MATHEMATICS Algebra, geometry, combinatoricsmarkl/teaching/ALGEBRA/Chapter7new.pdf · introduce you...

MATHEMATICSAlgebra, geometry, combinatorics

Dr Mark V Lawson

October 24, 2014

Contents

1 The nature of mathematics 11.1 What are algebra, geometry and combinatorics? . . . . . . . . 1

1.1.1 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.3 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 The scope of mathematics . . . . . . . . . . . . . . . . . . . . 81.3 Pure versus applied mathematics . . . . . . . . . . . . . . . . 91.4 The antiquity of mathematics . . . . . . . . . . . . . . . . . . 111.5 The modernity of mathematics . . . . . . . . . . . . . . . . . 121.6 The legacy of the Greeks . . . . . . . . . . . . . . . . . . . . . 141.7 The legacy of the Romans . . . . . . . . . . . . . . . . . . . . 151.8 What they didn’t tell you in school . . . . . . . . . . . . . . . 151.9 Further reading and links . . . . . . . . . . . . . . . . . . . . . 16

2 Proofs 192.1 How do we know what we think is true is true? . . . . . . . . 202.2 Three fundamental assumptions of logic . . . . . . . . . . . . 222.3 Examples of proofs . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3.1 Proof 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.3.2 Proof 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.3 Proof 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.3.4 Proof 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3.5 Proof 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4 Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.5 Mathematics and the real world . . . . . . . . . . . . . . . . . 412.6 Proving something false . . . . . . . . . . . . . . . . . . . . . 412.7 Key points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.8 Mathematical creativity . . . . . . . . . . . . . . . . . . . . . 43

i

ii CONTENTS

2.9 Set theory: the language of mathematics . . . . . . . . . . . . 432.10 Proof by induction . . . . . . . . . . . . . . . . . . . . . . . . 52

3 High-school algebra revisited 573.1 The rules of the game . . . . . . . . . . . . . . . . . . . . . . . 57

3.1.1 The axioms . . . . . . . . . . . . . . . . . . . . . . . . 573.1.2 Indices . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.1.3 Sigma notation . . . . . . . . . . . . . . . . . . . . . . 663.1.4 Infinite sums . . . . . . . . . . . . . . . . . . . . . . . 68

3.2 Solving quadratic equations . . . . . . . . . . . . . . . . . . . 703.3 Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.4 The real numbers . . . . . . . . . . . . . . . . . . . . . . . . . 77

4 Number theory 814.1 The remainder theorem . . . . . . . . . . . . . . . . . . . . . . 814.2 Greatest common divisors . . . . . . . . . . . . . . . . . . . . 914.3 The fundamental theorem of arithmetic . . . . . . . . . . . . . 974.4 Modular arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 108

4.4.1 Congruences . . . . . . . . . . . . . . . . . . . . . . . . 1094.4.2 Wilson’s theorem . . . . . . . . . . . . . . . . . . . . . 112

4.5 Continued fractions . . . . . . . . . . . . . . . . . . . . . . . . 1134.5.1 Fractions of fractions . . . . . . . . . . . . . . . . . . . 1134.5.2 Rabbits and pentagons . . . . . . . . . . . . . . . . . . 116

5 Complex numbers 1235.1 Complex number arithmetic . . . . . . . . . . . . . . . . . . . 1235.2 The fundamental theorem of algebra . . . . . . . . . . . . . . 131

5.2.1 The remainder theorem . . . . . . . . . . . . . . . . . . 1325.2.2 Roots of polynomials . . . . . . . . . . . . . . . . . . . 1345.2.3 The fundamental theorem of algebra . . . . . . . . . . 136

5.3 Complex number geometry . . . . . . . . . . . . . . . . . . . . 1415.3.1 sin and cos . . . . . . . . . . . . . . . . . . . . . . . . 1415.3.2 The complex plane . . . . . . . . . . . . . . . . . . . . 1415.3.3 Arbitrary roots of complex numbers . . . . . . . . . . . 1455.3.4 Euler’s formula . . . . . . . . . . . . . . . . . . . . . . 148

5.4 Making sense of complex numbers . . . . . . . . . . . . . . . . 1505.5 Radical solutions . . . . . . . . . . . . . . . . . . . . . . . . . 151

5.5.1 Cubic equations . . . . . . . . . . . . . . . . . . . . . . 151

CONTENTS iii

5.5.2 Quartic equations . . . . . . . . . . . . . . . . . . . . . 1545.5.3 Symmetries and particles . . . . . . . . . . . . . . . . . 156

5.6 Gaussian integers and factorizing primes . . . . . . . . . . . . 157

6 Rational functions 1596.1 Numerical partial fractions . . . . . . . . . . . . . . . . . . . . 1596.2 Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1626.3 Partial fractions . . . . . . . . . . . . . . . . . . . . . . . . . . 1636.4 Integrating rational functions . . . . . . . . . . . . . . . . . . 167

7 Matrices I: linear equations 1717.1 Matrix arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 171

7.1.1 Basic matrix definitions . . . . . . . . . . . . . . . . . 1717.1.2 Addition, subtraction, scalar multiplication and the

transpose . . . . . . . . . . . . . . . . . . . . . . . . . 1737.1.3 Matrix multiplication . . . . . . . . . . . . . . . . . . . 1757.1.4 Special matrices . . . . . . . . . . . . . . . . . . . . . . 1797.1.5 Linear equations . . . . . . . . . . . . . . . . . . . . . 1817.1.6 Conics and quadrics . . . . . . . . . . . . . . . . . . . 1827.1.7 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

7.2 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 1867.2.1 Properties of matrix addition . . . . . . . . . . . . . . 1867.2.2 Properties of matrix multiplication . . . . . . . . . . . 1877.2.3 Properties of scalar multiplication . . . . . . . . . . . . 1887.2.4 Properties of the transpose . . . . . . . . . . . . . . . . 1897.2.5 Some proofs . . . . . . . . . . . . . . . . . . . . . . . . 189

7.3 Solving systems of linear equations . . . . . . . . . . . . . . . 1957.3.1 Some theory . . . . . . . . . . . . . . . . . . . . . . . . 1967.3.2 Gaussian elimination . . . . . . . . . . . . . . . . . . . 198

7.4 Blankinship’s algorithm . . . . . . . . . . . . . . . . . . . . . 206

8 Matrices II: inverses 2098.1 What is an inverse? . . . . . . . . . . . . . . . . . . . . . . . . 2098.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 2138.3 When is a matrix invertible? . . . . . . . . . . . . . . . . . . . 2178.4 Computing inverses . . . . . . . . . . . . . . . . . . . . . . . . 2238.5 The Cayley-Hamilton theorem . . . . . . . . . . . . . . . . . . 2278.6 Complex numbers via matrices . . . . . . . . . . . . . . . . . . 230

iv CONTENTS

9 Vectors 2319.1 Vector algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

9.1.1 Addition and scalar multiplication of vectors . . . . . . 2329.1.2 Inner, scalar or dot products . . . . . . . . . . . . . . . 2389.1.3 Vector or cross products . . . . . . . . . . . . . . . . . 2409.1.4 Scalar triple products . . . . . . . . . . . . . . . . . . . 243

9.2 Vector arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . 2459.2.1 i’s, j’s and k’s . . . . . . . . . . . . . . . . . . . . . . . 245

9.3 Geometry with vectors . . . . . . . . . . . . . . . . . . . . . . 2499.3.1 Position vectors . . . . . . . . . . . . . . . . . . . . . . 2499.3.2 Linear combinations . . . . . . . . . . . . . . . . . . . 2509.3.3 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2519.3.4 Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . 2559.3.5 Determinants . . . . . . . . . . . . . . . . . . . . . . . 258

9.4 Summary of vectors . . . . . . . . . . . . . . . . . . . . . . . . 2639.5 *Two vector proofs* . . . . . . . . . . . . . . . . . . . . . . . 2669.6 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

Chapter 1

The nature of mathematics

This chapter is a guide to the mathematics described in this book.

1.1 What are algebra, geometry and combi-

natorics?

1.1.1 Algebra

Algebra started as the study of equations. The simplest kinds of equationsare ones like

3x− 1 = 0

where there is only one unknown x and that unknown occurs to the power1. This means we have x alone and not, say, x1000. It is easy to solve thisspecific equation. Add 1 to both sides to get

3x = 1

and then divide both sides by 3 to get

x =1

3.

This is the solution to my original equation and, to make sure, we check ouranswer by calculating

3 · 1

3− 1

1

2 CHAPTER 1. THE NATURE OF MATHEMATICS

and observing that we really do get 0 as required. Even this simple exampleraises an important point: to carry out these calculations, I had to knowwhat rules the numbers and symbols obeyed. You probably applied theserules unconsciously, but in this book it will be important to know explicitlywhat they are. The method used for the specific example above can beapplied to any equation of the form

ax+ b = 0

as long as a 6= 0. Here a, b are specific numbers, probably real numbers, andx is the real number I am trying to find. This equation is the most generalexample of a linear equation in one unknown.

If x occurs to the power 2 then we get

ax2 + bx+ c = 0

where a 6= 0. This is an example of a quadratic equation in one unknown. Youwill have learnt a formula to solve such equations. But there is no reason tostop at 2. If x occurs to the power 3 we get a cubic equation in one unknown

ax3 + bx2 + cx+ d = 0

where a 6= 0. Solving such equations is much harder than solving quadraticsbut there is also an algebraic formula for the roots. But there is no reason tostop at cubics. We could look at equations in which x occurs to the power 4,quartics, and once again there is a formula for finding the roots. The highestpower of x that occurs in such an equation is called its degree. These resultsmight lead you to expect that there are always algebraic formulae for findingthe roots of any polynomial equation whatever its degree. There aren’t.For equations of degree 5, the quintics, and more, there are no algebraicformulae which enable you to solve the equations. I don’t mean that noformulae have yet been discovered, I mean that someone has proved that sucha formula is impossible, that someone being the young French mathematicianEvariste Galois (1811–1832), the James Dean of mathematics. Galois’s workmeant the end of the view that algebra was about finding formulae to solveequations. We shall not study Galois’s work in this book but it has had ahuge impact on algebra. It is one of the reasons why the algebra you studylater in your university careers will look very different from the algebra youstudied at school. In fact, one of my goals in writing this book is to help younavigate this transition.

1.1. WHAT ARE ALGEBRA, GEOMETRY AND COMBINATORICS? 3

I have talked about solving equations where there is one unknown butthere is no reason to stop there. We can also study equations where thereare any finite number of unknowns and those unknowns occur to any powers.The best place to start is where we have any number of unknowns but eachunknown can occur only to the first power and no products of unknowns areallowed. This means we are studying linear equations like

x+ 2y + 3z = 4.

Our goal is to find all the values of x, y and z that satisfy this equation.Thus the solutions are ordered triples (x, y, z). For example, both (0, 2, 0)and (2, 1, 0) are solutions whereas (1, 1, 1) is not a solution. It is unusual tohave just one linear equation to solve. Usually we have two or more such as

x+ 2y + 3z = 4 and x+ y + z = 0.

We then need to find all the triples (x, y, z) that satisfy both equationssimultaneously. In fact, as you should check, all the triples

(λ− 4, 4− 2λ, λ)

where λ is any number satisfy both equations. For this reason, we oftenspeak about simultaneous linear equations. It turns out that solving systemsof linear equations never becomes difficult however many unknowns thereare. The modern way of studying systems of linear equations uses matrixtheory.

That leaves studying equations where there are at least 2 unknowns andwhere there are no constraints on the powers of the unknowns and the extentto which they may be multiplied together. This is much more complicated.If you only allow squares such as x2 or products of at most two unknowns,such as xy, then there are relatively simple methods for solving them. But,even here, strange things happen. For example, the solutions to

x2 + y2 = 1

can be written (x, y) = (sin θ, cos θ). If you allow cubes or products of morethan two unknowns then you enter the world of subjects like algebraic ge-ometry and even connect with current research.

In this book, I shall introduce you to the theory of polynomial equationsand also to the theory of linear equations. I shall also show you how to solveequations that look like this

ax2 + bxy + cy2 + dx+ ey + f = 0.


So far, I have been talking about the algebra of numbers. But I shall alsointroduce you to the algebra of matrices, and the algebra of vectors, and thealgebra of subsets of a set, amongst others. In fact, I think the first shockon encountering university mathematics can be summed up in the followingstatement.

There is not one algebra, but many different algebras, each de-signed for different purposes.

These different algebras are governed by different sets of rules. For thisreason, it becomes crucial in university mathematics to make those rulesexplicit. In this book, the algebra you studied at school I often call high-school algebra so we know what we are talking about.

In my description of solving equations, I have left to one side somethingthat probably seemed obvious: the nature of the solutions. These solutionsare of course numbers but what do we mean by ‘numbers’? You might thinkthat a number is a number but in mathematics this concept turns out tobe much more interesting than it might first appear. The everyday ideaof a number is essentially that of a real number. Informally, these are thenumbers that can be expressed as positive or negative decimals, with possiblyan infinite number of digits after the decimal place such as

π = 3 · 14159265358 . . .

where the dots indicate that this can be continued forever. Whilst suchnumbers are sufficient to solve linear equations in one unknown, they arenot enough to solve quadratics, cubics, quartics etc. These require the in-troduction of complex numbers which involve such apparent ineffabilities asthe square root of minus one. Because such numbers don’t occur in everydaylife, there is a temptation to view them as somehow artificial or of purelytheoretical interest. This is wrong with a capital w. All numbers are artifi-cial, in that they are artefacts of our imaginations that help us to understandthe world. Although you can see examples of two things you cannot see thenumber two. It is an idea, an abstraction. As for being of only theoreticalinterest, it is worth noting that quantum mechanics, the theory that explainsthe behaviour of atoms and their constituents, uses complex numbers in anessential way. In fact, for mathematicians the word ‘number’ usually means‘complex number’ and mathematics is unthinkable without them.


But this is not the end of our excavations of what we mean by the word‘number’. There are occasions when we want to restrict the solutions: wemight want whole number solutions or solutions as fractions. It turns outthat the usual high-school methods for solving equations don’t work in thesecases. For example, consider the equation

2x+ 4y = 3.

To find the real or complex solutions, we let x be any real or complex valueand then we can solve the equation to work out the corresponding value ofy. But suppose that we are only interested in whole number solutions? Infact, there are none. You can see why by noting that the lefthand side of theequation is exactly divisible by 2, whereas the righthand side isn’t. Whenwe are interested in solving equations, of whatever type, by means of wholenumbers or fractions we say that we are studying Diophantine equations. Thename comes from Diophantus of Alexandria who flourished around 250 CE,and who studied such equations in his book Arithmetica. It is ironic thatsolving Diophantine equations is often much harder than solving equationsusing real or complex numbers.

1.1.2 Geometry

If algebra is about manipulating symbols, then geometry is about pictures.The Ancient Greeks developed geometry to a very high level. Some of theirachievements are recorded in Euclid’s book the Elements which I shall havemore to say about later. It developed the whole of what became known asEuclidean geometry on the basis of a few rules known as axioms. This geom-etry gives every impression of being a faithful mathematical version of thegeometry of actual space and for that reason you might expect that, unlikealgebra, there is only one geometry and that’s that. In fact, it was discoveredin the nineteenth century that there are other mathematical geometries suchas spherical geometry and hyperbolic geometry. In the twentieth century, itbecame apparent that even the space we inhabit was much more complexthan it appeared. First came the four dimensional geometry of special rela-tivity and then the curved space-time of general relativity. Modern particlephysics suggests that there may be many more dimensions in real space thanwe can see. So, in fact, we have the following.


There is not one geometry, but many different geometries, eachdesigned for different purposes.

In this book, I will only talk about three-dimensional Euclidean geometry,but this is the gateway to all these other geometries.

This, however, is not the end of the story. In fact, any book about algebramust also be about geometry. The two are indivisible but it was not alwayslike that. Unlike geometry which began with a sort of Big Bang in AncientGreece, algebra crystallized much more slowly over time and in differentplaces. There is even some algebra, disguised, in the Elements. In the 17thcentury, Rene Descartes discovered the first connection between algebra andgeometry which will be completely familiar to you. For example, x2 + y2 = 1is an algebraic equation, but it also describes something geometric: a circleof unit radius centred on the origin. This connection between algebra andgeometry will play an important role in our study of linear equations andvectors. But it is just a beginning.

If you are studying an algebra look for an accompanying geometry,and if you are studying a geometry find a companion algebra.

This is quite a fancy way of saying things, but it boils down to the fact thatmanipulating symbols is often helped by drawing pictures, and sometimesthe pictures are to complex so it is helpful to replace them with symbols. It’snot a one-way street.

I want to give you some idea of why the connection between algebra andgeometry is so significant. Let me start with a problem that looks completelyalgebraic. Problem: find all whole numbers a, b, c that satisfy the equationa2 + b2 = c2. I’ll write solutions that satisfy this equation as (a, b, c). Suchnumbers are called Pythagorean triples. Thus (0, 0, 0) is a solution and so is(3, 4, 5), and I can put in minus signs since when squared they disappear so(−3, 4,−5) is a solution. In addition, if (a, b, c) is a solution so is (λa, λb, λc)where λ is any whole number. I shall now show that this problem is equivalentto one in geometry. Suppose first that a2 + b2 = c2. We exclude the casewhere c = 0 since then a = 0 and b = 0. We may therefore divide both sidesby c2 and get (a

c

)2

+

(b

c

)2

= 1.


Recall that a rational number is a real number that can be written in theform u

vwhere u and v are whole numbers and v 6= 0. It follows that

(x, y) =

(a

c,b

c

)is a rational point on the unit circle; that is, a point with rational co-ordinates.On the other hand, if

(x, y) =

(m

n,p

q

)is a rational point on the unit circle then

(mq)2

(nq)2+

(np)2

(nq)2= 1.

Thus (mq, pn, nq) is a Pythagorean triple. We may therefore interpret ouralgebraic question as a geometric one: to find all Pythagorean triples, findall those points on the unit circle with centre the origin whose x and y co-ordinates are both rational. In fact, this can be used to get a very nicesolution to the original algebraic problem as we shall show later.

1.1.3 Combinatorics

The term ‘combinatorics’ may not be familiar though the sorts of questionsit deals with are. Combinatorics is the branch of mathematics that dealswith arrangements and the counting of arrangements. The fact that it dealsin counting makes it sound like this should be an easy subject. In fact, it isoften very difficult. For example, counting lies behind probability theory, asubject that can often defy intuition. Let me give you a simple example. In aclass of, say, 25 students, how likely do you think it is that two students willshare the same birthday? By this I mean, the same date and month, thoughnot year. Unless you’ve seen this problem before, I think the instinct is to say‘not very’. This is because we imagine in our mind’s eye those 25 studentsto be arranged across 365 days without any pair of students landing on thesame date. In fact the answer, which you can calculate using the methods ofthis book, is just over a half. In other words, there is the same chance of twostudents sharing the same birthday as there is of tossing a coin and gettingheads. This little problem is often known as the birthday paradox. It is agood example of where maths can be used to correct our faulty intuition. But


this is really a counting problem. To get the right answers to such problems,you need to think about what you are counting in the right way.

1.2 The scope of mathematics

The most common replies to the question ‘what is mathematics?’ addressedto a non-mathematician are usually the depressing ‘arithmetic’ or ‘accoun-tancy’. Asked what they remember about school maths and they might beable to dredge up some more-or-less arcane words with challenging spellings:hypotenuse, isosceles, parallelogram. It either sounds a bit boring or a bitweird, but in any event is so obviously completely removed from real life thatit can safely be ignored.

Mathematics, therefore, has an image problem.I think part of the reason for this is the kind of maths that is taught in

schools and the way it is taught. School mathematics suffers by being basedon the narrow syllabuses proscribed by examining boards under politicaldirection. As a result, it is more by luck than design if anyone at schoolgets an idea of what maths is actually about. In addition, teaching too oftenmeans teaching to the exam, which means working through past exam papersand learning tricks1.

Let me begin by showing you just how vast a subject mathematics reallyis. The official Mathematics Subject Classification currently divides math-ematics into 64 broad areas in any one of which a mathematician couldwork their entire professional life. You can see what they are in the box.By the way, the missing numbers are deliberate and not because I cannotcount.

Mathematics Subject Classification 2010 (adapted)

00. General 01. History and biography 03. Mathematical logicand foundations 05. Combinatorics 06. Order theory 08. Gen-eral algebraic systems 11. Number theory 12. Field theory 13.Commutative rings 14. Algebraic geometry 15. Linear and multi-linear algebra 16. Associative rings 17. Non-associative rings 18.Category theory 19. K-theory 20. Group theory and generaliza-

1I say teaching and not teachers. My criticism is directed at policy not those who areforced to carry out that policy often under enormous pressures.

1.3. PURE VERSUS APPLIED MATHEMATICS 9

tions 22. Topological groups 26. Real functions 28. Measureand integration 30. Complex functions 31. Potential theory 32.Several complex variables 33. Special functions 34. Ordinary dif-ferential equations 35. Partial differential equations 37. Dynamicalsystems 39. Difference equations 40. Sequences, series, summa-bility 41. Approximations and expansions 42. Harmonic analysis43. Abstract harmonic analysis 44. Integral transforms 45. Integralequations 46. Functional analysis 47. Operator theory 49. Calcu-lus of variations 51. Geometry 52. Convex geometry and discretegeometry 53. Differential geometry 54. General topology 55.Algebraic topology 57. Manifolds 58. Global analysis 60. Proba-bility theory 62. Statistics 65. Numerical analysis 68. Computerscience 70. Mechanics 74. Mechanics of deformable solids 76.Fluid mechanics 78. Optics 80. Classical thermodynamics 81.Quantum theory 82. Statistical mechanics 83. Relativity 85. As-tronomy and astrophysics 86. Geophysics 90. Operations research91. Game theory 92. Biology 93. Systems theory 94. Informationand communication 97. Mathematics education

Each of these broad areas is then subdivided into a large number of smallerareas, any one of which could be the subject of a PhD thesis. This is a littleoverwhelming, so to make it more manageable it can be summarized, veryroughly, into the following ten areas:

Algebra Number theoryCalculus and analysis Probability and statisticsCombinatorics Differential equationsGeometry and topology Mathematical physicsLogic Computing

Most undergraduate courses will fit under one of these headings. But it isimportant to remember that mathematics is one subject — dividing it upinto smaller areas is done for convenience only. When solving a problem anyand all of the above areas might be needed.

1.3 Pure versus applied mathematics

Sometimes a distinction is drawn between pure and applied mathematics.Pure maths is supposed to be maths done for its own sake with no thought to


applications, whereas applied maths is maths used to solve some, presumablypractical, problem. I think there is often an implicit moralistic undertone tothis distinction with pure maths being viewed as perhaps rather self-indulgentand decorative, and applied maths as socially responsible grown-up mathsthat pays its way. Politicians prefer applied maths because they think it willmake money. Evidence for this distinction is the following quote from theEnglish mathematician G. H. Hardy (1877–1947) that is often used to provethe point:

“I have never done anything ‘useful’. No discovery of mine hasmade, or is likely to make, directly or indirectly, for good or ill,the least difference to the amenity of the world.”

Hardy was a truly great mathematician and a decent human being. Ashis dates show, he was of the generation that witnessed the First WorldWar where science and technology were applied to the business of wholesaleslaughter. His views on maths are therefore a not unnatural reaction onthe part of someone who taught young people who then went to war neverto return. Maths for him was perhaps a sanctuary2. In reality, the termspure and applied are extremely fuzzy. A mathematician might start work onsolving a real-life problem and then be led to develop new pure mathematics,or start in pure maths and develop an application. Calculus, for example,developed mainly out of the need to solve problems in physics and then wasapplied to pure maths. Complex numbers couldn’t have been more pure,introduced to provide the missing roots to polynomial equations, but are nowthe basis of quantum mechanics. In reality, there is just one mathematics.

The Banach-Tarski Paradox

The glory of mathematics is often to be found in its sheer weirdness.For a universe founded on logic, it can lead to some pretty confoundingconclusions. For example, a solid the size of a pea may be cut into afinite number of pieces which may then be reassembled in such a way asto form another solid the size of the sun. This is known as the Banach-Tarski Paradox (1924). There’s no trickery involved here and no sleightof hand. This is clearly pure maths — give me a real pea and whateverI do it will remain resolutely pea-sized — but the ideas it uses involve

2There was a similar reaction at the end of the Second World War amongst physicistswho turned instead to biology as an alternative to building weapons.

1.4. THE ANTIQUITY OF MATHEMATICS 11

such fundamental and seemingly straightforward notions as length, areaand volume that have important applications in applied maths.

1.4 The antiquity of mathematics

The history of chemistry or astronomy is not hugely relevant, however inter-esting it may be, to modern theories of chemistry or astronomy. A few hun-dred years ago, chemistry was alchemy and astronomy was astrology: modernchemists are not searching for the philosopher’s stone and astronomers don’tcaste horoscopes. Alchemists and astrologers are often the forbears theywould prefer to forget.3 Maths is different, since what was mathematicallytrue hundreds of years ago remains true today. Here is a famous example.Plimpton 322 is a small clay tablet kept in the George A. Plimpton Collectionat Columbia University dating to about 1,800 BCE. Impressed on the tabletare a number of columns of numbers written in cuneiform. The numbers arewritten not in base 10 but in base 60, the base that still lies behind the waywe tell the time and measure angles. The meaning and purpose of this claytablet is much disputed. But the second and third columns consist of thefollowing numbers, where I have given the usual corrected numbers. I havegiven the first seven lines of the table — there are fifteen in the original.

B C

1 119 1692 3367 48253 4601 66494 12709 185415 65 976 319 4817 2291 3541

If you calculate C2−B2 you will get a perfect square D2. Thus (B,D,C) isa Pythagorean triple. How such large Pythagorean triples were computed isa mystery.

This antiquity, combined with the fact that maths is a cumulative subject,meaning that you have to learnX before you can learn Y , has the unfortunate

3I am exaggerating a little here for rhetorical purposes. In fact, much fine work wascarried out under the guise of alchemy and astrology.


effect that most of the mathematics you learnt at school was invented before1800. Here is a very rough chronology.

BCE CE2000 Solving quadratics 1550 Solving cubics and quartics400 Existence of irrational numbers 1590 Logarithms300 Euclidean geometry 1630 Analytic geometry200 Conics 1675 Calculus

1700 Probability1795 Complex numbers

Only matrices (1850) and vectors (1880) were introduced more recently. How-ever, if you think of all the developments in physics since 1800 such as blackholes, the big bang theory, parallel universes, quantum then you might sus-pect that there have also been big developments in mathematics. There have,but you would be forgiven for not knowing about them because they are notpromoted in the media or taught in school.

I should add that like any other field of human endeavour, it is of coursetrue that mathematical ideas go in and out of fashion, but crucially theydon’t become wrong with time.

1.5 The modernity of mathematics

The fact that what’s taught in schools doesn’t seem to change much fromgeneration to generation leads to one of the biggest misconceptions aboutmathematics: that it has already all been discovered. To try and bring youup to date, I am going to say a little about three mathematicians and theirwork: Alan Turing (1912–1954), Sir Andrew Wiles (b. 1953), and TerenceTao (b. 1975). I have chosen them to illustrate some additional points I wantto make about maths.

Alan Turing

Alan Turing is the only mathematician I know who has had a West Endplay written about his life: the 1986 play Breaking the code by Hugh White-more. Turing is best known as one of the leading members of Bletchley Parkduring the Second World War, for his role in the British development ofcomputers during and after the War, and for the ultimately tragic nature of

1.5. THE MODERNITY OF MATHEMATICS 13

his early death. Here I want to return to Turing the mathematician. As agraduate student, he wrote a paper in 1936 entitled On computable numberswith an application to the Entscheidungsproblem, where the long Germanword means decision problem and refers to a specific question in mathemat-ical logic. It was as a result of solving this problem that Turing was led toformulate a precise mathematical blueprint for a computer now called Tur-ing machines in his honour. This is the most extreme example I know ofa problem in pure maths leading to new applied maths — in fact, it led tothe whole field of computer science and the information age we now inhabit.Amongst computer scientists, Turing is regarded as the father of computerscience. So, mathematicians invented the modern world.

Andrew Wiles

Mathematicians operate on a completely different timescale from everyoneelse. I have already talked about Pythagorean triples, those whole numbers(x, y, z) that satisfy the equation x2 +y2 = z2. Here’s an idle thought. Whathappens if we try to find whole number solutions to x3+y3 = z3 or x4+y4 = z4

or more generally xn + yn = zn where n ≥ 3. Let’s exclude the trivial casewhere some of the numbers x, y or z are 0. So, here is the question: for n ≥ 3find all whole number solutions to xn + yn = zn where xyz 6= 0. Back in the17th century, Pierre de Fermat (1601?–1665) wrote in the margin of a book,the Arithmetica of Diophantus, that he had found a proof that there wereno such solutions but that sadly there wasn’t enough room for him to recordit. This became known as Fermat’s Last Theorem. In fact, since Fermat’ssupposed proof was never found, it was really a conjecture. More to thepoint, it is highly unlikely that he ever had a proof since in the subsequentcenturies many attempts were made to prove this result, all in vain, althoughsubstantial progress was made. This problem became one of mathematics’many Mount Everests: the peak that everyone wanted to scale. Finally, onMonday 19th September, 1994, sitting at his desk, Andrew Wiles, buildingon over three centuries of work, and haunted by his premature announcementof his success the previous year, had a moment of inspiration as the followingquote from the Daily Telegraph dated 3rd May 1997 reveals

“Suddenly, totally unexpectedly, I had this incredible revelation.It was so indescribably beautiful, it was so simple and so elegant.”

As a result Fermat’s Conjecture really is a theorem, but the proof requiredtravelling through what can only be described as mathematical hyperspace.


Wiles’s reaction to his discovery is also a glimpse of the profound intellectualexcitement that engages the emotions as well as the intellect when doingmathematics4.

Terence Tao

Tao won the 2006 Field’s medal. This is a mathematical honour compa-rable with a Nobel Prize though with the added twist that you have to beunder 40 to get one. You can read his thoughts at his blog, as well as use itto find all manner of interesting things. So, what sorts of things does he do?Here is one example that is remarkably easy to explain though the proof isformidable. You know what primes are and, in any event, we shall talk aboutthem later. They can be regarded as the atoms of numbers and their prop-erties have inspired hard questions and deep results. One of the things thatinterests mathematicians is the sorts of patterns that can be found in primes.An arithmetic progression is a sequence of numbers of the form a+ dk wherea and d are fixed numbers. Consider the arithmetic progression 3 + 2k. Ob-serve that for the consecutive values of k = 0, 1, 2, the numbers 3, 5, 7 whicharise are all prime. But when k = 3 we get 9 which is not prime. Our littleexample is an instance of an arithmetic progression with 3 terms all prime.Here is one with 10 terms 199 + 210k where k = 0, 1, . . . , 9. In 2004, Taoand his colleague Ben Green proved that there were arithmetic progressionsof arbitrary length all of whose terms are prime. In other words, for anynumber n there is an arithmetic progression so that the first n terms are allprime.

1.6 The legacy of the Greeks

The word ‘mathematics’ is Greek. In fact, many mathematical terms areGreek: lemma, theorem, hypotenuse, orthogonal, polygon, to name just afew. The Greek alphabet is used as a standard part of mathematical nota-tion. The very concept of a mathematical proof is a Greek idea. All of thisreflects the fact that Ancient Greece is the single most important historicalinfluence on the development and content of mathematics. By Ancient Greek

4There is a BBC documentary directed by Simon Singh about Andrew Wiles madefor the BBC’s Horizon series. It is an exemplary example of how to portray complexmathematics in an accessible way and cannot be too highly recommend.

1.7. THE LEGACY OF THE ROMANS 15

mathematics, I mean the mathematics developed in the wider Greek worldaround the Mediterranean in the thousand or more years between roughly600 BCE and and 600 CE. It begins with the work of semi-mythical figures,such as Thales of Miletus and Pythagoras of Samos, and is developed in thebooks of such mathematicians as Euclid, Archimedes, Apollonius of Perga,Diophantus and Pappus. Of all the Ancient Greek mathematicians the great-est was Archimedes. His work is sophisticated mathematics of the highestorder. In particular, he developed methods that are close to those of integralcalculus and used them to calculate areas and volumes of complicated curvedshapes.

1.7 The legacy of the Romans

For all their aqueducts, roads, baths and maintenance of public order, it hasbeen said of the Romans that their only contribution to mathematics waswhen Cicero rediscovered the grave of Archimedes and had it restored5.

1.8 What they didn’t tell you in school

This book is written to help you make the transition from school maths touniversity maths. You might well still be in school, or you might have leftschool fifty years ago, it doesn’t matter. Maths as taught in school and themaths taught at university are very different, but the failure to understandthose differences can cause problems. To be successful in university mathe-matics you have to think in new ways. University Mathematics is not justSchool Mathematics with harder sums and fancier notation, it is different,fundamentally different, from what you did at school.

In much of school mathematics, you learn methods for solving spe-cific problems. Often, you just learn formulae.

A method for solving a problem that requires little thought in its appli-cation is called an algorithm. Computer programs are the supreme examplesof algorithms, and it is certainly true that finding algorithms for solving spe-cific problems is an important part of mathematics, but it is by no means the

5George Simmons, Calculus Gems, McGraw-Hill, Inc., New York, 1992, page 38.


only part. Problems do not come neatly labelled with the methods neededfor their solution. A new problem might be solvable using old methods orit might require you to adapt those methods. On the other hand, you mayhave to invent completely new methods to solve it. Such new methods re-quire new ideas. In fact, what you might not have appreciated from schoolmathematics is the important role played in mathematics by ideas. An ideais a tool to help you think.

Mathematics at school is often taught without reasons being givenfor why the methods work.

This is the fundamental difference between school mathematics and uni-versity mathematics. A reason why something works is called a proof. I shallsay a lot more about proofs in Chapter 2.

The Millennium Problems

Mathematics is difficult but intellectually rewarding. Just how hard canbe gauged by the following. The Millennium Problems is a list of sevenoutstanding problems posed by the Clay Institute in the year 2000. Acorrect solution to any one of them carries a one million dollar prize.To date, only one has been solved, the Poincare conjecture, by GrigoriPerelman in 2010, who declined to take the prize money. The point isthat no one offers a million dollars for something that is trivial. You canread more about these problems at

http://www.claymath.org/millennium-problems

1.9 Further reading and links

There is a wealth of material about mathematics available on the Web andI would encourage exploration. Here, I will point out some books and linksthat develop the themes of this chapter. A book that is in tune with thegoals of this chapter is

P. Davis, R. Hersh, E. A. Marchisotto, The mathematical experience, Birkhauser,2012.

1.9. FURTHER READING AND LINKS 17

It’s one of those books that you can dip into and you will learn somethinginteresting but, most importantly, it will expand your understanding of whatmathematics is, as it did mine.

A good source book for the history of mathematics, and again somethingthat can be dipped into, is

C. B. Boyer, U. C. Merzbach, A history of mathematics, Jossey Bass, 3rdEdition, 2011.

The books above are about maths rather than doing maths. Let me nowturn to some books that do maths in a readable way. There is a plethoraof popular maths books now available, and if you pick up any books by IanStewart — though if the book appears to be rather more about volcanoesthan is seemly in a maths book, you have Iain Stewart — and Peter Higginsthen you will find something interesting. Sir (William) Timothy Gowers wona Field’s Medal in 1998 and so can be assumed to know what he is talkingabout.

T. Gowers, Mathematics: A Very Short Introduction, Oxford UniversityPress, 2002

It is worth checking out his homepage for some interesting links. He alsohas his own blog which is worth checking out. I think the Web is serving tohumanize mathematicians: their ivory towers all have wi-fi. A classic bookof this type is

R. Courant, H. Robbins, What is mathematics, OUP, 1996.

This is also an introduction to university-level maths, and it has influencedmy thinking on the subject.

If you have never looked into Euclid’s book the Elements, then I wouldrecommend you do6. There is an online version that you can access via DavidE. Joyce’s website at Clark University. A handsome printed version, editedby Dana Densmore, has been published by Green Lion Press, Santa Fe, NewMexico.

6Whenever I refer to Euclid, it will always be to this book. It consists of thirteenchapters, themselves called ‘books’, which are numbered in the Roman fashion I–XIII.


Finally, let me mention the books of Martin Gardner. For a quarter ofa century, he wrote a monthly column on recreational mathematics for theScientific American which inspired amateurs and professionals alike. I wouldstart with

M. Gardner, Hexaflexagons, probability paradoxes, and the Tower of Hanoi:Martin Gardner’s first book of mathematical puzzles and games, CUP, 2002

and follow your interests.

Chapter 2

Proofs

Part of the argument sketch, Monty Python

M = Man looking for an argumentA = Arguer

M: An argument isn’t just contradiction.A: It can be.M: No it can’t. An argument is a connected series of statements intended toestablish a proposition.A: No it isn’t.M: Yes it is! It’s not just contradiction.A: Look, if I argue with you, I must take up a contrary position.M: Yes, but that’s not just saying ‘No it isn’t.’A: Yes it is!M: No it isn’t!A: Yes it is!M: Argument is an intellectual process. Contradiction is just the automaticgainsaying of any statement the other person makes.(short pause)A: No it isn’t.

The most fundamental difference between school and university mathe-matics lies in proofs. At school, you were probably told mathematical factsand given recipes that solved particular kinds of problems. But the chances

19

20 CHAPTER 2. PROOFS

are, you were not given any reasons to back up those facts or explanationas to why those recipes worked. University and professional mathematics isdifferent. Reasons and explanations are essential and are called proofs. Theyare the essence of mathematics. Mathematical truth, and the notion of proofthat supports it, is so different from what we encounter in everyday life thatI shall need to begin by setting the scene.

2.1 How do we know what we think is true is

true?

Human beings usually believe something first for emotional reasons, andthen look for the evidence to back it up. The pitfalls of this are obvious.We shall therefore be interested in reasons that do not involve emotion. Tobe concrete, how would you verify the following claim: Mount Everest isbetween 8 and 9 km high?

The appeal to authority

In the past, claims such as this would be resolved by consulting an en-cyclopedia or atlas whereas today, of course, we would simply go online. Ifyou do this, you will find that a height of about 8.8 km is quoted. For mostpurposes this would settle things. But it’s important to understand whatthis entails. We are, in effect, taking someone’s word for it. We assume thatwhoever posted this information knows what they are talking about. Whatwe are doing, therefore, is appealing to authority. Most of what we take tobe true is based on such appeals to authority: parents, teachers, politicians,religiosi etc tell us things that they claim to be true and more often than notwe believe them. There’s a small element of laziness involved on our part,but it is so convenient. The pitfalls of this are also obvious.

The appeal to experiment

But where did the figure of 8.8km come from? It wasn’t just pluckedfrom the sky. The height of Mount Everest was first measured as part of thegreat survey of India undertaken in the nineteenth century. This consisted ofa team of expert surveyers who not only employed extremely precise instru-ments that were used to take multiple measurements but who also tried to

2.1. HOW DO WE KNOW WHAT WE THINK IS TRUE IS TRUE? 21

minimize the effect of factors influencing the accuracy of their measurementssuch as temperature and, amazingly, variations in gravity. Making measure-ments and taking great pains over those measurements together with esti-mations of the error bounds is such an important part of science that scienceitself would be impossible without it. Let’s call this the appeal to experiment.

This brings me to how we know statements are true in mathematics. Theessential point is the following:

Neither of the above methods for ascertaining truth plays anyrole whatsoever in determining mathematical truth.

This is so important, I am going to say it again in a different way:

• Results are not true in maths because I say so or because someoneimportant said they were true a long time ago.

• Results are not true in mathematics because I have carried out exper-iments and I always get the same answer.

• Results are not true in maths ‘just because they are’.

How then can we determine whether something in mathematics is true?

• Results are true in maths only because they have been proved to betrue.

• A proof shows that a result is true.

• A proof is something that you yourself can follow and at the end youwill see the truth of what has been proved.

• A result that has been proved to be true is called a theorem.

• The appeal to authority and the appeal to experiment are both fallible.The appeal to proof is never fallible. The only truths we know forcertain are mathematical truths.

This is heady stuff. So what, then, is a proof? The remainder of thischapter is devoted to an introductory answer to this question.


2.2 Three fundamental assumptions of logic

In order to understand how mathematical proofs work, there are three sim-ple, but fundamental, assumptions you have to understand.

I. Mathematics only deals in statements that are capableof being either true or false.

Mathematics does not deal in statements which are ‘sometimes true’ or‘mostly false’. There are no approximations to the truth in mathematics andno grey areas. Either a statement is true or a statement is false, though wemight not know which. This is quite different from everyday life, where weoften say things which contain a grain of truth or where we say things forrhetorical reasons which we don’t entirely mean. Mathematics also doesn’tdeal in statements that are neither true nor false like exclamations such as‘Out damned spot!’ or with questions such as ‘To be or not to be?’.

II. If a statement is true then its negation is false, and ifa statement is false then its negation is true.

In natural languages, negating a sentence is achieved in different ways.In English, the negation of ‘It is raining’ is ‘It is not raining’. In French,the negation of ‘Il pleut’ is obtained by wrapping the verb in ‘ne . . . pas’ toget ‘It ne pleut pas’. To avoid grammatical idiosyncracies, we can use theformal phrase ‘it is not the case that’ and place it in front of any sentenceto negate it. So, ‘It is not the case that it is raining’ is the negation of ‘It israining’. In some languages, and French is one of them, adding negatives isused for emphasis. This used to be the case in older forms of English and isoften the case in informal English. In formal English, we are taught that twonegatives make a positive which is actually the rule taken from mathematicsabove where it is true. In fact, negating negatives in natural languages ismore complex than this. For example, if your partner says they are ‘not un-happy’ then this isn’t quite the same as being ‘happy’ and maybe you needto talk.

III. Mathematics is free of contradictions.

2.3. EXAMPLES OF PROOFS 23

A contradiction is where both a statement and its negation are true. Thisis impossible by (II) above. This assumption will play a vital role in proofsas we shall see later.

2.3 Examples of proofs

Armed with the three assumptions above, I am going to take you throughfive proofs of five results, three of them being major theorems. This willenable me to show you examples of proofs but will also illustrate importantissues about how proofs, and mathematics, work.

Although proofs can be long or short, hard or easy they all tend to followthe same script. First, there will be a statement of what is going to beproved. This usually has the form: if a bunch of things are assumed truethen something else is also true. If the things assumed true are lumpedtogether as A, for assumptions, and the thing to be proved true is labelledC, for conclusion, then a statement to be proved usually has the shape ‘if Athen C’ or ‘A implies C’ or, in notation, ‘A ⇒ C’. The proof itself shouldbe thought of as a (rational) argument between two protagonists whom weshall call Alice and Bob. We assume that Alice wants to prove C. She canuse any of the assumptions A, any previously proved theorems, the rules oflogic, which I shall describe as we meet them, and definitions. Bob’s roleis to act like an attorney and to demand that Alice justify each claim shemakes. Thus Alice cannot just make assertions without justifying them, andshe is limited in the sorts of things that count as justifications. At the endof this, Alice can say something like ‘ . . . and so C is proved’ and Bob willbe forced to agree.

2.3.1 Proof 1

We shall prove the following statement.

The square of an even number is even, and the square of an oddnumber is odd.

In fact, this is really two statements ‘If n is an even number then n2 is even’and ‘If n is an odd number then n2 is odd.’ Before we can prove them, we


need to understand what they are actually saying. The terms odd and evenare only used of whole numbers such as

0, 1, 2, 3, 4, . . .

These numbers are called the natural numbers and they are the first kindsof numbers we learn about as children. Thus we are being asked to prove astatement about natural numbers. The terms ‘odd’ and ‘even’ might seemobvious, but we need to be clear about how they are used in maths. Bydefinition, a natural number n is even if it is exactly divisible by 2, otherwiseit is said to be odd. In maths, we usually just say divisible rather than exactlydivisible. This definition of divisibility only makes sense when talking aboutwhole numbers. For fractions, for example, it is pointless since one fractionwill always divide another fraction. Notice that 0 is an even number because0 = 2 × 0. In other words, 0 is exactly divisible by 2. However, remember,you cannot divide by 0 but you can certainly divide into 0. You might havebeen told that a number is even if its last digit is one of the digits 0, 2, 4, 6, 8.In fact, this is a consequence of our definition rather than a definition itself.I shall ask you to prove this result in the exercises. I shall say no more aboutthe definition of even. What about the definition of odd? A number is oddif it is not even. This is not a very useful definition since a number is oddif it fails to be even. We want a more positive characterization. So we shalldescribe a better one. If you attempt to divide a number by 2 then there aretwo possibilities: either it goes exactly, in which case the number is even, orit goes so many times plus a remainder of 1, in which case the number is odd.It follows that a better way of defining an odd number n is one that can bewritten n = 2m + 1 for some natural number m. So, the even numbers arethose natural numbers that are divisible by 2, thus the numbers of the form2n for some n, and the odd numbers are those that leave the remainder 1when divided by 2, thus the numbers of the form 2n + 1 for some n. Everynumber is either odd or even but not both.

There is a moral to be drawn from what I have just done, and I shallstate it boldly because of its importance. It may seem obvious but experi-ence shows that it is, in fact, not.

Every time you are asked to prove a statement, you must ensurethat you understand what that statement is saying. This means,in particular, checking that you understand what all the words in


the statement mean.

The next point is that we are making a claim about all even numbers. Ifyou pick a few even numbers at random and square them then you will findin every case that the result is even but this does not prove our claim. Even ifyou checked a trillion even numbers and squared them and the results were alleven it wouldn’t prove the claim. Maths, remember, is not an experimentalscience. There are plenty of examples in maths of statements that look trueand are true for umpteen cases but are in fact bunkum.

This means that, in effect, we have to prove an infinite number of state-ments: 02 is even, and 22 is even, and 42 is even . . . I cannot therefore provemy claim by picking a specific even number, like 12, and checking that itssquare is even. This simply verifies one of the infinitely many statementsabove. As a result, the starting point for my proof cannot be a specific evennumber. It has to be a general even number. We are now in a position toprove our claims.

First, we prove that the square of an even number is even.

1. Let n be an even number. This is the assumption that gets the ballrolling. Notice that n is not a specific even number. We want to provesomething for all even numbers so we cannot argue with a specific one.

2. Then n = 2m for some natural number m. Here we are using thedefinition of what it means to be an even number.

3. Square both sides of the equation in (2) to get n2 = 4m2. To do thiscorrectly, you need to follow the rules of high-school algebra.

4. Now rewrite this equation as n2 = 2(2m2). This uses more basic high-school algebra.

5. Since 2m2 is a natural number, it follows that n2 is even using ourdefinition of an even number. This proves our claim.

Second, we prove that the square of an odd number is odd. I’ll provide lesscommentary than in the previous case.

1. Let n be an odd number.

2. By definition n = 2m+ 1 for some natural number m.


3. Square both sides of the equation in (2) to get n2 = 4m2 + 4m+ 1.

4. Now rewrite the equation in (3) as n2 = 2(2m2 + 2m) + 1.

5. Since 2m2 + 2m is a natural number, it follows that n2 is odd using ourdefinition of an odd number. This proves our claim.

We have therefore proved our two claims. I admit that they are notexciting but just bear with me.

2.3.2 Proof 2

We shall prove the following statement.

If the square of a number is even then that number is even, andif the square of a number is odd then that number is odd.

In fact, this is really two statements ‘If n2 is even then n is even’ and ‘If n2

is odd then n is odd’. At first reading, you might think that I am simplyrepeating what I proved above. But in Proof 1, I proved

‘if n is even then n2 is even’

whereas now I want to prove

‘if n2 is even then n is even’.

Our assumptions in each case are different and our conclusions in each caseare different. It is therefore important to distinguish between A ⇒ B andB ⇒ A. The statement B ⇒ A is called the converse of the statementA ⇒ B. Experience shows that people are prone to swapping assumptionsand conclusions without being aware of it.

We prove the first claim.

1. Suppose that n2 is even.

2. Now it is very tempting to try and use the definition of even here, justas we did in Proof 1, and write n2 = 2m for some natural number m.But this turns out to be a dead-end. Just like playing a game such aschess, not every possible move is a good one. Choosing the right movecomes with experience and sometimes just plain trial-and-error.


3. So we make a different move. We know that n is either odd or even.Our goal is to prove that it must be even.

4. Could n be odd? The answer is no, because as we showed in Proof 1,if n is odd then, as we showed above, n2 is odd.

5. Therefore n is not odd.

6. But a number that is not odd must be even. It follows that n is even.

We use a similar strategy to prove the second claim.

The proofs here were more subtle, and less direct, than in our first ex-ample and they employed the following important strategy: if there are twopossibilities exactly one of which is true; we rule out one of those possibilitiesand so deduce that the other possibility must be true.1

Here is a concrete example. There are two politicians, Alice and Bob.One of them always lies and the other always tells the truth. Suppose youask Bob the question: is it true that 2 + 2 = 5? If he replies ‘yes’ then youknow Bob is lying. Without further ado, you can deduce that Alice is thatparagon of politicians and always tells the truth.

If A ⇒ B and B ⇒ A then we say that A if, and only, if B or A iff Bor A ⇔ B. The use of the word iff is peculiar to mathematical English. Ifwe combine Proofs 1 and 2, we have proved the following two statements forall natural numbers n: ‘n is even if, and only if, n2 is even’ and ‘n is odd if,and only if, n2 is odd’.

It is important to remember that the statement ‘A if, and only, if B’ is infact two statements in one. It means (1) ‘A implies B’ and (2) ‘B impliesA’. So, to prove the statement ‘A if and only if B’ we have to prove TWOstatements: we have to prove ‘A implies B’ and we have to prove ‘B impliesA’.

The results of this example were trickier to prove than the previous ones,but not much more exciting. However, we have now laid the foundations fora truly remarkable result.

1This might be called the Sherlock Holmes method. “How often have I said to you thatwhen you have eliminated the impossible, whatever remains, however improbable, mustbe the truth?” The Sign of Four, 1890.


2.3.3 Proof 3

We shall now prove our first real theorem.

√2 cannot be written as an exact fraction.

If you square each of the fractions in turn

3

2,7

5,17

12,41

29, . . .

you will find that you get closer and closer to 2 and so each of these numbersis an approximation to the square root of 2. This raises the question: isit possible to find a fraction x

ywhose square is exactly 2? In fact, it isn’t

but that isn’t proved just because my attempts above failed. Maybe, I justhaven’t looked hard enough. So, I have to prove that it is impossible. Toprove that

√2 is not an exact fraction, I am actually going to begin by trying

to show you that it is.

1. Suppose that√

2 = xy

where x and y are positive whole numbers wherey 6= 0.

2. We may assume that xy

is a fraction in its lowest terms so that the onlynatural number that divides both x and y is 1. Keep your eye on thisassumption because it will come back to sting us later.

3. Square both sides of the equation in (2) to get 2 = x2

y2.

4. Multiply both sides of the equation in (3) by y2.

5. We therefore get the equation 2y2 = x2.

6. Since 2 divides the lefthandside of this equation, it must divide therighthandside. This means that x2 is even.

7. We now use Proof 2 to deduce that x is even.

8. We may therefore write x = 2u for some natural number u.

9. Substitute this value for x we have found in (5) to get 2y2 = 4u2.

10. Divide both sides of the equation in (9) by 2 to get y2 = 2u2.


11. Since the righthand-side of the equation in (10) is even so is the left-handside. Thus y2 is even.

12. Since y2 is even, it follows by Proof 2, that y is even.

13. If (1) is true then we are led to the following two conclusions. From (2),we have that the only natural number to divide both x and y is 1. From(7) and (12), 2 divides both x and y. This is a contradiction. Thus (1)cannot be true. Hence

√2 cannot be written as an exact fraction.

This result is phenomenal. It says that no matter how much money youspend on a computer it will never be able to calculate the exact value of√

2, just a very, very good approximation. We now make a very importantdefinition. A real number that is not rational is called irrational. We havetherefore proved that

√2 is irrational.

2.3.4 Proof 4

We now prove our second real theorem.

The sum of the angles in a triangle add up to 180◦.

This is a famous result that everyone knows. You might have learnt aboutit at school by drawing lots of triangles and measuring their angles but asI said above, maths is not an experimental science and so this enterprizeproves nothing. The proof I give is very old and occurs in Euclid’s book theElements: specifically, Book I, Proposition 32. Draw a triangle and call itsthree angles α, β and γ respectively.

α γ

β

Our goal is to prove that

α + β + γ = 180◦.

In fact, we shall show that the three angles add up to a straight line whichis the same thing. Draw a line through the point P parallel to the base ofthe triangle.


α γ

β

P

Then extend the two sides of the triangle that meet at the point P as shown.

α γ

β

β′γ′ α′

As a result, we get three angles that I have called α′, β′ and γ′. I now makethe following claims

• β′ = β because the angles are opposite each other in a pair of inter-secting straight line.

• α′ = α because these two angles are formed from a straight line cuttingtwo parallel lines.

• γ′ = γ for the same reason as above.

But since α′ and β′ and γ′ add up to give a straight line, we have proved theclaim.

Now this is all well and good, but we have proved our result on the basisof three other results currently unproved:

1. That given a line l and a point P not on that line I may draw a linethrough the point P and parallel to l.

2. If two line intersect, then opposite angles are equal.

3. If a line l cuts two parallel lines l1 and l2 the angle l makes with l1 isthe same as the angle it makes with l2.


How do we know they are true? Result (2) can readily be proved. We shalluse the diagram below.

α γβ

δ

The proof that α = γ follows from the simple observation that α+β = β+γ.This still leaves (1) and (3). I shall say more about them later when I talkabout axioms.

2.3.5 Proof 5

The most famous theorem of them all is the one attributed to Pythagorasand proved in Book I, Proposition 47 of Euclid. We begin with a right-angledtriangle.

ca

b

We want to prove, of course, that

a2 + b2 = c2.

Consider the shape below. It has been constructed from four copies of ourtriangle and two squares of areas a2 and b2, respectively. I claim that thisshape is actually a square. First, the sides all have the same length a + b.Second, the angles at the corners are right angles by Proof 4.


a2

b2

a

b

a b

Now look at the following picture. This is also a square with sides a+ b so ithas the same area as the first square. Using Proof 4, the shape in the middlereally is a square with area c2.

c2

b

a

b a

a b

a

b

If we subtract the four copies of the original triangle from both squares, theshapes that remain must have the same areas, and we have proved the claim.


Exercises 2.3

1. Raymond Smullyan is both a mathematician and a magician. Hereare two of his puzzles. On an island there are two kinds of people:knights who always tell the truth and knaves who always lie. They areindistinguishable.

(a) You meet three such inhabitants A, B and C. You ask A whetherhe is a knight or knave. He replies so softly that you cannot makeout what he said. You ask B what A said and they say ‘he saidhe is a knave’. At which point C interjects and says ‘that’s a lie!’.Was C a knight or a knave?

(b) You encounter three inhabitants: A, B and C.A says ‘exactly one of us is a knave’.B says ‘exactly two of us are knaves’.C says: ‘all of us are knaves’.What type is each?

2. There are five houses, from left to right, each of which is painted adifferent colour, their inhabitants are called W, C, O, S and M, but notnecessarily in that order, who own different pets, drink different drinksand drive different cars.

(a) There are five houses.

(b) W lives in the red house.

(c) C owns the dog.

(d) Coffee is drunk in the green house.

(e) O drinks tea.

(f) The green house is immediately to the right (that is: your right)of the ivory house.

(g) The Oldsmobile driver owns snails.

(h) The Bentley owner lives in the yellow house.

(i) Milk is drunk in the middle house.

(j) S lives in the first house.


(k) The person who drives the Chevy lives in the house next to theman with the fox.

(l) The Bentley owner lives in a house next to the house where thehorse is kept.

(m) The Lotus owner drinks orange juice.

(n) M drives the Porsche.

(o) S lives next to the blue house.

There are two questions: who drinks water and who owns the aardvark?

3. Prove that the sum of any two even numbers is even, that the sum ofany two odd numbers is even, and that the sum of an odd and an evennumber is odd.

4. Prove that the sum of the interior angles in any quadrilateral is equalto 360◦.

5.

(a) A rectangular box has side of length 2, 3 and 7 units. What is thelength of the longest diagonal?

(b) I draw a square. Without measuring any lengths, you now haveconstruct a square that has exactly twice the area.

(c) A right-angled triangle has sides with lengths x, y and hypotenusez. Prove that if the area of the triangle is z2

4then the triangle is

isosceles.

6.

(a) Prove that the last digit in the square of a positive whole numbermust be one of 0,1,4,5,6, or 9. Is the converse true?

(b) Prove that a natural number is even if, and only if, its last digitis even.

(c) Prove that a natural number is exactly divisible by 9 if, and onlyif, the sum of its digits is divisible by 9.

7. Prove that√

3 cannot be written as an exact fraction.


8. The goal of this question is to prove Ptolomy’s theorem2. This dealswith cyclic quadrilaterals, that is those quadrilaterals whose vertices lieon a circle. With reference to the diagram below,

A

B

C

D

b

c

d

a

x

y

this theorem states that

xy = ac+ bd.

Hint. Show that on the line BD there is a point X such that the angleXAD is equal to the angle BAC. Deduce that the triangles AXD andABC are similar, and that the triangles AXB and ACD are similar.Let the distance between D and X be e. Show that

e

a=c

xand that

y − ed

=b

x.

From this, the result follows by simple algebra. To help you show thatthe triangles are similar, you will need to use Proposition III.21 fromEuclid which is illustrated by the following diagram

2Claudius Ptolomeus was a Greek mathematician and astronomer who flourishedaround 150 CE in the city of Alexandria.


9. The goal of this question is to find all Pythagorean triples. That isnatural numbers (a, b, c) such that a2 + b2 = c2. We shall do this usinggeometry by finding all the rational points on the unit circle. We shalluse the diagram below.

A

P

We have drawn a unit circle centre the origin. From the point (−1, 0),called A, we draw a line to any other point P on the circle.

(a) Show that any line passing through the point A has the equationy = t(x+ 1) where t is any real number.

(b) Show that this line intersects the circle at some point P on thecircle, different from A, when

(x, y) =

(1− t2

1 + t2,

2t

1 + t2

).

(c) Deduce that the rational points on the circle correspond to thevalues of t which are rational.

2.4. AXIOMS 37

(d) Put t = pq, in its lowest terms. Deduce that all Pythagorean triples

are obtained as the following

(r(q2 − p2), 2pqr, r(p2 + q2))

where p, q, r are any integers.

10. Take any positive natural number n; so n = 1, 2, 3, . . . If n is even,divide it by 2 to get n

2; if n is odd, multiply it by 3 and add 1 to obtain

3n+1. Now repeat this process and stop only if you get 1. For example,if n = 6 you get 6, 3, 10, 5, 16, 8, 4, 2, 1. What happens if n = 11? Whatabout n = 27? Prove that no matter what number you start with, youwill always eventually reach 1.

2.4 Axioms

At this point, I need to confront some potential problems with the idea ofproof I have been developing. Once this is done, I will then be able tocomplete the proof of Proof 4. Suppose I am trying to prove the statementS. Then I am done if I can find a theorem S1 so that S1 ⇒ S. But this raisesthe question of how I know that S1 is a theorem. This can only be because Ican find a theorem S2 such that S2 ⇒ S1. There are now three possibilities:

1. At some point I find a theorem Sn such that S ⇒ Sn. This is clearlya bad thing. In trying to prove S I have in fact used S and so haven’tproved anything at all. This is an example of circular reasoning and hasto be avoided. I can do this by organizing what I know in a hierarchy— so to prove a result, I am only allowed to use those theorems alreadyproved. In this way, I can avoid going around in circles.

2. Assuming I have avoided the above pitall, the next nasty possibility isthat I get an infinite sequence of implications:

. . .⇒ Sn ⇒ Sn−1 ⇒ . . .⇒ S1 ⇒ S.

I never actually know that S is a theorem because it is always proved interms of something else without end. This is also clearly a bad thing.I establish relative truth, a statement is true if another is true, but notabsolute truth. I clearly don’t want this to happen. But if not, then Iam led inexorably to the third possibility.


3. To prove S, I only have to prove only a finite number of implications

Sn ⇒ Sn−1 ⇒ . . .⇒ S1 ⇒ S.

But, if Sn is supposed to be a theorem then how do I know it is true ifnot in terms of something else, contradicting the assumption that thiswas supposed to be a complete argument?

I shall now delve into case (3) above in more detail, since resolving it willlead to an important insight. Maths is supposed to be about proving theo-rems but the analysis above has led us to the uncomfortable possibility thatsome things have to be accepted as true ‘because they are’ which contradictswhat I went to great trouble to rubbish earlier. Before I explain the wayout of this conundrum, let me first consider an example from an apparentlycompletely different enterprize: playing a game.

To be concrete, let’s take the game of chess. Most people have learntchess at some point even if, like me, you are not very good at it. This gameconsists of a board and some pieces. The pieces are of different types — kings,queens, knights, bishops, castles, pawns — each of which can be moved indifferent ways. To play chess means to accept the rules of chess and to movethe pieces in accordance with the rules. Whether one player wins or there isa draw is also described by the rules of chess. It’s meaningless to ask whetherthe rules of chess are true. But a move in chess is valid, which is another wayof saying true, if it is made according to those rules. This example providesa way of understanding how maths works.

Maths should be viewed as a collection of different mathematical domainseach described by its own ‘rules of the game’ which in maths are termedaxioms. These axioms are the basic assumptions on which the theory is builtand are the building blocks of all proofs within that mathematical domain.Our goal is to prove interesting theorems from those axioms.

As an example, consider Euclidean geometry. The Greeks attributed thediscovery of geometry to the Ancient Egyptians who needed it in recalculat-ing land boundaries for the purposes of tax assessment after the yearly floodof the Nile. Thus geometry probably first existed as a collection of geomet-rical methods that worked: the tax was calculated, the pyramids built andeveryone was happy. But it was the Ancient Greeks themselves who elevatedit into a mathematical science and a model of what could be achieved inmathematics. Euclid’s book the Elements codified what was known aboutgeometry into a handful of axioms and then showed that all of geometry

2.4. AXIOMS 39

could be deduced from those axioms by the use of mathematcial proof. TheElements is not only the single most important mathematics book ever writ-ten but one of the most important books — fullstop. Here is a list of the keyaxioms.

1. Two distinct points determine a unique straight line.

2. A line segment can be extended infinitely in either direction.

3. Circles can be drawn with any centre and any radius.

4. Any two right angles are equal to each other.

5. Suppose that a straight line cuts two lines l1 and l2. If the interiorangles on the same side add up to strictly less than 180◦, then if l1 andl2 are extended on that side they will eventually meet.

The last axiom needs a picture to illustrate what is going on.

l1

l2

In principle, all of the results you learnt in school about triangles and cir-cles can be proved from these axioms. I say ‘in principle’ since there werea few bugs which were later fixed by a number of mathematicians most no-tably David Hilbert. But this shouldn’t detract from what an enormousachievement Euclid’s book was and is. We may now finish off Proof 4: claim


(1) is proved in Book I, Proposition 31, and claim (3) is proved in Book I,Proposition 29.

One way of teaching maths at university would therefore be to start witha list of axioms and start proving things. But this approach has a num-ber of disadvantages: it is time-consuming, laborious, sometimes, even, abit tedious, and takes a very, very long time to reach the really interestingtheorems. Therefore, in this book, I shall usually base each topic on quitehigh-level axioms so that we can get to the interesting theorems quickly,but I shall also give pointers to readers who want to see the full axiomaticdevelopment.

Exercises 2.4

1. Hofstadter’s MU-puzzle. A string is just an ordered sequence of sym-bols. In this puzzle, you will construct strings using the letters M, I, U .You are given the string MI which is your only axiom. You can makenew strings only by using the following rules any number of times insuccession in any order:

(I) If you have a string that ends in I then you can add a U on at theend.

(II) If you have a string Mx where x is a string then you may formMxx.

(III) If III occurs in a string then you may make a new string withIII replaced by U .

(IV) If UU occurs in a string then you may erase it.

I shall write x→ y to mean that y is the string obtained from the stringx by applying one of the above four rules. Here are some examples:

• By rule (I), MI →MIU .

• By rule (II), MIU →MIUIU .

• By rule (III), UMIIIMU → UMUMU .

• By rule (IV), MUUUII →MUII.

The question is: can you make MU?

2.5. MATHEMATICS AND THE REAL WORLD 41

2.5 Mathematics and the real world

Euclidean geometry appears to be about the real world. In fact, for thou-sands of years this was what mathematicians believed until they discoveredother geometries with different properties. On the surface of a sphere, forexample, the sum of the angles in a spherical triangle will actually be biggerthan 180◦, the exact amount being determined by the area of the triangle.This result played an important role in surveying. But our analysis aboveleads us to the following conclusion:

Mathematics is about logically consistent mathematical universes.

A mathematical truth is therefore something proved in one of those math-ematical universes, and is not a truth about ‘out there’. Despite this, math-ematical truths do help us to understand the actual physical universe weinhabit. For example, does the geometry of the universe follow the rules ofEuclidean geometry? Here is what NASA says on the basis of the WilkinsonMicrowave Anisotropy Probe (WMAP):

“WMAP also confirms the predictions that the amplitude of thevariations in the density of the universe on big scales should beslightly larger than smaller scales, and that the universe shouldobey the rules of Euclidean geometry so the sum of the interiorangles of a triangle add to 180 degrees.”

http://map.gsfc.nasa.gov/news/index.html

2.6 Proving something false

‘Proving a statement true’ and ‘proving a statement false’ sound similarbut it turns out that ‘proving a statement false’ requires a lot less workthan ‘proving a statement true’. There is an asymmetry between them. Toprove a statement false all you need do is find a counterexample. Here is anexample. Consider the following statement: every odd number bigger than 1is a prime. This is false. The reason is that 9 is odd, bigger than 1, and notprime. Thus 9 is a counterexample. The number 9 here can be regarded asa witness that shows the claim to be false. To prove a statement true, youhave to work hard. To prove a statement false, you only have to find one


counterexample and you are done. (Though in research mathematics findinga counterexample can be a Herculean task).

2.7 Key points

• One of the goals of this book is to introduce you to proofs. This doesnot mean that you will afterwards be able to do proofs. That takestime and practice.

• Initially, you should aim to understand proofs. This means seeingwhy a proof is true. A good test of whether you really understand aproof is whether you can explain it to someone else. It is much easierto check that a proof is correct then it is to invent the proof in thefirst place. Nevertheless, be warned, it can also take a long time justto understand a proof.

• I shall ask you to find some proofs for yourself. But do not expect tofind them in a few minutes. Constructing proofs takes time, trial anderror and, yes, luck.

• If you don’t understand the words used in a statement that you areasked to prove then you are not going to be able to prove that state-ment. Definitions are vitally important in mathematics.

• Every statement that you make in a proof must be justified: if it is adefinition, say that it is a definition; if it is a result known to be true,that is a theorem, say that it is known to be true; if it is one of theassumptions, say that it is one of the assumptions; if it is an axiom,say that it is an axiom.

• When starting out, it is probably best to write each statement of aproof on a separate line followed by its justification.

Finally, there are one or two pieces of terminology and notation that areworth mentioning here. The conclusion of a proof is marked using the symbol2. This replaces the older use of QED. If we believe something might be truebut there isn’t yet a proof we say that it is a conjecture. The things we canprove fall, roughly, into the following categories: a theorem is a major result,worthy of note; a proposition is a result, and a lemma is an auxiliary result,

2.8. MATHEMATICAL CREATIVITY 43

a tool, useful in many different places; a corollary is a result we can deducewith little or no effort from a proposition or theorem.

2.8 Mathematical creativity

Everything I have said above is true, but does need to be placed in perspec-tive. Where do proofs come from? More to the point, where do theoremscome from? Music is a useful analogy. You can learn how to write musicdown, but that doesn’t make you a musician. In fact, there are some talentedmusicians who cannot even read music. Proofs keep us honest and groundwhat we are doing, but what makes maths fun is that it is creative, andfor creativity there are no rules. For example, in dreaming up a theorem,experimentation may well play a role. Sometimes a theorem may evolve intandem with a proof, at other times the theorem, or more accurately, theconjecture comes first and then there is the struggle to prove it, which maytake place over many generations and centuries.

2.9 Set theory: the language of mathematics

Everyday English is good at everyday jobs, but can be hopelessly impre-cise where accuracy is important. To get around this, special varieties ofEnglish, little dialects, have been constructed for particular purposes. Inmathematics, we use precise versions of everyday language augmented withspecial symbols. Part of this special language is that of set theory, inventedby Georg Cantor (1845–1918) in the last quarter of the nineteenth century.This section is mainly a phrasebook of the most important terms we shallneed for most of this book. I shall develop this language further when I needto when studying combinatorics.

The starting point of set theory are the following two deceptively simpledefinitions:

• A set is a collection of objects which we wish to regard as a whole. Themembers of a set are called its elements3.

• Two sets are equal precisely when they have the same elements.

3Strictly speaking this definition is nonsense. Why?


We often use capital letters to name sets: such as A, B, or C or fancy capitalletters such as N and Z. The elements of a set are usually denoted by lowercase letters. If x is an element of the set A then we write

x ∈ A

and if x is not an element of the set A then we write

x /∈ A.

A set should be regarded as a bag of elements, and so the order of theelements within the set is not important. In addition, repetition of elementsis ignored.4

Examples 2.9.1.

1. The following sets are all equal: {a, b}, {b, a}, {a, a, b}, {a, a, a, a, b, b, b, a}because the order of the elements within a set is not important and anyrepetitions are ignored. Despite this it is usual to write sets withoutrepetitions to avoid confusion. We have that a ∈ {a, b} and b ∈ {a, b}but α /∈ {a, b}.

2. The set {} is empty and is called the empty set. It is given a specialsymbol ∅, which is taken from Danish and is the first letter of the Danishword meaning ‘empty’. Remember that ∅ means the same thing as {}.Take careful note that ∅ 6= {∅}. The reason is that the empty setcontains no elements whereas the set {∅} contains one element. By theway, the symbol for the emptyset is different from the Greek letter phi:φ or Φ.

The number of elements in a set is called its cardinality. If X is a setthen |X| denotes its cardinality. A set is finite if it only has a finite numberof elements, otherwise it is infinite. If a set has only finitely many elementsthen we might be able to list them if there aren’t too many: this is done byputting them in ‘curly brackets’ { and }. We can sometimes define infinitesets by using curly brackets but then, because we can’t list all elements inan infinite set, we use ‘. . .’ to mean ‘and so on in the obvious way’. This canalso be used to define finite sets where there is an obvious pattern. Often,

4If you want to take account of repetitions you have to use multisets.

2.9. SET THEORY: THE LANGUAGE OF MATHEMATICS 45

we describe a set by saying what properties an element must have to belongto the set. Thus

{x : P (x)}

means ‘the set of all things x which satisfy the condition P ’. Here are someexamples of sets defined in various ways.

Examples 2.9.2.

1. D = {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sun-day }, the set of the days of the week. This is a small finite set and sowe can conveniently list its elements.

2. M = { January, February, March, . . . , November, December }, the setof the months of the year. This is a finite set but I didn’t want to writedown all the elements so I wrote ‘. . . ’ to indicate that there were otherelements of the set which I was too lazy to write down explicitly butwhich are, nevertheless, there.

3. A = {x : x is a prime number}. I define a set by describing the proper-ties that the elements of the set must have. Here P (x) is the statement‘x is a prime number’ and those natural numbers x are admitted mem-bership to the set when they are indeed prime.

In this book, the following sets of numbers will play a special role. Weshall use this notation throughout and so it is worthwhile getting used to it.

Examples 2.9.3.

1. The set N = {0, 1, 2, 3, . . .} of all natural numbers.

2. The set Z = {. . . ,−3,−2,−1, 0, 1, 2, 3, . . .} of all integers. The reasonZ is used to designate this set is because ‘Z’ is the first letter of theword ‘Zahl’, the German for number.

3. The set Q of all rational numbers i.e. those numbers that can be writtenas fractions whether positive or negative.

4. The set R of all real numbers i.e. all numbers which can be representedby decimals with potentially infinitely many digits after the decimalpoint.


5. The set C of all complex numbers, which I shall introduce from scratchlater on.

Given a set A, a new set B can be formed by choosing elements fromA to put in B. We say that B is a subset of A, which is written B ⊆ A.In mathematics, the word ‘choose’ also includes the possibilty of choosingnothing and the possibility of choosing everything. In addition, there doesn’thave to be any rhyme or reason to your choices: you can pick elements ‘atrandom’ if you want. If B ⊆ A and A 6= B then we say that B is a propersubset of A.

Examples 2.9.4.

1. ∅ ⊆ A for every set A, where we choose no elements from A. It is avery common mistake to forget the empty set when listing subsets of aset.

2. A ⊆ A for every set A, where we choose all the elements from A. It isa very common mistake to forget the set itself when listing subsets ofa set.

3. N ⊆ Z ⊆ Q ⊆ R ⊆ C.

4. E, the set of even natural numbers, is a subset of N.

5. O, the set of odd natural numbers, is a subset of N.

6. P = {2, 3, 5, 7, 11, 13, 17, 19, 23, . . .}, the set of primes, is a subset of N.

7. A = {x : x ∈ R and x2 = 4} which is just the set {−2, 2}.

There is a particular kind of subset that will be convenient to define now.If A and B are sets we define the set A \B to consist of those elements of Athat are not in B. Thus, in particular, A \ B ⊆ A. The operation is calledrelative complement. For example, N \E = O. The set R \Q is precisely theset of irrational numbers.

When set theory is first encountered it doesn’t look very impressive. Whatcould you possibly do with these very simple, if not simple-minded, defini-tions? In fact, all of mathematics can be developed using set theory. I amgoing to finish off this section with a first glimpse at the power of set theory.


Consider the set {a, b}. I have explained above that order doesn’t matterand so this is the same set as {b, a}. But there are many occasions wherewe do want order to matter. For example, in the Olympics it is importantto know who came first and second in the 100m sprint not merely that thefirst two over the finishing line were X and Y in alphabetical order. So weneed a new notion where order does matter. It is called an ordered pair andis written (a, b), where a is called the first component and b is called thesecond component. The key feature of this new object is that (a, b) = (c, d)if, and only if, a = c and b = d. So, order matters. For example, the or-dered pair (1, 2) is different from the ordered pair (2, 1). Furthermore, (1, 1)does not mean the same as 1 on its own. The idea of an ordered pair is afamiliar one from co-ordinate geometry. We use ordered pairs of real num-bers (x, y) to specifiy points in the plane. At first blush, set theory seemsinadequate to define ordered pairs. But in fact it can. I have put the detailsin a box and you don’t need to read them to understand the rest of the book.

Ordered Pairs

I am going to show you how sets, which don’t encode order directly,can nevertheless be used to define ordered pairs. It is an idea due toKuratowski (1896–1980). Define

(a, b) = {{a}, {a, b}}.

We have to prove, using only this definition, that we have (a, b) = (c, d)if, and only if, a = c and b = d. The proof is essentially an exercise inspecial cases. I shall prove the hard direction. Suppose that

{{a}, {a, b}} = {{c}, {c, d}}.

Since {a} is an element of the lefthand side it must be an element of therighthand side. So {a} ∈ {{c}, {c, d}}. There are now two possibilities.Either {a} = {c} or {a} = {c, d}. The first case gives us that a = c, andthe second case gives us that a = c = d. Since {a, b} is an element of thelefthand side it must be an element of the righthand side. So {a, b} ∈{{c}, {c, d}}. There are again two possibilities. Either {a, b} = {c} or{a, b} = {c, d}. The first case gives us that a = b = c, and the second


case gives us that (a = c and b = d) or (a = d and b = c). We thereforehave the following possibilities:

• a = b = c. But then {{a}, {a, b}} = {{a}}. It follows that c = dand so a = b = c = d and, in particular, a = c and b = d.

• a = c and b = d.

• In all remaining cases, a = b = c = d and so, in particular, a = cand b = d.

We can now build sets of ordered pairs. Let A and B be sets. DefineA×B, the product of A and B, to be the set

A×B = {(a, b) : a ∈ A and b ∈ B}.

Example 2.9.5. Let A = {1, 2, 3} and let B = {a, b}. Then

A×B = {(1, a), (1, b), (2, a), (2, b), (3, a), (3, b)}

andB × A = {(a, 1), (1, b), (a, 2), (b, 2), (a, 3), (b, 3)}.

So, in particular, A×B 6= B × A, in general.

If A = B it is natural to abbreviate A× A as A2. This now agrees withthe notation R2 which is the set of all ordered pairs of real numbers and,geometrically, can be regarded as the real plane.

We have defined ordered pairs but there is no reason to stop with justpairs. We may also define ordered triples. This can be done by defining

(x, y, z) = ((x, y), z).

The key property of ordered triples is that if (a, b, c) = (d, e, f) then a = d,b = e and c = f . Given three sets A, B and C we may define their productA × B × C to be the set of all ordered triples (a, b, c) where a ∈ A, b ∈ Band c ∈ C. A good example of an ordered triple in everyday life is a datethat consist of a day, a month and a year. Thus the 16th June 1904 is reallyan ordered triple (16, June, 1904) where we specify day, month and year inthat order. If A = B = C then we write A3 rather than A × A × A. Thusthe set R3 consists of all Cartesian co-ordinates (x, y, z). In general, we maydefine ordered n-tuples, which look like this (x1, . . . , xn), and products of n-sets A1 × . . . × An. And if A1 = . . . = An then we write An for their n-foldproduct.


Russell’s Paradox

There is more to sets than meets the eye. I shall now describe a famousresult in the history of mathematics called Russell’s Paradox. Definethe following

R = {x : x /∈ x},

in other words: the set of all sets that do not contain themselves as anelement. For example, ∅ ∈ R. We now ask the question: is R ∈ R?Before resolving this question, let’s back off a bit and ask what it meansfor X ∈ R. From the entry requirements, we would have to show thatX /∈ X . Putting X = R we deduce that R ∈ R is true only if R /∈ R.Since this is an evident contradiction, we are inclined to deduce thatR /∈R. However, if R /∈ R then in fact R satisfies the entry requirementsto be an element of R and so R ∈ R. Thus exactly one of R ∈ R andR /∈ R must be true but assuming one is true implies the other is true.We therefore have an honest-to-goodness contradiction. Our only wayout is to conclude that, whatever R might be, it is not a set. But thisin turn contradicts my definition of a set as a collection of objects sinceR is a collection of objects. If you want to understand how to escapethis predicament, you will have to study set theory. Disconcerting as thismight be to you, imagine how much more so it was to the mathematicianGottlob Frege (1848–1925). He was working on a book which based thedevelopment of maths on sets when he received a letter from Russelldescribing this paradox and undermining what Frege was attempting toachieve.

Bertrand Russell himself was an Anglo-Welsh philosopher born in1872, when Queen Victoria still had another thirty years on the throneas ‘Queen empress’, and who died in 1970 a few months after Neil Arm-strong stepped onto the moon. As a young man he made importantcontributions to the foundations of mathematics but in the course of hisextraordinary life he found time to stand for parliament, encouraged thephilosopher Ludwig Wittgenstein, received two prison sentences, wonthe Nobel prize for literature, was the first president of CND, and cam-paigned against the Vietnam war. See Russell: a very short introductionby A. C. Grayling published by OUP, 2002, for a very short introduction.

I shall conclude this section by touching on a fundamental notion of math-ematics: that of a function. I shall approach it by first defining something


more general.Let A and B be any sets. By definition a subset X ⊆ A × B is called

a relation from A to B. To motivate this definition, and new terminology, Ishall consider an example.

Example 2.9.6. Let A be the set {A(dam),B(eth),C(ate),D(ave)} of peo-ple. Let B be the set {a(apples), b(ananas), o(ranges)} of fruit. Define X tobe the following set of ordered pairs

{(A, a), (A, o), (B, b), (D, a), (D, b), (D, o)}

which tells us who likes which fruit. Thus, for example, Adam likes applesand oranges (but not bananas) and Cate doesn’t like any of the fruit on offer.It is pretty irresistible to represent this information by means of a directedgraph, such as the one below. Clearly, such graphs can be drawn to representany relation. The term ‘relation’ is now explained by the fact that X tellsus how the elements of A are related to the elements of B. In this case, therelation is ‘likes to eat’.

D

C

B

A

b

o

a

Let X be a relation from A to B. We say that X is a function if it satisfiestwo additional conditions: first, for each a ∈ A there is at least one b ∈ Bsuch that (a, b) ∈ X; second, if (a, b), (a, c) ∈ X then b = c. If we think backto the graph in our example above, then the first condition says that everyelement in A is at the base of an arrow, and the second condition says thatfor each element in A is never at the base of two, or more, arrows. Slightlydifferent notation is used when dealing with functions. Rather than thinkingof ordered pairs, we think instead of inputs and outputs. Thus a functionfrom A to B is determined when for each a ∈ A there is associated exactlyone element b ∈ B. We think of a as the input and b as the corresponding,uniquely determined, output. If we denote our function by f then we writeb = f(a) or that a 7→ b. Thus the corresponding relation is the set of all


ordered pairs (a, f(a)) where a ∈ A We call the set A the domain of thefunction and the set B the codomain of the function. We write f : A→ B or

Af→ B.

Example 2.9.7. Here is an example of a function f : A→ B. Let A be theset of all students in the lecture theatre at this time. Let B be the set ofnatural numbers. Then f is defined when for each student a ∈ A we associatetheir age f(a). We can see why this is precisely a function and not merely arelation. First, everyone has an age and, assuming they don’t lie, they haveexactly one age. On the other hand, if we kept A as it is and let B be theset of nationalities then we will no longer have a function in general. Somepeople might be stateless, but even if we include that as a possibility in theset B, we still won’t necessarily have a function since some people own morethan one passport.

Exercises 2.9

1. LetA = {♣,♦,♥,♠}, B = {♠,♦,♣,♥} and C = {♠,♦,♣,♥,♣,♦,♥,♠}.Is it true or false that A = B and B = C? Explain.

2. Let X = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Write down the following subsets ofX:

(a) The subset A of even elements of X.

(b) The subset B of odd elements of X.

(c) C = {x : x ∈ X and x ≥ 6}.(d) D = {x : x ∈ X and x > 10}.(e) E = {x : x ∈ X and x is prime}.(f) F = {x : x ∈ X and (x ≤ 4 or x ≥ 7)}.

3. (a) Find all subsets of {a, b}. How many are there? Write down alsothe number of subsets with respectively 0, 1 and 2 elements.

(b) Find all subsets of {a, b, c}. How many are there? Write down alsothe number of subsets with respectively 0, 1, 2 and 3 elements.


(c) Find all subsets of the set {a, b, c, d}. How many are there? Writedown also the number of subsets with respectively 0, 1, 2, 3 and4 elements.

(d) What patterns do you notice arising from these calculations?

4. If the set A has m elements and the set B has n elements how manyelements does the set A×B have?

5. If A has m elements, how many elements does the set An have?

6. Prove that that two sets A and B are equal if, and only if, A ⊆ B andB ⊆ A.

2.10 Proof by induction

This is a method of proof that, although useful, does not always deliver muchinsight into why something is true. The basis of this method is the following:

Let X be a subset of N that satisfies the following two conditions:first, 0 ∈ X, and second if n ∈ X then n+ 1 ∈ X. Then X = N.

This fact is called the induction principle, and can be viewed as one of thebasic axioms describing the natural numbers.

We may use it as a proof technique in the following way. Suppose wehave an infinite number of statements S0, S1, S2, . . . which we want to prove.By the induction principle, it is enough to do two things:

1. Show that S0 is true.

2. Show that if Sn is true then Sn+1 is also true.

It will then follow that Si is true for all positive i.Proofs by induction have the following script:

Base step Show that the case n = 0 holds.

Induction hypothesis (IH) Assume that the case where n = k holds.

Proof bit Now use (IH) to show that the case where n = k + 1 holds.

Conclude that the result holds for all n by the induction principle.

2.10. PROOF BY INDUCTION 53

Example 2.10.1. Prove by induction that n3 + 2n is exactly divisible by 3for all natural numbers n ≥ 0.

Base step: when n = 0, we have that 03 + 2 · 0 = 0 which is exactlydivisible by 3.

Induction hypothesis: assume result is true for n = k. We prove it forn = k + 1. We need to prove that (k + 1)3 + 2(k + 1) is exactly divisibleby 3 assuming only that k3 + 2k is exactly divisible by 3. We first expand(k + 1)3 + 2(k + 1) to get

k3 + 3k2 + 3k + 1 + 2k + 2.

This is equal to(k3 + 2k) + 3(k2 + k + 1)

which is exactly divisible by 3 using the induction hypothesis.

In practice, some simple variants of this principle are used. Rather thanthe whole set N, we often work with a set of the form

N≥k = N \ {0, 1, . . . , k − 1}

where k ≥ 1. Our induction principle is modified accordingly: a subset X ofN≥k that contains k and contains n+1 whenever it contains n must be equalto the whole of N≥k. In our script above, the base step involves checking thecase where n = k.

What I described above I shall call basic induction. There is also some-thing called the strong induction principle which runs as follows:

Let X be a subset of N that satisfies the following two conditions:first, 0 ∈ X and second, if {0, 1 . . . , n} ⊆ X, where n ≥ 1, then{0, 1 . . . , n+ 1} ⊆ X. Then X = N.

Finally, there is the well-ordering principle that states that every non-empty subset of the natural numbers has a smallest element.

Induction, strong induction and well-ordering look very different fromeach other. In fact, they are equivalent and all useful in proving theorems.

Proposition 2.10.2. The following are equivalent.

1. The induction principle.

2. The strong induction principle.


3. The well-ordering principle.

Proof. (1)⇒(2). I shall assume that the induction principle holds and provethat the strong induction principle holds. Let X ⊆ N be such that 0 ∈ Xand and if {0, 1 . . . , n} ⊆ X, where n ≥ 1, then {0, 1 . . . , n + 1} ⊆ X. Weshall use induction to prove that X = N. Let Y ⊆ N consist of all naturalnumbers n such that {0, 1, . . . , n} ⊆ X. We have that 0 ∈ Y and we havethat n + 1 ∈ Y whenever n ∈ Y . By induction, we deduce that Y = N. Itfollows that X = N.

(2)⇒(3). I shall assume that the strong induction principle holds andprove that the well-ordering principle holds. Let X ⊆ N be a subset that hasno smallest element. I shall prove that X must be empty. Put Y = N \X. Iclaim that 0 ∈ Y . If not, then 0 ∈ X and that would obviously have to be thesmallest element, which is a contradiction. Suppose that {0, 1, . . . , n} ⊆ Y .Then we must have that n + 1 ∈ X because otherwise n + 1 would be thesmalest element of X. We now invoke strong induction to deduce that Y = Nand so X = ∅.

(3)⇒(1). I shall assume the well-ordering principle and prove the induc-tion principle. Let X ⊆ N be a subset such that 0 ∈ X and whenever n ∈ Xthen n + 1 ∈ X. Suppose that N \ X is non-empty. Then it would have asmallest element k say. But then k − 1 ∈ X and so, by assumption, k ∈ X,which is a contradiction. Thus N \X is empty and so X = N.

Strong induction will be used in a few places in this book but I will discussit in more detail when needed.

Exercises 2.10

1. Prove that for each natural number n ≥ 3, we have that

n2 > 2n+ 1.

2. Prove that for each natural number n ≥ 5, we have that

2n > n2.

3. Prove that for each natural number n ≥ 1, the number 4n+2 is divisibleby 3.

2.10. PROOF BY INDUCTION 55

4. Prove that

1 + 2 + 3 + . . .+ n =n(n+ 1)

2.

5. Prove that2 + 4 + 6 + . . .+ 2n = n(n+ 1).

6. Prove that

13 + 23 + 33 + . . .+ n3 =

(n(n+ 1)

2

)2

.

7. Prove that a set with n ≥ 0 elements has exactly 2n subsets.

Chapter 3

High-school algebra revisited

In this chapter, I will review some of the basic constructions from high-schoolalgebra from the perspective of this book.

3.1 The rules of the game

3.1.1 The axioms

Algebra deals with the manipulation of symbols. This means that symbolsare altered and combined according to certain rules. In high-school, the alge-bra you studied was mainly based on the properties of the real numbers. Thismeans that when you write x you mean an unknown or yet-to-be-determinedreal number. In this section, I shall describe the rules, or axioms, that youuse for doing algebra with real numbers. The primary operations we areinterested in are addition x + y and multiplication x × y. As usual, I shallabbreviate the operation of multiplication by concatenation, which simplymeans we write xy. Sometimes, it is helpful to denote multiplication asfollows x · y. Of course, there are two other familiar operations: subtrac-tion and division. We shall see that these should be treated in a differentway: subtraction as the inverse of addition, and division as the inverse ofmultiplication.

Both addition and multiplication require two inputs and then deliverone output with the inputs and outputs all being taken from the same set.They are therefore examples of what are called binary operations and arethe commonest kinds of operations in algebra. For example, as we shall see

57

58 CHAPTER 3. HIGH-SCHOOL ALGEBRA REVISITED

later, matrix addition and matrix multiplication are both binary operations,the vector product of two vectors is a binary operation, and the intersectionand union of two sets are both binary operations. A binary operation on aset X is nothing other than a function from X × X to X. I shall use ∗ tomean any binary operation defined on some specified set X. We usually writebinary operations between the inputs rather than using the usual functionalnotation.

a

ba ∗ b∗

The two most important properties a binary operation may have is com-mutativity and associativity.

A binary operation is commutative if

a ∗ b = b ∗ a

in all cases. That is, the order in which you carry out the operation isnot important. Addition and multiplication of real, and as we shall see later,complex numbers are commutative. But we shall also meet binary operationsthat are not commutative: both matrix multiplication and vector productsare examples. Commutativity is therefore not automatic.

A binary operation is associative if

(a ∗ b) ∗ c = a ∗ (b ∗ c)

in all cases. Remember that the brackets tell you how to work out theproduct. Thus (a ∗ b) ∗ c means first work out a ∗ b, let’s call it d, and thenwork out d ∗ c. Almost all the binary operations we shall meet in this bookare associative, the one important exception being the vector product.

In order to show that a binary operation ∗ is associative, we have to checkthat all possible products (a ∗ b) ∗ c and a ∗ (b ∗ c) are equal. To show that abinary operation is not associative, we simply have to find specific values fora, b and c so that (a ∗ b) ∗ c 6= (a ∗ b) ∗ c. Here are examples of both of thesepossibilities.

Example 3.1.1. Let’s take the set or real numbers R and investigate a newbinary operation denoted by ◦ that is defined as follows

a ◦ b = a+ b+ ab.

3.1. THE RULES OF THE GAME 59

We shall prove that it is associative. First, we have to understand what it iswe have to show. From the definition of associativity, we have to prove that

(a ◦ b) ◦ c = a ◦ (b ◦ c)

for all real numbers a, b and c. To do this, we calculate first the lefthand sideand then the righthand side and then verify they are equal. Because we aretrying to prove a result true for all real numbers, we cannot choose specificvalues of a, b and c. We first calculate (a ◦ b) ◦ c. Using the axioms for realnumbers, we get that

(a ◦ b) ◦ c = (a+ b+ ab) ◦ c = (a+ b+ ab) + c+ (a+ b+ ab)c

which is equal to a+ b+ c+ ab+ ac+ bc+ abc. Now we calculate a ◦ (b ◦ c).We get that

a ◦ (b ◦ c) = a ◦ (b+ c+ bc) = a+ (b+ c+ bc) + a(b+ c+ bc)

which is equal to a+ b+ c+ ab+ ac+ bc+ abc. We now see that we get thesame answers however we bracket the product and so we have proved thatthe binary operation ◦ is associative.

Example 3.1.2. Let’s take the set N and define the binary operation ⊕ asfollows

a⊕ b = a2 + b2.

I shall show that this binary operation is not associative. Let’s calculate first(1⊕ 2)⊕ 3. By definition this is computed as follows

(1⊕ 2)⊕ 3 = (12 + 22)⊕ 3 = 5⊕ 3 = 52 + 32 = 25 + 9 = 34.

Now we calculate 1⊕ (2⊕ 3) as follows

1⊕ (2⊕ 3) = 1⊕ (22 + 32) = 1⊕ (4 + 9) = 1⊕ 13 = 12 + 132 = 1 + 169 = 170.

Therefore(1⊕ 2)⊕ 3 6= 1⊕ (2⊕ 3).

It follows that the binary operation ⊕ is not associative.

We are now ready to state the algebraic axioms that form the basis ofhigh-school algebra. We shall split them up into three groups: those dealingonly with addition, those dealing only with multiplication, and finally thosethat deal with both operations together.


Axioms for addition

(F1) Addition is associative. Let x, y and z be any real numbers. Then(x+ y) + z = x+ (y + z).

(F2) There is an additive identity. The number 0 (zero) is the additiveidentity. This means that for an real number x we have that x + 0 =x = 0 + x. Thus adding zero to a number leaves it unchanged.

(F3) Each element has a unique additive inverse. This means that for eachnumber x there is another number, written −x, with the property thatx+(−x) = 0 = (−x)+x. The number −x is called the additive inverseof the number x.

(F4) Addition is commutative. Let x and y be any real numbers. Thenx + y = y + x. The word commutative means that the order in whichyou add the numbers does not matter.

The first thing to understand is that none of these axioms should besurprising. They should all agree with your intuition.

Axioms for multiplication

(F5) Multiplication is associative. Let x, y and z be any real numbers. Then(xy)z = x(yz).

(F6) There is a multiplicative identity. The number 1 is the multiplicativeidentity. This means that for any real number x we have that 1x =x = x1.

(F7) Each non-zero number has a unique multiplicative inverse. Let x 6= 0.Then there is a unique real number written x−1 with the property thatx−1x = 1 = xx−1. The number x−1 is called the multiplicative inverseof x. It is, of course, the number 1

x. It is very important to observe

that zero does not have a multiplicative inverse.

(F8) Multiplication is commutative. Let x and y be any real numbers. Thenxy = yx. Once again the word commutative means that the order inwhich you carry out the operations doesn’t matter. In this case, theoperation is multiplication.


The axioms for multiplication are very similar to those for addition. Theonly real difference between them is axiom (F7). This expresses the fact thatyou cannot divide by zero.

Linking axioms

(F9) 0 6= 1.

(F10) The additive identity is a multiplicative zero. This means that 0x =0 = x0. If you multiply any real number by 0 then you get 0.

(F11) Multiplication distributes over addition on the left and the right. Thereare actually two distributive laws: the left distributive law

x(y + z) = xy + xz

and the right distributive law

(y + z)x = yx+ zx.

Let me come back to the omission of subtraction and division. These arenot viewed as binary operations in their own right. Instead, we define a− bto mean a + (−b). Thus to subtract b means the same thing as adding −b.Likewise, we define a÷ b, when b 6= 0 to mean a× b−1. Thus to divide by bis to multiply by b−1.

We have missed out one further ingredient in algebra, and that is theproperties of equality.

Properties of equality

(E1) If a = b then c+ a = c+ b.

(E2) If a = b then ca = cb.

Example 3.1.3. When I talked about algebra in Chapter 1, I mentionedthat the usual way of solving a linear equation in one unknown depended onthe properties of real numbers. Let me now show you how we use the above


axioms to solve ax+b = 0 where a 6= 0. Throughout, I use without commentthe two properties of equality I have listed above.

ax+ b = 0

(ax+ b) + (−b) = 0 + (−b) by (F3)

ax+ (b+ (−b)) = 0 + (−b) by (F1)

ax+ 0 = 0 + (−b) by (F3)

ax = 0 + (−b) by (F2)

ax = −b by (F2)

a−1(ax) = a−1(−b) by (F10) since a 6= 0

(a−1a)x = a−1(−b) by (F5)

1x = a−1(−b) by (F10)

x = a−1(−b) by (F5)

I don’t propose that you go into quite such gory detail when solvingequations, but I wanted to show you what actually lay behind the rules thatyou might have been taught at school.

Example 3.1.4. We can use our axioms to prove that−1×−1 = 1 somethingwhich is hard to understand in any other way. By definition, −1 is theadditive inverse of 1. This means that 1 + (−1) = 0. Let us calculate(−1)(−1)− 1. We have that

(−1)(−1)− 1 = (−1)(−1) + (−1) by definition of subtraction

= (−1)(−1) + (−1)1 since 1 is the multiplicative identity

= (−1)[(−1) + 1] by the left distributivity law

= (−1)0 by properties of additive inverses

= 0 by properties of zero

Hence (−1)(−1) = 1. In other words, the result follows from the usual rulesof algebra.

My final example explains the reason for the prohibition about dividingby zero.

Example 3.1.5. The following fallacious ‘proof’ shows that 1 = 2.

1. Let a = b.


2. Then a2 = ab when we multiply both sides by a.

3. Now add a2 to both sides to get 2a2 = a2 + ab.

4. Subtract 2ab from both sides to get 2a2 − 2ab = a2 + ab− 2ab.

5. Thus 2(a2 − ab) = a2 − ab.

6. We deduce that 2 = 1 by cancelling.

The source of the problem is in passing from line (5) to line (6). We are infact dividing by zero and this is the source of the problem.

3.1.2 Indices

We usually write a2 rather than aa, and a3 instead of aaa. In this section,I want to review the meaning of algebraic expressions such as a

rs where r

sis

any rational number. Our starting point is a result that I would encourageyou to assume as an axiom at a first reading. I have included the proof toshow you a more sophisticated example of proof by induction.

Lemma 3.1.6 (Generalized associativity). Let ∗ be any binary operationdefined on a set X. If ∗ is associative then however you bracket a productsuch as

x1 ∗ . . . ∗ xnyou will always get the same answer.

Proof. If x1, x2, · · · , xn are elements of the set X then one particular brack-eting will play an important role in our proof

x1 ∗ (x2 ∗ (· · · (xn−1 ∗ xn) · · · ))

which we write as [x1x2 . . . xn].The proof is by strong induction on the length n of the product in ques-

tion. The base case is where n = 3 and is just an application of the associativelaw. Assume that n ≥ 4 and that for all k < n, all bracketings of a sequenceof k elements of X lead to the same answer. This is therefore the induc-tion hypothesis for strong induction. Let X denote any properly bracketedexpression obtained by inserting brackets into the sequence x1, x2, · · · , xn.Observe that the computation of such a bracketed product involves comput-ing n − 1 products. This is because at each step we can only compute the


product of adjacent letters xi ∗ xi+1. Thus at each step of our calculationwe reduce the number of letters by one until there is only one letter left.However the expression may be bracketed, the final step in the computationwill be of the form Y ∗Z, where Y and Z will each have arisen from properlybracketed expressions. In the case of Y it will involve a bracketing of somesequence x1, x2, . . . , xr, and for Z the sequence xr+1, xr+2, . . . xn for some rsuch that 1 ≤ r ≤ n − 1. Since Y involves a product of length r < n, wemay assume by the induction hypothesis that Y = [x1x2 . . . xr]. Observe that[x1x2 . . . xr] = x1 ∗ [x2 . . . xr]. Hence by associativity,

X = Y ∗ Z = (x1 ∗ [x2 . . . xr]) ∗ Z = x1 ∗ ([x2 . . . xr] ∗ Z).

But [x2 . . . xr] ∗ Z is a properly bracketed expression of length n − 1 inx2, · · · , xn and so using the induction hypothesis must equal [x2x3 . . . xn].It follows that X = [x1x2 . . . xn]. We have therefore shown that all possiblebracketings yield the same result in the presence of associativity.

We illustrate a special case of the above proof in the example below.

Example 3.1.7. Take n = 5. Then the notation [x1x2x3x4x5] introducedin the above proof means x1 ∗ (x2 ∗ (x3 ∗ (x4 ∗ x5))). Consider the product((x1 ∗ x2) ∗ x3) ∗ (x4 ∗ x5). Here we have Y = (x1 ∗ x2) ∗ x3 and Z = x4 ∗ x5.By associativity Y = x1 ∗ (x2 ∗ x3). Thus Y ∗Z = (x1 ∗ (x2 ∗ x3)) ∗ (x4 ∗ x5).But this is equal to x1 ∗ ((x2 ∗ x3) ∗ (x4 ∗ x5)) again by associativity. By theinduction hypothesis (x2 ∗ x3) ∗ (x4 ∗ x5) = x2 ∗ (x3 ∗ (x4 ∗ x5)), and so

((x1 ∗ x2) ∗ x3) ∗ (x4 ∗ x5) = x1 ∗ (x2 ∗ (x3 ∗ (x4 ∗ x5))),

as required.

If a binary operation is associative then the above lemma tells us thatcomputing products of elements is straightforward because we never haveto worry about how to evaluate it as long as we maintain the order of theelements. We now consider a special case of this result. Let a be any realnumber. Define the nth power an of a, where n is a natural number, asfollows: a1 = a and an = aan−1 for any n ≥ 2. Generalized associativitytells us that an can in fact be calculated in any way we like because we shallalways obtain the same answer. The following result should be familiar. Ishall ask you to prove it in the exercises.

Lemma 3.1.8 (Laws of exponents). Let m,n ≥ 1 be any natural numbers.


1. am+n = aman.

2. (am)n = amn.

It follows from the above lemma that powers of the same element a com-mute with one another: aman = anam as both products equal am+n. Our goalnow is to define what am means when m is an arbitrary rational number. Weshall be guided by the requirement that the above laws of exponents shouldcontinue to hold. We may extend the laws of exponents to allow m or n tobe 0. The only way to do this is to define a0 = 1, where 1 is the identity anda 6= 0.

An extreme case! What about 00? This is a can of worms. For this book,it is probably best to define 00 = 1.

We have explained what an means when n is positive but what can we saywhen the exponent is negative? In other words, what does a−n mean? Weassume that the rules above still apply. Thus whatever a−n means we shouldhave that a−nan = a0 = 1. It follows that a−n = 1

an. With this interpretation

we have defined an for all integer values of x.We now investigate what a

1n should mean. If the law of exponents are to

continue holidng, then (a1n )n = a1 = a. It follows that a

1n = n√a.

We may now calculate ars it is equal to

ars = ( s

√a)r.

How do we calculate (ab)n? This is just ab times itself n times. But theorder in which we multiply a’s and b’s doesn’t matter and so we can arrangeall the a’s to the front. Thus (ab)n = anbn.

We also have similar results for addition. We define 2x = x + x andnx = x+ . . .+ x where the x occurs n times. We have 1x = x and 0x = 0.

Let {a1, . . . , an} be a set of n elements. If we write them all in someorder ai1 , . . . , ain then we have what is called a permutation of the elements.The following lemma can be treated as an axiom and the proof omitted untillater.

Lemma 3.1.9 (Generalized commutativity). Let ∗ be an associative andcommutative binary operation on a set X. Let a1, . . . , an be any n elementsof X. Then

a1 ∗ . . . ∗ an = ai1 ∗ . . . ∗ ain .


Proof. First prove by induction the result that

a1 ∗ . . . ∗ an ∗ b = b ∗ a1 ∗ . . . ∗ an.

Let a1, . . . , an, an+1 be n+1 elements. Consider the product ai1∗. . .∗ain∗ain+1 .Suppose that an+1 = air . Then

ai1 ∗ . . . ∗ air ∗ . . . ∗ ain ∗ ain+1 = (ai1 ∗ . . . ∗ ain) ∗ an+1

where the expression in the backets is a product of some permutation ofthe elements a1, . . . , an. We have used here our result above. But by theinduction hypothesis, we may write ai1 ∗ . . . ∗ ain = a1 ∗ . . . ∗ an.

3.1.3 Sigma notation

At this point, it is appropriate to introduce some useful notation. Leta1, a2, . . . , an be n numbers. Their sum is a1 + a2 + . . . + an and becauseof generalized associativity we don’t have to worry about brackets. We nowabbreviate this as

n∑i=1

ai.

Where∑

is Greek ‘S’ and stands for Sum. The letter i is called a subscript.The equality i = 1 tells us that we start the value of i at 1. The equalityi = n tells us that we end the value of i at n. Although I have started thesum at 1, I could, in other circumstances, have started at 0, or any otherappropriate number. This notation is very useful and can be manipulatedusing the rules above. If 1 < s < n, then we can write

n∑i=1

ai =s∑i=1

ai +n∑s+1

ai.

If b is any number then

b

(n∑i=1

ai

)=

n∑i=1

bai

is the generalized distributivity law that you are asked to prove in the exer-cises. These uses of sigma-notation shouldn’t cause any problems.


The most complicated use of∑

-notation arises when we have to sum upwhat is called an array of numbers aij where 1 ≤ i ≤ m and 1 ≤ j ≤ n.This arises in matrix theory, for example. For concreteness, I shall give theexample where m = 3 and n = 4. We can therefore think of the numbers aijas being arranged in a 3× 4 array as follows:

a11 a12 a13 a14

a21 a22 a23 a24

a31 a32 a33 a34

Observe that the first subscript tells you the row and the second subscripttells you the column. Thus a23 is the number in the second row and the thirdcolumn. Now we can add these numbers up in two different ways getting thesame answer in both cases. The first way is to add the numbers up along therows. So, we calculate the following sums

4∑j=1

a1j,4∑j=1

a2j,4∑j=1

a3j.

We then add up these three numbers

4∑j=1

a1j +4∑j=1

a2j +4∑j=1

a3j =3∑i=1

(4∑j=1

aij

).

The second way is to add the numbers up along the columns. So, we calculatethe following sums

3∑i=1

ai1,

3∑i=1

ai2,

3∑i=1

ai3,

3∑i=1

ai4.

We then add up these four numbers

n∑i=1

ai1 +n∑i=1

ai2 +n∑i=1

ai3 +n∑i=1

ai4 =4∑j=1

(3∑i=1

aij

).

The fact that3∑i=1

(4∑j=1

aij

)=

4∑j=1

(3∑i=1

aij

)


is a consequence of the generalized commutativity law that you are asked toprove in the exercises. We therefore have in general that

m∑i=1

(n∑j=1

aij

)=

n∑j=1

(m∑i=1

aij

).

3.1.4 Infinite sums

What I have defined so far are finite sums and form part of algebra. Thereare also infinite sums

∞∑i=1

ai

which form part of analysis, the subject that provides the foundations forcalculus. There is one place where we use infinite sums in everyday life, andthat is in the decimal representations of numbers. Thus the fraction 1

3can

be written as 0 · 3333 . . . and this is in fact an infinite sum: it means theinfinite sum

∞∑i=1

3

10i.

But in general infinite sums are problematic. For example, consider theinfinite sum

S =∞∑i=1

(−1)i+1.

So, this is justS = 1− 1 + 1− 1 + . . .

What is S? You’re first instinct might be to say 0 because

S = (1− 1) + (1− 1) + . . .

But it could equally well be 1 calculated as follows

S = 1 + (−1 + 1) + (−1 + 1) + . . .

In fact, it could even be 12

since S + S = 1 and so S = 12. There is clearly

something seriously awry here, and it is that infinite sums have to be handledvery carefully if they are to make sense. Just how is the business of analysis


and won’t be an issue in this book.

Warning! ∞ is not a number. It simply tells us to keep adding on termsfor increasing values of i without end so we never write

3

10∞.

Exercises 3.1

1. Prove the following identities using the axioms introduced.

(a) (a+ b)2 = a2 + 2ab+ b2.

(b) (a+ b)3 = a3 + 3a2b+ 3ab2 + b3

(c) a2 − b2 = (a+ b)(a− b)(d) (a2 + b2)(c2 + d2) = (ac− bd)2 + (ad+ bc)2

2. Calculate the following.

(a) 23.

(b) 213 .

(c) 2−4.

(d) 2−32 .

3. Assume that aij are assigned the following values

a11 = 1 a12 = 2 a13 = 3 a14 = 4a21 = 5 a22 = 6 a23 = 7 a24 = 8a31 = 9 a32 = 10 a33 = 11 a34 = 12

Calculate the following sums.

(a)∑3

i=1 ai2.

(b)∑4

j=1 a3j.

(c)∑3

i=1

(∑4j=1 a

2ij

).


4. Let a, b, c ∈ R. If ab = ac is it true that b = c? Explain.

5. Laws of exponents.

(a) Prove by induction that am+n = aman. To do this, fix m and thenprove the result by induction on n. Deduce that it holds for allm.

(b) Prove by induction that (am)n = amn. To do this, fix m and thenprove the result by induction on n. Deduce that it holds for allm.

6. Prove by induction that the left generalized distributivity law holds

a(b1 + b2 + b3 + . . .+ bn) = ab1 + ab2 + ab3 + . . .+ abn,

for any n ≥ 2.

3.2 Solving quadratic equations

The previous section might have given the impression that algebraic calcu-lations are routine. In fact, once you pass beyond linear equations, theyusually require good ideas. The first place where a good idea is needed is insolving quadratic equations. Quadratic equations were solved by the Baby-lonians and the Egyptians and are dealt with in all school algebra courses. Ihave included them here because I want to show you that you don’t have toremember a formula to solve such equations; what you have to remember isa method. Let’s recall some definitions. An expression of the form

ax2 + bx+ c

where a, b, c are numbers and a 6= 0 is called a quadratic polynomial or apolynomial of degree 2. The numbers a, b, c are called the coefficients of thequadratic. A quadratic where a = 1 is said to be monic. A number r suchthat

ar2 + br + c = 0

is called a root of the polynomial. The problem of finding all the roots of aquadratic is called solving the quadratic. Usually this problem is stated inthe form: ‘solve the quadratic equation ax2 + bx+ c = 0’. Equation because

3.2. SOLVING QUADRATIC EQUATIONS 71

we have set the polynomial equal to zero. I shall now show you how to solvea quadratic equation without having to remember a formula. Observe firstthat if ax2 + bx+ c = 0 then

x2 +b

ax+

c

a= 0.

Thus it is enough to find the roots of monic quadratics. We shall solve thisequation by trying to do the following: write x2 + b

ax as a perfect square plus

a number. This will turn out to be the crux of solving the quadratic. Weshall illustrate our construction by using some diagrams. First, we representgeometrically the expression x2 + b

ax.

x

x

ba

Now cut the red rectangle into two pieces along the dotted line and rearrangethem as shown below.


x

x

b2a

b2a

It is now geometrically obvious that if we add in the small dotted square, weget a new bigger square. This explain why the procedure is called completingthe square. We now express in algebraic terms what these diagrams suggest.

x2 +b

ax =

(x2 +

b

ax+

b2

4a2

)− b2

4a2=

(x+

b

2a

)2

− b2

4a2.

We therefore have that

x2 +b

ax =

(x+

b

2a

)2

− b2

4a2.

Look carefully at what we have done here: we have rewritten the lefthandside as a perfect square — the first term on the righthandside — plus anumber — the second term on the righthandside. It follows that

x2 +b

ax+

c

a=

(x+

b

2a

)2

− b2

4a2+c

a=

(x+

b

2a

)2

+4ac− b2

4a2.

Setting the last expression equal to zero and rearranging, we get(x+

b

2a

)2

=b2 − 4ac

4a2.

Now take square roots of both sides, remembering that a non-zero numberhas two square roots:

x+b

2a= ±

√b2 − 4ac

4a2


which of course simplifies to

x+b

2a= ±√b2 − 4ac

2a.

Thus

x =−b±

√b2 − 4ac

2a

the usual formula for finding the roots of a quadratic.

Example 3.2.1. Solve the quadratic equation

2x2 − 5x+ 1 = 0.

by completing the square. Divide through by 2 to make the quadratic monicgiving

x2 − 5

2x+

1

2= 0.

We now want to write

x2 − 5

2x

as a perfect square plus a number. We get

x2 − 5

2x =

(x− 5

4

)2

− 25

16.

Thus our quadratic becomes(x− 5

4

)2

− 25

16+

1

2= 0.

Rearranging and taking roots gives us

x =5

4±√

17

4=

5±√

17

4.

We now check our answer by substituting each of our two roots back intothe original quadratic and ensuring that we get zero in both cases.

For the quadratic equation

ax2 + bx+ c = 0

the number D = b2 − 4ac, called the discriminant of the quadratic, plays animportant role.


• If D > 0 then the quadratic equation has two distinct real solutions.

• If D = 0 then the quadratic equation has one real root repeated. In

this case, the quadratic is the perfect square(x+ b

2a

)2.

• If D < 0 then we shall see that the quadratic equation has two complexroots which are complex conjugate to each other. This is called theirreducible case.

If we put y = ax2 + bx+ c then we may draw the graph of this equation.The roots of the original quadratic therefore correspond to the points wherethis graph crosses the x-axis. The diagrams below illustrate the three casesthat can arise.

D > 0

D = 0


D < 0

Exercises 3.2

1. Calculate the discriminants of the following quadratics and so deter-mine whether they have two distinct roots, or repeated roots, or noreal roots.

(a) x2 + 6x+ 5.

(b) x2 − 4x+ 4.

(c) x2 − 2x+ 5.

2. Solve the following quadratic equations by completing the square. Checkyour answers.

(a) x2 + 10x+ 16 = 0.

(b) x2 + 4x+ 2 = 0.

(c) 2x2 − x− 7 = 0.

3. I am thinking of two numbers x and y. I tell you their sum a and theirproduct b. What are x and y in terms of a and b?

4. Let p(x) = x2 + bx + c be a monic quadratic with roots x1 and x2.Express the discriminant of p(x) in terms of x1 and x2.

5. This question is an interpretation of part of Book X of Euclid. We shallbe interested in numbers of the form a+

√b where a and b are rational

and b > 0 where√b is irrational1.

1Remember that irrational means not rational.


(a) If√a = b+

√c where

√c is irrational Then b = 0.

(b) If a+√b = c+

√d where a and c are rational and

√b and

√d are

irrational then a = c and√b =√d.

(c) Prove that the square roots of a+√b have the form ±(

√x+√y).

3.3 Order

In addition to algebraic operations, the real numbers are also ordered: wecan always say of two real numbers whether they are equal or whether one ofthem is bigger than the other. I shall write down first the axioms for orderthat hold both for rational and complex numbers. The following notation isimportant. If a ≤ b and a 6= b then we write a < b and say that a is strictlyless than b.

Axioms for order

(O1) For every element a ≤ a.

(O2) If a ≤ b and b ≤ a then a = b.

(O3) If a ≤ b and b ≤ c then a ≤ c.

(O4) Given any two elements a and b then either a ≤ b or b ≤ a or a = b.

If a > 0 the we say that it is positive and if a < 0 we say it is negative.

(O5) If a ≤ b and c ≤ d then a+ b ≤ b+ d.

(O6) If a ≤ b and c is positive then ac ≤ bc.

The only axiom that you really have to watch is (O6). Here is an exampleof a proof using these axioms.

Example 3.3.1. We prove that a ≤ b if, and only if, b− a is positive. Sincethis statement involves an ‘if, and only, if’ there are, as usual,two statementsto be proved. Suppose first that a ≤ b. By axiom (O5), we may add −a toboth sides to get a+(−a) ≤ b+(−a). But a+(−a) = 0 and b+(−a) = b−a,by definition. It follows that 0 ≤ b−a and so b−a is positive. Now we provethe converse. Suppose that b − a is positive. Then by definition 0 ≤ b − a.

3.4. THE REAL NUMBERS 77

Also by definition, b− a = b+ (−a). Thus 0 ≤ b+ (−a). By axiom (O5), wemay add a to both sides to get 0 + a ≤ (b + (−a)) + a. But 0 + a = a and(b + (−a)) + a quickly simplifies to b. We have therefore proved that a ≤ b,as required.

Exercises 3.3

1. Prove that between any two distinct rational numbers there is anotherrational number.

2. Prove the following using the axioms.

(a) If a ≤ b then −b ≤ −a.

(b) a2 is positive for all a 6= 0.

(c) If 0 < a < b then 0 < b−1 < a−1.

3.4 The real numbers

The axioms I have introduced so far apply equally well to both the rationalnumbers Q and the real numbers R. But we have seen that although Q ⊆ Rthe two sets are not equal because we have proved that

√2 /∈ Q. In fact, we

shall see later that there are many more irrational numbers than there arerational numbers. In this section, I shall explain the fundamental differencebetween rationals and reals. This material will not be needed in the rest ofthis book instead its role is to connect with the foundations of calculus, thatis, with analysis.

It is convenient to write K to mean either Q or R in what follows because Iwant to make the same definitions for both sets. A non-empty subset A ⊆ Kis said to be bounded above if there is some number b ∈ K so that for alla ∈ A we have that a ≤ b. For example, the set A = {2n : n ≥ 0} is notbounded above since its elements getter bigger and bigger without limit. Onthe other hand, the set B = {

(12

)n: n ≥ 0} is bounded above, for example

by 1. A non-empty subset A as above is said to have a least upper bound ifyou can find a number a ∈ K with the following two properties: first of all,a but be an upper bound for A and second of all if b is any upper bound for


A then a ≤ b. We shall now apply these definitions to a result we obtainedearlier.

Let

A = {a : a ∈ Q and a2 ≤ 2}

and let

B = {a : a ∈ R and a2 ≤ 2}.

Then A ⊆ Q and B ⊆ R. Both sets are bounded above: the number 112, for

example, works in both case. However, I shall prove that the subset A doesnot have a least upper bound, whereas the subset B does.

Let’s consider subset A first. Suppose that r were a least upper bound.I claim that r2 would have to equal 2 which is impossible because we haveproved that

√2 is irrational.

Suppose first that r2 < 2. Then I claim there is a rational number r1 suchthat r < r1 and r2

1 < 2. Choose any rational number h such that 0 < h < 1and

h <2− r2

2r + 1.

Put r1 = r + h. By construction r1 > r. We calculate r21 as follows

r21 = r2 + 2rh+ h2 = r2 + (2r + h)h < r2 + (2r + 1)h = r2 + 2− r2 = 2.

Thus r21 < 2 as claimed. But this contradicts the fact that r is an upper

bound of the set A.Suppose now that 2 < r2. Then I claim that I can find a rational number

r1 such that r1 < r and 2 < r21. Put h = r2−2

2rand define r1 = r− h. Clearly,

0 < r1 < r. We calculate r22 as follows

r21 = r2 − 2rh+ h2 = r2 − (r2 − 2) + h2 > r2 − (r2 − 2) = 2.

But this contradicts the fact that r is supposed to be a least upper bound.We have therefore proved that if r is a least upper bound of A then

r =√

2. But this is impossible because we have proved that√

2 is irrational.Thus the set A does not have a least upper bound in the rationals. However,by essentially the same reasoning the set B does have a least upper boundin the reals: the number

√2. This motivates the following definition. It is

this axiom that is needed to develop calculus properly.

3.4. THE REAL NUMBERS 79

The completeness axiom for R

Every non-empty subset of the reals that is bounded above has a leastupper bound.

The Peano Axioms

Set theory is supposed to be a framework in which all of mathematicscan take place. Let me briefly sketch out how we can construct thereal numbers using set theory. The starting point are the Peano axiomsstudied by G. Peano (1858–1932). These deal with a set P and anoperation on this set called the successor function which for each n ∈ Pproduces a unique element n+. The following four axioms should hold:

(P1) There is a distinguished element of P that we denote by 0.

(P2) There is no element n ∈ P such that n+ = 0.

(P3) If m,n ∈ P and m+ = n+ then m = n,

(P4) If X ⊆ P is such that 0 ∈ X and if n ∈ X then n+ ∈ X thenX = P .

By using ideas from set theory, one shows that P is essentially the setof natural numbers together with its operations of addition and multi-plication.

The natural numbers are deficient in that it is not always possibleto solve equations of the form a+ x = b because of the lack of negativenumbers. However, we can use set theory to construct Z from N byusing ordered pairs. The idea is to regard (a, b) as meaning a − b.However, there are many names for the same negative number so weshould have (0, 1) and (2, 3) and (3, 4) all signifying the same number:namely, −1. To make this work, one uses another idea from set theory,that of equivalence relations which we shall meet later. This gives riseto the set Z. Again using ideas from set theory, the usual operationscan be constructed on Z.

But the integers are deficient because we cannot always solve equa-tions of the form ax + b = 0 because of the lack of rational numbers.To construct them we use ordered pairs again. This time (a, b), where


b 6= 0, is interpreted as ab. But again we have the problem of multiple

names for what should be the same number. Thus (1, 2) should equal(−1,−2) should equal (2, 4) and so forth. Once again this problem issolved by using an equivalence relation, and once again, the set whicharises, which is denoted by Q, is endowed with the usual operations.

As we have seen, the rationals are deficient in not containing numberslike√

2. The intuitive idea behind the construction of the reals from therationals is that we want to construct R as all the numbers that canbe approximated arbitrarily by rational numbers. To do this, we formthe set of all subsets X of Q which have the following characteristics:X 6= ∅, X 6= Q, if x ∈ X and y ≤ x then y ∈ X, and X doesn’t havea biggest element. These subsets are called Dedekind cuts and shouldbe regarded as defining the real number r so that X consists of all therational numbers less than r.

Chapter 4

Number theory

Number theory is one of the oldest branches of mathematics and deals,mainly, with the properties of the integers, the simplest kinds of numbers. Itis a vast subject, and so this chapter can only be an introduction. The mainresult proved is that every natural number greater than one can be writtenas a product of powers of primes, a result known as the fundamental theoremof arithmetic. This shows that the primes are the building blocks, or atoms,from which all natural numbers are constructed. The primes are still thesubject of intensive research and the source of many unanswered questions.It is ironic that the numbers we learn about first as children are the sourceof some of mathematics’ most difficult and interesting questions. The toolthat enables this chapter to work is the remainder theorem so that is wherewe shall start.

4.1 The remainder theorem

We begin by stating a basic result that you may assume as an axiom butwhich I shall also set as a proof in one of the exercises.

Lemma 4.1.1 (Remainder Theorem). Let a and b be integers where b > 0.Then there are unique integers q and r such that

a = bq + r

where 0 ≤ r < b.

81

82 CHAPTER 4. NUMBER THEORY

The number q is called the quotient and the number r is called the re-mainder. For example, if we consider the pair of natural numbers 14 and 3then

14 = 3 · 4 + 2

where 4 is the quotient and 2 is the remainder. Your first reaction to thisresult is that it is obvious and you might conclude from this that it is there-fore uninteresting. But this would be wrong. It is certainly not hard tounderstand but despite that it is important. The reason is that whenever wehave a question that involves divisibility, it is very likely going to require theuse of this result.

Example 4.1.2. From the remainder theorem, we know that every naturalnumber n can be written as n = 10q + r where 0 ≤ r ≤ 9. The integer r isnothing other than the units digit in the usual base 10 representation of n.Thus, for example, 42 = 10 × 4 + 2. Similarly, it is the remainder theoremthat tells us that odd numbers are precisely those that leave remainder 1when divided by 2.

Let a and b be integers where a 6= 0. We say that a divides b or that bis divisible by a if there is a q such that b = aq. In other words, there is noremainder. We also say that a is a divisor or factor of b. We write a | b tomean the same thing as ‘a divides b’. It is very important to remember thata | b does not mean the same thing as a

b. The latter is a number, the former

is a statement about two numbers.As a very simple example of the remainder theorem, we shall look at how

we write numbers down.I don’t think our hunter-gatherer ancestors worried too much about writ-

ing numbers down because there wasn’t any need: they didn’t have to fill intax-returns and so didn’t need accountants. However, organizing cities doesneed accountants and so ways had to be found of writing numbers down.The simplest way of doing this is to use a mark like |, called a tally, for eachthing being counted. So

||||||||||

means 10 things. This system has advantages and disadvantages. The ad-vantage is that you don’t have to go on a training course to learn it. Thedisadvantage is that even quite small numbers need a lot of space like

||||||||||||||||||||||||||||||||||||||

4.1. THE REMAINDER THEOREM 83

It’s also hard to tell whether

|||||||||||||||||||||||||||||||||||||||

is the same number or not. (It’s not.) It’s inevitable that people will in-troduce abbreviations to make the system easier to use. Perhaps it was inthis way that the next development occurred. Both the ancient Egyptiansand Romans used similar systems but I’ll describe the Roman system becauseit involves letters rather than pictures. First, you have a list of basic symbols:

number 1 5 10 50 100 500 1000symbol I V X L C D M

There are more symbols for bigger numbers. Numbers are then writtenaccording to the additive principle. Thus MMVIIII is 2009. Incidently, Iunderstand that the custom of also using a subtractive principle so that, forexample, IX means 9 rather than using VIIII, is a more modern innovation.This system is clearly a great improvement on the tally-system. Even quitebig numbers are written compactly and it is easy to compare numbers. Onthe other hand, there is more to learn. The other disadvantage is that we needseparate symbols for different powers of 10 and their multiples by 5. Thiswas probably not too inconvenient in the ancient world where it is likely thatthe numbers needed on a day-to-day basis were never going to be that big.A common criticism of this system is that it is hard to do multiplication in.However, that turns out to be a non-problem because, like us, the Romansused pocket calculators or, more accurately, a device called an abacus thatcould easily be carried under a toga. The real evidence for the usefulness ofthis system of writing numbers is that it survived for hundreds and hundredsof years.

The system used throughout the world today is quite different and iscalled the positional number system. It seems to have been in place by theninth century in India but it was hundreds of years in development and theresult of ideas from many different cultures: the invention of zero on its ownis one of the great steps in human intellectual development. The genius ofthe system is that it requires only 10 symbols

0, 1, 2, 3, 4, 5, 6, 7, 8, 9


and every natural number can be written using a sequence of these symbols.The trick to making the system work is that we use the position on the pageof a symbol to tell us what number it means. Thus 2009 means

103 102 101 100

2 0 0 9

In other words

2× 103 + 0× 102 + 0× 101 + 9× 100.

Notice the important role played by the symbol 0 which makes it clear towhich column a symbol belongs otherwise we couldn’t tell 29 from 209 from2009. The disadvantage of this system is that you do have to go on a courseto learn it because it is a highly sophisticated way of writing numbers. On theother hand, it has the enormous advantage that any number can be writtendown in a compact way. Once the basic system had been accepted it couldbe adapted to deal not only with positive whole numbers but also negativewhole numbers, using the symbol −, and also fractions with the introductionof the decimal point. By the end of the sixteenth century, the full decimalsystem was in place.

Notation warning! In the UK, we use a raised decimal point like 0 · 123and not a comma. Also we generally write the number 1 without a longhook at the top. If you do write it like that there is a danger that people willconfuse it with the number 7 which is not always written in the UK with aline through it.

We shall now look in more detail at the way in which numbers can bewritten down using a positional notation. In order not to be biased, we shallnot just work in base 10 but show how any base can be used. Our main toolis the remainder theorem.

Let’s see how to represent numbers in base b where b ≥ 2. If d ≤ 10 thenwe represent numbers by sequences of symbols taken from the set

Zd = {0, 1, 2, 3, . . . d− 1}


but if d > 10 then we need new symbols for 10, 11, 12 and so forth. It’sconvenient to use A,B,C, . . .. For example, if we want to write numbers inbase 12 we use the set of symbols

{0, 1, . . . , 9, A,B}

whereas if we work in base 16 we use the set of symbols

{0, 1, . . . , 9, A,B,C,D,E, F}.

If x is a sequence of symbols then we write xd to make it clear that we areto interpret this sequence as a number in base d. Thus BAD16 is a numberin base 16.

The symbols in a sequence xd, reading from right to left, tell us the con-tribution each power of d such as d0, d1, d2, etc makes to the number thesequence represents. Here are some examples.

Examples 4.1.3. Converting from base d to base 10.

1. 11A912 is a number in base 12. This represents the following numberin base 10:

1× 123 + 1× 122 + A× 121 + 9× 120,

which is just the number

123 + 122 + 10× 12 + 9 = 2001.

2. BAD16 represents a number in base 16. This represents the followingnumber in base 10:

B × 162 + A× 161 +D × 160,

which is just the number

11× 162 + 10× 16 + 13 = 2989.

3. 55567 represents a number in base 7. This represents the followingnumber in base 10:

5× 73 + 5× 72 + 5× 71 + 6× 70 = 2001.


These examples show how easy it is to convert from base d to base 10.

There are two ways to convert from base 10 to base d.

1. The first runs in outline as follows. Let n be the number in base 10that we wish to write in base d. Look for the largest power m of d suchthat adm ≤ n where a < d. Then repeat for n − adm. Continuing inthis way, we write n as a sum of multiples of powers of d and so we canwrite n in base d.

2. The second makes use of the remainder theorem. The idea behind thismethod is as follows. Let

n = am . . . a1a0

in base d. We may think of this as

n = (am . . . a1)d+ a0

It follows that a0 is the remainder when n is divided by d, and thequotient is n′ = am . . . a1. Thus we can generate the digits of n in based from right to left by repeatedly finding the next quotient and nextremainder by dividing the current quotient by d; the process starts withour input number as first quotient.

Examples 4.1.4. Converting from base 10 to base d.

1. Write 2001 in base 7. I’ll solve this question in two different ways: thelong but direct route and then the short but more thought-provokingroute.

We see that 74 > 2001. Thus we divide 2001 by 73. This goes 5 timesplus a remainder. Thus 2001 = 5 × 73 + 286. We now repeat with286. We divide it by 72. It goes 5 times again plus a remainder. Thus286 = 5× 72 + 41. We now repeat with 41. We get that 41 = 5× 7 + 6.We have therefore shown that

2001 = 5× 73 + 5× 72 + 5× 7 + 6.

Thus 2001 in base 7 is just 5556.


Now for the short method.

quotient remainder

7 20017 285 67 40 57 5 5

0 5

Thus 2001 in base 7 is:5556.

2. Write 2001 in base 12.

quotient remainder

12 200112 166 912 13 10 = A12 1 1

0 1

Thus 2001 in base 12 is:11A9.

3. Write 2001 in base 2.

quotient remainder

2 20012 1000 12 500 02 250 02 125 02 62 12 31 02 15 12 7 12 3 12 1 1

0 1


Thus 2001 in base 2 is (reading from bottom to top):

11111010001.

When converting from one base to another it is always wise to checkyour calculations by converting back.

Number bases have some special terminology associated with them whichyou might encounter:

Base 2 binary.

Base 8 octal.

Base 10 decimal.

Base 12 duodecimal.

Base 16 hexadecimal.

Base 20 vigesimal.

Base 60 sexagesimal.

Binary, octal and hexadecimal occur in computer science; there are remnantsof a vigesimal system in French and the older Welsh system of counting; base60 was used by astronomers in ancient Mesopotamia and is still the basis oftime measurement (60 seconds = 1 minute, and 60 minutes = 1 hour) andangle measurement.

As a final example, of the importance of the remainder theorem, we lookat how we may write proper fractions as decimals. To see what’s involved,let’s calculate some decimal fractions.

Examples 4.1.5.

1. 120

= 0 · 05. This fraction has a finite decimal representation.

2. 17

= 0 · 142857142857142857142857142857 . . .. This fraction has aninfinite decimal representation, which consists of the same sequence ofnumbers repeated. We abbreviate this decimal to 0 · 142857.

3. 3784

= 0 · 44047619. This fraction has an infinite decimal representation,which consists of a non-repeating part followed by a part which repeats.


I shall characterize those fractions which have a finite decimal represen-tation once we have proved our main theorem. I want to focus here on thelast two cases. Case (2) is said to be a purely periodic decimal whereas case(3), which is more general, is said to be ultimately periodic.

Proposition 4.1.6. An infinite decimal fraction represents a rational num-ber if and only if it is ultimately periodic.

Proof. The key is in the remainders. Consider the ultimately periodic deci-mal number

r = 0 · a1 . . . asb1 . . . bt.

We shall prove that r is rational. Observe that

10sr = a1 . . . as · b1 . . . bt

and10s+t = a1 . . . asb1 . . . bt · b1 . . . bt.

From which we get that

10s+tr − 10sr = a1 . . . asb1 . . . bt − a1 . . . as

where the righthand side is the decimal form of some integer that we shallcall a. It follows that

r =a

10s+t − 10s

is a rational number.The proof of the converse is based on the method we use to compute

the decimal expansion of mn

. We carry out repeated divisions by n and ateach step of the computation we use the remainder obtained to calculatethe next digit. But there are only a finite number of possible remaindersand our expansion is assumed infinite. Thus at some point there must berepetition.

Example 4.1.7. We shall write the ultimately periodic decimal 0 · 94. as aproper fraction in its lowest terms. Put r = 0 · 94. Then

• r = 0 · 94.

• 10r = 9.444 . . .


• 100r = 94.444 . . ..

Thus 100r−10r = 94−9 = 85 and so r = 8590

. We can simplify this to r = 1718

.We can now easily check that this is correct.

Exercises 4.1

1. Find the quotients and remainders for each of the following pair ofnumbers. Divide the smaller into the larger.

(a) 30 and 6.

(b) 100 and 24.

(c) 364 and 12.

2. Write the number 2009 in

(a) Base 5.

(b) Base 12.

(c) Base 16.

3. Write the following numbers in base 10.

(a) DAB16.

(b) ABBA12.

(c) 443322115.

4. Write the following decimals as fractions in their lowest terms.

(a) 0 · 534.

(b) 0 · 2106.

(c) 0 · 076923.

5. Prove the following properties of the division relation on Z.

(a) If a 6= 0 then a | a.

(b) If a | b and b | a then a = ±b.(c) If a | b and b | c then a | c.

4.2. GREATEST COMMON DIVISORS 91

(d) If a | b and a | c then a | (b+ c).

6. This question develops a proof of the remainder theorem. Let a andb be integers with b > 0. Then there exist a unique pair of integers qand r such that a = qb+ r where 0 ≤ r < b.

(a) LetX = {a− nb : n ∈ Z}.

Show that this set contains non-negative elements.

(b) Let X+ be the subset of X consisting of non-negative elements.This subset is non-empty by the first step. Use the well-orderingprinciple to deduce that this set contains a minimum element r.Thus r = a− qb ≥ 0 for some q ∈ Z.

(c) Show that if r ≥ b then X+ in fact contains a smaller element,which is a contradiction.

(d) We therefore have that a = bq + r where 0 ≤ r < b. It remainsto prove that q and r are unique with these propertries. Assumetherefore that a = bq′ + r′ where 0 ≤ r′ < b. Deduce that q = q′

and r = r′.

4.2 Greatest common divisors

Let a, b ∈ N. A number d which divides both a and b is called a commondivisor of a and b. The largest number which divides both a and b is calledthe greatest common divisor of a and b and is denoted by gcd(a, b). A pairof natural numbers a and b is said to be coprime if gcd(a, b) = 1. For usgcd(0, 0) is undefined but if a 6= 0 then gcd(a, 0) = a.

Example 4.2.1. Consider the numbers 12 and 16. The set of divisors of12 is {1, 2, 3, 4, 6, 12}. The set of divisors of 16 is {1, 2, 4, 8, 16}. The set ofcommon divisors is the set of numbers that belong to both of these two sets:namely, {1, 2, 4}. The greatest common divisor of 12 and 16 is therefore 4.Thus gcd(12, 16) = 4.

One application of greatest common divisors is in simplifying fractions.For example, the fraction 12

16is equal to the fraction 3

4because we can divide

out by the common divisor of numerator and denominator. The fractionwhich results cannot be simplified further and is in its lowest terms.


Lemma 4.2.2. Let d = gcd(a, b). Then gcd(ad, bd) = 1.

Proof. Because d divides both a and b we may write a = a′d and b = b′d forsome natural numbers a′ and b′. We therefore need to prove that gcd(a′, b′) =1. Suppose that e | a′ and e | b′. Then a′ = ex and b′ = ey for some naturalnumbers x and y. Thus a = exd and b = eyd. Observe that ed | a and ed | band so ed is a common divisor of both a and b. But d is the greatest commondivisor and so e = 1, as required.

Let me paraphrase what the result above says since it is not surprising. IfI divide two numbers by their greatest common divisor then the numbers thatremain are coprime. This seems intuitively plausible and the proof ensuresthat our intuition is correct.

Example 4.2.3. Greatest common divisors arise naturally in solving lin-ear equations where we require the solutions to be integers. Consider, forexample, the linear equation

12x+ 16y = 5.

If we want our solutions (x, y) to have real number co-ordinates, then it isof course easy to solve this equation and find infinitely many solutions sincethe solutions form a line in the plane. But suppose now that we require(x, y) ∈ Z2; that is, we want the solutions to be integers. In other words, wewant to know whether the line contains any points with integer co-ordinates.We can see immediately that this is impossible. We have calculated thatgcd(12, 16) = 4. Thus if x and y are integers, the number 4 divides thelefthand side of our equation. But clearly, 4 does not divide the righthandside of our equation. Thus the set

{(x, y) : (x, y) ∈ Z2and 12x+ 16y = 5}

is empty.

If the numbers a and b are large, then calculating their gcd in the wayI did above would be time-consuming and error-prone. We want to find anefficient method of calculating the greatest common divisor. The followinglemma is the basis of just such a method.

Lemma 4.2.4. Let a, b ∈ N, where b 6= 0, and let a = bq+r where 0 ≤ r < b.Then

gcd(a, b) = gcd(b, r).


Proof. Let d be a common divisor of a and b. Since a = bq + r we have thata− bq = r so that d is also a divisor of r. It follows that any divisor of a andb is also a divisor of b and r.

Now let d be a common divisor of b and r. Since a = bq+ r we have thatd divides a. Thus any divisor of b and r is a divisor of a and b.

It follows that the set of common divisors of a and b is the same as theset of common divisors of b and r. Thus gcd(a, b) = gcd(b, r).

The point of the above result is that b < a and r < b. So calculat-ing gcd(b, r) will be easier than calculating gcd(a, b) because the numbersinvolved are smaller. Compare ︷︸︸︷

a = b q + r

witha = bq + r︸︷︷︸ .

The above result is the basis of an efficient algorithm for computing greatestcommon divisors. It was described in Propositions 1 and 2 of Book VII ofEuclid.

Algorithm 4.2.5 (Euclid’s algorithm).Input: a, b ∈ N such that a ≥ b and b 6= 0.Output: gcd(a, b).Procedure: write a = bq + r where 0 ≤ r < b. Then gcd(a, b) = gcd(b, r). Ifr 6= 0 then repeat this procedure with b and r and so on. The last non-zeroremainder is gcd(a, b)

Example 4.2.6. Let’s calculate gcd(19, 7) using Euclid’s algorithm. I havehighlighted the numbers that are involved at each stage.

19 = 7 · 2 + 5

7 = 5 · 1 + 2

5 = 2 · 2 + 1 ∗2 = 1 · 2 + 0

By Lemma 1.3.3 we have that

gcd(19, 7) = gcd(7, 5) = gcd(5, 2) = gcd(2, 1) = gcd(1, 0).

The last non-zero remainder is 1 and so gcd(19, 7) = 1 and, in this case, thenumbers are coprime.


There are occasions when we need to extract more information from Eu-clid’s algorithm as we shall discover later when we come to deal with primenumbers. The following provides what we need.

Theorem 4.2.7 (Bezout’s theorem). Let a and b be natural numbers. Thenthere are integers x and y such that

gcd(a, b) = xa+ yb.

I shall prove this theorem by describing an algorithm that will computethe integers x and y above. This is achieved by running Euclid’s algorithmin reverse and is called the extended Euclidean algorithm. The procedure fordoing so is outlined below but the details are explained in the example thatfollows it.

Algorithm 4.2.8 (Extended Euclidean algorithm).Input: a, b ∈ N where a ≥ b and b 6= 0.Output: numbers x, y ∈ Z such that gcd(a, b) = xa+ yb.Procedure: apply Euclid’s algorithm to a and b; working from bottom to toprewrite each remainder in turn.

Example 4.2.9. This is a little involved so I have split the process up intosteps. I shall apply the extended Euclidean algorithm to the example Icalculated above. I have highlighted the non-zero remainders wherever theyoccur, and I have discarded the last equality where the remainder was zero.I have also marked the last non-zero remainder.

19 = 7 · 2 + 5

7 = 5 · 1 + 2

5 = 2 · 2 + 1 ∗

The first step is to rearrange each equation so that the non-zero remainderis alone on the lefthand side.

5 = 19− 7 · 22 = 7− 5 · 11 = 5− 2 · 2


Next we reverse the order of the list

1 = 5− 2 · 22 = 7− 5 · 15 = 19− 7 · 2

We now start with the first equation. The lefthand side is the gcd we areinterested in. We treat all other remainders as algebraic quantities and sys-tematically substitute them in order. Thus we begin with the first equation

1 = 5− 2 · 2.

The next equation in our list is

2 = 7− 5 · 1

so we replace 2 in our first equation by the expression on the right to get

1 = 5− (7− 5 · 1) · 2.

We now rearrange this equation by collecting up like terms treating the high-lighted remainders as algebraic objects to get

1 = 3 · 5− 2 · 7.

We can of course make a check at this point to ensure that our arithmetic iscorrect. The next equation in our list is

5 = 19− 7 · 2

so we replace 5 in our new equation by the expression on the right to get

1 = 3 · (19− 7 · 2)− 2 · 7.

Again we rearrange to get

1 = 3 · 19 − 8 · 7 .

The algorithm now terminates and we can write

gcd(19, 7) = 3 · 19 + (−8) · 7 ,

as required. We can also, of course, easily check the answer!


I shall describe a much more efficient algorithm for implementing theextended Euclidean algorithm later in this book when I have discussed ma-trices.

A very useful application of Bezout’s theorem is the following.

Lemma 4.2.10. Let a and b be natural numbers. Then a and b are coprimeif, and only if, we may find integers x and y such that

1 = xa+ yb.

Proof. Suppose first that a and b are coprime. Then by Bezout’s theorem

gcd(a, b) = ax+ by

for some integers a and b. But, by assumption, gcd(a, b) = 1. Conversely,suppose that

1 = xa+ yb.

Then any natural number that divides both a and b must divide 1. It followsthat gcd(a, b) = 1.

The significance of the above lemma is that whenever you know that aand b are coprime, you can actually write down an expression 1 = xa + ybwhich means the same thing. This turns out to be enormously useful.

Exercises 4.2

1. Use Euclid’s algorithm to find the gcd’s of the following pairs of num-bers.

(a) 35, 65.

(b) 135, 144.

(c) 17017, 18900.

2. Use the extended Euclidean algorithm to find integers x and y suchthat gcd(a, b) = ax+by for each of the following pairs of numbers. Youshould ensure that your answers for x and y have the correct signs.

(a) 112, 267.

(b) 242, 1870.

4.3. THE FUNDAMENTAL THEOREM OF ARITHMETIC 97

3. We know how to find the greatest natural number that divides twonumbers. Define now gcd(a, b, c) to be the greatest common divisor ofa and b and c jointly. Prove that

gcd(a, b, c) = gcd(gcd(a, b), c).

Deduce that

gcd(gcd(a, b), c) = gcd(a, gcd(b, c)).

We may similarly define gcd(a, b, c, d) to be the greatest common divisorof a and b and c and d jointly. Calculate gcd(910, 780, 286, 195) andjustify your calculations.

4. The following question is by Dubisch Amer. Math. Mon. 69. DefineN∗ = N \ {0}. A binary operation ◦ defined on N∗ is known to havethe following properties:

(a) a ◦ b = b ◦ a.

(b) a ◦ a = a.

(c) a ◦ (a+ b) = a ◦ b.

Prove that a ◦ b = gcd(a, b). Hint: the question is not asking you toprove that gcd(a, b) has these properties.

5. You have an unlimited supply of 3 cent stamps and an unlimited supplyof 5 cent stamps. By combining stamps of different values you can makeup other values: for example, three 3 cent stamps and two 5 cent stampsmake the value 19 cents. What is the largest value you cannot make?Hint: you need to show that the question makes sense.

6. Let n ≥ 1. Define φ(n) to be the number of numbers less than or equalto n and coprime to n. This is the Euler totient function. Tabulate thevalues of φ(n) for 1 ≤ n ≤ 12.

4.3 The fundamental theorem of arithmetic

The goal of this section is to state and prove the most basic result about thenatural numbers: each natural number, excluding 0 and 1, can be written


as a product of powers of primes in essentially one way. The primes aretherefore the ‘atoms’ from which all natural numbers can be built.

A proper divisor of a natural number n is a divisor that is neither 1 norn. A natural number n is said to be prime if n ≥ 2 and the only divisors ofn are 1 and n itself. A number bigger than or equal to 2 which is not primeis said to be composite. It is important to remember that the number 1 isnot a prime. The only even prime is the number 2.

The properties of primes have exercised a great fascination ever since theywere first studied and continue to pose questions that mathematicians haveyet to solve. There are no nice formulae to tell us what the nth prime is butthere are still some interesting results in this direction. The polynomial

p(n) = n2 − n+ 41

has the property that its value for n = 1, 2, 3, 4, . . . , 40 is always prime. Ofcourse, for n = 41 it is clearly not prime. In 1971, the mathematician YuriMatijasevic found a polynomial in 26 variables of degree 25 with the propertythat when non-negative integers are substituted for the variables the positivevalues it takes are all and only the primes. However, this polynomial doesnot generate the primes in any particular order.

Lemma 4.3.1. Let n ≥ 2. Either n is prime or the smallest proper divisorof n is prime.

Proof. Suppose n is not prime. Let d be the smallest proper divisor of n. If dwere not prime then d would have a smallest proper divisor and this divisorwould in turn divide n, but this would contradict the fact that d was thesmallest proper divisor of n. Thus d must itself be prime.

The following was also proved by Euclid: it is Proposition 20 of Book IXof Euclid.

Theorem 4.3.2. There are infinitely many primes.

Proof. Let p1, . . . , pn be the first n primes. Put

N = (p1 . . . pn) + 1.

If N is a prime, then N is a prime bigger than pn. If N is composite, then Nhas a prime divisor p by Lemma 4.3.1. But p cannot equal any of the primesp1, . . . , pn because N leaves remainder 1 when divided by pi. It follows thatp is a prime bigger than pn. Thus we can always find a bigger prime. Itfollows that there must be an infinite number of primes.


Example 4.3.3. It’s interesting to consider some specific cases of the num-bers introduced in the above proof. The first few are already prime.

• 2 + 1 = 3 prime.

• 2 · 3 + 1 = 7 prime.

• 2 · 3 · 5 + 1 = 31 prime.

• 2 · 3 · 5 · 7 + 1 = 211 prime.

• 2 · 3 · 5 · 7 · 11 + 1 = 2, 311 prime.

• 2 · 3 · 5 · 7 · 11 · 13 + 1 = 30, 031 = 59 · 509.

The Prime Number Theorem

There are infinitely many primes but how are those primes distributed?For example, are they arranged fairly regularly, or do the gaps betweenthem get bigger and bigger? There are no formulae which output thenth prime in a usable way, but if we adopt a statistical approach then wecan obtain much more useful results. The idea is that for each naturalnumber n we count the number of primes π(n) less than or equal ton. The graph of π(n) has a staircase shape — it certainly isn’t smooth— but as you zoom away it begins to look smoother and smoother.This raises the question of whether there is a smooth function that is agood approximation to π(n). In 1792, the young Carl Friedrich Gauss(1777–1855) observed that π(n) appeared to be close to the value of theamazingly simple function n

ln(n). But proving that this was always true,

and not just an artefact of the comparatively small numbers he lookedat, turned out to be difficult. Eventually, in 1896 two mathematicians,Jacques Hadamard (1865–1963) and the spectacularly named CharlesJean Gustave Nicolas Baron de la Vallee Poussin (1866–1962), provedindependently of each other that

limx→∞

π(x)

x/ ln(x)= 1

a result known as the Prime Number Theorem. It was proved using com-plex analysis; that is, calculus using complex numbers. As an example,


we have thatπ(1, 000, 000) = 78, 498

whereas106

ln 106= 72, 382.

Algorithm 4.3.4. To decide whether a number n is prime or composite.Check to see if any prime p ≤

√n divides n. If none of them do, the number

n is prime. We shall now explain why this works. If a divides n then we canwrite n = ab for some number b. If a <

√n then b >

√n whilst if a >

√n

then b <√n. Thus to decide if n is prime or not we need only carry out trial

divisions by all numbers a ≤√n. However, this is inefficient because if a

divides n and a is not prime then a is divisible by some prime p which musttherefore also divide n. It follows that we need only carry out trial divisionsby the primes p ≤

√n.

Example 4.3.5. Determine whether 97 is prime using the above algorithm.We first calculate the largest whole number less than or equal to

√97. This

is 9. We now carry out trial divisions of 97 by each prime number p where2 ≤ p ≤ 9; by the way, if you aren’t certain which of these numbers is prime:just try them all. You’ll get the right answer although not as efficiently. Youmight also want to remember that if m doesn’t divide a number neither canany multiple of m. In any event, in this case we carry out trial divisions by2, 3, 5 and 7. None of them divides 97 exactly and so 97 is prime.

Cryptography

Prime numbers play an important role in exchanging secret information.In 1976, Whitfield Diffie and Martin Hellman wrote a paper on cryptog-raphy that can genuinely be called ground-breaking. In ‘New directionsin cryptography’ IEEE Transactions on Information Theory 22 (1976),644–654, they put forward the idea of a public-key cryptosystem whichwould enable

. . . a private conversation . . . [to] be held between any two in-dividuals regardless of whether they have ever communicatedbefore.

With considerable farsightedness, Diffie and Hellman foresaw that such


cryptosystems would be essential if communication between computerswas to reach its full potential. However, their paper did not describe aconcrete way of doing this. It was R. I. Rivest, A. Shamir and L. Adle-man (RSA) who found just such a concrete method described in theirpaper, ‘A method for obtaining digital signatures and public-key cryp-tosystems’ Communications of the ACM 21 (1978), 120–126. Theirmethod is based on the following observation. Given two prime num-bers it takes very little time to multiply them together, but if I giveyou a number that is a product of two primes and ask you to factorizeit then it takes a lot of time. You might like to think about why inrelation to the algorithm I gave for factroizing numbers above. Afterconsiderable experimentation, RSA showed how to use little more thanundergraduate mathematics to put together a public-key cryptosystemthat is an essential ingredient in e-commerce. Ironically, this secret codehad in fact been invented in 1973 at GCHQ, who had kept it secret.

The following is the key property of primes we shall need to prove thefundamental theorem of arithmetic. It is the main reason why we neededBezout’s theorem. It is Proposition 30 of Book VII of Euclid.

Lemma 4.3.6 (Euclid’s lemma).

1. Let p | ab where p is a prime. Then p | a or p | b.

2. Let p | a1 . . . an where p is a prime. Then p | ai for some i.

Proof. (1) Suppose that p does not divide a. We shall prove that p must thendivide b. If p does not divide a, then a and p are coprime. By Lemma 4.2.10,there exist integers x and y such that 1 = px+ay. Thus b = bpx+ bay. Nowp | bp and p | ba, by assumption, and so p | b, as required.

(2) This is a typical application for proof by induction. We have provedthe base case where n = 2. Assume that the result holds when n = k. Weprove that it holds for n = k + 1. Suppose that p | (a1 . . . ak)ak+1. Fromthe base case, either p | a1 . . . ak or p | ak+1. But we may now deduce thatp | pi for some 1 ≤ i ≤ k or p | ak+1 by the induction hypothesis. We havetherefore proved the result.

Example 4.3.7. The above result is not true if p is not a prime. For example,6 | 9× 4 but 6 divides neither 9 nor 4.


Lemma 4.3.6 is so important, I want to spell out in words what it says:

If a prime divides a product of numbers it must divide at leastone of them.

There is a very nice application of Euclid’s lemma to proving that certainnumbers are irrational. It generalizes our proof that

√2 is irrational described

in Chapter 2.

Theorem 4.3.8. The square root of every prime number is irrational.

Proof. We shall prove this by contradiction. Assume that we can write√p

as a rational. I shall show that this assumption leads to a contradiction andso must be false. We are assuming that

√p = a

b. By cancelling the greatest

common divisor of a and b we can in fact assume that gcd(a, b) = 1. Thiswill be crucial to our argument. Squaring both sides of the equation

√p = a

b

and multiplying the resulting equation by b2 we get that

pb2 = a2.

This says that a2 is divisible by p. But if a prime divides a product of twonumbers it must divide at least one of those numbers by Euclid’s lemma.Thus p divides a. Thus we can write a = pc for some natural number c.Substituting this into our equation above we get that

pb2 = p2c2.

Dividing both sides of this equation by p gives

b2 = pc2.

This tells us that b2 is divisible by p and so in the same way as above pdivides b. We have therefore shown that our assumption that

√p is rational

leads to both a and b being divisible by p. But this contradicts the fact thatgcd(a, b) = 1. Our assumption is therefore wrong, and so

√p is not a rational

number.

We now come to the main theorem of this chapter.

Theorem 4.3.9 (Fundamental theorem of arithmetic). Every number n ≥ 2can be written as a product of primes in one way if we ignore the order inwhich the primes appear. By product we allow the possibility that there isonly one prime.


Proof. Let n ≥ 2. If n is already a prime then there is nothing to prove, sowe can suppose that n is composite. Let p1 be the smallest prime divisor ofn. Then we can write n = p1n

′ where n′ < n. Once again, n′ is either primeor composite. Continuing in this way, we can write n as a product of primes.

We now prove uniqueness. Suppose that

n = p1 . . . ps = q1 . . . qt

are two ways of writing n as a product of primes. Now p1 | n and so p1 |q1 . . . qt. By Euclid’s Lemma, the prime p1 must divide one of the qi’s and,since they are themselves prime, it must actually equal one of the qi’s. Byrelabelling if necessary, we can assume that p1 = q1. Cancel p1 from bothsides and repeat with p2. Continuing in this way, we see that every primeoccurring on the lefthand side occurs on the righthand side. Changing sides,we see that every prime occurring on the righthand side occurs on the lefthandside. We deduce that the two prime decompositions are identical.

When we write a number as a product of primes we usually gather to-gether the same primes into a prime power, and write the primes in increasingorder which then gives a unique representation. This is illustrated in the ex-ample below.

Example 4.3.10. Let n = 999, 999. Write n as a product of primes. Thereare a number of ways of doing this but in this case there is an obvious placeto start. We have that

n = 32 ·111, 111 = 33 ·37, 037 = 33 ·7 ·5, 291 = 33 ·7 ·11 ·481 = 33 ·7 ·11 ·13 ·37.

Thus the prime factorisation of 999, 999 is

999, 999 = 33 · 7 · 11 · 13 · 37.

Supernatural Numbers

There are natural numbers. Are there super natural numbers? It soundslike a joke but in fact there are, though to be honest they are onlyencountered in advanced work. But since they are easy to understandand I like the name, I have included a brief description List the primesin order 2, 3, 5, 7, . . .. By the fundamental theorem of arithmetic, eachnatural number ≥ 2 may be expressed as a unique product of powers ofprimes. Let’s write each such natural number as a product all primes.


This can be achieved by including those primes not needed by raisingthem to the power 0. For example,

10 = 2 · 5 = 21 · 30 · 51 · 70 . . .

which we could write as

(1, 0, 1, 0, 0, 0 . . .)

and12 = 22 · 3 = 22 · 31 · 50 · 70 . . .

which we could write as

(2, 1, 0, 0, 0, 0 . . .)

Of course, for each natural number from some point on all the entrieswill be zero. Thus each natural number ≥ 2 is encoded by an infinitesequence of natural numbers that are zero from some point onwards.We now define a supernatural number to be any sequence

(a1, a2, a3, . . .)

where the ai are natural numbers. We define a natural number to be asupernatural number where the ai = 0 for all i ≥ m for some naturalnumber m ≥ 1. This makes sense because each natural supernaturalnumber can be regarded as the encoded version of a natural number inthe non-super sense. I shall denote the set of supernatural numbers byS; this is not yet the complete list since I still have to add some specialsuch numbers. I shall denote supernatural numbers by bold letters suchas a. I shall also denote the ith component by ai. Let a and b be twosupernatural numbers. We define their product as follows

(a · b)i = ai + bi.

This makes sense because, for example,

10 · 12 = 120


and

(1, 0, 1, 0, 0, 0 . . .) · (2, 1, 0, 0, 0, 0 . . .) = (3, 1, 1, 0, 0, 0 . . .)

which encodes 233151 = 120. I shall leave you to check that the multi-plication is associative. If we define

1 = (0, 0, 0, . . .)

and allow it to be supernatural then we also have a multiplicative iden-tity because

1 · a = a = a · 1.

Now introduce a new symbol ∞ which satisfies a +∞ = ∞ = ∞ + a.Then if we allow

0 = (∞,∞,∞, . . .)

as a supernatural number then we also have a zero in the set of super-natural numbers since

0 · a = 0 = a · 0.

Finally, allow ∞ to occur anyway any number of times in the definitionof a supernatural number. Then we have the full set of supernaturalnumbers. How do you think that we could define gcd(a,b) and lcm(a,b)of supernatural numbers?

I shall now describe two simple applications of our main theorem.The greatest common divisor of two numbers a and b is the largest number

that divides into both a and b. On the other hand, if a | c and b | c then wesay that c is a common multiple of a and b. The smallest common multipleof a and b is called the least common multiple of a and b and is denoted bylcm(a, b). You might expect that to calculate the least common multiple wewould need a new algorithm, but in fact we can use Euclid’s algorithm asthe following result shows.

Proposition 4.3.11. Let a and b be natural numbers not both zero. Then

gcd(a, b) · lcm(a, b) = ab.

Proof. We begin with a special case to motivate the idea. Suppose thata = pr and b = ps where p is a prime. Then it is immediate from the


properties of indices that

gcd(a, b) = pmin(r,s) and lcm(a, b) = pmax(r,s)

and so, in this special case, we have that gcd(a, b) · lcm(a, b) = ab. Nextsuppose that the prime factorizations of a and b are

a = pr11 . . . prmm and b = ps11 . . . psmm

where the pi are primes. We may easily determine the prime factorization ofgcd(a, b) when we bear in mind the following points. The primes that occurin the prime factorization of gcd(a, b) must be from the set {p1, . . . , pm}, the

number pmin(ri,si)i divides gcd(a, b) but no higher power does. It follows that

gcd(a, b) = pmin(r1,s1)1 . . . pmin(rm,sm)

m .

A similar kind of argument proves that

lcm(a, b) = pmax(r1,s1)1 . . . pmax(rm,sm)

m .

The proof of the fact that gcd(a, b)·lcm(a, b) = ab now follows by multiplyingthe above two prime factorizations together. In the above proof, we assumedthat a and b had prime factorizations using the same set of primes. Thisneed not be true in general, but by allowing zero powers of primes we caneasily arrange for the same sets of primes to occur and the argument aboveremains valid.

For our next result, we begin with an observation. Some fractions, suchas 1

2can be written with only a finite number of digits after the decimal

place, but others, such as 13

require an infinite number of digits. We can nowaccount for this using our main theorem.

Proposition 4.3.12. A proper rational number ab

in its lowest terms has afinite decimal expansion if and only if b = 2m5n for some natural numbers mand n.

Proof. Let ab

have the finite decimal representation 0 · a1 . . . an. This means

a

b=a1

10+

a2

102+ . . .+

an10n

.


The righthand side is just the fraction

a110n−1 + a210n−2 + . . .+ an10n

.

The denominator contains only the prime factors 2 and 5 and so the reducedform will also only contain at most the prime factors 2 and 5.

To prove the converse, consider the proper fraction

a

2α5β.

If α = β then the denominator is 10α. If α 6= β then multiply the fraction bya suitable power of 2 or 5 as appropriate so that the resulting fraction hasdenominator a power of 10. But any fraction with denominator a power of10 has a finite decimal expansion.

Exercises 4.3

1. List the primes less than 100. Hint: use the Sieve of Eratosthenes1

which can be used to construct a table of all primes up to the numberN . List all numbers from 2 to N inclusive. Mark 2 as prime and thencross out from the table all numbers which are multiples of 2. Theprocess now iterates as follows. Find the smallest number which is notmarked as a prime and which has not been crossed out. Mark it as aprime and cross out all its multiples. If no such number can be foundthen you have found all primes less than or equal to N .

2. For each of the following numbers use Algorithm 4.3.4 to determinewhether they are prime or composite. When they are composite find aprime factorization. Show all working.

(a) 131.

(b) 689.

(c) 5491.

3. Find the lowest common multiples of the following pairs of numbers.

1Eratosthenes of Cyrene who lived about 250 BCE. He is famous for using geometryand some simple observations to estimate the circumference of the earth.


(a) 22, 121.

(b) 48, 72.

(c) 25, 116.

4. Given 24 · 3 · 55 · 112 and 22 · 56 · 114, calculate their greatest commondivisor and least common multiple.

5. Use the fundamental theorem of arithmetic to show that we can alwayswrite

√n, where n is a natural number, as a product of a natural

number and a product of square roots of primes. Calculate the squareroots of the following numbers exactly using the above method.

(a) 10.

(b) 42.

(c) 54.

6. Let a and b be coprime. Prove that if a | bc then a | c.

4.4 Modular arithmetic

From an early age, we are taught to think of numbers as being strung outalong the number line

0 1 2 3−1−2−3

But that is not the only way we count. We count the seasons in a cyclicmanner

. . . autumn, winter, spring, summer . . .

and likewise the days of the week

. . . Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday . . .

Also the months of the year or the hours in a day, whether by means of the 12-hour clock or the 24-hour clock. The fact that we use words for these eventsobscures the fact that we really are counting. This is clearer in the namesfor the months since October, November and December were originally the

4.4. MODULAR ARITHMETIC 109

eighth, ninth and tenth months, respectively, until Roman politics intervenedand they were shifted. But the counting in all these cases is not linear butcyclic. Rather than using a number line to represent this type of counting,we use instead number circles, and rather than using the words above, Ishall use numbers. Here is the number circle for the seasons with numbersreplacing words.

0

1

2

3

Adding in these systems of arithmetic means stepping around in a clockwisedirection, whereas subtracting means stepping around in an anticlockwisedirection. Modular arithmetic is the name given to these different systemsof cyclic counting. It was Gauss who realised that these different systems ofcounting were mathematically interesting.

4.4.1 Congruences

Let n ≥ 2 be a fixed natural number which in this context we call themodulus. If a, b ∈ Z we write a ≡ b if, and only if, a and b leave the sameremainder when divided by n or, what amounts to the same thing, n | a− b.

Here are a couple of simple examples. If n = 2, then a ≡ b if, and onlyif, a and b are either both odd or both even. On the other hand, if n = 10then a ≡ b if, and only if, a and b have the same units digit.

The symbol ≡ is a modification of the equality symbol =. If a ≡ b withrespect to n we say that a is congruent to b modulo n. In fact, congruencebehaves like a weakened form of equality as we now show.

Lemma 4.4.1. Let n ≥ 2 be a fixed modulus.

1. a ≡ a.


2. a ≡ b implies b ≡ a.

3. a ≡ b and b ≡ c implies that a ≡ c.

4. a ≡ b and c ≡ d implies that a+ c ≡ b+ d.

5. a ≡ b and c ≡ d implies that ac ≡ bd.

Here is a very simple application of modular arithmetic.

Lemma 4.4.2. A natural number n is divisible by 9 if, and only if, the sumof the digits of n is divisible by 9.

Proof. We shall work modulo 9. The proof hinges on the fact that 10 ≡ 1modulo 9. By using Lemma 4.4.1, we quickly find that 10r ≡ 1 for all naturalnumbers r ≥ 1. We use this result now. Let

n = an10n + an−110n−1 + . . .+ a110 + a0.

Then n ≡ an+ . . .+a0. Thus n and the sum of the digits of n leave the sameremainder when divided by 9, and so n is divisible by 9 if, and only if, thesum of the digits of n are divisible by 9.

Solving a linear equation such as ax+by = c is very easy. For each possiblereal value of x we can compute the corresponding real value of y. But supposenow that a, b and c are integers and we only want to find solutions (x, y) whoseco-ordinates are integers? This is an example of a Diophantine equation. Weshall show how it may solved with the help of modular arithmetic. First,we shall show that the problem of finding integer solutions is equivalent tosolving a simple kind of liner equation in one unknown in modular arithmetic.

Lemma 4.4.3. Let a, b and c be integers. Then the following are equivalent.

1. The pair (x1, y1) is an integer solution to ax+ by = c for some y1.

2. The integer x1 is a solution to the equation ax ≡ c (mod b).

Proof. (1) ⇒ (2). Suppose that ax1 + by1 = c. Then it is immediate thatax1 ≡ c (mod b).

(2) ⇒ (1). Suppose that ax1 ≡ c (mod b). Then by definition, ax1− c =bz1 for some integer z1. Thus ax1 + b(−z1) = c. We may therefore puty1 = z1.

4.4. MODULAR ARITHMETIC 111

We shall now describe how to solve all equations of the form

ax ≡ b (mod n).

Lemma 4.4.4. Consider the linear congruence ax ≡ b (mod n).

1. The linear congruence has a solution if, and only if, d = gcd(a, n) issuch that d | b.

2. If the condition in part (1) holds and x0 is any solution, then all solu-tions have the form

x = x0 + tn

d

where t ∈ Z.

Proof. (1). Suppose first that x1 is a solution to our linear congruence. Thenby definition, ax1−b = nq for some integer q. It follows that ax1+n(−q) = b.By definition d | a and d | n and so d | b.

We now prove the converse. By Bezout’s theorem, we may find integersu and v such that au+nv = d. By assumption, d | b and so b = dw for someinteger w. It follows that auw + nvw = dw = b. Thus a(uw) ≡ b (mod n),and we have found a solution.

(2) Let x0 be any one solution to ax ≡ b (mod n). It is routine to checkthat x = x0 + tn

dfor any t ∈ Z. Let x1 be any solution to ax ≡ b (mod n).

Then a(x1− x0) ≡ 0 (mod n). Thus a(x1− x0) = tn for some integer t. Theresult now follows.

There is a special case of the above result that is very important. Itsproof is immediate.

Corollary 4.4.5. Let p be a prime. Then the linear congruence ax ≡ b(mod p), where a is not congruent to 0 modulo p, always has a solution, andall solutions are congruent modulo p.

Example 4.4.6. Let’s find all the points on the line 2x + 3y = 5 that haveinteger co-ordinates. Observe first that gcd(2, 3) = 1. Thus such points exist.In this case, by inspection, 1 = 2 · 2 + (−1)3. Thus 5 = 10 · 2 + (−5)3. Itfollows that (10,−5) is one point on the line with integer co-ordinates. Thusthe set of integer solutions is

{(10 + 3t,−5− 2t) : t ∈ Z}.


4.4.2 Wilson’s theorem

I shall finish off this section with an application of congruences to primes.It is the first hint of hidden patterns in the primes. We need some notationfirst. For each natural number n define n!, pronounced n factorial, or ifyou are more extrovert n shriek, as follows: 0! = 1 and for n > 0 definen! = n · (n − 1)!. In other words, n! is what you get when you multiplytogether all the positive integers less than or equal to n. For each naturalnumber n, we shall be interested in the value of (n− 1)! modulo n. Observethat there is no point in studying n! (mod n) since the answer is always 0.It’s worth doing some numerical calculations first to see if you can spot apattern.

Theorem 4.4.7 (Wilson’s Theorem). Let n be a natural number. Then n isa prime if, and only if,

(n− 1)! ≡ n− 1 (mod n)

Since n− 1 ≡ −1 (mod n) this is usually expressed in the form

(n− 1)! ≡ −1 (mod n).

Proof. The statement to be proved is an ‘if, and only if’ and so we have toprove two statements: (1) If n is prime then (n − 1)! ≡ n − 1 (mod n). (2)If (n− 1)! ≡ n− 1 (mod n) then n is prime.

We prove (1) first. Let n be a prime. The result is clearly true whenn = 2 so we may assume n is an odd prime. For each 1 ≤ a ≤ n − 1 thereis a unique number 1 ≤ b ≤ n − 1 such that ab ≡ 1 (mod n). If a = b thena1 ≡ 1 (mod n) which means that n | (a − 1)(a + 1). Since n is a primeeither n | a− 1 or a | a+ 1. This can only occur if a = 1 or a = n− 1. Thus(n− 1)! ≡ n− 1 (mod n), as claimed.

We now prove (2). Suppose that (n−1)! ≡ n−1 (mod n). We prove thatn is a prime. Observe that when n = 1 we have that (n − 1)! = 1 which isnot congruent to 0 modulo 1. When n = 4, we get that (4−1)! ≡ 2 (mod 4).Suppose that n > 4 is not prime. Then n = ab where 1 < a, b < n. If a 6= bthen ab occurs as a factor of (n− 1)! and so this is congruent to 0 modulo n.If a = b then a occurs in (n− 1)! and so does 2a. Thus n is again a factor of(n− 1)!.

This theorem is interesting for another reason. To show that a numberis prime, we would usually apply the algorithm we described earlier which

4.5. CONTINUED FRACTIONS 113

is just a systematic way of carrying out trial division. This theorem showsthat a number is prime in a completely different way. Although it is not apratical test for deciding whether a number is prime or composite, since n!gets very big very quickly, it shows that there might be backdoor ways ofshowing that a number is prime. This is a very important question in thelight of the role of prime numbers in cryptography.

4.5 Continued fractions

The goal of this section is to show how some of the ideas we have introducedso far can interact with each other. The material we cover is not neededelsewhere in this book.

4.5.1 Fractions of fractions

We return to an earlier calculation. We used Euclid’s algorithm to calculategcd(19, 7) as follows.

19 = 7 · 2 + 5

7 = 5 · 1 + 2

5 = 2 · 2 + 1

2 = 1 · 2 + 0

We first rewrite each line, except the last, as follows

19

7= 2 +

5

27

5= 1 +

2

55

2= 2 +

1

2

Take the first equality19

7= 2 +

5

2.

But 57

is the reciprocal of 75, and from the second equality

7

5= 1 +

2

5.


If we combine them, we get

19

7= 2 +

1

1 + 25

however strange this may look. We may repeat the process to get

19

7= 2 +

1

1 +1

2 + 12

Fractions like this are called continued fractions. Suppose I just gave you

2 +1

1 +1

2 + 12

You could work out what the usual rational expression was by working fromthe bottom up. First compute the part in bold below

2 +1

1 +1

2 + 12

to get

2 +1

1 +152

which simplifies to

2 +1

1 + 25

This process can no be repeated and we shall eventually obtain a standardfraction.

I am not going to develop the theory of continued fractions, but I shallshow you one more application. Let r be a real number. We may writer as r = m1 + r1 where 0 ≤ r1 < 1. For example, π may be written asπ = 3 · 14159265358 . . . where here m = 3 and r1 = 0 · 14159265358 . . .. Now


since r1 < 1 and assume that it is non-zero. Then 1r1> 1. We may therefore

repeat the above process and write 1r1

= m2 + r2 where once again r2 < 1.This begin to feel an aweful lot like what we did above. In fact, we may write

r = m1 +1

m2 + r2

,

and we can continue the above process with r2. It looks like we would obtaina continued fraction representation of r with the big difference that it couldbe infinite. Here is a concrete example.

Example 4.5.1. We apply the above process to√

3. Clearly, 1 <√

3 < 2.Thus √

3 = 1 + (√

3− 1)

where√

3− 1 < 1. We now focus on

1√3− 1

.

To convert this into a more usable form we multiple top and bottom by√3 + 1. We therefore get that

1√3− 1

=1

2(√

3 + 1).

It is clear that 1 < 12(√

3 + 1) < 112. Thus

1√3− 1

= 1 +

√3− 1

2.

We now focus on2√

3− 1

which simplifies to√

3 + 1. Clearly

2 <√

3 + 1 < 3.

Thus√

3 + 1 = 2 + (√

3 − 1). However, we have now gone full circle. Let’ssee what we have obtained. We have that

√3 = 1 +

1

1 +1

2 + (√

3− 1)

.


However, we saw above that the pattern repeats as√

3 − 1, so what weactually have is

√3 = 1 +

1

1 +1

2 +1

1 +1

. . .

.

Let’s see where we are by computing

1 +1

1 +1

2 +1

1

which simplifies to 74. You can check that this is an approximation to

√3.

4.5.2 Rabbits and pentagons

We now illustrate some of the ways that algebra and geometry may inter-act. We begin with an artificial looking question. In his book, Liber Abaci,Fibonacci raised the following little puzzle which I’ve taken from MacTutor:

“A certain man put a pair of rabbits in a place surrounded onall sides by a wall. How many pairs of rabbits can be producedfrom that pair in a year if it is supposed that every month eachpair begets a new pair which from the second month on becomesproductive?”

These are obviously mathematical rabbits rather than real ones so let mespell out the rules more explicitly:

Rule 1 The problem begins with one pair of immature rabbits.2

Rule 2 Each immature pair of rabbits takes one month to mature.

Rule 3 Each mature pair of rabbits produces a new immature pair at theend of a month.

2Fibonacci himself seems to have assumed that the starting pair was already maturebut we shan’t.


Rule 4 The rabbits are immortal.

The important point is that we must solve the problem using the rules wehave been given. To do this, I am going to draw a picture. I will representan immature pair of rabbits by ◦ and a mature pair by •. Rule 2 will berepresented by

◦

��•

and Rule 3 will be represented by

•

��~~~~~~~

��@@@@@@@

• ◦Rule 1 tells us that we start with ◦. Applying the rules we obtain the followingpicture for the first 4 months.

◦

��

1 pair

•

��

��<<<<<<<< 1 pair

•

��

��<<<<<<<< ◦

��<<<<<<<< 2 pairs

•

��

��<<<<<<<< ◦

��

•

��

��<<<<<<<< 3 pairs

• ◦ • • ◦ 5 pairs

We start with 1 pair and at the end of the first month we still have 1 pair, atthe end of the second month 2 pairs, at the end of the third month 3 pairs,and at the end of the fourth month 5 pairs. I shall write this F0 = 1, F1 = 1,F2 = 2, F3 = 3, F4 = 5, and so on. Thus the problem will be solved if wecan compute F12. There is an apparent pattern in the sequence of numbers1, 1, 2, 3, 5, . . . after the first two terms in the sequence each number is thesum of the previous two. Let’s check that we are not just seeing things.Suppose that the number of immature pairs of rabbits at a given time t is


It and the number of mature pairs is Mt. Then using our rules at time t+ 1we have that Mt+1 = Mt + It and It+1 = Mt. Thus

Ft+1 = 2Mt + It.

SimilarlyFt+2 = 3Mt + 2It.

It is now easy to check that

Ft+2 = Ft+1 + Ft.

The sequence of numbers such that F0 = 1, F1 = 1 and satisfying therule Ft+2 = Ft+1 + Ft is called the Fibonacci sequence. We have that

F0 = 1, F1 = 1, F2 = 2, F3 = 3, F4 = 5, F5 = 8, F6 = 13, F7 = 21,

F8 = 34, F9 = 55, F10 = 89, F11 = 144, F12 = 233.

The solution to the original question is therefore 233 pairs of rabbits.Fibonacci numbers arise in the most diverse situations: famously, in phyl-

lotaxis which is the study of how leaves and petals are arranged on plants.We shall now look for a formula that will enable us to calculate Fn directly.To begin, we’ll follow an idea due to the astronomer Jonannes Kepler, andlook at the behaviour of the fractions Fn+1

Fnas n gets bigger and bigger. I

have tabulated some calculations below.

F1

F0

F2

F1

F3

F2

F4

F3

F5

F4

F6

F5

F7

F6

F14

F13

1 2 1 · 5 1 · 6 1 · 625 1 · 615 1 · 619 1 · 6180

These ratios seem to be going somewhere; the question is: where? Noticethat

Fn+1

Fn=Fn + Fn−1

Fn= 1 +

Fn−1

Fn= 1 +

1Fn

Fn−1

.

But for very large n we suspect that Fn+1

Fnand Fn

Fn−1will be almost the same.

This suggests, but doesn’t prove, that we need to find the positive solutionx to

x = 1 +1

x.

Thus x is a number that when you take its reciprocal and add 1 you get xback again. This problem is really a quadratic equation in disguise

x2 = x+ 1 or more usually x2 − x− 1 = 0.


This equation can be solved very simply to give us

x =1±√

5

2.

That is

φ =1 +√

5

2and φ =

1−√

5

2.

The number φ is called the golden ratio, about which a deal of nonsense hasbeen written. Let’s go back and see if this calculation makes sense. First wecalculate φ and we get

φ = 1 · 618033988 . . .

I computeF19

F18

=6765

4181= 1 · 618033963

on my pocket calculator. This is pretty close.We can now get our formula for the Fibonacci numbers. Define

fn =1√5

(φn+1 − φn+1

).

I’m going to show you that Fn = fn. To do this, I’ll use the followingidentities which are straightforward to check

φ− φ =√

5 φ2 = φ+ 1 and φ2 = φ+ 1.

Let’s start with f0. We know that

φ− φ =√

5

and so we really do have that f0 = 1. To calculate f1 we use the otherformulae and again we get f1 = 1. We now calculate fn + fn+1 we get

fn + fn+1 =1√5

(φn+1 − φn+1

)+

1√5

(φn+2 − φn+2

)=

1√5

(φn+1 + φn+2 − (φn+1 + φn+2)

)=

1√5

(φn+1(1 + φ)− φn+1(1 + φ)

)=

1√5

(φn+1φ2 − φn+1φ2

)=

1√5

(φn+3 − φn+3

)= fn+2


Because fn and Fn start in the same place and satisfy the same rules, wehave therefore proved that

Fn = 1√5

(φn+1 − φn+1

).

At this point, we can go back and verify our original idea that the fractionsFn+1

Fnseem to get closer and closer to φ as n gets larger and larger. We have

that

Fn+1

Fn=

φn+2 − φn+2

φn+1 − φn+1

=φ

1− ( φφ)n+1

− 11φ(φφ)n+1 − 1

φ

I have rewritten it like this so that we can see what happens as n gets largerand larger. Observe that the absolute value of φ

φis less than 1. So as n gets

larger and larger the first term above gets closer and closer to φ. Now lookat the second term. The absolute value of the fraction φ

φis strictly greater

than 1. Thus as n gets larger and larger the denominator of the second termgets larger and larger and so the fraction as a whole gets smaller and smaller.Thus we have proved that Fn+1

Fnreally is close to φ when n is large.

So far, what we have been doing is algebra. I shall now show that thereis geometry here as well. Below is a picture of a regular pentagon. I haveassumed that the length of the sides is 1. I claim that the length of a diagonal,such as BE, is equal to φ.

A

E

B

C

D

φ

1

To prove this I am going to use Ptolomy’s theorem. We shall concentrate onthe cyclic quadrilateral formed by the vertices ABDE.


A

B

C

D

E

I’ll let the side of a diagonal be x. Then by Ptolomy’s theorem, we have that

x2 = 1 + x.

But this is precisely the quadratic equation we solved above. Its positivesolution is φ and so the length of a diagonal of a regular pentagon with side1 is φ.

This raises the question of whether we can somehow see the Fibonaccinumbers in the regular pentagon. The answer is: almost. Consider thediagram below.

A

B

C

D

E

a′

e′d′

b′c′

I’ve drawn in all the diagonals. The shaded triangle BCD is similar to theshaded triangle Ac′E. This means that they have exactly the same shapesjust different sizes. It follows that

Ac′

AE=BC

BD.


But AE is a side of the pentagon and so has unit length, and BD is of lengthφ. Thus

AC ′ =1

φ.

Now, Dc′ has the same length as BC which is a side of the pentagon. ThusDc′ = 1. We now have

φ = DA = Dc′ + c′A = 1 +1

φ.

Thus, just from geometry, we get

φ = 1 +1

φ.

This is a very odd equation because φ is mentioned on both sides. Let’s gowith it and repeat:

φ = 1 +1

1 + 1φ

and

φ = 1 +1

1 +1

1 + 1φ

and

φ = 1 +1

1 +1

1 +1

1 + 1φ

and so on. We therefore obtain a continued fraction. For each of thesefractions cover up the term 1

φand then calculate what you see to get

1, 1 +1

1= 2, 1 +

1

1 + 11

=3

2, 1 +

1

1 +1

1 + 11

=5

3, . . .

and the Fibonacci sequence reappears.

Chapter 5

Complex numbers

Why be one-dimensional when you can be two-dimensional?

0 1 2 3−1−2−3

?

?We begin by returning to the familiar number line, where I have placedthe question marks there appear to be no numbers. I shall rectify this bydefining the complex numbers which give us a number plane rather than just anumber line. Complex numbers play a fundamental role in mathematics. Forexample, in this chapter, I shall use them to show how e and π — numbersof radically different origins — are in fact connected.

5.1 Complex number arithmetic

In the set of real numbers we can add, subtract, multiply and divide, butwe cannot always extract square roots. For example, the real number 1 hasthe two real square roots 1 and −1, whereas the real number −1 has no real

123

124 CHAPTER 5. COMPLEX NUMBERS

square roots, the reason being that the square of any real non-zero number isalways positive. In this section, we shall repair this lack of square roots and,as we shall learn, we shall in fact have achieved much more than this. Com-plex numbers were first studied in the 1500’s but were only fully acceptedand used in the 1800’s.

Warning! If r is a positive real number then√r is usually interpreted to

mean the positive square root. If I want to emphasize that both square rootsneed to be considered I shall write ±

√r.

When the discriminant of a quadratic equation is strictly less than zero,we know that it has no real roots. In this section, we shall show that inthis case the equation has two complex roots. This will mean that quadraticequations will always have two roots. The key step is the following

We introduce a new number, denoted by i, whose defining propertyis that i2 = −1. We shall assume that in all other respects itsatisfies the usual axioms of high-school algebra. This assumptionwill be justified later.

We shall now explore the consequences of this definition which turns outto be a profound one for mathematics. The numbers i and −i are the two‘missing’ square roots of 1. In all other respects, the number i will behavelike a real number. Thus if b is any real number then bi is a number, and ifa is any real number then a+ bi is a number. We therefore formally define acomplex number to be a number of the form a+bi where a, b ∈ R. We denotethe set of complex numbers by C. Complex numbers are sometimes calledimaginary numbers. This is not such a good term: they are not figments ofour imagination like unicorns or dragons. Like all numbers they are, however,products of our imagination: no one has seen the complex number number ibut, then again, no one has seen the number 2. If z = a + bi then we call athe real part of z, denoted Re(z), and b the complex or imaginary part of z,denoted Im(z).

Two complex numbers a+ bi and c+ di are equal precisely whena = c and b = d. In other words, when their real parts are equaland when their complex parts are equal.

We can think of every real number as being a special kind of complexnumber because if a is real then a = a+ 0i. Thus R ⊆ C. Complex numbers

5.1. COMPLEX NUMBER ARITHMETIC 125

of the form bi are said to be purely imaginary. Now we show that we canadd, subtract, multiply and divide complex numbers. Addition, subtractionand multiplication are all easy. Let a+ bi, c+ di ∈ C. To add these numbersmeans to calculate (a + bi) + (c + di). We assume that the order in whichwe add complex numbers doesn’t matter and that we may bracket sums ofcomplex numbers how we like and still get the same answer and so we canrewrite this as a+c+bi+di. Next we assume that multiplication of complexnumbers distributes over addition of complex numbers to get (a+c)+(b+d)i.Thus

(a+ bi) + (c+ di) = (a+ c) + (b+ d)i.

The definition of subtraction is similar and justified in the same way

(a+ bi)− (c+ di) = (a− c) + (b− d)i.

To multiply our numbers means to calculate (a+ bi)(c+di). We first assumecomplex multiplication distributes over complex addition to get (a+ bi)(c+di) = ac + adi + bic + bidi. Next we assume that the order in which wemultiply complex numbers doesn’t matter to get ac+ adi+ bic+ bidi = ac+adi+bci+bdi2. Now we use the fact that i2 = −1 to get ac+adi+bci+bdi2 =ac+adi+bci−bd. We now rearrange the terms to get the following definitionof multiplication

(a+ bi)(c+ di) = (ac− bd) + (ad+ bc)i.

Examples 5.1.1. Carry out the following calculations.

1. (7 − i) + (−6 + 3i). We add together the real parts to get 1; addingtogether −i and 3i we get 2i. Thus the solution is 1 + 2i.

2. (2 + i)(1 + 2i). First we multiply out the brackets as usual to get2 + 4i+ i+ 2i2. We now use the fact that i2 = −1 to get 2 + 4i+ i− 2.Finally we simplify to get 0 + 5i = 5i.

3.(

1−i√2

)2

. Multiply out and simplify to get −i.

The final operation is division. We have to show that when a + ib 6= 0the reciprocal

1

a+ ib


is also a complex number. We use an idea that can also be applied in othersituations called rationalizing the denominator. It is convenient first to definea new operation on complex numbers. Let z = a+ bi ∈ C. Define

z = a− bi.

The number z is called the complex conjugate of z. Why is this operationuseful? Let’s calculate zz. We have

zz = (a+ bi)(a− bi) = a2 − abi+ abi− b2i2 = a2 + b2.

Notice that zz = 0 if and only if z = 0. Thus for non-zero complex numbersz, the number zz is a positive real number. Let’s see how we can use thecomplex conjugate to define division of complex numbers. Our goal is tocalculate

1

a+ bi

where a+bi 6= 0. The first step is to multiply top and bottom by the complexconjugate of a+ bi. We therefore get

a− bi(a+ bi)(a− bi)

=a− bia2 + b2

=1

a2 + b2(a− bi) .

Examples 5.1.2. Carry out the following calculations.

1. 1+ii

. The complex conjugate of i is −i. Multiply top and bottom of thefraction to get −i+1

1= 1− i.

2. i1−i . The complex conjugate of 1− i is 1 + i. Multiply top and bottom

of the fraction to get i(1+i)2

= i−12

.

3. 4+3i7−i . The complex conjugate of 7− i is 7 + i. Multiply top and bottom

of the fraction to get (4+3i)(7+i)50

= 1+i2

.

We shall need the following properties of the complex conjugate later on.

Lemma 5.1.3.

1. z1 + . . .+ zn = z1 + . . .+ zn.

2. z1 . . . zn = z1 . . . zn.


3. z is real if and only if z = z.

Proof. (1) We prove the case where n = 2. The general case can then beproved using induction. Let z1 = a + ib and z2 = c + id. Then z1 + z2 =(a + c) + i(b + d). Thus z1 + z2 = (a + c) − i(b + d). But z1 = a − ib andz2 = c − id and so z1 + z2 = (a − ib) + (c − id) = (a + c) − (b + d)i. Thusz1 + z2 = z1 + z2.

(2) We prove the case where n = 2. The general case can then be provedusing induction. Using the notation form part (1), we have that z1z2 =(ac− bd) + (ad+ bc)i. Thus z1z2 = (ac− bd)− (ad+ bc)i. On the other hand,z1z2 = (ac− bd)− i(ad+ bd), as required.

(3) If z is real then it is immediate that z = z. Suppose that z = z wherez = a+ ib. Then a+ ib = a− ib. Hence b = −b and so b = 0. It follows thatz is real.

We now introduce a way of thinking about complex numbers that enablesus to visualize them. A complex number z = a + bi has two components: aand b. It is irresistible to plot these as a point in the plane. The plane usedin this way is called the complex plane: the x-axis is the real axis and they-axis is interpreted as the complex axis.

a

ibz = a+ ib

Although a complex number can be thought of as labelling a point in thecomplex plane, it can also be regarded as labelling the directed line segmentfrom the origin to the point, and this turns out to be the more fruitfulviewpoint. By Pythagoras’ theorem, the length of this line is

√a2 + b2. We

define

|z| =√a2 + b2


where z = a + bi. This is called the modulus1 of the complex number z.Observe that

|z| =√zz.

We shall use the following important property of moduli.

Lemma 5.1.4. |wz| = |w| |z|.

Proof. Let w = a+ bi and z = c+di. Then wz = (ac− bd) + (ad+ bc)i. Now|wz| =

√(ac− bd)2 + (ad+ bc)2 whereas |w| |z| =

√(a2 + b2)(c2 + d2). But

(ac− bd)2 + (ad+ bc)2 = (ac)2 + (bd)2 + (ad)2 + (bc)2 = (a2 + b2)(c2 + d2).

Thus the result follows.

The complex numbers were obtained from the reals by simply adjoiningone new number, i, a square root of −1. Remarkably, every complex numberhas a square root — there is no need to invent any new numbers.

Theorem 5.1.5. Every nonzero complex number has exactly two squareroots.

Proof. Let z = a + bi be a nonzero complex number. We want to find acomplex number w so that w2 = z. Let w = x + yi. Then we need to findreal numbers x and y such that (x+yi)2 = a+bi. Thus (x2−y2)+2xyi = a+bi,and so equating real and imaginary parts, we have to solve the following twoequations

x2 − y2 = a and 2xy = b.

Now we actually have enough information to solve our problem, but we canmake life easier for ourselves by adding one extra equation. To get it, we usethe modulus function. From (x+yi)2 = a+bi we get that |x+ yi|2 = |a+ bi|.Now |x+ yi|2 = x2 + y2 and |a+ bi| =

√a2 + b2. We therefore have three

equations

x2 − y2 = a and 2xy = b and x2 + y2 =√a2 + b2.

If we add the first and third equation together we get

x2 =a

2+

√a2 + b2

2=a+√a2 + b2

2.

We can now solve for x and therefore for y.

1Plural: moduli


Example 5.1.6. Every negative real number has two square roots. We havethat the square roots of −r, where r > 0 are ±i

√r.

Example 5.1.7. Find both square roots of 3 + 4i and check your answers.We assume that there is a complex number x + yi where both x and y arereal such that

(x+ yi)2 = 3 + 4i.

Squaring and comparing real and imaginary parts we get that the followingtwo equations must be satisfied by x and y

x2 − y2 = 3 and 2xy = 4.

We also have a third equation by taking moduli

x2 + y2 = 5.

Adding the first and third equation together we get x = ±2. Thus y = 1 ifx = 2 and y = −1 if x = −2. The roots we want are therefore 2 + i and−2− i. Of course, one root will be minus the other. Now square either rootto check your answer: (2 + i)2 = 4 + 4i− 1 = 3 + 4i, as required.

Remark Notice that the two square roots of a non-zero complex number willhave the form w and −w; in other words, one root will be −1 times the other.

If we combine our method for solving quadratics with our method fordetermining the square roots of complex numbers, we have a method forfinding the roots of quadratics with any coefficients, whether they be real orcomplex.

Example 5.1.8. Solve the quadratic equation

4z2 + 4iz + (−13− 16i) = 0.

The complex numbers obey the same algebraic laws as the reals and so wecan solve this equation by completing the square or we can simply plug thenumbers into the formula for the roots of a quadratic. Here I shall completethe square. First, we convert the equation into a monic one

z2 + iz +(−13− 16i)

4= 0.


Next, we observe that (z +

i

2

)2

= z2 + iz − 1

4.

Thus

z2 + iz =

(z +

i

2

)2

+1

4.

Our equation therefore becomes(z +

i

2

)2

+1

4+

(−13

4− 4i

)= 0.

We therefore have (z +

i

2

)2

= 3 + 4i.

Taking square roots of both sides using a previous calculation, we have that

z +i

2= 2 + i or − 2− i.

It follows that z = 2 + i2

or − 2 − 3i2

. Now check that these roots really dowork.

Every quadratic equation ALWAYS has exactly two roots.

Exercises 5.1

1. Solve the following problems in complex number arithmetic. In eachcase, the answer should be in the form a+ ib where a and b are real.

(a) (2 + 3i) + (4 + i).

(b) (2 + 3i)(4 + i).

(c) (8 + 6i)2.

(d) 2+3i4+i

.

(e) 1i

+ 31+i

.

5.2. THE FUNDAMENTAL THEOREM OF ALGEBRA 131

(f) 3+4i3−4i− 3−4i

4+4i.

2. Find the square roots of each of the following complex numbers andcheck your answers.

(a) −i.(b) −1 + i

√24.

(c) −13− 84i.

3. Solve the following quadratic equations and check your answers.

(a) x2 + x+ 1 = 0.

(b) 2x2 − 3x+ 2 = 0.

(c) x2 − (2 + 3i)x− 1 + 3i = 0.

5.2 The fundamental theorem of algebra

We have proved that every quadratic equation has exactly two roots. Thegoal of this section is to generalize this result: I shall prove that every poly-nomial equation of degree n has exactly n roots. This result plays a key rolein calculus where it is used (in its real version which I also describe) to provethat any rational function can be integrated using partial fractions. In thissection, we shall work with arbitrary polynomials so I shall now recall someof the terminology needed to handle them. An expression

anxn + an−1x

n−1 + . . .+ a1x+ a0

where ai are complex numbers, called the coefficients, is called a polynomial.If all the coefficients are zero then the polynomial is identically zero and weshall call it the zero polynomial. We assume an 6= 0. The degree of thispolynomial is n. We abbreviate this to deg. If an = 1 the polynomial issaid to be monic. The term a0 is called the constant term and the termanx

n is called the leading term. Polynomials can be added, subtracted andmultiplied. Two polynomials are equal if they have the same degree and thecoefficients of terms of the same degree are equal.

• Polynomials of degree 1 are said to be linear.


• those of degree 2, quadratic.

• those of degree 3, cubic.

• those of degree 4, quartic.

• those of degree 5, quintic.

There are special terms for polynomials of degree higher than 5, if you wantthem. Why are polynomials interesting? There are two answers to this ques-tion. First, they have widespread applications such as in helping to solvelinear differential equations and in studying matrices. Second, a polynomialdefines a function which is calculated in a very simple way using the op-erations of addition, subtraction and multiplication. However many, morecomplicated, functions can be usefully approximated by polynomial ones.We denote by C[x] the set of polynomials with complex coefficients and byR[x], the set of polynomials with real coefficients. I will write F [x] to meanF = R or F = C.

5.2.1 The remainder theorem

The addition, subtraction and multiplication of polynomials is easy. We shalltherefore concentrate in this section on division. Let f(x), g(x) ∈ F [x]. Wesay that g(x) divides f(x), denoted by

g(x) | f(x),

if there is a polynomial q(x) ∈ F [x] such that f(x) = g(x)q(x). We say thatg(x) is a factor of f(x). There are obvious similarities here with our work inChapter 4.

Example 5.2.1. Let f(x) = x4 + 2x+ 1 and g(x) = x+ 1. Then

x+ 1 | x4 + 2x+ 1

since x4 + 2x+ 1 = (x+ 1)(x3 − x2 + x+ 1).

In multiplying and dividing polynomials the following result is key.

Lemma 5.2.2. Let f(x), g(x) ∈ F [x] be non-zero polynomials. Then

deg f(x)g(x) = deg f(x) + deg g(x).


Proof. Let f(x) have leading term amxm and let g(x) have leading term bnx

n.Then the leading term of f(x)g(x) is ambnx

m+n. Now ambn 6= 0 and so thedegree of f(x)g(x) is m+ n, as required.

The following result is analogous to the remainder theorem for integersLemma 4.1.1 I shall not prove it here.

Lemma 5.2.3 (Remainder theorem). Let f(x) and g(x) be polynomials inF [x] where deg f(x) ≥ deg g(x). Then either

g(x) | f(x)

orf(x) = g(x)q(x) + r(x)

where deg r(x) < deg g(x).

Example 5.2.4. Let f(x) = x3 +x+3 and g(x) = x2 +x. Then x3 +x+3 =(x − 1)(x2 + x) + (2x + 3). Here x − 1 is the quotient and 2x + 3 is theremainder.

The following example is a reminder of how to carry out long division ofpolynomials. Remember that answers can always be checked by multiplyingout.

Example 5.2.5. Divide 6x4 + 5x3 + 4x2 + 3x+ 2 by 2x2 + 4x+ 5 and so findthe quotient and remainder. We set out the computation in the followingform.

2x2 + 4x+ 5 6x4 + 5x3 + 4x2 + 3x+ 2

To get the term involving 6x4 we would have to multiply the lefthand sideby 3x2. As a result we write down the following

3x2

2x2 + 4x+ 5 6x4 + 5x3 + 4x2 + 3x+ 26x4 + 12x3 + 15x2

We now subtract the lower righthand side from the upper and we get

3x2

2x2 + 4x+ 5 6x4 + 5x3 + 4x2 + 3x+ 26x4 + 12x3 + 15x2

−7x3 − 11x2 + 3x+ 2


The procedure is now repeated with the new polynomial.

3x2 − 72x

2x2 + 4x+ 5 6x4 + 5x3 + 4x2 + 3x+ 26x4 + 12x3 + 15x2

−7x3 − 11x2 + 3x+ 2−7x3 − 14x2 − 35

2x

3x2 + 412x+ 2

The procedure is repeated one more time with the new polynomial

3x2 − 72x+ 3

2quotient

2x2 + 4x+ 5 6x4 + 5x3 + 4x2 + 3x+ 26x4 + 12x3 + 15x2

−7x3 − 11x2 + 3x+ 2−7x3 − 14x2 − 35

2x

3x2 + 412x+ 2

3x2 + 122x+ 15

2292x− 11

2remainder

This is the end of the line because the new polynomial we obtain has degreestrictly less than the polynomial we are dividing by. What we have shown isthat

6x4 + 5x3 + 4x2 + 3x+ 2 =(2x2 + 4x+ 5

)(3x2 − 7

2x+

3

2

)+

(29

2x− 11

2

).

You can verify this is true by multiplying out the righthand side.

5.2.2 Roots of polynomials

Let f(x) ∈ F [x]. A number r ∈ F is said to be a root or zero of f(x) iff(r) = 0. The roots of f(x) are the solutions of the equation f(x) = 0.

Example 5.2.6. The number 1 is a root of x100−2x98+1 because 1−2+1 = 0.

Checking whether a number is a root is easy, but finding a root in thefirst place is trickier. The next result tells us that when we find roots of poly-nomials we are in fact determining linear factors. It is crucial to eveythingwe shall do.


Proposition 5.2.7. Let r ∈ F . Then r is a root of f(x) ∈ F [x] if and onlyif (x− r) | f(x).

Proof. Suppose that (x − r) | f(x). Then by definition f(x) = (x − r)q(x)for some polynomial q(x). If we now calculate f(r) we see immediately thatit must be zero.

We now prove the converse. Suppose that r is a root of f(x). By theremainder theorem, either (x− r) | f(x) or f(x) = q(x)(x− r) + r(x) wheredeg(r(x)) < deg(x − r) = 1. If the former then we are done. If the latterthen it follows that r(x) is in fact a constant (that is, just a number). Callthis number a. If we calculate f(r) we get a. It follows that in fact a = 0and so (x− r) | f(x).

Example 5.2.8. We have seen that the number 1 is a root of x100−2x98 +1.Thus by the above result (x− 1) | x100 − 2x98 + 1.

A root r of a polynomial f(x) is said to have multiplicity m if

(x− r)m | f(x)

but (x− r)m+1 does not divide f(x). A root is always counted according toits multiplicity.

Example 5.2.9. The polynomial x2 + 2x+ 1 has −1 as a root and no otherroots. However (x + 1)2 = x2 + 2x + 1 and so the root −1 occurs withmultiplicity 2. Thus the polynomial has two roots counting multiplicities.This is the sense in which we can say that a quadratic equation always hastwo roots.

The following result is extremely useful. It provides an upper bound tothe number of roots a polynomial may have.

Theorem 5.2.10. A non-constant polynomial of degree n has at most nroots.

Proof. Let f(x) be a non-zero polynomial of degree n > 0. Suppose thatf(x) has a root a. Then f(x) = (x − a)f1(x) by Proposition 5.2.7 and thedegree of f1(x) is n − 1. This argument can be repeated and we reach thedesired conclusion.


5.2.3 The fundamental theorem of algebra

The big question I have so far not dealt with is whether a polynomial needhave a root at all. This is answered by the following theorem whose namereflects its importance when first discovered, though not its significance inmodern algebra. We shall not give a proof because that would require moreadvanced methods than are covered in this book. It was first proved byGauss.

Theorem 5.2.11 (Fundamental theorem of algebra (FTA)). Every non-constant polynomial of degree n with complex coefficients has a root.

The fundamental theorem of algebra has the following important conse-quence using Theorem 5.2.10.

Corollary 5.2.12. Every non-constant polynomial with complex coefficientsof degree n has exactly n complex roots (counting multiplicities). Thus everysuch polynomial can be written as a product of linear polynomials.

Proof. Let f(x) be a non-constant polynomial of degree n. By the FTA, thispolynomial has a root r1. Thus f(x) = (x−r1)f1(x) where f1(x) is a polyno-mial of degree n− 1. This argument can be repeated and we eventually endup with f(x) = a(x− r1) . . . (x− rn) where a is the last quotient, necessarilya complex number.

Example 5.2.13. It can be checked that the quartic x4− 5x2− 10x− 6 hasroots −1, 3, i− 1 and −1− i. We can therefore write

x4 − 5x2 − 10x− 6 = (x+ 1)(x− 3)(x+ 1 + i)(x+ 1− i).

In many practical examples, our polynomials will have real coefficientsand we will want any factors of the polynomial to likewise be real. The resultabove doesn’t do that because it could produce complex factors. However,we can rectify this situation at a very small price. We shall use the notion ofthe complex conjugate of a complex number that we introduced earlier. Wemay now prove the following key lemma.

Lemma 5.2.14. Let f(x) be a polynomial with real coefficients. If the com-plex number z is a root then so too is z.


Proof. Let

f(x) = anxn + an−1x

n−1 + . . .+ a1x+ a0

where the ai are real numbers. Let z be a complex root. Then

0 = anzn + an−1z

n−1 + . . .+ a1z + a0.

Take the complex conjugate of both side and use the properties of the complexconjugate to get

0 = anzn + an−1z

n−1 + . . .+ a1z + a0

and so z is also a root.

Example 5.2.15. We saw above that

x4 − 5x2 − 10x− 6 = (x+ 1)(x− 3)(x+ 1 + i)(x+ 1− i).

Observe that the complex roots −1 − i and −1 + i are complex conjugatesof each other.

Lemma 5.2.16. Let z be a complex number which is not real. Then

(x− z)(x− z)

is an irreducible quadratic with real coefficients.On the other hand, if x2 + bx + c is an irreducible quadratic with real

coefficients then its roots are complex conjugates of each other.

Proof. To prove the first claim, we multiply out to get

(x− z)(x− z) = x2 − (z + z)x+ zz.

Observe that z + z and zz are both real numbers. The discriminant of thispolynomial is (z− z)2. You can check that if z is complex and non-real thenz − z is purely complex. It follows that its square is negative. We havetherefore shown that our quadratic is irreducible.

The proof of the second claim follows from the formula for the roots of aquadratic combined with the fact that the square root of a negative real willhave the form ±αi where α is real.


Example 5.2.17. We saw above that

x4 − 5x2 − 10x− 6 = (x+ 1)(x− 3)(x+ 1 + i)(x+ 1− i).

Multiply out (x+ 1 + i)(x+ 1− i) and we get x2 + 2x+ 2. Thus

x4 − 5x2 − 10x− 6 = (x+ 1)(x− 3)(x2 + 2x+ 2)

with all the polynomials involved being real.

The following theorem is the one that we can use to help us solve problemsinvolving real polynomials.

Theorem 5.2.18 (Fundamental theorem of algebra for real polynomials).Every non-constant polynomial with real coefficients can be written as a prod-uct of polynomials with real coefficients which are either linear or irreduciblequadratic.

Proof. We can write the polynomial as a product of linear polynomials. Bringthe real linear factors to the front. The remaining linear polynomials willhave complex coefficients. They correspond to roots that come in complexconjugate pairs. Multiplying together those complex linear factors corre-sponding to complex conjugate roots we get real quadratics and the result isproved.

In fact, we can write any real polynomial as a real number times a productof monic linear and quadratic factors. This result is the basis of the methodof partial fractions used in integrating rational functions in calculus. This isdiscussed in Chapter 6.

Finding the exact roots of a polynomial is difficult, in general. However,the following result tells us how to find the rational roots of polynomials withinteger coefficients. It is a nice, and perhaps unexpected, application of thenumber theory we developed in Chapter 4.

Theorem 5.2.19 (Rational root theorem). Let

f(x) = anxn + an−1x

n−1 + . . .+ a1x+ a0

be a polynomial with integer coefficients. If rs

is a root with r and s coprimethen r | a0 and s | an. In particular, if the polynomial is monic then anyrational roots must be integers and divide the constant term.


Proof. Substituting rs

into f(x) we have, by assumption, that

0 = an(r

s)n + an−1(

r

s)n−1 + . . .+ a1(

r

s) + a0.

Multiply through by sn to get

0 = anrn + an−1sr

n−1 + . . .+ sn−1r + a0sn.

We now make two observations. First, r | a0sn. I claim that r and sn are

coprime. We may now deduce that r | a0 from a previous exercise. It onlyremains to prove the claim. Let p be any prime that divides r and sn. Thenby Euclid’s lemma, p divides r and s which is a contradiction since r and sare coprime. It follows that r | a0. Second, s | anrn. By a similar argumentto the previous case s | an.

Example 5.2.20. Find all the roots of the following polynomial

x4 − 8x3 + 23x2 − 28x+ 12.

The polynomial is monic and so the only possible rational roots are integersand must divide 12. Thus the only possible rational roots are

±1,±2,±3,±4,±6,±12.

We find immediately that 1 is a root and so (x−1) must be a factor. Dividingout by this factor we get the quotient

x3 − 7x2 + 16x− 12.

We check this polynomial for rational roots and find 2 works. Dividing outby (x− 2) we get the quotient

x2 − 5x+ 6.

Once we get down to a quadratic we can solve it directly. In this case itfactorizes as (x− 2)(x− 3). We therefore have that

x4 − 8x3 + 23x2 − 28x+ 12 = (x− 1)(x− 2)2(x− 3).

At this point, multiply out the righthand side and check that we really dohave an equality. In this case, all roots are rational and are 1,2,2,3.


Exercises 5.2

1. Find the quotient and remainder when the first polynomial is dividedby the second.

(a) x3 − 7x− 1 and x− 2.

(b) x4 − 2x2 − 1 and x2 + 3x− 1.

(c) 2x3 − 3x2 + 1 and x.

2. Find all roots using the information given.

(a) 4 is a root of 3x3 − 20x2 + 36x− 16.

(b) −1,−2 are both roots of x4 + 2x3 + x+ 2.

3. Find a cubic having roots 2,−3, 4.

4. Find a quartic having roots i, −i, 1 + i and 1− i.

5. The cubic x3 + ax2 + bx+ c has roots α, β and γ. Show that a, b, c caneach be written in terms of the roots.

6. 3 + i√

2 is a root of x4 + x3 − 25x2 + 41x + 66. Find the remainingroots.

7. 1− i√

5 is a root of x4−2x3 + 4x2 + 4x−12. Find the remaining roots.

8. Find all the roots of the following polynomials.

(a) x3 + x2 + x+ 1.

(b) x3 − x2 − 3x+ 6.

(c) x4 − x3 + 5x2 + x− 6.

9. Write each of the following polynomials as a product of linear or quadraticreal factors.

(a) x3 − 1.

(b) x4 − 1.

(c) x4 + 1.

5.3. COMPLEX NUMBER GEOMETRY 141

5.3 Complex number geometry

We have proved that every non-zero complex number has two square rootsand from the fundamental theorem of algebra (FTA), we know that everynon-zero complex number has three cube roots, and four fourth roots, andmore generally n nth roots. However, we didn’t prove the FTA. The maingoal of this section is to prove that every non-zero complex number has nnth-roots. To do this, we shall think about complex numbers in a geometric,rather than an algebraic, way. Throughout this section we shall not assumeFTA. We shall only need Theorem 5.2.10: every polynomial of degree n hasat most n roots.

5.3.1 sin and cos

We recall some well-known properties of the trigonometric functions sin andcos. First the addition formulae

sin(α + β) = sinα cos β + cosα sin β

andcos(α + β) = cosα cos β − sinα sin β.

These formulae were important historically because they enabled unknownvalues of sin’s and cos’s to be calculated from known ones, and so they wereuseful in constructing trig tables in the days before calculators

Angles are most naturally measured in radians rather than degrees —the system of angle measurement based on degrees is an historical accident.Why 360 degrees in a circle? You would have to ask the Ancient Babylonians.Recall that positive angles are measures in an anticlockwise direction.

The sin and cos functions are periodic functions with period 2π. Thismeans that for all angles θ

sin(θ + 2πn) = sin θ and cos(θ + 2πn) = cos θ

for all n ∈ Z. This fact will be crucial in what follows.

5.3.2 The complex plane

In this section, we shall describe in more detail an alternative way of thinkingabout complex numbers which turns out to be very fruitful. Recall that a


complex number z = a+ bi has two components: a and b. We can plot theseas a point in the plane. The plane used in this way is called the complexplane: the x-axis is the real axis and the y-axis is interpreted as the complexaxis. Although a complex number can be thought of as labelling a point inthe complex plane, it can more usefully be regarded as labelling the directedline segment from the origin to the point. This is how we shall regard it.Let z = a+ bi be a non-zero complex number and let θ be the angle that itmakes with the positive reals. The length of z as a directed line segment inthe complex plane is |z|, and by basic trig a = |z| cos θ and b = |z| sin θ. Itfollows that

z = |z| (cos θ + i sin θ) .

i |z| sin θ

|z| cos θ

z

θ

Observe that |z| is a non-negative real number. This way of writing complexnumbers is called the polar form.

At this point, I need to clarify the only feature of complex numbers thatcauses confusion. I have already mentioned that the functions sin and cosare periodic. For that reason, there is not just one number θ that yieldsthe complex number z but infinitely many of them: namely, all the numbersθ + 2πk where k ∈ Z. For this reason, we define the argument of z, denotedby arg z, not merely to be the single angle θ but the set

arg z = {θ + 2πk : where k ∈ Z}.

The angle θ is chosen so that 0 ≤ θ < 2π and is called, for convenience, theprincipal argument. But note that books vary on what they choose to callthe principal argument. This feature of the argument plays a crucial rolewhen we come to calculate nth roots.

Let w = r (cos θ + i sin θ) and z = s (cosφ+ i sinφ) be two non-zerocomplex numbers. We shall calculate wz. We have that


wz = rs (cos θ + i sin θ) (cosφ+ i sinφ)

= rs[(cos θ cosφ− sin θ sinφ) + (sin θ cosφ+ cos θ sinφ)i]

but using the properties of the sin and cos functions this reduces to

wz = rs (cos(θ + φ) + i sin(θ + φ)) .

We thus have the following important result:

when two non-zero complex numbers are multiplied together theirlengths are multiplied and their arguments are added.

This result helps us to understand the meaning of i. Multiplication by iis the same as a rotation about the origin by a right angle. Multiplication byi2 is therefore the same as a rotation about the origin by two right angles.But this is exactly the same as multiplication by −1.

1

i

−1

−i

We may apply similar reasoning to explain geometrically why −1×−1 = 1.We of course proved this algebraically in Chapter 3. Multiplication by −1 isinterpreted as rotation about the origin by 180◦. It follows that doing thistwice takes us back to where we started and so is equivalent to multiplicationby 1.

The proof of the next theorem follows by induction from the result weproved above. But it is important to note that it is the result above that isreally fundamental.


Theorem 5.3.1 (De Moivre). Let n be a positive integer. If z = r (cos θ + i sin θ)then

zn = rn (cosnθ + i sinnθ) .

Example 5.3.2. Observe that complex numbers of the form

S1 = {cos θ + i sin θ : θ ∈ R}

can be interpreted geometrically as being the unit circle with centre theorigin in the complex plane. Thus every non-zero complex number is a realnumber times a complex number lying on the unit circle. The set S1 hassome interesting algebraic properties as well. Observe that if u, v ∈ S1 thenuv ∈ S1, and that if u ∈ S1 then u−1 ∈ S1.

Our results above have nice applications in painlessly obtaining trigono-metric identities.

Example 5.3.3. If you remember that when multiplying complex numbersin polar form you add their arguments, then you can easily reconstitute theidentities we started with since

(cosα + i sin β)(cosα + i sin β) = cos(α + β) + i sin(α + β).

This is helpful in getting both sines and signs right.

Example 5.3.4. Express cos 3θ in terms of cos θ and sin θ using De Moivre’sTheorem. We have that

(cos θ + i sin θ)3 = cos 3θ + i sin 3θ.

However, we can expand the lefthand side to get

cos3 θ + 3i cos2 θ sin θ + 3 sin θ(i sin θ)2 + (i sin θ)3

which simplifies to(cos3 θ − 3 cos θ sin2 θ

)+ i(3 cos2 θ sin θ − sin3 θ

)where we use the fact that i2 = −1 and i3 = −i and i4 = 1. Equating realand imaginary parts we get

cos 3θ = cos3 θ − 3 cos θ sin2 θ.

We also get the formula

sin 3θ = 3 cos2 θ sin θ − sin3 θ

for free.


5.3.3 Arbitrary roots of complex numbers

In this section, we shall prove that every non-zero complex number has nnth roots: thus it has three cube roots, and four fourth roots and so on. Webegin with a special case that turns out to give us almost all the informationwe need to solve the general case. We shall also need the following importantidea. The word radical simply means a square root, or a cube root, or a fourthroot and so on. We regard the four basic operations of algebra — addition,subtraction, multiplication and division — together with the extraction ofnth roots as purely algebraic operations. Although slightly failing as a precisedefinition, I shall say that a radical expression is an algebraic expressioninvolving nth roots. For example, the formula for the roots of a quadraticdescribes the roots as radical expressions in terms of the coefficients of thequadratic. Thus a radical expression is supposed to be an explicit descriptionof some real number. The following table gives some easy to find radicalexpressions for the sines and cosines of some well-known angles.

θ sin θ cos θ

0◦ 0 1

30◦ 12

√3

2

45◦ 1√2

1√2

60◦√

32

12

90◦ 1 0

The nth roots of unity

We shall show that the number 1 has n nth roots — these are called then roots of unity. We know that the equation zn− 1 = 0 has at most n roots,so all we need do is find n roots and we are home and dry. We begin with amotivating example.

Example 5.3.5. We find the three cube roots of 1. There are two ways ofwriting these roots: as trigonometric expressions and as radical expressions.Divide the unit circle in the complex plane into an equilateral triangle with 1as one of its vertices. Then the other two roots are ω1 = cos 120◦+ i sin 120◦

obtained by dividing 2π by 3 and ω2 = cos 240◦+ i sin 240◦ which is twice 2π3

.If we put ω = ω1 then in fact ω2 = ω2. This is the trigonometric form of theroots.


1

ω

ω2

In this case, it is easy to write down the radical expressions for the rootsas well since we already have radical expressions for sin 60◦ and cos 60◦. Wetherefore have that

ω =1

2

(−1 + i

√3)

and ω2 = −1

2

(1 + i

√3).

The general case is solved in a similar way to our example above usingregular n-gons in the complex plane where one of the vertices is 1.

Theorem 5.3.6 (Roots of unity). The n roots of unity are given by thefollowing formula

cos2kπ

n+ i sin

2kπ

n

for k = 1, 2, . . . , n. These complex numbers are arranged uniformly on theunit circle and form a regular polygon with n sides: the cube roots of unityform an equilateral triangle, the fourth roots form a square, the fifth rootsform a pentagon, and so on.

There is one point here that is potentially confusing. It is always possible,and easy, to write down trigonometric expressions for the nth roots of unity.Using such an expression, we can then write down numerical values of thenth roots to any desired degree of accuracy. Thus, from a purely practicalpoint of view, we can find the nth roots of unity. It is also always possibleto write down the radical expressions of the nth roots of unity but this isfar from easy in general. In fact, it forms part of the advanced subjectknown as Galois theory.

Example 5.3.7. Gauss proved the following result which is highly non-trivial. You can verify that it is true by using a calculator — at least up tothe limits of your calculator. It is a good example of a radical expression


where, on this occasion, the only radicals that occur are square roots; thetheory Gauss developed showed that this implied that the 17-gon could beconstructed using only a ruler and compass.

16 cos2π

17= −1 +

√17 +

√34− 2

√17

+

√(68 + 12

√17− 16

√34 + 2

√17− 2(1−

√17)

√34− 2

√17

)Arbitrary nth roots

The nth roots of unity play an important role in finding arbitrary nthroots. We begin with an example to illustrate the idea.

Example 5.3.8. We find the three cube roots of 2. If you use your calculatoryou will simply find 3

√2, a real number. There should be two others: where

are they? The explanation is that the other two cube roots are complex. Letω be the complex cube root of 1 that we described above. Then the threecube roots of 2 are the following

3√

2, ω3√

2, ω2 3√

2.

The above example generalizes.

Theorem 5.3.9 (nth roots). Let z = r (cos θ + i sin θ) be a non-zero complexnumber. Put

u = n√r

(cos

θ

n+ i sin

θ

n

),

the obvious nth root, and put

ω = cos2π

n+ i sin

2π

n,

the first interesting nth root of unity. Then the nth roots of z are as follows

u, uω, . . . , uωn−1.

It follows that the nth roots of z = r (cos θ + i sin θ) can be written in theform

n√r

(cos

(θ

n+

2kπ

n

)+ i sin

(θ

n+

2kπ

n

))for k = 0, 1, 2, . . . , n− 1.

This is the reason why every non-zero number has two square roots thatdiffer by a multiple of −1: the two square roots of 1 are 1 and -1.


5.3.4 Euler’s formula

We have seen that every real number can be written as a whole number plusa possibly infinite decimal part. It turns out that many functions can also bewritten as a sort of decimal. I shall illustrate this by means of an example.Consider the function ex. All you need to know about this function is thatit is equal to its derivative and e0 = 1. We would like to write

ex = a0 + a1x+ a2x2 + a3x

3 + . . .

where the ai are real numbers that we have yet to determine. We can workout the value of a0 easily by putting x = 0. This tells us that a0 = 1. To getthe value of a1 we first differentiate our expression to get

ex = a1 + 2a2x+ 3a3x2 + . . .

Now put x = 0 again and this time we get that a1 = 1. To get the value ofa2 we differentiate our expression again to get

ex = 2a2 + 3 · 2 · a3x+ . . .

Now put x = 0 and we get that a2 = 12. Continuing in this way we quickly

spot the pattern for the values of the coefficient an. We find that an = 1n!

where n! = n(n − 1)(n − 2) . . . 2 · 1. What we have done for ex we can alsodo for sin x and cosx and we obtain the following series expansions of eachof these functions.

• ex = 1 + x+ x2

2!+ x3

3!+ x4

4!+ . . ..

• sinx = x− x3

3!+ x5

5!− x7

7!+ . . ..

• cosx = 1− x2

2!+ x4

4!− x6

6!+ . . ..

There are interesting connections between these three series. We shallnow show that complex numbers help to explain them. Without worryingabout the validity of doing so, we calculate the infinite series expansion ofeiθ. We have that

eiθ = 1 + (iθ) +1

2!(iθ)2 +

1

3!(iθ)3 + . . .


that is

eiθ = 1 + iθ − 1

2!θ2 − 1

3!θ3i +

1

4!θ4 + . . .

By separating out real and complex parts, and using the infinite series weobtained above, we get Euler’s remarkable formula

eiθ = cos θ + i sin θ.

Thus the complex numbers enable us to find the hidden connections betweenthe three most important functions of calculus: the exponential function andthe sine and cosine functions. It follows that every non-zero complex numbercan be written in the form reiθ. If we put θ = π in Euler’s formula, we getthe following result, which is widely regarded as one of the most amazing inmathematics.

Theorem 5.3.10 (Euler’s identity).

eπi = −1.

This result shows us that the real numbers π, e and −1 are connected,but that to establish that connection we have to use the complex number i.This is one of the important roles of the complex numbers in mathematics inthat they enable us to make connections between topics that look different:they form a mathematical hyperspace.

Exercises 5.3

1. Express cos 5x and sin 5x in terms of cosx and sin x.

2. Prove the following where x is real.2

(a) sinx = 12i

(eix − e−ix).(b) cos x = 1

2(eix + e−ix).

Hence show that cos4 x = 18[cos 4x+ 4 cos 2x+ 3].

3. Find the 4th roots of unity as radical expressions.

2Compare (a) and (b) below with sinhx = 12 (ex − e−x) and coshx = 1

2 (ex + e−x).




6. Solve x3 = −8i.

7. Find radical expresssions for the roots of x5 − 1, and so show that

cos 72◦ =

√5− 1

4and sin 72◦ =

√10 + 2

√5

4.

To do this, consider the equation

x4 + x3 + x2 + x+ 1 = 0.

Divide through by x2 to get

x2 +1

x2+ x+

1

x+ 1 = 0.

Put y = x+ 1x. Show that y satisfies the quadratic

y2 + y − 1 = 0.

You can now find all four values of x.

8. Determine all the values of ii. What do you notice?

5.4 Making sense of complex numbers

In this chapter, I have assumed that complex numbers exist and that theyobey the usual high-school rules of algebra. In this section, I shall sketch outa proof of this.

We start with the set R×R whose elements are ordered pairs (a, b) wherea and b are real numbers. It will be helpful to denote these ordered pairs bybold letters so a = (a1, a2). We define 0 = (0, 0), 1 = (1, 0) and i = (0, 1).We now define operations as follows

• If a = (a1, a2) and b = (b1, b2), define a + b = (a1 + b1, a2 + b2).

• If a = (a1, a2) define −a = (−a1,−a2).

5.5. RADICAL SOLUTIONS 151

• If a = (a1, a2) and b = (b1, b2), define

ab = (a1b1 − a2b2, a1b2 + a2b1).

• If a = (a1, a2) 6= 0 define

a−1 = (a1√a2

1 + a22

,−a2√a2

1 + a22

).

It is now a long exercise to check that all the usual axioms of high-schoolalgebra hold. Observe now that the element (a1, a2) can be written

(a1, 0)1 + (a2, 0)i

and thati2 = (0, 1)(0, 1) = (−1, 0) = −1.

The elements of the form (a, 0) can be identified with the real numbers. Thisproves that the complex numbers as I described them earlier in this chapterreally do exist.

5.5 Radical solutions

There are two great historical revolutions in the history of algebra. The firstis the discovery that there are irrational numbers — this means that we haveto learn to work with real numbers that are described by radical expressionssuch as

√2. The second is Galois’s discovery that the roots of a polynomial

need not be radical expressions of the coefficients of the polynomial — putsimply, that there is not always a formula for the roots of a polynomialequation. We begin by describing the way in which cubics and quartics canbe solved purely algebraically.

5.5.1 Cubic equations

Letf(x) = a3x

3 + a2x2 + a1x+ a0

where a3 6= 0. I shall assume all coefficients are real though the theoryworks in general. We shall find all the roots of f(x). This problem can be


simplified in two ways. First, we may divide through by a3 and so, withoutloss of generality, we may assume that f(x) is monic. That is a3 = 1. Second,by means of a substitution we may obtain a cubic in which the coefficient ofthe term in x2 is zero. Put x = y − a3

3. You should do this and check that

you get a polynomial of the form

g(y) = y3 + py + q.

We say that such a cubic is reduced. It follows that without loss of generality,we need only solve the cubic

g(x) = x3 + px+ q.

To do this needs what looks like a minor miracle. Let u and v be two complexvariables. Let ω = cos 2π

3+ i sin 2π

3, one of the complex cube roots of unity.

You should now check that the following cubic

t(x) = x3 − 3uv − (u3 + v3)

has the rootsu+ v, uω + vω2, uω2 + vω.

Now we can solvex3 + px+ q = 0

if we can find u and v such that

p = −3uv, q = −u3 − v3.

Now if we cube the first equation, we get the following two equations

−p27

= u3v3, −q = u3 + v3.

If we regard u3 and v3 as the unknowns we know their sum and we know theirproduct. This means that u3 and v3 are the roots of the quadratic equation

x2 + qx− p3

27= 0.


u3 =1

2

(−q +

√27q2 + 4p3

27

)


and

v3 =1

2

(−q −

√27q2 + 4p3

27

).

To find u we have to take a cube root of the number u3 and there are threepossible such roots. Choose one such value for u. We then choose the valueof v so that p = −3uv.

Example 5.5.1. Find the roots of x3− 9x− 2 = 0. Here p = 9 and q = −2.The quadratic equation we have to solve is therefore

x2 − 2x− 27 = 0.

This has roots 1 ± 2√

7. Put u3 = +2√

7. We may choose a real cube rootin this case to get

u =3

√1 +√

28.

We must then choose v to be

u =3

√1−√

28.

We may now write down the three roots of our original cubic.

The following cubic equation was studied by Bombelli in 1572 and hadan important influence on the development of complex numbers.

Example 5.5.2. Consider the cubic

x3 − 15x− 4 = 0.

The associated quadratic in this case is

x2 + 4x+ 125 = 0.

This gives the two solutions that Bombelli would have written in a wayequivalent to the following

x = 2±√−121.

We would write this asx = 2± 11i.


Thusu3 = 2 + 11i and v3 = 2− 11i.

There are three cube roots of 2 + 11i all complex. Let’s press on regardless.Write 3

√2 + 11i to represent one of those cube roots. Write 3

√2− 11i to

be the corresponding cube root such that their product is 5. Thus at leastsymbolically we may write

u+ v = 3√

2 + 11i+ 3√

2− 11i.

What is surprising is that for some choice of these cube roots this value mustbe real. The reason is that the graph of our cubic has one real root whichcan easily be checked to be 4. To see why, observe that

(2 + i)3 = 2 + 11i and (2− i)3 = 2− 11i.

If we choose 2 + i as one of the cube roots of 2 + 11i then we have to choose2− i as the corresponding cube root of 2− 11i. In this way, we get

4 = (2 + 11i) + (2− 11i)

as a root. It was the fact that real roots arose in this way that providedthe first inkling that there was a number system, the complex numbers,that extended the so-called real numbers, but had every much as tangibleexistence.

5.5.2 Quartic equations

Letf(x) = a4x

4 + a3x3 + a2x

2 + a1x+ a0.

As usual, we may assume that a4 = 1. By means of a suitable substitution,which is left as an exercise, we may eliminate the cubed term. We thereforeend up with a reduced quartic which it is convenient to write in the followingway

x4 = ax2 + bx+ c.

Suppose that we could write the righthand side as a perfect square (dx+ e)2.Then our quartic could be written as the product of two quadratics(

x2 − (dx+ e)) (x2 + dx+ e

).


The roots of each these two quadratics will be the four roots of our originalquartic. It is not true that we can always do this, but by means of anothermiracle we can transform the equation into one with the same roots wherewe can. Let t be a new variable whose value will be determined later. Wemay write

(x2 + t)2 = (a+ 2t)x2 + bx+ (c+ t2).

We now want to choose a value of t so that the righthand side is a perfectsquare. This happens when the discriminant of the quadratic (a + 2t)x2 +bx+ (c+ t2) is zero. That is when

b2 − 4(a+ 2t)(c+ t2) = 0.

Now this is a cubic in t. We now use the method of the previous section tofind a specific value of t say t1. We then get

(x2 + t1)2 = (a+ 2t1)

(x+

b

2(a+ 2t1)

)2

.

It follows that the roots of the original quartic are the roots of the followingtwo quadratics

(x2 + t1)−√a+ 2t1

(x+

b

2a+ 4t1

)= 0

and

(x2 + t1) +√a+ 2t1

(x+

b

2a+ 4t1

)= 0.

Example 5.5.3. Solve the quartic

x4 = 1− 4x.

We shall find a value of t below

(x2 + t)2 = t4 + 2x2t+ t2 = 2x2t− 4x+ (1 + t2)

which makes the righthand side a perfect square. This requires us to find aroot of the cubic

t3 + t− 2 = 0.


Here t = 1 works. Our quartic with t therefore becomes

(x2 + 1)2 = 2(x− 1)2.

Therefore the roots of our original quartic are the roots of the following twoquadratics

(x2 + 1)−√

2(x− 1) = 0 and x2 + 1 +√

2(x− 1) = 0.

The roots of our original quartic are therefore

1± i√√

8 + 1√2

and−1±

√√8− 1√

2.

5.5.3 Symmetries and particles

Although quadratic equations had been solved in antiquity, it was not untilthe 16th century that cubics and quartics were first solved. This great leapforward in the development of algebra was centred on a group of Italianmathematicians — Scipione del Ferro (1465–1525), Niccolo Tartaglia (1500–1557), Girolamo Cardano (1501–1576), Ludovico Ferrari (1522–1562), RafaelBombelli (1526–1572) — whose antics are worthy of an opera or Shakespearecomedy but the importance of their work cannot be overemphasized. Buttwo points arise. First, the solution of quadratics, cubics and quartics seemto rely on mathematical miracles. Second, we appear to see a pattern: tosolve cubics we need to solve an associated quadratic and to solve quarticswe need to solve an associated cubic. These two points were investigated bya number of mathematicians in great depth: in particular, Lagrange (1736–1813), Ruffini (1765–1822) and Abel (1802–1829). The expectation was highthat quintics should be solvable by using quartics in a way that continuedthe pattern. Then came the great surprize. Ruffini and Abel proved thatthe pattern does not continue and that one cannot always describe the rootsof a quintic in radical form — there are, of course, five roots — the pointis that these roots cannot in general be written down using an algebraicformula. The question is why and the answer to this question also explainsthe algebraic miracles we used above. It was discovered by Evariste Galois(1811–1832). I shall not go into the details of his biography — he was killed,for instance, in a duel — since you will find much more written about himelsewhere, some of it accurate, instead I shall focus on his mathematics.

5.6. GAUSSIAN INTEGERS AND FACTORIZING PRIMES 157

By building on the work of Lagrange, he explained the miracles above andmuch more. His approach was new: to determine whether the roots of apolynomial could be expressed in algebraic terms as radical expressions, hestudied the symmetries of the polynomial. Just what this means is explainedin a subject known as Galois theory after its founder. Crucially, this is nota mere extrapolation of existing algebraic manipulation, instead it involvesworking at a higher level of abstraction. As so often happens in mathematics,a development in one area led to developments in other areas. Sophus Lie(1811–1832) realized that symmetries could also be used to help understandthe tricks that were used to solve differential equations. It was in this waythat symmetry came to play a fundamental role in physics. If you hear aparticle physicist talking about symmetries, they are paying an unconscioustribute to Galois’ bold work in studying the nature of the roots of polynomialequations.

5.6 Gaussian integers and factorizing primes

Complex numbers may be used to factorize some primes. For example,

5 = (1− 2i)(1 + 2i).

To develop this example further, we shall need some definitions. The integersZ are a subset of the reals R. We define the Gaussian integers, denotedby Z[i], to be all complex numbers of the form m + in where m and n areintegers. What our example shows is that some primes can be factorized usingGaussian integers. The question is: which ones? Observe that 5 = 12 + 22.In other words, it can be written as a sum of two squares. Another exampleof a prime that can be written as a sum of two squares is 13. We have that

13 = 9 + 4 = 32 + 22.

This prime can also be factorizes using Gaussian integers

13 = (3 + 2i)(3− 2i).

In fact, any prime p that can be written as a sum of two squares p = a2 + b2,can also be factorized using Gaussian integers

p = (a+ ib)(a− ib).


This raises the question of exactly which primes can be written as a sum oftwo squares.

Lemma 5.6.1. Let p be an odd prime that can be written as a sum of twosquares. Then p ≡ 1 (mod 4).

Proof. Let p = a2 + b2. Since p is assumed odd, we must have that one of a2

and b2 is even and the other odd. Without loss of generality, we may assumethat a2 is odd and b2 is even. But from Chapter 2, this implies that a isodd and b is even. We may therefore write a = 2u and b = 2v + 1 for somenatural numbers u and v. But then p = 4u2 + 4v2 + 4v + 1. It follows thatp ≡ 1 (mod 4).

Lemma 5.6.2. Each odd prime p satisfies either p ≡ 1 (mod 4) or p ≡ 3(mod 4).

Proof. The possible remainder when p is divided by 4 are 0, 1, 2, 3. Since pis an odd prime both 0 and 2 are impossible and the result follows.

The lemma above tells us that each odd prime belongs to exactly one oftwo camps. The obvious question is whether both of these camps are infinite.

Proposition 5.6.3.

1. There are infinitely many primes p such that p ≡ 3 (mod 4).

2. There are infinitely many primes p such that p ≡ 1 (mod 4).

We have proved that if an odd prime p can be written as a sum of twosquares then p ≡ 1 (mod 4). The hard question is whether the converse istrue.

Theorem 5.6.4 (Euler, 1754). An odd prime p can be written as a sum oftwo squares if, and only if, p ≡ 1 (mod 4).

We may deduce from this theorem that every odd prime p ≡ 1 (mod 4)can be factorized by means of Gaussian integers.

Chapter 6

Rational functions

A (real) rational function is simply a quotient f(x)g(x)

where f(x) and g(x) are

any polynomials with real coefficients, the polynomial g(x) of course notbeing equal to the zero polynomial. If deg f(x) < deg g(x), I shall say thatthe rational function is proper. The set of all rational functions R(x) — noticeI use round brackets unlike the square brackets for the set of real polynomials— can be added, subtracted, multiplied and divided. In fact, they satisfy allthe algebraic laws of high-school algebra. Rational functions are enormouslyuseful in mathematics. The goal of this section is to show that every rationalfunction can be written as a sum of simpler rational functions. Once I haveshown how to do this, I will outline its application to integration.

6.1 Numerical partial fractions

This section is intended as motivation for the partial fraction representationof rational functions described in a later chapter, so it can be omitted at firstreading. The idea is to show how a fraction can be written as a sum of otherfractions having a particular form. Specifically, the goal of this section is toshow how a proper fraction can be written as a sum of proper fractions overprime power denominators. This involves two steps which I shall describe bymeans of examples. The theory is an application of the fundamental theoremor arithmetic and the extended Euclidean algorithm.

In order to add two fractions together, we first have to ensure that bothare expressed over the same denominator. For example, suppose we want to

159

160 CHAPTER 6. RATIONAL FUNCTIONS

add 57

and 813

. Since 7× 13 = 91 we have the following

5

7+

8

13=

65 + 56

91=

121

91.

We shall now consider the reverse process, using the fraction 8101003

as anexample. Observe that 1003 = 17 × 59 where 17 and 59 are coprime. Ourgoal is to write

810

1003=

a

17+

b

59

for some natural numbers a and b. By the extended Euclidean algorithm, wecan write

1 = 7 · 17− 2 · 59.

It follows that1

1003=

7 · 17− 2 · 59

17 · 59=

7

59− 2

17.

Now multiply both sides by 810 to get

810

1003=

7 · 810

59− 2 · 810

17= 96

6

59− 95

5

17= 1 +

6

59− 5

17.

Simplifying we get810

1003=

6

59+

12

17

as required.We shall now do something different. Consider the fraction 10

16. We have

that 16 = 24 and so we cannot write it as a product of coprime numbers.However, we can do something else. We can write 10 = 2+8 = 21 +23. Thus

10

16=

21 + 23

24=

21

24+

23

24=

1

23+

1

2

and so10

16=

1

21+

1

23.

Let’s now combine these two steps. Consider the fraction 4190

. The primefactorisation of 90 is 2 · 32 · 5. Our first goal is to write

41

90=a

2+

b

32+c

5.

6.1. NUMERICAL PARTIAL FRACTIONS 161

Thus we have to find a, b, c such that

41 = 45a+ 10b+ 18c.

By trial and error, remembering that a, b, c have to be integers, we find that

41 = 45 · 1 + 10 · 5 + (−3) · 18.

It follows that41

90=

1

2+

5

32− 3

5.

We now want to write5

32=d

3+

e

32

where |d| , |e| < 3. But 5 = 2 + 3 and so

5

32=

1

3+

2

32.

It follows that41

90=

1

2+

1

3+

2

9− 3

5.

We may summarise what we have found in the following theorem.

Theorem 6.1.1.

(i) Let ab

be a proper fraction, and let b = pn11 . . . pnr

r be the prime factorisationof b. Then

a

b=

r∑i=1

cipnii

for some integers ci, where each of the fractions is proper.

(ii) Now let p be a prime and cpn

a proper fraction. Then

c

pn=

n∑j=1

djpj

where each dj is such that |dj| < p.


6.2 Analogies

There are parallels between the properties of the natural numbers and theproperties of real polynomials. We have seen that there are remainder theo-rems for both natural numbers and polynomials. In the case of the naturalnumbers, we used the remainder theorem to develop Euclid’s algorithm andthe Extended Euclidean algorithm for computing greatest common divisors.We can do the same thing for polynomials. We define the greatest commondivisor of two real polynomials a(x) and b(x) to be a real polynomial oflargest degree dividing both a(x) and b(x). Any two such gcd’s will be realnumber multiples of each other. We say that a(x) and b(x) are coprime iftheir greatest common divisor is a constant polynomial. Euclid’s algorithmand the Extended Euclidean algorithm can both be proved for real polyno-mials. As a consequence, if a(x) and b(x) are coprime real polynomials, thenwe can find real polynomials c(x) and d(x) such that

1 = a(x)c(x) + b(x)d(x).

If f(x) is any real polynomial, then we can multiply both sides of the aboveequation by f(x) to get

f(x) = a(x)[f(x)c(x)] + b(x)[f(x)d(x)].

Thus f(x) can be written in terms of a(x) and b(x) in a very simple way.There is a simple refinement of this result I shall use below. If deg f(x) <deg a(x) + deg b(x) then using the remainder theorem, we can in fact write

f(x) = B(x)a(x) + A(x)b(x)

where degB(x) < deg b(x) and degA(x) < deg a(x).Every natural number can be written as a product of primes, where a

prime is a number which cannot be factorised in a non-trivial way. Theanalogue of a prime number for real polynomials is the notion of an irreduciblepolynomial. The real polynomial f(x) is said to be irreducible if it cannotbe factorised into real polynomials each having smaller degree than f(x).Unlike the case of prime numbers, we can characterise the real irreduciblepolynomials very easily. It is a consequence of the fundamental theorem forreal polynomials that there are only two kinds of irreducible real polynomials:linear polynomials c(x−a) and irreducible quadratic polynomials c(x2 +ax+b), that is, quadratics having only non-real roots.

6.3. PARTIAL FRACTIONS 163

We now have the following analogue of the fundamental theorem of arith-metic for real polynomials: every real polynomial of degree at least 1 can bewritten as a product of a real number and powers of distinct monic polyno-mials or distinct monic irreducible quadratic polynomials in essentially oneway.

6.3 Partial fractions

Let f(x)g(x)

be a rational function. If deg f(x) > deg g(x) then we may applythe Remainder Theorem and write

f(x)

g(x)= q(x) +

r(x)

g(x)

where deg r(x) < deg g(x). Thus without loss of generality, we may assumethat deg f(x) < deg g(x) in what follows. I shall also assume that g(x) ismonic; if it isn’t there will simply be a constant factor at the front.

By the fundamental theorem for real polynomials, we may write g(x) asa product of distinct factors of the form (x − a)r or (x2 + ax + b)s. Using

this decomposition of g(x), the rational function f(x)g(x)

can now be written asa sum of simpler rational functions which have the following forms:

• For each factor of g(x) of the form (x− a)r, we will have a sum of theform

A1

x− a+ . . .+

Ar−1

(x− a)r−1+

Ar(x− a)r

.

• For each factor of g(x) of the form (x2 + ax + b)s, we will have a sumof the form

A1x+B1

x2 + ax+ b+ . . .+

As−1x+Bs−1

(x2 + ax+ b)s−1+

Asx+Bs

(x2 + ax+ b)s.

This is called the partial fraction decomposition of f(x)g(x)

. The practicalmethod for finding such decompositions is best illustrated by means of someexamples.

Examples 6.3.1.


1. Write 5x2+x−6

in partial fractions. We have that x2 +x−6 = (x+3)(x−2), a product of two distinct linear factors. We expect a solution of theform

5

x2 + x− 6=

A

x+ 3+

B

x− 2

where A and B are real numbers to be determined. The RHS is just

A(x− 2) +B(x+ 3)

(x+ 3)(x− 2).

Comparing the LHS with the RHS we get that

5 = A(x− 2) +B(x+ 3)

which must hold for all values of x. Putting x = 2 we get B = 1 andputting x = −3 we get A = 1. Thus

5

x2 + x− 6=−1

x+ 3+

1

x− 2.

At this point, we check our solution.

2. Write 9(x−1)(x+2)2

in partial fractions. Here we have a single linear factorand a square of a linear factor. We therefore expect an answer in theform

9

(x− 1)(x+ 2)2=

A

x− 1+

B

x+ 2+

C

(x+ 2)2.

Carrying out the sum on the RHS, and comparing the LHS with theRHS we get that

9 = A(x+ 2)2 +B(x− 1)(x+ 2) + C(x− 1).

Putting x = 1 we get that A = 1, putting x = −2, we get that C = −3and putting x = −1 and using the values we have for A and C we getthat B = −1. Thus

9

(x− 1)(x+ 2)2=

1

x− 1− 1

x+ 2− 3

(x+ 2)2.

6.3. PARTIAL FRACTIONS 165

3. Write 16xx4−16

in partial fractions. We have that x4 − 16 = (x − 2)(x +2)(x2 + 4), a product of two distinct linear factors and a quadraticfactor. We expect a solution of the form

16x

x4 − 16=

A

x− 2+

B

x+ 2+Cx+D

x2 + 4.

This leads to

16x = A(x+ 2)(x2 + 4) +B(x− 2)(x2 + 4) + (Cx+D)(x− 2)(x+ 2).

Using appropriate values of x we get that A = 1, B = 1, C = 2 andD = 0. Thus

16x

x4 − 16=

1

x− 2+−1

x+ 2+

2x

x2 + 4.

4. Write 3x2+2x+1(x+2)(x2+x+1)2

in partial fractions. We expect a solution in theform

3x2 + 2x+ 1

(x+ 2)(x2 + x+ 1)2=

A

x+ 2+

Bx+ C

x2 + x+ 1+

Dx+ E

(x2 + x+ 1)2.

This leads to

3x2+2x+1 = A(x2+x+1)2+(Bx+C)(x+2)(x2+x+1)+(Dx+E)(x+2).

Putting x = −2 yields A = 1. There are four unknowns left and so weneed four equations. However, to avoid having to solve four equationsin four unknowns we can vary our procedure. Putting x = 0 gives0 = C + E. Putting x = 1 gives −1 = B + C + D + E. Thus−1 = B + D. On the RHS the highest power of x occurring is x2.On the LHS the highest power of x occurring apears to be x4 but thatimmediately implies that its coefficient must be zero. The coefficient ofx4 is 1 +B and so B = −1 which means that D = 0. Put x = 2. Thisgives 6 = 7C + E. This quickly leads to E = −1 and C = 1. Thus

3x2 + 2x+ 1

(x+ 2)(x2 + x+ 1)2=

1

x− 2+

1− xx2 + x+ 1

+−1

(x2 + x+ 1)2.


Let me conclude this section by sketching out why the partial fractiondecomposition of real rational functions is possible.

Consider the proper rational function

f(x)

a(x)b(x)

where a(x) and b(x) are coprime. Then we indicated above that we maywrite

f(x)

a(x)b(x)=A(x)

a(x)+B(x)

b(x)

where the rational functions are all proper. This may be generalised asfollows. Let f(x)

g(x)be a proper rational function. Let g(x) = a1 . . . am(x) be a

product of pairwise coprime polynomials. Then we may write

f(x)

g(x)=

m∑i=1

Ai(x)

ai(x),

where the rational functions are all proper.We shall now assume that the ai(x) are either powers of linear factors or

of quadratic factors and that these factors are distinct for different i.Consider the proper rational function h(x)

(x−a)rwhere r ≥ 1. Then we may

writeh(x) = a0 + a1(x− 1) + . . .+ ar−1(x− a)r−1

for some real numbers a0, . . . , ar−1 in a way analogous to writing a naturalnumber in a number base. Thus

h(x)

(x− a)r=

ar−1

x− a+ . . .+

a1

(x− a)r−1+

a0

(x− a)r.

Consider the proper rational function h(x)(x2+ax+b)r

where r ≥ 1. Then wemay similarly write

h(x) = (a0x+b0)+(a1x+b1)(x2 +ax+b)+ . . .+(ar−1x+br−1)(x2 +ax+b)r−1

for some real numbers a0, . . . , ar−1 and b0, . . . , br−1 in a way analogous towriting a natural number in a number base. Thus

h(x)

(x2 + ax+ b)r=ar−1x+ br−1

x2 + ax+ b+ . . .+

a1x+ b1

(x2 + ax+ b)r−1+

a0x+ b0

(x2 + ax+ b)r.

The existence of partial fraction decompositions of real rational functionsnow follows.

6.4. INTEGRATING RATIONAL FUNCTIONS 167

6.4 Integrating rational functions

In order to appreciate the significance of partial fractions it is essential tounderstand how they are used. The goal of this section is therefore to showyou how to calculate ∫

f(x)

g(x)dx

exactly, when f(x) and g(x) are real polynomials.We need to know one key property of integration: namely, if ai are real

numbers then ∫ n∑i=1

aifi(x)dx =n∑i=1

ai

∫fi(x)dx

This property is known as linearity. I shall break my discussion up into anumber of steps.

Step 1. Suppose that in f(x)g(x)

we have that deg f(x) > deg g(x). By theRemainder Theorem for polynomials we can write

f(x)

g(x)= q(x) +

r(x)

g(x)

where deg r(x) < deg g(x). By the linearity of integration, we have that∫f(x)

g(x)dx =

∫q(x)dx+

∫r(x)

g(x)dx.

In other words, to integrate an arbitrary rational function it is enough toknow how to integrate polynomials and proper rational functions.

Step 2. By linearity of integration, integrating arbitrary polynomials canbe reduced to integrating the following∫

xndx

where n ≥ 0.

Step 3. Let f(x)g(x)

be a proper rational function, so that deg f(x) < deg g(x).

We may factorise g(x) into a product of real linear polynomials and real


irreducible quadratic polynomials and then write f(x)g(x)

as a sum of rationalfunctions of one of the following two forms

a

(x− d)r,

where a and d are real and r ≥ 1, and

px+ q

(x2 + bx+ c)s

where p, q, b, c are real and s ≥ 1 and the quadratic has a pair of complexconjugate roots. By the linearity of integration, this reduces calculating∫

f(x)

g(x)dx

to calculating integrals of the following two forms∫a

(x− d)rdx

and ∫px+ q

(x2 + bx+ c)sdx.

Again by linearity of integration, this reduces to being able to calculate thefollowing three integrals∫

1

(x− d)rdx

∫x

(x2 + bx+ c)sdx

∫1

(x2 + bx+ c)sdx.

Step 4. We now concentrate on the two integrals involving quadratics. Bycompleting the square, we can write

x2 + bx+ c = (x+b

2)2 + (c− b2

4).

By assumption b2 − 4ac < 0 (why?). Put e2 = 4c−b24

(which makes sense).Thus

x2 + bx+ c = (x+b

2)2 + e2.

6.4. INTEGRATING RATIONAL FUNCTIONS 169

I shall now use a technique of calculus known as substitution and put y =x + b

2. Doing this, and returning to x as my variable, we need to be able to

integrate the following three integrals∫1

(x− d)rdx

∫x

(x2 + e2)sdx

∫1

(x2 + e2)sdx.

Step 5. The second integral above can be converted into the first by meansof the substitution x2 = u.

We have therefore proved the following.

Theorem 6.4.1. The integration of an arbitrary rational function can bereduced to integrals of the following three kinds:

1.∫xndx.

2.∫

1(x−d)r

dx.

3.∫

1(x2+e2)s

dx.

Chapter 7

Matrices I: linear equations

The term matrix was introduced by James Joseph Sylvester (1814–1897)in 1850, and the first paper on matrix algebra was published by ArthurCayley (1821–1895) in 18581. Matrices were introduced initially as packagingfor systems of linear equations, but then came to be investigated in theirown right. The main goal of this chapter is to introduce the basics of thearithmetic and algebra of matrices. This chapter and the two that followform the first steps in the subject known as linear algebra. It is hard tooveremphasize the importance of this subject throughout mathematics andits applications

7.1 Matrix arithmetic

In this section, I shall introduce matrices and three arithmetic operationsdefined on them. I shall also define an operation called the ‘transpose of amatrix’ that will be important in later work. This section forms the founda-tion for all that follows.

7.1.1 Basic matrix definitions

A matrix2 is a rectangular array of numbers. In this course, the numbers willusually be real numbers but, on occasion, I shall also use complex numbers

1A memoir on matrices, Philosphical Transactions of the Royal Society of London 148(1858), 17–37. This is well-worth reading.

2Plural: matrices.

171

172 CHAPTER 7. MATRICES I: LINEAR EQUATIONS

for variety.

Example 7.1.1. The following are all matrices:

(1 2 34 5 6

),

(41

),

1 1 −10 2 41 1 3

,(

6).

Usually the array of numbers that comprises a matrix is enclosed in roundbrackets. Occasionally books use square brackets with the same meaning.Later on, I shall introduce determinants and these are indicated by usingstraight brackets. In general, the kind of brackets you use is important andis not just a matter of taste.

We usually denote matrices by capital Roman letters: A, B, C, etc. Thesize of a matrix is m × n if it has m rows and n columns. The entries in amatrix are often called the elements of the matrix and are usually denotedby lower case Roman letters. If A is an m × n matrix, and 1 ≤ i ≤ m and1 ≤ j ≤ n, then the entry in the ith row and jth column of A is often denoted(A)ij. Thus ()ij means ‘the element in ith row and jth column’.

Examples 7.1.2.

1. Let

A =

(1 2 34 5 6

)Then A is a 2×3 matrix. We have that (A)11 = 1, (A)12 = 2, (A)13 = 3,(A)21 = 4, (A)22 = 5, (A)23 = 6.

2. Let

B =

(41

)Then B is a 2× 1 matrix. We have that (B)11 = 4, (B)21 = 1.

3. Let

C =

1 1 −10 2 41 1 3

Then C is a 3×3 matrix. (C)11 = 1, (C)12 = 1, (C)13 = −1, (C)21 = 0,(C)22 = 2, (C)23 = 4, (C)31 = 1, (C)32 = 1, (C)33 = 3.

7.1. MATRIX ARITHMETIC 173

4. LetD =

(6)

Then D is a 1× 1 matrix. We have that (D)11 = 6.

Matrices A and B are said to be equal, written A = B, if they have thesame size and corresponding entries are equal: that is, (A)ij = (B)ij for allallowable i and j.

Example 7.1.3. Given that(a 2 b4 5 c

)=

(3 x −2y z 0

)Find a, b, c, x, y, z. This example simply illustrates what it means for twomatrices to be equal. By definition a = 3, 2 = x, b = −2, 4 = y, 5 = z andc = 0.

When we want to talk about an arbitrary matrix A we usually denoteits elements by aij where i tells you the row the element lives in and j thecolumn. For example, a typical 2× 3 matrix A would be written

A =

(a11 a12 a13

a21 a22 a23

)7.1.2 Addition, subtraction, scalar multiplication and

the transpose

We define first the operations that cause us no trouble.

Addition Let A and B be two matrices of the same size. Then their sumA+B is the matrix defined by

(A+B)ij = (A)ij + (B)ij.

That is, corresponding entries of A and B are added. If A and B arenot the same size then their sum is not defined.

Subtraction Let A and B be two matrices of the same size. Then theirdifference A−B is the matrix defined by

(A−B)ij = (A)ij − (B)ij.

That is, corresponding entries of A and B are subtracted. If A and Bare not the same size then their difference is not defined.


Scalar multiplication In matrix theory, numbers are often called scalars.For us scalars will usually be either real or complex. Let A be anymatrix and λ any scalar. Then the matrix λA is defined as follows:

(λA)ij = λ(A)ij.

In other words, every element of A is multiplied by λ.

Transpose of a matrix Let A be an m × n matrix. Then the transposeof A, denoted AT , is the n×m matrix defined by (AT )ij = (A)ji. Wetherefore interchange rows and columns: the first row of A becomes thefirst column of AT , the second row of A becomes the second column ofAT , and so on.

Examples 7.1.4.

1. (1 2 −13 −4 6

)+

(2 1 3−5 2 1

)=

(1 + 2 2 + 1 −1 + 3

3 + (−5) −4 + 2 6 + 1

)which gives (

3 3 2−2 −2 7

)2. (

1 2 −13 −4 6

)−(

2 1 3−5 2 1

)=

(1− 2 2− 1 −1− 3

3− (−5) −4− 2 6− 1

)which gives (

−1 1 −48 −6 5

)3. (

1 12 1

)−(

3 3 2−2 −2 7

)is not defined since the matrices have different sizes.

4.

2

(3 3 2−2 −2 7

)=

(6 6 4−4 −4 14

)


5. The transposes of the following matrices

(1 2 34 5 6

) (41

) 1 1 −10 2 41 1 3

(6)

are, respectively, 1 42 53 6

(4 1

) 1 0 11 2 1−1 4 3

(6).

7.1.3 Matrix multiplication

This is more complicated than the other operations and, like them, is notalways defined. To define this operation it is useful to work with two specialclasses of matrix. A row matrix or row vector is a matrix with one row (butany number of columns). A column matrix or column vector is a matrix withone column but any number of rows. Row and column matrices are oftendenoted by bold lower case Roman letters a,b, c . . .. The ith element of therow or column matrix a will be denoted by ai.

Examples 7.1.5. The matrix (1 2 3 4

)is a row matrix whilst

1234

is a column matrix.

I shall build up to the definition of matrix multiplication in three stages.

Stage 1. Let a be a row matrix and b a column matrix, where

a = (a1 a2 . . . am)


and

b =

b1

b2

.

.

.bn

Then their product ab is defined if, and only if, the number of columns ofa is equal to the number of rows of b, that is m = n, in which case theirproduct is the 1× 1 matrix

ab = (a1b1 + a2b2 + . . .+ anbn).

The numbera1b1 + a2b2 + . . .+ anbn

is called the inner product of a and b and is denoted by a · b. Using thisnotation we have that

ab = (a · b).

Example 7.1.6. This odd way of multiplying is actually quite natural.Here’s an example of where it arises in real life. If you buy y items whoseunit cost is x then you spend xy. This can be generalized as follows whenyou buy a number of different kinds of items at different prices. Let a be therow matrix (

0 · 6 1 0 · 2)

where 0 · 6 is the price of a bottle of milk, 1 is the price of a loaf of bread,and 0 · 2 is the price of an egg. Let b be the column matrix 2

310

where 2 is the number of bottles of milk bought, 3 is the number of loavesof bread bought, and 10 is the number of eggs bought. Thus a is the pricerow matrix and b is the quantity column matrix. The total amount spent istherefore

0 · 6× 2 + 1× 3 + 0 · 2× 10 :

namely, the sum over all the commodities bought of the price of each com-modity times the number of items of that commodity purchased. This num-ber is precisely the inner product a · b: namely, 6 · 20.


Stage 2. Let a be a row matrix as above and let B be a matrix. Thus a isa 1 ×m matrix and B is a p × q matrix. Then their product aB is definedif, and only if, the number of columns of a is equal to the number of rowsof B. Thus m = p. To calculate the product think of B as consisting of qcolumn matrices b1, . . . ,bq. We calculate the q numbers a · b1, . . . , a · bq asin stage 1, and the q numbers that result become the entries of aB. ThusaB is a 1× q matrix whose jth entry is the number a · bj.

Example 7.1.7. Let a be the cost matrix of our previous example. Let B bethe 3× 5 matrix whose columns tell me the quantity of commodities boughton each of the days of the week Monday to Friday:

B =

2 0 2 0 43 0 4 0 8

10 0 10 0 20

Thus on Tuesday and Thursday no purchases were made, whilst on Fridayextra commodities were bought in preparation for the weekend. The matrixaB is a 1× 5 matrix which tells us how much was spent on each day of theweek. Thus

aB =(

0 · 6 1 0 · 2) 2 0 2 0 4

3 0 4 0 810 0 10 0 20

which is equal to (

6 · 2 0 7 · 2 0 14 · 4)

Stage 3. Let A be an m × n matrix and let B be a p × q matrix. Theirproduct AB is defined if, and only if, the number of columns of A is equalto the number of rows of B: that is n = p. If this is so then AB is an m× qmatrix. To define this product we think of A as consisting of m row matricesa1, . . . , am and we think of B as consisting of q column matrices b1, . . . ,bq.As in Stage 2 above, we multiply the first row of A into each of the columnsof B and this gives us the first row of A; we then multiply the second row ofA into each of the columns of B to get the second row of B, and so on.

Example 7.1.8. Let B be the 3× 5 matrix of the previous example whosecolumns tell me the quantity of commodities bought on each of the days


Monday to Friday

B =

2 0 2 0 43 0 4 0 8

10 0 10 0 20

Let A be the 2×3 matrix whose first row tells me the cost of the commoditiesin shop 1 and whose second row tells me the cost of the commodities in shop 2.

A =

(0 · 6 1 0 · 2

0 · 65 1 · 05 0 · 30

)The first row of AB tells me how much was spent on each day of the weekin shop 1, and the second row of AB tells me how much was spent on eachday of the week in shop 2. Thus

AB =

(0 · 6 1 0 · 2

0 · 65 1 · 05 0 · 30

) 2 0 2 0 43 0 4 0 8

10 0 10 0 20

which is equal to (

6 · 2 0 7 · 2 0 14 · 47 · 45 0 8 · 5 0 17

)Examples 7.1.9.

1.

(1 −1 0 2 1

)

231−1

3

=(

0)

2. The product (1 −1 23 0 1

)(0 −2 32 1 −1

)doesn’t exist because the number of columns of the first matrix is notequal to the number of rows of the second matrix.

3. The product (1 2 42 6 0

) 4 1 4 30 −1 3 12 7 5 2


exists because the first matrix is a 2×3 and the second is a 3×4. Thusthe product will be a 2× 4 matrix and is(

12 27 30 138 −4 26 12

)Summary of matrix multiplication

• Let A be an m × n matrix and B a p × q matrix. The product ABis defined if, and only if, n = p and the result will then be an m × qmatrix. In other words:

(m× n)(n× q) = (m× q).

• (AB)ij is the inner product of the ith row of A and the jth column ofB.

• It follows that the inner product of the ith row of A and each of thecolumns of B in turn yields each of the elements of the ith row of ABin turn.

If ai are row matrices and bj are column matrices then the product oftwo matrices can be written as follows

a1

.

.

.am

( b1 . . .bn)

=

a1 · b1 . . . a1 · bn. . . . .. . . . .. . . . .

am · b1 . . . am · bn

7.1.4 Special matrices

Matrices come in all shapes and sizes, but some of these are important enoughto warrant their own terminology. A matrix all of whose elements are zero iscalled a zero matrix. The m× n zero matrix is denoted Om,n or just O andwe let the context determine the size of O. A square matrix is one in whichthe number of rows is equal to the number of columns. In a square matrix Athe elements (A)11, (A)22, . . . , (A)nn are called the diagonal elements. All theother elements of A are called the off-diagonal elements. A diagonal matrix is


a square matrix in which all off-diagonal elements are zero. A scalar matrixis a diagonal matrix in which the diagonal elements are all the same. Then× n identity matrix is the scalar matrix in which all the diagonal elementsare the number one. This is denoted by In or just I where we allow thecontext to determine the size of I. Thus scalar matrices are those of theform λI where λ is any scalar. A matrix is real if all its elements are realnumbers, and complex if all its elements are complex numbers. A matrix Ais said to be symmetric if AT = A. In particular, symmetric matrices arealways square.

Examples 7.1.10.

1. The matrix 1 0 00 2 00 0 3

is a 3× 3 diagonal matrix.

2. The matrix 1 0 0 00 1 0 00 0 1 00 0 0 1

is the 4× 4 identity matrix.

3. The matrix 42 0 0 0 00 42 0 0 00 0 42 0 00 0 0 42 00 0 0 0 42

is a 5× 5 scalar matrix.

4. The matrix 0 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 0


is a 6× 5 zero matrix.

5. The matrix 1 2 32 4 53 5 6

is a 3× 3 symmetric matrix.

7.1.5 Linear equations

Matrices are extremely useful in helping us to solve systems of linear equa-tions. For the time being, I shall simply show you how matrices provide aconvenient notation for writing down such equations.

A system of m linear equations in n unknowns is a list of equations of thefollowing form

a11x1 + a12x2 + . . .+ a1nxn = b1

a21x1 + a22x2 + . . .+ a2nxn = b2

· · ·am1x1 + am2x2 + . . .+ amnxn = bm

If we have only a few unknowns then we often use w, x, y, z rather thanx1, x2, x3, x4. A solution is a set of values of x1, . . . , xn that satisfy all theequations. The set of all solutions is called the solution set or general solution.The equations above can be conveniently represented using matrices. Let Abe the m×n matrix (A)ij = aij, let b be the m× 1 matrix (b)i = bi, and letx be the n× 1 matrix (x)j = xj. Then the system of linear equations abovecan be written in the form

Ax = b.

The matrix A is called the coefficient matrix. At the moment, we are justusing matrices as packaging for the equations.

Example 7.1.11. The following system of linear equations

2x+ 3y = 1

x+ y = 2


may be written in matrix form as follows.(2 31 1

)(xy

)=

(12

)

7.1.6 Conics and quadrics

We have dealt with polynomial equations in one unknown and, in this chap-ter, we shall deal with linear equations in several unknowns. But what aboutequations in several unknowns where both products and powers of the un-knowns can occur? The simplest class of such equations are the conics. Theseare equations of the form

ax2 + bxy + cy2 + dx+ ey + f = 0

where a, b, c, d, e, f are numbers of some kind. These are equations in twovariables and variables either appear to degree zero, which is the constantterm, directly as linear terms or as binary products such as xy or x2. Ingeneral, the roots or zeroes of such equations form curves in the plane suchas circles, ellipses and hyperbolas. The term conic arises from the way thatthey were first defined by the Greeks as those curves that arise when you cuta double cone by means of a plane. These curves are important in astronomysince it can be proved that the orbits of satellites, planets, space-craft etcalways follow conics. The reason for introducing them here is that they canbe represented as matrix equations as follows.

xTAx + JTx + (h) = (0)

where

A =

(a 1

2b

12b c

)x =

(xy

)J =

(fg

)This is not just a notational convenience. The fact that the matrix A issymmetric means that powerful ideas from matrix theory, to be developedlater in this book, can be brought to bear on studying such conics. If wereplace the x above by the matrix

x =

xyz


and A by a 3 × 3 symmetric matrix and J by a 3 × 1 matrix then we getthe matrix equation of a quadric surface. Examples of such surfaces are thesurface of a sphere or the surface described by a cooling tower. But eventhough we are dealing with three rather than two dimensions, the matrixalgebra we shall develop applies just as well.

7.1.7 Graphs

The word ‘graph’ is used in two, completely different, ways in mathematics:to mean the graph of a function, and to mean a certain kind of diagram. Itis in the second sense that we shall use it here. A graph consists of a set ofvertices and a collection of edges, where this collection may contain repeats.By an edge we mean either a set of two vertices or a single vertex. Graphsare represented by means of diagrams: the vertices are represented by circlesand the edges by means of lines joining the edges. An edge joining a vertexto itself is called a loop. An example of a graph is given below.

1 2

34

5

This information can be represented by means of a 5×5 symmetric matrix Ggiven below. The entry (G)ij is just the number of edges connecting verticesi and j.

G =

0 1 0 1 01 0 1 0 00 1 0 1 11 0 1 0 10 0 1 1 0

Exercises 7.1

1. Let A =

1 21 0−1 1

and B =

1 4−1 1

0 3

. Find A+B, A−B and

−3B.


2. Let A =

0 4 2−1 1 3

2 0 2

and B =

1 −3 52 0 −43 2 0

. Find the matrices

AB and BA.

3. Let A =

(3 10 −1

), B =

0 1−1 1

3 1

and C =

(1 0 3−1 1 1

).

Calculate BA, AA and CB. Can any other pairs of these matrices bemultiplied ? Multiply those which can.

4. Calculate 1234

( 1 2 3)

5. If A =

2 1−1 0

2 3

, B =

(3 0−2 1

)and C =

(−1 2 3

4 0 1

). Calcu-

late both (AB)C and A(BC) and check that you get the same answer.

6. Calculate 2 −1 21 2 −43 −1 1

xyz

7. Calculate (

2 + i 1 + 2ii 3 + i

)(2i 2 + i

1 + i 1 + 2i

)where i is the complex number i.

8. Calculate a 0 00 b 00 0 c

d 0 00 e 00 0 f

9. Calculate


(a) 1 0 00 1 00 0 1

a b cd e fg h i

(b) 0 1 0

1 0 00 0 1

a b cd e fg h i

(c) a b c

d e fg h i

0 1 01 0 00 0 1

10. Find the transposes of each of the following matrices

A =

1 21 0−1 1

, B =

1 −3 52 0 −43 2 0

, C =

1234

11. This question deals with the following 4 matrices with complex entries

and their negatives: I,X, Y, Z where

I =

(1 00 1

)X =

(0 1−1 0

)Y =

(i 00 −i

)and Z =

(0 −i−i 0

)Show that the product of any two such matrices is again a matrix ofthis type by completing the following table for multiplication where theentry in row A and column B is AB in that order.

I X Y Z −I −X −Y −ZIXYZ−I−X−Y−Z


7.2 Matrix algebra

In this section, we shall look at algebra where the variables are matrices.This algebra is similar to high-school algebra but also differs significantly inone or two places. For example, if A and B are matrices it is not true ingeneral that AB = BA even if both products are defined. We will learn inthis section which rules of high-school algebra apply to matrices and thosewhich don’t.

7.2.1 Properties of matrix addition

In Chapter 3, I introduced the idea of a binary operation. Matrix additionand multiplication both have two inputs just as addition and multiplicationof real numbers, but there is an added complication that not all pairs ofmatrices can be added and not all pairs of matrices can be multiplied. Despitethis difference, I shall nevertheless use the same terminology I introduced inChapter 3 but in this slightly different setting.

(MA1) (A + B) + C = A + (B + C). This is the associative law for matrixaddition.

(MA2) A+ O = A = O + A. The zero matrix O, the same size as A, is theadditive identity for matrices the same size as A.

(MA3) A + (−A) = O = (−A) + A. The matrix −A is the unique additiveinverse of A.

(MA4) A+B = B + A. Matrix addition is commutative.

Thus matrix addition has the same properties as the addition of real num-bers, apart from the fact that the sum of two matrices is only defined whenthey have the same size. The role of zero is played by the zero matrix O ofthe appropriate size.

Example 7.2.1. Calculate

2A− 3B + 6I

where

A =

(1 23 4

)and B =

(0 12 1

)

7.2. MATRIX ALGEBRA 187

Because we are dealing with matrix addition and scalar multiplication therules we apply are the same as those in high-school algebra. We get(

8 10 11

)

7.2.2 Properties of matrix multiplication

(MM1) (AB)C = A(BC). This is the associative law for matrix multiplica-tion.

(MM2) Let A be an m× n matrix. Then ImA = A = AIn. The matrices Imand In are the left and right multiplicative identities, respectively.

(MM3) A(B + C) = AB + AC and (B + C)A = BA + CA. These are theleft and right distributivity laws for matrix multiplication over matrixaddition.

Thus matrix multiplication has the same properties as the multiplicationof real numbers, apart from the fact that the product is not always defined,except the following three major differences.

1. Matrix multiplication is not commutative.Consider the matrices

A =

(1 23 4

)and B =

(1 1−1 1

)Then AB 6= BA. One consequence of the fact that matrix multiplication isnot commutative is that

(A+B)2 6= A2 + 2AB +B2,

in general (see below).

2. The product of two matrices can be a zero matrix without eithermatrix being a zero matrix.Consider the matrices

A =

(1 22 4

)and B =

(−2 −6

1 3

)


Then AB = O.

3. Cancellation of matrices is not allowed.Consider the matrices

A =

(0 20 1

)and B =

(2 31 4

)and C =

(−1 1

1 4

)Then A 6= O and AB = AC but B 6= C.

Just how different matrix algebra is from high-school algebra is shown bythe following example.

Example 7.2.2. Suppose that X2 = I. Then X2 − I = O and so we mayfactorize to get (X − I)(X + I) = O. But we cannot conclude from thisthat X = I or X = −I because we cannot conclude from the fact that theproduct of two matrices is a zero matrix then one of the matrices must itselfby a zero matrix. We have seen that this is false. We therefore cannot deducethat the identity matrix has two square roots. In fact, it has infinitely manyas we now show. Let

A =

(a bc −a

)and suppose that a2 + bc = 1. Check that A2 = I. Examples of matricessatisfying these conditions are( √

1 + n2 −nn −

√1 + n2

)where n is any positive integer. Thus the 2× 2 identity matrix has infinitelymany square roots.

7.2.3 Properties of scalar multiplication

(S1) 1A = A.

(S2) λ(A+B) = λA+ λB

(S3) (λµ)A = λ(µA).

(S4) (λ+ µ)A = λA+ µA.

(S5) (λA)B = A(λB) = λ(AB).


7.2.4 Properties of the transpose

(T1) (AT )T = A.

(T2) (A+B)T = AT +BT .

(T3) (αA)T = αAT .

(T4) (AB)T = BTAT .

It is important to observe that the transpose of a product reverses theorder of the matrices.

There are some important consequences of the above properties:

• Because matrix addition is associative we can write sums without brack-ets.

• Because matrix multiplication is associative we can write matrix prod-ucts without brackets.

• The left and right distributivity laws can be extended to arbitrary finitesums.

7.2.5 Some proofs

In this section, I shall prove that the algebraic properties of matrices statedreally do hold. I shan’t prove all of them: just a representative sample. I shallleave you the pleasure of proving the rest. It is important to observe that allthe properties of matrix algebra are ultimately proved using the propertiesof real numbers.

Let A be an m× n matrix whose entry in the ith row and jth column isaij. Let B be an n× p matrix whose entry in the jth row and kth column isbjk. By definition (AB)ik is the number equal to the product of the ith rowof A times the kth column of B. This is just

(AB)ik =n∑j=1

aijbjk.

Theorem 7.2.3.

1. (A+B) + C = A+ (B + C).


2. A(BC) = (AB)C.

3. (λ+ µ)A = λA+ µA.

Proof. (1) To show that (A + B) + C = A + (B + C) we have to prove twothings. First, the size of (A+B) +C is the same as the size of A+ (B+C).Second, elements of (A+B) +C and A+ (B+C) in corresponding positionsare equal. To add A and B they have to be the same size and the resultwill be the same size as both of them. Thus C is the same size as A and B.It’s clear that both sides of the equation really are the same size. We nowcompare corresponding elements:

((A+B) + C)ij = (A+B)ij + (C)ij = ((A)ij + (B)ij) + (C)ij.

But now we use the associativity of addition of real numbers to get

((A)ij+(B)ij)+(C)ij = (A)ij+((B)ij+(C)ij) = (A)ij+(B+C)ij = (A+(B+C))ij,

as required.(2) Let A be an m× n matrix with entries aij, let B be an n× p matrix

with entries bjk, and let C be a p × q matrix with entries ckl. It’s evidentthat A(BC) and (AB)C have the same size, so it remains to show thatcorresponding elements are the same. We shall prove that

(A(BC))il = ((AB)C)il.

By definition

(A(BC))il =n∑t=1

ait(BC)tl,

and

(BC)tl =

p∑s=1

btscsl.

Thus

(A(BC))il =n∑t=1

ait

(p∑s=1

btscsl

).

Using distributivity of multiplication over addition for real numbers this sumis just

n∑t=1

p∑s=1

aitbtscsl.


Now change the order in which we add up these real numbers to get

p∑s=1

n∑t=1

aitbtscsl.

Now use distributivity again

p∑s=1

(n∑t=1

aitbts

)csl.

The sum within the brackets is just

(AB)is

and so the whole sum isp∑s=1

(AB)iscsl

which is precisely

((AB)C)il.

(3) Clearly (λ + µ)A and λA + µA have the same sizes. We show thatcorresponding elements are the same:

((λ+ µ)A)ij = (λ+ µ)(A)ij = λ(A)ij + µ(A)ij = (λA)ij + (µA)ij

which is just (λA+ µA)ij, as required.

We now prove the properties satisfied by the transpose.

Theorem 7.2.4.

1. (AT )T = A.

2. (A+B)T = AT +BT .

3. (αA)T = αAT .

4. (AB)T = BTAT .


Proof. (1) We have that

((AT )T )ij = (AT )ji = (A)ij.

(2) We have that

((A+B)T )ij = (A+B)ji = (A)ji + (B)ji = (AT )ij + (BT )ij

which is just(AT +BT )ij.

(3) We have that

((αA)T )ij = (αA)ji = α(A)ji = α(AT )ji = (αAT )ij.

(4) Let A be an m×n matrix and B an n×p matrix. Thus AB is definedand is m × p. Hence (AB)T is p ×m. Now BT is p × n and AT is n ×m.Thus BTAT is defined and is p×m. Hence (AB)T and BTAT have the samesize. We now show that corresponding elements are equal. By definition

((AB)T )ij = (AB)ji.

This is equal ton∑s=1

(A)js(B)si =n∑s=1

(AT )sj(BT )is.

But real numbers commute under multiplication and son∑s=1

(AT )sj(BT )is =

n∑s=1

(BT )is(AT )sj = (BTAT )ij,

as required.

Quantum Mechanics

Quantum mechanics is one of the fundamental theories of physics. Atits heart are matrices. We have defined the transpose of a matrix but formatrices with complex entries there is another, related, operation. Givenany complex matrix A we define the matrix A† to be the one obtained bytransposing A and then taking the complex conjugate of all entries. It istherefore the conjugate-transpose of A. A matrix A is called Hermitianif A† = A. Observe that a real matrix is Hermitian precisely when it issymmetric. It turns out that quantum mechanics is based on Hermitianmatrices and their generalizations. The fact that matrix multiplication


is not commutative is one of the reasons that quantum mechanics is sodifferent from classical mechanics. The theory of quantum computingmakes heavy use of Hermitian matrices and their properties.

Exercises 7.2

1. Calculate (2 07 −1

)+

(1 11 0

)+

(0 11 1

)+

(2 23 3

)2. Calculate 1

23

( 3 2 1) 1−1−4

( 3 1 5)

3. Calculate (x y

)( 5 44 4

)(xy

)4. If

A =

(1 −11 2

)calculate A2, A3 and A4.

5. Let A =

(1 11 0

)and x =

(10

). Calculate Ax, A2x, A3x, A4x and

A5x. What do you notice?

6. Calculate A2 where

A =

(cos θ sin θ− sin θ cos θ

)7. Show that

A =

(1 23 4

)satisfies A2 − 5A− 2I = O.


8. Let A be the following 3× 3 matrix 2 4 40 1 −10 1 3

Calculate

A3 − 6A2 + 12A− 8I

where I is the 3× 3 identity matrix.

9. Let A =

3 1 −12 2 −12 2 0

Calculate

A3 − 5A2 + 8A− 4I

where I is the 3× 3 identity matrix.

10. If 3X + A = B, find X in terms of A and B.

11. If X + Y =

(1 12 2

)and X − Y =

(2 21 1

)find X and Y .

12. If AB = BA show that A2B = BA2.

13. Is it true that AABB = ABAB?

14. Show that(A+B)2 − (A−B)2 = 2(AB +BA).

15. Let A and B be n× n matrices. Is it necessarily true that

(A−B)(A+B) = A2 −B2?

If so, prove it. If not, find a counterexample.

16. Expand (A+ I)4 carefully.

17. A matrix A is said to be symmetric if AT = A.

(a) Show that a symmetric matrix must be square.

(b) Show that if A is any matrix then AAT is defined and symmetric.

7.3. SOLVING SYSTEMS OF LINEAR EQUATIONS 195

(c) Let A and B be symmetric matrices of the same size. Prove thatAB is symmetric if and only if AB = BA.

18. An n× n-matrix A is said to be skew-symmetric if AT = − A.

(a) Show that the diagonal entries of a skew-symmetric matrix are allzero.

(b) If B is any n × n-matrix, show that B + BT is symmetric andthat B − BT is skew-symmetric.

(c) Deduce that every square matrix can be expressed as the sum ofa symmetric matrix and a skew-symmetric matrix.

19. Let A, B and C be square matrices of the same size. Define [A,B] =AB −BA. Calculate

[[A,B], C] + [[B,C], A] + [[C,A], B].

20. Let A be a 2× 2 matrix such that AB = BA for all 2× 2 matrices B.Show that

A =

(λ 00 λ

)for some scalar λ.

21. Let A be a 2× 2 matrix. The trace of A, denoted tr(A), is the sum ofthe diagonal elements.

(a) Show that tr(A+B) = tr(A) + tr(B); tr(λA) = λtr(A); tr(AB) =tr(BA).

(b) Let A be a known matrix. Show that the equation AX −XA = Icannot be solved for X.

7.3 Solving systems of linear equations

The goal of this section is to use matrices to help us solve systems of linearequations. We begin by proving some general results on linear equations,and then we describe Gaussian elimination, an algorithm for solving systemsof linear equations.


7.3.1 Some theory

A system of m linear equations in n unknowns is a list of equations of thefollowing form

a11x1 + a12x2 + . . .+ a1nxn = b1

a21x1 + a22x2 + . . .+ a2nxn = b2

· · ·am1x1 + am2x2 + . . .+ amnxn = bm

A solution is a sequence of values of x1, . . . , xn that satisfy all the equa-tions. The set of all solutions is called the solution set or general solution.

The equations above can be conveniently represented using matrices. LetA be the m×n matrix (A)ij = aij, let b be the m× 1 matrix (b)i1 = bi, andlet x be the n × 1 matrix (x)j1 = xj. Then the system of linear equationsabove can be written in the form

Ax = b

If b is a zero matrix, we say that the equations are homogeneous, otherwisethey are said to be inhomogeneous.

A system of linear equations that has no solution is said to be inconsistent;otherwise, it is said to be consistent.

We begin with some results that tell us what to expect when solvingsystems of linear equations.

Proposition 7.3.1. Homogeneous equations Ax = 0 are always consistent,because x = 0 is always a solution. In addition, the sum of any two solutionsis again a solution, and the scalar multiple of any solution is again a solution.

Proof. Let Ax = 0 be our homogeneous system of equations. Let a and b besolutions. That is Aa = 0 and Ab = 0. We now calculate A(a + b). To dothis we use the fact that matrix multiplication satisfies the left distributivitylaw

A(a + b) = Aa + Ab = 0 + 0 = 0.

Now let a be a solution and λ any scalar. Then

A(λa) = λAa = λ0 = 0.


Proposition 7.3.2. Let

Ax = b

be a consistent system of linear equations. Let p be any one solution. Thenevery solution of the equation is of the form p + h for some solution h ofAx = 0.

Proof. Let a be any solution to Ax = b. Let h = a−p. Then Ah = 0. Theresult now follows.

Theorem 7.3.3 (Fundamental theorem of linear equations). We assumethat the scalars are the rationals, the reals or the complexes. A system oflinear equations Ax = b has either

• No solutions.

• Exactly one solution.

• Infinitely many solutions.

Proof. We prove that if we can find two different solutions we can in factfind infinitely many solutions. Let u and v be two distinct solutions tothis equation then Au = b and Av = b. Consider now the column matrixw = u− v. Then

Aw = A(u− v) = Au− Av = 0

using the distributive law. Thus w is a non-zero column matrix that satisfiesthe equation Ax = 0. Consider now the column matrices of the form

u + λw

where λ is any real number. This is therefore a set of infinitely many differentcolumn matrices. We calculate

A(u + λw) = Au + λAw = b

using the distributive law and properties of scalars. It follows that the in-finitely many column matrices u + λw are solutions to the equation Ax =b.


7.3.2 Gaussian elimination

In this section, we shall develop an algorithm that will take as input a systemof linear equations and produce as output the following: if the system hasno solutions it will tell us, on the other hand if it has solutions then it willdetermine them all. Our method is based on three simple ideas:

1. Certain systems of linear equations have a shape that makes them veryeasy to solve.

2. Certain operations can be carried out on systems of linear equationswhich simplify them but do not change the solutions.

3. Everything can be done using matrices.

Here are examples of each of these ideas.

Example 7.3.4. The system of equations

2x+ 3y = 1

y = −3

is very easy to solve. From the second equation we get y = −3. Substitutingthis value into the first equation gives us x = 5. We can check that thissolution is correct by checking that these two values satisfy every equation.


2x+ 3y = 1

x+ y = 2

can be converted into a system with the same solutions but which is easierto solve. Multiply the second equation by 2. This gives us the new equations

2x+ 3y = 1

2x+ 2y = 4

which have the same solutions as the original equations. Next, subtract thefirst equation from the second equation to get

2x+ 3y = 1

−y = 3


Finally, multiply the last equation by −1. The resulting equations have thesame solutions as the original equations, but they can now be easily solvedas we showed above.


2x+ 3y = 1

x+ y = 2

can be written in matrix form as the matrix equation(2 31 1

)(xy

)=

(12

)For the purposes of our algorithm, we rewrite this equation in terms of whatis called an augmented matrix (

2 3 11 1 2

)The operations carried out in the previous example can be applied directlyto the augmented matrix.(

2 3 11 1 2

)=⇒

(2 3 12 2 4

)=⇒

(2 3 10 −1 3

)=⇒

(2 3 10 1 −3

)This augmented matrix can then be converted back into the usual matrixform and solved

2x+ 3y = 1

y = −3

We now formalize the above ideas.A matrix is called a row echelon matrix or to be in row echelon form if it

satisfies the following three conditions:

1. Any zero rows are at the bottom of the matrix.

2. If there are non-zero rows then they begin with the number 1, calledthe leading 1.

3. In the column beneath a leading 1, the elements are all zero.


The following operations on a matrix are called elementary row operations:

1. Multiply row i by a non-zero scalar λ. We notate this operation byRi ← λRi.

2. Interchange rows i and j. We notate this operation by Ri ↔ Rj.

3. Add a multiple λ of row i to another row j. We notate this operationby Rj ← Rj + λRi.

The following result is not hard to prove.

Proposition 7.3.7. Applying the elementary row operations to a system oflinear equations does not change their solution set.

Given a system of linear equations

Ax = b

the matrix(A|b)

is called the augmented matrix.

Algorithm 7.3.8. (Gaussian elimination) This is an algorithm for solvingsystems of linear equations. In outline, the algorithm runs as follows:

1. Given a system of equations

Ax = b

form the augmented matrix

(A|b).

2. By using elementary row operations, convert

(A|b)

into an augmented matrix(A′|b′)

which is a row echelon matrix.


3. Solve the equations obtained from

(A′|b′)

by back substitution.

Remarks

• The process in step (2) has to be carried out systematically to avoidgoing around in circles.

• Elementary row operations applied to a set of linear equations do notchange the solution set. Thus the solution sets of

Ax = b and A′x = b′

are the same.

• Solving systems of linear equations where the associated augmentedmatrix is a row echelon matrix is easy and can be accomplished byback substitution.

Here is a more detailed description of step (2) of the algorithm — theinput is a matrix B and the output is a matrix B′ which is a row echelonmatrix:

1. Locate the leftmost column that does not consist entirely of zeros.

2. Interchange the top row with another row if necessary to bring a non-zero entry to the top of the column found in step 1.

3. If the entry now at the top of the column found in step 1 is a, thenmultiply the first row by 1

ain order to introduce a leading 1.

4. Add suitable multiples of the top row to the rows below so that allentries below the leading 1 become zeros.

5. Now cover up the top row, and begin again with step 1 applied to thematrix that remains. Continue in this way until the entire matrix is arow echelon matrix.


The important thing to remember is to start at the top and work down-wards.

We now look in more detail at the final part of the overall algorithm whereour set of equations A′x = b′ is derived from an augmented matrix whichis a row echelon matrix. Assume that there is more than one solution. Thevariables are divided into two groups: those variables corresponding to thecolumns of A′ containing leading 1’s, called leading variables, and the rest,called free variables. We solve for the leading variables in terms of the freevariables; the free variables can be assigned arbitrary values independentlyof each other.

Examples 7.3.9.

1. We shall show that the following system of equations is inconsistent.

x+ 2y − 3z = −1

3x− y + 2z = 7

5x+ 3y − 4z = 2

The first step is to write down the augmented matrix of the system. Inthis case, this is the matrix 1 2 −3 −1

3 −1 2 75 3 −4 2

Carry out the elementary row operations R2 ← R2 − 3R1 and R3 ←R3 − 5R1. This gives us 1 2 −3 −1

0 −7 11 100 −7 11 7

Now carry out the elementary row operation R3 ← R3 − R2 whichyields 1 2 −3 −1

0 −7 11 100 0 0 −3

The equation corresponding to the last line of the augmented matrixis 0x+ 0y + 0z = −3. Clearly, this equation has no solutions becauseit is zero on the left and non-zero on the right. It follows thatthe original set of equations has no solutions.


2. We shall show that the following system of equations has exactly onesolution, and we shall also check it.

x+ 2y + 3z = 4

2x+ 2y + 4z = 0

3x+ 4y + 5z = 2

We first write down the augmented matrix 1 2 3 42 2 4 03 4 5 2

We then carry out the elementary row operations R2 ← R2 − 2R1 andR3 ← R3 − 3R1 to get 1 2 3 4

0 −2 −2 −80 −2 −4 −10

Then carry out the elementary row operations R2 ← −1

2R2 and R3 ←

−12R3 that yield 1 2 3 4

0 1 1 40 1 2 5

Finally, carry out the elementary row operation R3 ← R3 −R2 1 2 3 4

0 1 1 40 0 1 1

This is now a row echelon matrix. Write down the corresponding setof equations

x+ 2y + 3z = 4

y + z = 4

z = 1


Now solve by back substitution to get x = −5, y = 3 and z = 1.Finally, we check that 1 2 3

2 2 43 4 5

−531

=

402

3. We shall show that the following system of equations has infinitely

many solutions, and we shall check them.

x+ 2y − 3z = 6

2x− y + 4z = 2

4x+ 3y − 2z = 14

The augmented matrix for this system is 1 2 −3 62 −1 4 24 3 −2 14

We transform this matrix into an echelon matrix by means of the fol-lowing elementary row operations R2 ← R2 − 2R1, R3 ← R3 − 4R1,R2 ← −1

5R2, R3 ← −1

5R3 and R3 ← R3 −R2. This yields 1 2 −3 6

0 1 −2 20 0 0 0

Because the bottom row consists entirely of zeros, this means that wehave only two equations

x+ 2y − 3z = 6

y − 2z = 2

By back substitution, both x and y can be expressed in terms of z, andz may take any value we like. We say that z is a free variable. Letz = λ ∈ R. Then the set of solutions can be written in the form x

yz

=

220

+ λ

−121


We now check that these solutions work 1 2 −32 −1 44 3 −2

2− λ2 + 2λ

λ

=

62

14

as required.

Exercises 7.3

1. In each case, determine whether the system of equations is consistentor not. When consistent, find all solutions and show that they work.

(a)

2x+ y − z = 1

3x+ 3y − z = 2

2x+ 4y + 0z = 2

(b)

2x+ y − z = 1

3x+ 3y − z = 2

2x+ 4y + 0z = 3

(c)

2x+ y − 2z = 10

3x+ 2y + 2z = 1

5x+ 4y + 3z = 4

(d)

x+ y + z + w = 0

4x+ 5y + 3z + 3w = 1

2x+ 3y + z + w = 1

5x+ 7y + 3z + 3w = 2


7.4 Blankinship’s algorithm

The ideas of this chapter lead to an alternative, and better, procedure3 forcalculating the integers x and y such that gcd(a, b) = xa + yb. To explainhow it works, let’s go back to the basic step of Euclid’s algorithm. If a ≥ bthen we divide b into a and write

a = bq + r

where 0 ≤ r ≤ b. The key point is that gcd(a, b) = gcd(b, r). We shall nowthink of (a, b) and (b, r) as column matrices(

ab

),

(rb

).

We want the 2× 2 matrix that maps(ab

)to

(rb

).

This is the matrix (1 −q0 1

).

Thus (1 −q0 1

)(ab

)=

(rb

).

Finally, we can describe the process by the following matrix operation(1 0 a0 1 b

)→(

1 −q r0 1 b

)by carrying out an elementary row operation. This procedure can be iterated.It will terminate when one of the entries in the righthand column is 0. Thenon-zero entry will then be the greatest common divisor of a and b and thematrix on the lefthand side will tell you how to get to 0, gcd(a, b) from a, band so will provide the information that the Euclidean algorithm provides.All of this is best illustrated by means of an example.

3It was described by W. A. Blankinship in his paper ‘A new version of the Euclideanalgorithm’ American Mathematical Monthly 70 (1963), 742–745.

7.4. BLANKINSHIP’S ALGORITHM 207

Let’s calculate x, y such that gcd(2520, 154) = x2520 + y154. We startwith the matrix (

1 0 25200 1 154

)If we divide 154 into 2520 it goes 16 times plus a remainder. Thus we subtract16 times the second row from the first to get(

1 −16 560 1 154

)We now repeat the process but, since the larger number, 154, is on thebottom, we have to subtract some multiple of the first row from the second.This time we subtract twice the first row from the second to get(

1 −16 56−2 33 42

)Now repeat this procedure to get(

3 −49 14−2 33 42

)And again (

3 −49 14−11 180 0

)The process now terminates because we have a zero in the rightmost column.The non-zero entry in the rightmost column is gcd(2520, 154). We also knowthat (

3 −49−11 180

)(2520154

)=

(140

).

Now this matrix equation corresponds to two equations. It is the one corre-sponding to the non-zero value that says

14 = 3× 2520− 49× 154

which is both true and solves the extended Euclidean problem.

Chapter 8

Matrices II: inverses

We have learnt how to add subtract and multiply matrices but we have notdefined division. The reason is that in general it cannot always be defined.In this chapter, we shall explore when it can be. All matrices will be squareand I will always denote an identity matrix of the appropriate size.

8.1 What is an inverse?

The simplest kind of linear equation is ax = b where a and b are scalars. Ifa 6= 0 we can solve this by multiplying by a−1 on both sides to get a−1(ax) =a−1b. We now use associativity to get (a−1a)x = a−1b. Finally, a−1a = 1 andso 1x = a−1b and this gives x = a−1b. The number a−1 is the multiplicativeinverse of the non-zero number a.

We now try to copy this approach for the matrix equation

Ax = b.

We suppose that there is a matrix B such that BA = I.

• Multiply on the left both sides of our equation Ax = b to get B(Ax) =Bb. Because order matters when you multiply matrices, which sideyou multiply on also matters.

• Use associativity of matrix mulitiplication to get (BA)x = Bb.

• Now use our assumption that BA = I to get Ix = Bb.

• Finally, we use the properties of the identity matrix to get x = Bb.

209

210 CHAPTER 8. MATRICES II: INVERSES

We appear to have solved our equation, but we need to check it. We calculateA(Bb). By associativity this is (AB)b. At this point we also have to assumethat AB = I. This gives Ib = b, as required. We conclude that in order tocopy the method for solving a linear equation in one unknown, our coefficientmatrix A must have the property that there is a matrix B such that

AB = I = BA.

We take this as the basis of the following definition.A matrix A is said to be invertible if we can find a matrix B such that

AB = I = BA. The matrix B we call it an inverse of A, and we say thatthe matrix A is invertible. Observe that A has to be square. A matrix thatis not invertible is said to be singular.

Example 8.1.1. A real number r regarded as a 1× 1 matrix is invertible ifand only if it is non-zero, in which case an inverse is its reciprocal.

It’s clear that if A is a zero matrix, then it can’t be invertible just as inthe case of real numbers. However, the next example shows that even if A isnot a zero matrix, then it need not be invertible.

Example 8.1.2. Let A be the matrix(1 10 0

)We shall show that there is no matrix B such that AB = I = BA. Let B bethe matrix (

a bc d

)From BA = I we get

a = 1 and a = 0.

It’s impossible to meet both these conditions at the same time and so Bdoesn’t exist.

On the other hand here is an example of a matrix that is invertible.

Example 8.1.3. Let

A =

1 2 30 1 40 0 1

8.1. WHAT IS AN INVERSE? 211

and

B =

1 −2 50 1 −40 0 1

Check that AB = I = BA. We deduce that A is invertible with inverse B.

As always, in passing from numbers to matrices things become morecomplicated. Before going any further, I need to clarify one point which willat least make our lives a little simpler.

Lemma 8.1.4. Let A be invertible and suppose that B and C are matricessuch that

AB = I = BA and AC = I = CA.

Then B = C.

Proof. Multiply AB = I both sides on the left by C. Then C(AB) = CI.Now CI = C, because I is the identity matrix, and C(AB) = (CA)B sincematrix multiplication is associative. But CA = I thus (CA)B = IB = B. Itfollows that C = B.

The above result tells us that if a matrix A is invertible then there is onlyone matrix B such that AB = I = BA. We call the matrix B the inverse ofA. It is usually denoted by A−1. It is important to remember that we canonly write A−1 if we know that A is invertible. In the following, we describesome important properties of the inverse of a matrix.

Lemma 8.1.5.

1. If A is invertible then A−1 is invertible and its inverse is A.

2. If A and B are both invertible and AB is defined then AB is invertiblewith inverse B−1A−1.

3. If A1, . . . , An are all invertible and A1 . . . An is defined then A1 . . . Anis invertible and its inverse is A−1

n . . . A−11 .

Proof. (1) This is immediate from the equations A−1A = I = AA−1.(2) Show that

AB(B−1A−1) = I = (B−1A−1)AB.

(3) This follows from (2) above and induction.


We shall deal with the practical computation of inverse later. Let meconclude this section by returning to my original motivation for introducingan inverse.

Theorem 8.1.6 (Matrix inverse method). A system of linear equations

Ax = b

in which A is invertible has unique solution

x = A−1b.

Proof. Observe that

A(A−1b) = (AA−1)b = Ib = b.

Thus A−1b is a solution. It is unique because if x′ is any solution then

Ax′ = b

givingA−1(Ax′) = A−1b

and sox′ = A−1b.

Example 8.1.7. We shall solve the following system of equations using thematrix inverse method

x+ 2y = 1

3x+ y = 2

Write the equations in matrix form.(1 23 1

)(xy

)=

(12

)Determine the inverse of the coefficient matrix. In this case, you can checkthat this is the following

A−1 =−1

5

(1 −2−3 1

)

8.2. DETERMINANTS 213

Now we may solve the equations. From Ax = b we get that x = A−1b. Thusin this case

x =−1

5

(1 −2−3 1

)(12

)=

(3515

)Thus x = 3

5and y = 1

5. Finally, it is always a good idea to check the solutions.

There are two (equivalent) ways of doing this. The first is to check by directsubstitution

x+ 2y =3

5+ 2 · 1

5= 1

and

3x+ y = 3 · 3

5+

1

5= 2

Alternatively, you can check by matrix mutiplication(1 23 1

)(3515

)which gives (

12

)You can see that both calculations are, in fact, identical.

8.2 Determinants

The obvious questions that arise from the previous section are how do wedecide whether a matrix is invertible or not and, if it is invertible, how dowe compute its inverse? The material in this section is key to answeringboth of these questions. I shall define a number, called the determinant, thatcan be calculated from any square matrix. Unfortunately, the definition iscompletely unmotivated but it will justify itself by being useful.

Let A be a square matrix. We denote its determinant by det(A) or byreplacing the round brackets of the matrix A with straight brackets. It isdefined inductively: this means that I define an n× n determinant in termsof (n− 1)× (n− 1) determinants.

• The determinant of the 1× 1 matrix(a)

is a.


• The determinant of the 2× 2 matrix

A =

(a bc d

)denoted ∣∣∣∣ a b

c d

∣∣∣∣is the number ad− bc.

• The determinant of the 3× 3 matrix a b cd e fg h i

denoted ∣∣∣∣∣∣

a b cd e fg h i

∣∣∣∣∣∣is the number

a

∣∣∣∣ e fh i

∣∣∣∣− b ∣∣∣∣ d fg i

∣∣∣∣+ c

∣∣∣∣ d eg h

∣∣∣∣We could in fact define the determinant of any square matrix of whatever

size in much the same way. However, we shall limit ourselves to calculatingthe determinants of 3× 3 matrices at most. It’s important to pay attentionto the signs in the definition. You multiply alternately by plus one and minusone

+ − + − . . .

Examples 8.2.1.

1. ∣∣∣∣ 2 34 5

∣∣∣∣ = 2× 5− 3× 4 = −2.

2. ∣∣∣∣∣∣2 1 01 0 20 1 1

∣∣∣∣∣∣ = 2

∣∣∣∣ 0 21 1

∣∣∣∣− 1

∣∣∣∣ 1 20 1

∣∣∣∣ = −5

8.2. DETERMINANTS 215

3. ∣∣∣∣∣∣1 2 13 1 02 0 1

∣∣∣∣∣∣ = 1

∣∣∣∣ 1 00 1

∣∣∣∣− 2

∣∣∣∣ 3 02 1

∣∣∣∣+

∣∣∣∣ 3 12 0

∣∣∣∣ = −7

Determinants have many interesting properties, but as far as their con-nection with inverses is concerned, the following is the most important.

Theorem 8.2.2. Let A and B be square matrices having the same size. Then

det(AB) = det(A) det(B).

Proof. The result is true in general, but I shall only prove it for 2×2 matrices.Let

A =

(a bc d

)and B =

(e fg h

)We prove directly that det(AB) = det(A) det(B). First

AB =

(ae+ bg af + bhce+ dg cf + dh

)Thus

det(AB) = (ae+ bg)(cf + dh)− (af + bh)(ce+ dg).

The first bracket multiplies out as

acef + adeh+ bcgf + bdgh

and the second asacef + adfg + bceh+ bdgh.

Subtracting these two expressions we get

adeh+ bcgf − adfg − bceh.

Now we calculate det(A) det(B). This is just

(ad− bc)(eh− fg)

which multiplies out to give

adeh+ bcfg − adfg − bceh.

Thus the two sides are equal, and we have proved the result.


I shall mention one other property of determinants that we shall needwhen we come to study vectors and which will be useful in developing thetheory of inverses. It can be proved in the 2 × 2 and 3 × 3 cases by directverification.

Theorem 8.2.3. Let A be a square matrix and let B be obtained from A byinterchanging any two columns. Then det(B) = − det(A).

There is a very important consequence of the above result.

Proposition 8.2.4. If two columns of a determinant are equal then the de-terminant is zero.

Proof. Let A be a matrix with two columns equal. Then if we swap thosetwo columns the matrix remains unchanged. Thus by Theorem 8.2.3, wehave that detA = − detA. It follows that detA = 0.

Exercises 8.2

1. Compute the following determinants.

(a) ∣∣∣∣ 1 −12 3

∣∣∣∣(b) ∣∣∣∣ 3 2

6 4

∣∣∣∣(c) ∣∣∣∣∣∣

1 −1 12 3 40 0 1

∣∣∣∣∣∣(d) ∣∣∣∣∣∣

1 2 00 1 12 3 1

∣∣∣∣∣∣

8.3. WHEN IS A MATRIX INVERTIBLE? 217

(e) ∣∣∣∣∣∣2 2 21 0 5

100 200 300

∣∣∣∣∣∣(f) ∣∣∣∣∣∣

1 3 5102 303 5041000 3005 4999

∣∣∣∣∣∣(g) ∣∣∣∣∣∣

1 1 22 1 11 2 1

∣∣∣∣∣∣(h) ∣∣∣∣∣∣

15 16 1718 19 2021 22 23

∣∣∣∣∣∣2. Solve

∣∣∣∣ 1− x 42 3− x

∣∣∣∣ = 0.

3. Calculate ∣∣∣∣∣∣x cosx sinx1 − sinx cosx0 − cosx − sinx

∣∣∣∣∣∣4. Prove that ∣∣∣∣ a b

c d

∣∣∣∣ = 0

if, and only if, one column is a scalar multiple of the other. Hint:consider two cases: ad = bc 6= 0 and ad = bc = 0 for this case you willneed to consider various possibilities.

8.3 When is a matrix invertible?

Recall from Theorem 8.2.2 that

det(AB) = det(A) det(B).


I use this property below to get a necessary condition for a matrix to beinvertible.

Lemma 8.3.1. If A is invertible then det(A) 6= 0.

Proof. By assumption, there is a matrix B such that AB = I. Take deter-minants of both side of the equation

AB = I

to getdet(AB) = det(I).

By the key property of determinants recalled above

det(AB) = det(A) det(B)

and sodet(A) det(B) = det(I).

But det(I) = 1 and sodet(A) det(B) = 1.

In particular, det(A) 6= 0.

Are there any other properties that a matrix must satisfy in order to havean inverse? The answer is, surprisingly, no. We shall prove that a squarematrix A is invertible if, and only if, detA 6= 0. I shall only be able tosketch out the proof of this theorem below. The practical issue of actuallycomputing inverses is dealt with in the next section.

To motivate things, we start with a 2 × 2 matrix A where we can proveeverything. Let

A =

(a bc d

)We construct a new matrix as follows. Replace each entry aij of A by theelement you get when you cross out the ith row and jth column. Thus weget (

d cb a

)We now use the following matrix of signs(

+ −− +

)


to get (d −c−b a

)We now take the transpose of this matrix to get the matrix we call theadjugate of A

adj(A) =

(d −b−c a

)The defining characteristic of the adjugate is that

Aadj(A) = det(A)I = adj(A)A

which can easily be checked. We deduce from the defining characteristic ofthe adjugate that if det(A) 6= 0 then

A−1 =1

det(A)

(d −b−c a

)We have therefore proved the following.

Proposition 8.3.2. A 2×2 matrix is invertible if and only if its determinantis non-zero.

Example 8.3.3. Let A =

(1 23 1

). Determine if A is invertible and, if it

is, find its inverse, and check the answer. We calculate det(A) = −5. This isnon-zero, and so A is invertible. We now form the adjugate of A:

adj(A) =

(1 −2−3 1

)Thus the inverse of A is

A−1 = −1

5

(1 −2−3 1

)We now check that AA−1 = I (to make sure that we haven’t made anymistakes).

We now consider the general case. Here I will simply sketch out theargument. Let A be an n×n matrix with entries aij. We define its adjugateas the result of the following sequence of operations.


• Pick a particular row i and column j. If we cross out this row andcolumn we get an (n − 1) × (n − 1) matrix which I shall denote byM(A)ij. It is called a submatrix of the original matrix A.

• The determinant det(M(A)ij) is called the minor of the element aij.

• Finally, if we multiply det(M(A)ij) by the corresponding sign we getthe cofactor cij = (−1)i+j det(M(A)ij) of the element aij.

• If we replace each element aij by its cofactor, we get the matrix C(A)of cofactors of A.

• The transpose of the matrix of cofactors C(A), denoted adj(A), iscalled the adjugate1 matrix of A. Thus the adjugate is the transposeof the matrix of signed minors.

The crucial property of the adjugate is described in the next result.

Theorem 8.3.4. For any square matrix A, we have that

A(adj(A)) = det(A)I = (adj(A))A.

Proof. We have verified the above result in the case of 2×2 matrices. I shallnow prove it in the case of 3 × 3 matrices by means of an argument thatgeneralizes. Let A = (aij) and we write

B = adj(A) =

c11 c21 c31

c12 c22 c32

c13 c23 c33

We shall compute AB. We have that

(AB)11 = a11c11 + a12c12 + a13c13 = detA

by expanding the determinant along the top row. The next element is

(AB)12 = a11c21 + a12c22 + a13c23.

But this is the determinant of the matrix a11 a12 a13

a11 a12 a13

a31 a32 a33

1This odd word comes from Latin and means ‘yoked together’.


which, having two rows equal, must be zero by Proposition 8.2.4. This pat-tern now continues with all the off-diagonal entries being zero for similarreasons and the diagonal entries all being the determinant.

We may now prove the main theorem of this section.

Theorem 8.3.5. Let A be a square matrix. Then A is invertible if and onlyif det(A) 6= 0. When A is invertible, its inverse is given by

A−1 =1

det(A)adj(A).

Proof. Let A be invertible. By our lemma above, det(A) 6= 0 and so we canform the matrix

1

det(A)adj(A).

We now calculate

A1

det(A)adj(A) =

1

det(A)A adj(A) = I

by our theorem above. Thus A has the advertised inverse.

Conversely, suppose that det(A) 6= 0. Then again we can form the matrix

1

det(A)adj(A)

and verify that this is the inverse of A and so A is invertible.

Example 8.3.6. Let

A =

1 2 32 0 1−1 1 2

We show that A is invertible and calculate its inverse. First, det(A) = −5and so A is invertible. The matrix of minors is −1 5 2

1 5 32 −5 −4


The matrix of cofactors is −1 −5 2−1 5 −3

2 5 −4

The adjugate is the transpose of the matrix of cofactors −1 −1 2

−5 5 52 3 −4

Thus the inverse of A is the adjugate with each entry divided by the deter-minant of A

A−1 = −1

5

−1 −1 2−5 5 5

2 −3 −4

The Moore-Penrose Inverse

We have proved that a square matrix has an inverse if, and only if,it has a non-zero determinant. For rectangular matrices, the existenceof an inverse doesn’t even come up for discussion. However, in laterapplications of matrix theory it is very convenient if every matrix havean ‘inverse’. Let A be any matrix. We say that A+ is its Moore-Penroseinverse if the following conditions hold:

1. A = AA+A.

2. A+ = A+AA+.

3. (A+A)T = A+A.

4. (AA+)T = AA+.

It is not obvious, but every matrix A has a Moore-Penrose inverse A+

and, in fact, such an inverse is uniquely determined by the above fourconditions. In the case, where A is invertible in the vanilla-sense, itsMoore-Penrose inverse is just its inverse. But even singular matriceshave Moore-Penrose inverse. You can check that the matrix defined

8.4. COMPUTING INVERSES 223

below satisfies the four conditions above(1 23 6

)+

=

(0 · 002 0 · 0060 · 04 0 · 12

)The Moore-Penrose inverse can be used to find approximate solutuionsto systems of linear equations that might otherwise have no solution.

Exercises 8.3

1. Use the adjugate method to compute the inverses of the following ma-trices. In each case, check that your solution works.

(a)

(1 00 2

)(b)

(1 11 2

)

(c)

1 0 00 2 00 0 3

(d)

1 2 32 0 1−1 1 2

(e)

1 2 31 3 31 2 4

(f)

2 2 1−2 1 2

1 −2 2

8.4 Computing inverses

The practical way to compute inverses is to use elementary row operations.I shall first describe the method and then I shall prove that it works. Let Abe a square n×n matrix. We want to determine whether it is invertible and,


if it is, we want to calculate its inverse. We shall do this at the same timeand we shall not need to calculate a determinant. We write down a new kindof augmented matrix this time of the form B = (A | I) where I is the n× nidentity matrix. The first part of the algorithm is to carry out elementaryrow operations on B guided by A. Our goal is to convert A into a row echelonmatrix. This will have zeros below the leading diagonal. We are interestedin what entries lie on the leading diagonal. If one of them is zero we stop andsay that A is not invertible. If all of them are 1 then the algorithm continues.We now use the 1’s that lie on the leading diagonal to remove all elementabove each 1. Our original matrix B now has the following form (I | A′). Iclaim that A′ = A−1. I shall illustrate this method by means of an example.

Example 8.4.1. Let

A =

2 −2 42 3 2−1 1 1

We shall show that A is invertible and calculate its inverse. We first writedown the augmented matrix 2 −2 4 1 0 0

2 3 2 0 1 0−1 1 1 0 0 1

We now carry out a sequence of elementary row operations to get the follow-ing 1 −1 1 0 0 −1

0 1 0 0 15

25

0 0 1 12

0 1

The leading diagonal contains only 1’s and so our original matrix is invertible.We now use these 1’s to insert zeros above using elementary row operations. 1 0 0 −1

215−8

5

0 1 0 0 15

25

0 0 1 12

0 1

It follows that the inverse of A is

A−1 =

−12

15−8

5

0 15

25

12

0 1

8.4. COMPUTING INVERSES 225

At this point, it is always advisiable to check that A−1A = I in fact ratherthan just in theory.

We now need to explain why this method works. An n × n matrix E iscalled an elementary matrix if it is obtained from the n× n identity matrixby means of a single elementary row operation.

Example 8.4.2. Let’s find all the 2× 2 elementary matrices. The first oneis obtained by interchanging two rows and so is(

0 11 0

)Next we obtain two matrices by multiplying each row by a non-zero scalar λ(

λ 00 1

) (1 00 λ

)Finally, we obtain two matrices by adding a scalar multiple of one row toanother row (

1 λ0 1

) (1 0λ 1

)There are now two key results we shall need.

Lemma 8.4.3.

1. Let B be obtained from A by means of a single elementary row operationρ. Thus B = ρ(A). Let E = ρ(I). Then B = EA.

2. Each elementary row matrix is invertible.

Proof. (1) This has to be verified for each of the three types of elementary rowoperation. I shall deal with the third class of such operations: Rj ← Rj+λRi.Apply this elementary row operation to the n × n identity matrix I to getthe matrix E. This agrees with the identity matrix everywhere except thejth row. There it has a λ in the ith column and, of course, a 1 in the jthcolumn. We now calculate the effect of E on any suitable matrix A. ThenEA will be the same as A except in the jth row. This will consist of the jthrow of A to which λ times the ith row of A has been added.

(2) Let E be the elementary matrix that arises from the elementary rowoperation ρ. Thus E = ρ(I). Let ρ′ be the elementary row operation thatundoes the effect of ρ. Thus ρρ′ and ρ′ρ are both identity functions. LetE ′ = ρ′(I). Then E ′E = ρ′(E) = ρ′(ρ(I)) = I. Similarly, EE ′ = I. Itfollows that E is invertible with inverse E ′.


Example 8.4.4. We give an example of 2×2 elementary matrices. Considerthe elementary matrix

E =

(1 0λ 1

)which is obtained from the 2 × 2 identity matrix by carrying out the ele-mentary row operation R2 ← R2 + λR1. We now calculate the effect of thismatrix when we multiply it into the following matrix

A =

(a b cd e f

)and we get

EA =

(a b c

λa+ d λb+ e λc+ f

)But this matrix is what we would get if we applied the elementary rowoperation directly to the matrix A.

We may now prove that our the elementary row operation method forcalculating inverses which we described above really works.

Proposition 8.4.5. If (I | B) can be obtained from (A | I) by means ofelementary row operations then A is invertible with inverse B.

Proof. Let the elementary row operations that transform (A | I) to (I | B)be ρ1, . . . , ρn in this order. Thus

(ρn . . . ρ1)(A) = I and (ρn . . . ρ1)(I) = B.

Let Ei be the elementary matrix corresponding to the elementary row oper-ation ρi. Then

(En . . . E1)A = I and (En . . . E1)I = B.

Now the matrices Ei are invertible and so

A = (En . . . E1)−1 and B = En . . . E1.

Thus B is the inverse of A as claimed.

8.5. THE CAYLEY-HAMILTON THEOREM 227

Exercises 8.4

1. Use elementary row operations to compute the inverses of the followingmatrices. In each case, check that your solution works.

(a)

(1 00 2

)(b)

(1 11 2

)

(c)

1 0 00 2 00 0 3

(d)

1 2 32 0 1−1 1 2

(e)

1 2 31 3 31 2 4

(f)

2 2 1−2 1 2

1 −2 2

8.5 The Cayley-Hamilton theorem

The goal of this section is to prove a major theorem about square matrices. Itis true in general, although I shall only prove it for 2×2 matrices. It provides afirst indication of the importance of certain polynomials in studying matrices.

Let A be a square matrix. We can therefore form the product AA whichwe write as A2. When it comes to multiplying A by itself three times thereare apparently two possibilities: A(AA) and (AA)A. However, matrix multi-plication is associative and so these two products are equal. We write this asA3. In general An+1 = AAn = AnA. We define A0 = I, the identity matrixthe same size as A. The usual properties of exponents hold

AmAn = Am+n and (Am)n = Amn.


One important consequence is that powers of A commute so that

AmAn = AnAm.

We can form powers of matrices, multiply them by scalars and add themtogether. We can therefore form sums like

A3 + 3A2 + A+ 4I.

In other words, we can substitute A in the polynomial

x3 + 3x2 + x+ 4

remembering that 4 = 4x0 and so has to be replaced by 4I.

Example 8.5.1. Let f(x) = x2 + x+ 2 and let

A =

(1 11 0

)We calculate f(A). Remember that x2 +x+ 2 is really x2 +x+ 2x0. Replacex by A and so x0 is replaced by A0 which is I. We therefore get A2 +A+ 2Iand calculating gives (

5 22 3

)It is important to remember that when a square matrix A is substituted

into a polynomial, you must replace the constant term of the polynomial bythe constant term times the identity matrix. The identity matrix you usewill have the same size as A.

We now come to an important extension of what we mean by a root. Iff(x) is a polynomial and A is a square matrix, we say that A is a matrix rootof f(x) if f(A) is the zero matrix.

Let A be a square n× n matrix. Put

χA(x) = det(A− xI).

Observe that x is essentially a complex variable and so cannot be replacedby a matrix. Then χA(x) is a polynomial of degree n called the characteristicpolynomial of A. It is worth observing that when x = 0 we get that χA(0) =det(A), which is therefore the value of the constant term of the characteristicpolynomial.

8.5. THE CAYLEY-HAMILTON THEOREM 229

Theorem 8.5.2 (Cayley-Hamilton). Every square matrix is a root of itscharacteristic polynomial.

Proof. I shall only prove this theorem in the 2× 2 case, though it is true ingeneral. Let

A =

(a bc d

)Then from the definition the characteristic polynomial is∣∣∣∣ a− x b

c d− x

∣∣∣∣Thus

χA(x) = x2 − (a+ d)x+ (ad− bc).

We now calculate χA(A) which is just(a bc d

)2

− (a+ d)

(a bc d

)+

(ad− bc 0

0 ad− bc

)This simplifies to (

0 00 0

)which proves the theorem in this case. The general proof uses the adjugatematrix of A− xI.

There is one very nice application of this theorem.

Proposition 8.5.3. Let A be an invertible n × n matrix. Then the inverseof A can be written as a polynomial in A of degree n− 1.

Proof. We may write χA(x) = f(x) + det(A) where f(x) is a polynomialwith constant term zero. Thus f(x) = xg(x) for some polynomial g(x) ofdegree n− 1. By the Cayley-Hamilton theorem, 0 = Ag(A) + det(A)I ThusAg(A) = − det(A)I. Put B = − 1

det(A)g(A). Then AB = I. But A and B

commute since B is a polynomial in A. Thus BA = I. We have thereforeproved that A−1 = B.


8.6 Complex numbers via matrices

Consider all matrices that have the following shape(a −bb a

)where a and b are arbitrary real numbers. You should show first that the sum,difference and product of any two matrices having this shape is also a matrixof this shape. Rather remarkably matrix multiplication is commutative formatrices of this shape. Observe that the determinant of our matrix above isa2 +b2. It follows that every non-zero matrix of the above shape is invertible.The inverse of the above matrix in the non-zero case is

1

a2 + b2

(a b−b a

)and again has the same form. It follows that the set of all these matricessatisfies the axioms of high-school algebra. Define

1 =

(1 00 1

)and

i =

(0 −11 0

)We may therefore write our matrices in the form

a1 + bi.

Observe thati2 = −1.

It follows that our set of matrices can be regarded as the complex numbersin disguise.

Chapter 9

Vectors

Euclid’s book codified what was known about geometry into a handful ofaxioms and then showed that all of geometry could be deduced from thoseaxioms by the use of mathematcial proof. Impressive though Euclid’s achieve-ment was, it does suffer one drawback in that it is not the easiest system touse. Even proving simple results, like Pythagoras’s theorem, takes dozensof intermediate results. So although it is a great theoretical achievement, itis not such a practical one. It was not until the nineteenth century that apractical tool for doing three-dimensional geometry was constructed. On thebasis of the work carried out by Hamilton on quaternions — I say a littlemore about this later — the theory of vectors, the subject of this chapter,was developed by the American Josiah Willard Gibbs and promoted by theEnglish electrical engineer Oliver Heaviside (whose formal schooling endedat the age of 16). In addition to setting up an algebraic system that willenable us to carry out geometrical calculations easily, I shall also touch on adeep connection with the work of the previous chapter. Each linear equationin three unknowns is in fact the equation of a plane in three-dimensionalspace. This means that the theory of linear equations in three unknownshas a geometrical interpretation. This may be generalized: the theory ofmatrices combined with a theory of vectors in arbitrary dimensions is knownas linear algebra, one of the most important branches of algebra. I have notattempted to develop the subject in this chapter completely rigorously, so Ioften make appeals to geometric intuition in setting up the algebraic theoryof vectors.

231

232 CHAPTER 9. VECTORS

9.1 Vector algebra

I shall assume you are familiar with the following ideas from school:

• The notion of a point.

• The notion of a line and of a line segment.

• The notion of the length of a line segment and the angle between twoline segments.

• The notion of parallel lines.

The notion of a pair of lines being parallel is fundamental to Euclidean ge-ometry. We used it to prove that the angles in a triangle add up to two rightangles.

9.1.1 Addition and scalar multiplication of vectors

Definition of a vector Two directed line segments which are parallel, havethe same length, and point in the same direction are said to represent thesame vector.

The word ‘vector’ means carrier in Latin and what a vector carries isinformation about length and direction and nothing else. Because vectorsstay the same when they move parallel to themselves, they also preserveinformation about angles.

Thus vectors have length and direction but no other properties. I shalldenote vectors by bold letters a,b, . . . If P and Q are points then the directed

line segment from P to Q is written PQ or−→PQ. If P = Q then PQ is just a

point. The zero vector 0 is represented by the degenerate line segment PP .Vectors are denoted by arrows: the vector starts at the base of the arrow(where the feathers would be) we shall call this the tail of the vector andends at the tip (where the arrowhead is) which we shall call the point of thevector.

9.1. VECTOR ALGEBRA 233

Example 9.1.1. In the diagram below all the vectors shown are equal.

??��

??��

??��??��

??��

??��??��

??��

??��

The set of vectors in space can be equipped with two operations: vectoraddition and multiplication by a scalar.

Let a and b be vectors. Then their sum is defined as follows: slide thevectors parallel to themselves so that the point of a touches the tail of b.The directed line segment from the tail of a to the point of b represents thevector a + b.

b

��???????????????

a

GG��

a+b

77ooooooooooooooooooooooo

This definition does make sense though I will not justify that here. If a isa vector, then −a is defined to be the vector with the same length as a butpointing in the opposite direction.

−a

��

a

??��


Theorem 9.1.2 (Properties of vector addition).

(VA1) a + (b + c) = (a + b) + c. This is the associative law for vectoraddition.

(VA2) 0 + a = a = a + 0. The zero vector is the additive identity.

(VA3) a + (−a) = 0 = (−a) + a. The vector −a is the additive inverse of a.

(VA4) a + b = b + a. This is the commutative law for vector addition.

The proof of the commutativity of vector addition is illustrated below.

b //

a

??��b

//

a+b

b+a

77ooooooooooooooooooooooooooooooooooooooooooooo

a

??��

The proof of associativity is illustrated below.

We define a− b = a + (−b).


Advanced remark We have seen the above properties before: real numberswith respect to addition, and m × n matrices with respect to matrix addi-tion. A set equipped with a binary operation that is associative, possessesan identity, possesses unique inverses and is commutative is called an abeliangroup.

Example 9.1.3. Consider the following square of vectors.

a //

b

��

d

OO

coo

Then we havea + b + c + d = 0.

Thus, in particular,d = −c− b− a.


Example 9.1.4. Consider the following diagram

b //

k

��

c

��???????????????????????????????

a

OO

f

??��

e

''OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOg

ooh

oo

d

��

(i) We may write c in terms of e, d and f . By following the arrows we getthat c = d + ef

(ii) We may write g in terms of c, d, e and k. By following the arrows weget that g = −k + c + d− e.

(iii) We may solve x + b = f using similar methods to high-school algebrato get x = f − b which is just a.

(iv) We may solve x + h = d− e in a similar fashion to get x = d− e− hwhich is just g.

If a is a vector then ‖a‖ is its length.If ‖a‖ = 1 then a is called a unit vector. We have that ‖a‖ ≥ 0, and

‖a‖ = 0 iff a = 0. By results on triangles we have the triangle inequality

‖a + b‖ ≤ ‖a‖+ ‖b‖ .

We now define multiplication of a vector by a scalar. Let λ be a scalarand a a vector. If λ = 0 then λa = 0; if λ > 0 then λa has the same direction


as a and length λ ‖a‖; if λ < 0 then λa has the opposite direction to a andlength (−λ) ‖a‖. Observe that in all cases

‖λa‖ = |λ| ‖a‖ .If a is non-zero then

a =a

‖a‖is a unit vector in the same direction as a. We call this process normalisation.Vectors that differ by a scalar multiple are said to be parallel.

Theorem 9.1.5 (Properties of scalar multiplication).

(i) 0a = 0.

(ii) 1a = a.

(iii) (−1)a = −a.

(iv) (λ+ µ)a = λa + µa.

(v) λ(a + b) = λa + λb.

(vi) λ(µa) = (λµ)a.

We can use what we have introduced so far to prove simple geometrictheorems.

Example 9.1.6. If the midpoints of the consecutive sides of any quadrilateralare joined by line segments, then the resulting quadrilateral is a parallelo-gram. We refer to the picture below.


We have thata + b + c + d = 0.

Now−→AB = 1

2a + 1

2b and

−−→CD = 1

2c + 1

2d. But a + b = −(c + d). It follows

that−→AB = −

−−→CD. Hence the line segment AB is parallel to the line segment

CD and they have the same lengths. Similarly, BC is parallel to AD andhas the same length.

9.1.2 Inner, scalar or dot products

We now introduce a notion that will enable us to measure angles and lengths.It is a development of the idea of the perpendicular projection of a line ontoanother line.

Let a and b be two vectors. If a,b 6= 0 then we define

a · b = ‖a‖ ‖b‖ cos θ

where θ is the angle between a and b.Note that this angle is always chosen to be 0 ≤ θ ≤ π. If either a or b

is zero then a · b is defined to be zero. We call a · b the inner product of aand b. It is also sometimes called the scalar product and the dot product. Itis important to remember that it is a scalar and not a vector.

We say that non-zero vectors a and b are orthogonal to each other if theangle between them is ninety degrees. The key property of the inner productis that for non-zero a and b we have that

a · b = 0 iff a and b are orthogonal.

Theorem 9.1.7 (Properties of the inner product).

(i) a · b = b · a.

(ii) a · a = ‖a‖2.

(iii) λ(a · b) = (λa) · b = a · (λb).

(iv) a · (b + c) = a · b + a · c.

Remarks

(i) The inner product a · a is often abbreviated a2.


(ii) Property (iv) says that the inner product of a sum is the sum of theinner products. It will be very important to us. It is the only propertythat takes a bit of work to prove. I give the proof later.

The inner product enables us to prove much more interesting theorems.

Example 9.1.8. The angle in a semicircle is a right angle. Draw a semicircle.Choose any point on the circumference of the semicircle and join it to thepoints at either end of the diameter of the semicircle. Then the claim is thatthe resulting triangle is right-angled.

We are interested in the angle formed by AB and AC. Observe that−→AB = −(a + b) and

−→AC = a− b. Thus

−→AB ·

−→AC = −(a + b) · (a− b)

= −(a2 − a · b + b · a− b2)

= −(a2 − b2)

= 0

using the fact that a ·b = b ·a and ‖a‖ = ‖b‖, because this is just the radiusof the semicircle. It follows that the angle BAC is a right angle, as claimed.


Example 9.1.9. Pythagoras’ theorem proved using vectors.

We have thata + b + c = 0

and so a + b = −c. Now

(a + b)2 = (−c) · (−c) = ‖c‖2 .

But(a + b)2 = ‖a‖2 + 2a · b + ‖b‖2

and this is equal to ‖a‖2 + ‖b‖2 because a · b = 0. It follows that

‖a‖2 + ‖b‖2 = ‖c‖2 .

Remark The set of 3-dimensional vectors equipped with the operations ofvector addition and scalar multiplication together with the inner product iscalled three dimensional Euclidean space E3. This is precisely the space ofEuclid’s geometry, but done in a modern way.

9.1.3 Vector or cross products

In three dimensional space there is another operation available to us that isuseful in many applications. Let a and b be non-zero vectors. We define anew vector

a× b = ‖a‖ ‖b‖ sin θn


where θ is the angle between a and b, and n is a unit vector at right angles tothe plane containing a and b — this determines n up to sign: we choose thedirection of n so that when rotating a to b in a clockwise direction throughthe angle θ we are looking in the direction of n.

a×b

OO

b //

a

��?????????????????????????

If a or b is zero then a × b is the zero vector. We call it the vectorproduct of a and b. It is sometimes called the cross product. It is importantto remember that it is a vector. The key property of the vector product isthat for non-zero vectors

a× b = 0 iff a and b are parallel.

Theorem 9.1.10 (Properties of the vector product).

(i) a× b = −b× a.

(ii) λ(a× b) = (λa)× b = a× (λb).

(iii) a× (b + c) = a× b + a× c.

Remark Property (iii) says that the vector product distributes over addition.This is the hardest property to prove; I give the proof later.

Warning! a× (b× c) 6= (a× b)× c. In other words, the vector product isnot associative.


Warning! Distinguish between the following:

• λa. This is a scalar λ times a vector a and the result is a vector.

• a · b. This is the inner product of two vectors and is a scalar.

• a× b. This is the vector product of two vectors and is a vector.

You must not interchange notation for these different products (unlikeschool algebra where you can).

Example 9.1.11. The area of the parallelogram determined by the vectorsa and b is ‖a× b‖ as the following picture shows.

Example 9.1.12. We shall prove the law of sines for triangles using thevector product. With reference to the diagram below

we have thatsinA

a=

sinB

b=

sinC

c.


We choose vectors as shown so that

‖a‖ = a, ‖b‖ = b, ‖c‖ = c.

Thena + b + c = 0.

Hencea + b = −c.

Take the vector product of this equation on both sides on the left with a,band c in turn. We get

1. a× b = c× a.

2. b× a = c× b.

3. c× a = b× c.

From (1), we get‖a× b‖ = ‖c× a‖ .

Thus‖b‖ sinC = ‖c‖ sinB

which gives us the second equation in the statement of the result. Theremaining results follow similarly.

9.1.4 Scalar triple products

This product is nothing more than a combination of the previous two. How-ever, it is included because, as we shall see, it has an important geometricinterpretation.

Let a, b and c be three vectors. Then b× c is a vector. Thus a · (b× c)is a scalar. We define

[a,b, c] = a · (b× c).

It is called the scalar triple product. Its properties are determined by theproperties of the inner and vectors products. What it means geometricallywill be described later.

Exercises 4.1


1. Consider the following diagram.

Aa //

c

��

B b // C

D E F

Now answer the following questions.

(i) Write the vector BD in terms of a and c

(ii) Write the vector AE in terms of a and c

(iii) What is the vector DE?

(iv) What is the vector CF?

(v) What is the vector AC?

(vi) What is the vector BF?

2. If a, b, c and d represent the consecutive sides of a quadrilateral, showthat the quadrilateral is a parallelogram if and only if a + c = 0.

3. In the regular pentagon ABCDE, let AB = a, BC = b, CD = c, andDE = d. Express EA, DA, DB, CA, EC, BE in terms of a, b, c,and d.

4. Let a and b represent adjacent sides of a regular hexagon so that theinitial point of b is the terminal point of a. Represent the remainingsides by means of vectors expressed in terms of a and b.

5. Prove that ‖a‖b + ‖b‖ a is orthogonal to ‖a‖b− ‖b‖ a for all vectorsa and b.

6. Let a and b be two non-zero vectors. Let

u =

(a · ba · a

)a.

Show that b− u is orthogonal to a.

9.2. VECTOR ARITHMETIC 245

7. Simplify (u + v)× (u− v).

8. Let a and b be two unit vectors the angle between them being π3. Show

that 2b− a and a are orthogonal.

9. Prove that

‖u− v‖2 + ‖u + v‖2 = 2(‖u‖2 + ‖v‖2).

Deduce that the sum of the squares of the diagonals of a parallelogramis equal to the sum of the squares of all four sides.

9.2 Vector arithmetic

The theory I introduced in Section 4.1 is useful for proving general resultsabout geometry, but what if we want to calculate with particular vectors: howdo we describe them? To do this we need coordinates, and vectors viewed interms of coordinates will occupy us for the remainder of this section.

9.2.1 i’s, j’s and k’s

Set up a cartesian coordinate system consisting of x, y and z axes. Weorient the system so that in rotating the x axis clockwise to the y axis, weare looking in the direction of the positive z axis. Let i, j and k be unitvectors parallel to the x, y and z axes respectively (pointing in the positivedirections). Every vector a can be uniquely written in the form

a = a1i + a2j + a3k

for some scalars a1, a2, a3. This is achieved by orthogonal projection of thevector a (moved so that it starts at the origin) onto each of the three coor-dinate axes. The numbers ai are called the components of a in each of thethree directions.

Remarks

• If a = a1i + a2j + a3k and b = b1i + b2j + b3k then a = b iff ai = bi;that is, corresponding components are equal.

• 0 = 0i + 0j + 0k.


• If a = a1i + a2j + a3k and b = b1i + b2j + b3k then

a + b = (a1 + b1)i + (a2 + b2)j + (a3 + c3)k.

• If a = a1i + a2j + a3k then λa = λa1i + λa2j + λa3k.

Theorem 9.2.1 (Scalar products). Let a = a1i + a2j + a3k and b = b1i +b2j + b3k. Then

a · b = a1b1 + a2b2 + a3b3.

Proof. This is proved using Theorem 9.1.7 (iv) and the following table

· i j k

i 1 0 0j 0 1 0k 0 0 1

computed from the definition of the inner product. We have that

a · b = a · (b1i + b2j + b3k) = b1(a · i) + b2(a · j) + b3(a · k).

We now compute a · i, a · j, and a · k in turn:

• a · i = a1.

• a · j = a2.

• a · k = a3.

Putting everything together we get

a · b = a1b1 + a2b2 + a3b3,

as required.

Remark If a = a1i + a2j + a3k then ‖a‖ =√a2

1 + a22 + a2

3.

Theorem 9.2.2 (Vector products). Let a = a1i + a2j + a3k and b = b1i +b2j + b3k. Then

a× b =

∣∣∣∣∣∣i j ka1 a2 a3

b1 b2 b3

∣∣∣∣∣∣Warning! This ‘determinant’ can only be expanded along the first row.

9.2. VECTOR ARITHMETIC 247

Proof. This follows by Theorem 9.1.10 (iii) and the following table

× i j k

i 0 k −jj −k 0 ik j −i 0

computed from the definition of the vector product. We have that

a× b = a× (b1i + b2j + b3k) = b1(a× i) + b2(a× j) + b3(a× k).

We now compute a× i, a× j, and a× k in turn:

• a× i = −a2k + a3j.

• a× j = a1k− a3i.

• a× k = −a1j + a2i.

Putting everything together we get

a× b = (a2b3 − a3b2)i− (a1b3 − a3b1)j + (a1b2 − a2b1)k

which is equal to the given determinant.

The proof of the following now follows by our two theorems above.

Theorem 9.2.3 (Scalar triple products and determinants). Let

a = a1i + a2j + a3k, b = b1i + b2j + b3k, c = c1i + c2j + c3k.

Then

[a,b, c] =

∣∣∣∣∣∣a1 a2 a3

b1 b2 b3

c1 c2 c3

∣∣∣∣∣∣Thus the properties of scalar triple products are the same as the properties of3× 3 determinants.

Proof. We calculate a · (b× c). This is equal to

(a1i + a2j + a3k) · [(b2c3 − b3c2)i− (b1c3 − b3c1)j + (b1c2 − b2c1)k].


But this is equal to

a1(b2c3 − b3c2)− a2(b1c3 − b3c1) + a3(b1c2 − b2c1)

which is nothing other than ∣∣∣∣∣∣a1 a2 a3

b1 b2 b3

c1 c2 c3

∣∣∣∣∣∣

Before we look at some examples it is worth stepping back a bit to seewhere we are.

Summary

In Section 4.1, we defined vectors and vector operations geomet-rically. In Section 4.2, we showed that once we had chosen a co-ordinate system, vectors and vector operations could be describedalgebraically. The important point to remember in what followsis that the two approaches must give the same answers.

Exercises 4.2

1. Let a = 3i + 4j, b = 2i + 2j− k and c = 3i− 4k.

(i) Find ‖a‖, ‖b‖, and ‖c‖.(ii) Find a + b and a− c.

(iii) Determine ‖a− c‖.

2. (i) Let a = 4i + j − 3k and b = i + 2j + 2k. Find a · b. Are a and borthogonal?

(ii) Find the angle between −2(i− j) + k and j− i.

3. The unit cube is determined by the three vectors i, j and k. Find theangle between the long diagonal of the unit cube and one of its edges.

9.3. GEOMETRY WITH VECTORS 249

4. Calculate i× (i× k) and (i× i)× k. What do you deduce as a resultof this?

5. Calculate u · (v × w) where u = 3i − 2j − 5k, v = i + 4j − 4k, andw = 3j + 2k.

6. If [a,b, c] = 0 what can you deduce?

9.3 Geometry with vectors

There are two kinds of vectors: the free vectors that we have been dealingwith up to now and the position vectors we introduce next.

9.3.1 Position vectors

So far, we have used vectors to describe line segments. But we can also usevectors to describe the precise location of points. To do this, we have tochoose and fix a point O in space, called an origin. We can then consider allthe directed line segments that start at O. Each such segment represents avector and every vector is thus represented. The tops of the line segments arepoints in space, and every point thus occurs. It follows that once an originhas been fixed, vectors can be used to describe points. We talk about theposition vectors of points. However, we can only talk about position vectorswith respect to some fixed point O.


Example 9.3.1. The point A has position vector a = −i + j and the pointB has position vector b = 2i + j − k. Find the position vector of the pointP which is 2

3of the way between A and B.

A

2

?????????????????????????

P

1

?????????????????????????

Ob

//

a

OO

p

??��B

We have that

−→OP =

−→OA+

−→AP

=−→OA+

2

3

−→AB

= a +2

3(b− a)

=1

3a +

2

3b

=1

3(−i + j) +

2

3(2i + j− k)

= i + j− 2

3k

9.3.2 Linear combinations

Let v1, . . . ,vn be n vectors and let λ1, . . . , λn be n scalars. Then the vector

v = λ1v1 + . . .+ λnvn


is called a linear combination of the n vectors.

Only two cases of this definition are needed in this course. If we aregiven just one vector v1 then a linear combination is just a scalar multiple ofthat vector. The other case if where we have two vectors v1 and v2. Linearcombinations then look like this

λ1v1 + λ2v2.

Let v be any non-zero vector. Then any vector parallel to this vector hasthe form λv for some scalar λ.

Now let v1 and v2 be two non-zero vectors where neither is a multiple ofthe other. Then these two vectors determine a plane in space. This planeis not rooted to any point and so, for convenience, we may move it parallelto itself so that it passes through some fixed point that we may treat as anorigin. Now let v be any vector which is parallel to this plane. We may moveit parallel to itself so that its tail is at the origin. By plane geometry, wemay find real numbers λ1 and λ2 such that

v = λ1v1 + λ2v2.

We shall use these ideas in deriving formulae for lines and planes in spacein the sections below.

9.3.3 Lines

Intuitively, a line in space is determined by one of the following two piecesof information:

1. Two distinct points each described by a position vector.

2. One point and a direction where the point is given by a position vectorand the direction by a (free) vector.

Let’s see how we can use vectors to obtain the equation of that line.

Let a and b be the position vectors of two distinct points. Let r =xi + yj + zk be the position vector of a point on the line they determine.Observe that the line determined by the two points will be parallel to thevector b− a which is the direction the line is parallel to.


The vectors r−a and b−a will be parallel. Thus there is a scalar λ suchthat r− a = λ(b− a). It follows that

r = a + λ(b− a).

This is called the (vector form of) the parametric equation of the line. Theparameter in question is λ.

We now derive the coordinate form of the parametric equation. Let

a = a1i + a2j + a3k and b = b1i + b2j + b3k.

Substituting in our vector form above and equating components we obtain

x = a1 + λ(b1 − a1), y = a2 + λ(b2 − a2), z = a3 + λ(b3 − a3).

For convenience, put ci = bi−ai. Thus the coordinate form of the parametricequation for the line is

x = a1 + λc1, y = a2 + λc2, z = a3 + λc3.


If c1, c2, c3 6= 0 then we can eliminate the parameters in the above equa-tions to get the non-parametric equations of the line:

x− a1

c1

=y − a2

c2

,y − a2

c2

=z − a3

c3

.

It’s worth noting that

• The parametric equation is useful for generating points on the line (bychoosing values of the parameter λ).

• The non-parametric equation is useful for checking that given pointslie on a given line.

Example 9.3.2. Find the parametric and the non-parametric equations ofthe line through the point with position vector i+ 2j+ 3k and parallel to thevector 4i + 5j + 6k. In this question, we are given the direction that the lineis parallel to. Thus

r− (i + 2j + 3k)

is parallel to

4i + 5j + 6k.

It follows that

r = i + 2j + 3k + λ(4i + 5j + 6k)

is the vector form of the parametric equation of the line. We now find thecartesian form of the parametric equation. Put

r = xi + yj + zk.

Then

xi + yj + zk = i + 2j + 3k + λ(4i + 5j + 6k).

These two vectors are equal iff their coordinates are equal. Thus we havethat

x = 1 + 4λ

y = 2 + 5λ

z = 3 + 6λ


This is the cartesian form of the parametric equation of the line. Finally, weeliminate λ to get the non-parametric equation of the line

x− 1

4=y − 2

5and

y − 2

5=z − 3

6.

These two equations can be rewritten in the form

5x− 4y = −3 and 6y − 5z = −3.


9.3.4 Planes

Intuitively, a plane in space is determined by one of the following three piecesof information:

1. Any three points that do not all lie in a straight line; that is, the pointsform the vertices of a triangle.

2. One point and two non-parallel directions.

3. One point and a direction which is perpendicular or normal to theplane.

We shall begin by finding the parametric equation of the plane determinedby the three points with position vectors a,b and c.

The vectors b − a and c − a are both parallel to the plane, but are notparallel to each other. Thus every vector parallel to the plane they determinehas the form

λ(b− a) + µ(c− a)

for some scalars λ and µ. Here we use the ideas of Section 4.3.2. Thus ifthe position vector of an arbitrary point on the plane is r, then r − a =


λ(b − a) + µ(c − a). Thus the (vector form of) the parametric equation ofthe plane is

r = a + λ(b− a) + µ(c− a).

This can easily be written in coordinate form by equating components.To find the non-parametric equation of a plane, we use the fact that a

plane is determined once a point on the plane is known and a vector orthog-onal to every vector in the plane — such a vector is said to be normal tothe plane. Let n be a vector normal to our plane, and let a be the positionvector of a point in the plane.

Then r− a is orthogonal to n. Thus

(r− a) · n = 0.

This is the (vector form) of the non-parametric equation of the plane. Tofind the coordinate form of the non-parametric equation, let

r = xi + yj + zk, a = a1i + a2j + a3k, n = n1i + n2j + n3k.

From (r− a) · n = 0 we get (x− a1)n1 + (y − a2)n2 + (z − a3)n3 = 0. Thusthe non-parametric equation of the plane is

n1x+ n2y + n3z = a1n1 + a2n2 + a3n3.

Remark From the equation above, we deduce that the solutions of a linearequation in three unknowns

ax+ by + cz = d


all lie on a plane in general (although there are some degenerate cases wheresomething different from a plane will be obtained).

We observe that the non-parametric equation of the line in fact describesthe line as the intersection of two planes.

If we have three equations in three unknowns then, as long as the planesare angled correctly, they will intersect in a point — that is, the equationswill have a unique solution. However, there are many cases where either theplanes have no points in common (no solution) of have lines or indeed planesin common (infinitely many solutions).

Thus the nature of the solutions of a system of linear equations in threeunknowns is intimately bound up with the geometry of the planes they de-termine.

We have one final question to answer: given the parametric equation ofthe plane, how do we find the non-parametric equation? The vectors b − aand c−a are parallel to the plane but not parallel to each other. The vector

n = (b− a)× (c− a)

is normal to our plane.

Example 9.3.3. Find the parametric and non-parametric equations of theplane containing the three points with position vectors

a = j− k, b = i + j, c = i + 2j.

We have thatb− a = i + k

andc− a = i + j + k.

Thus the parametric equation of the plane is

r = j− k + λ(i + k) + µ(i + j + k).

To find the non-parametric equation, we need to find a vector normal to theplane. We calculate

(b− a)× (c− a) = k− i.

Thus(r− a) · (k− i) = 0.


That is(xi + (y − 1)j + (z + 1)k) · (k− i) = 0.

This simplifies toz − x = −1,

the non-parametric equation of the plane. We now check that our threeoriginal points satisfies this equation. The point a has co-ordinates (0, 1,−1);the point b has co-ordinates (1, 1, 0); the point c has co-ordinates (1, 2, 0).It is easy to check that each set of co-ordinates satisfies the equation.

9.3.5 Determinants

Let’s start with 1× 1 matrices. The determinant of (a) is just a. The lengthof a is |a|, the absolute value of the determinant of (a).


Theorem 9.3.4. Let a = ai + cj and b = bi + dj be a pair of plane vectors.Then the area of the parallelogram determined by these vectors is the absolutevalue of the determinant ∣∣∣∣ a b

c d

∣∣∣∣Proof. The proof I give will be for the case where both vectors are in thefirst quadrant. I shall consider two cases.

(Case 1): b is to the left of a when standing at the origin and lookingalong a. Let

a = ai + cj and b = bi + dj.

The area of the parallelogram is the area of the rectangle defined by thepoints

0, (a+ b)i, a + b, (c+ d)j

minus the area of two rectangles the same size, labelled (1), two triangles thesame size, labelled (2), and another two triangles of the same size, labelled(3). That is

(a+ b)(c+ d)− 2bc− 2(1

2)ac− 2(

1

2)bd

which is equal to

ac+ ad+ bc+ bd− 2bc− bd− ac = ad− bc.


(Case 2): b is to the right of a when standing at the origin and lookingalong a. A similar argument shows that the area is bc − ad which is thenegative of the determinant.

Putting these two cases together, we see that the area is the absolute valueof the determinant, because we usually expect areas to be non-negative.

Theorem 9.3.5. Let

a = ai + dj + gk, b = bi + ej + hk, c = ci + f j + ik

be three vectors. Then the volume of the parallelepiped (‘squashed box’) de-termined by these three vectors is the absolute value of the determinant∣∣∣∣∣∣

a b cd e fg h i

∣∣∣∣∣∣or its transpose.

Proof. We refer to the diagram below.

The volume of the box determined by the vectors a,b, c is equal to thebase area times the vertical height. This is equal to the absolute value of

‖a‖ ‖b‖ sin θ ‖c‖ cosφ.

We have to use the absolute value of this expression because cos(φ) can takenegative values if c is below rather than above the plane of a and b as I havedrawn it. Now


• a×b = ‖a‖ ‖b‖ sin θn, where n is the unit vector orthogonal to a andb and in the correct direction.

• n · c = ‖c‖ cosφ.

Thus‖a‖ ‖b‖ sin θ ‖c‖ cosφ = (a× b) · c.

By the properties of the inner product

(a× b) · c = c · (a× b) = [c, a,b].

We now use properties of the determinant

[c, a,b] = −[a, c,b] = [a,b, c].

It follows that the volume of the box is the absolute value of

[a,b, c].

It follows from the above theorem and our theorem on scalar triple prod-ucts that the volume of the parallelepiped determined by the three vectorsa, b, and c is the absolute value of the scalar triple product [a,b, c].

The geometric significance of determinants is that they enable us to mea-sure lengths, areas and volumes.

Exercises 4.3

1. (i) Find the parametric and the non-parametric equations of the linethrough the two points with position vectors i − j + 2k and 2i +3j + 4k.

(ii) Find the parametric and the non-parametric equations of the planecontaining the three points with position vectors i+3k, i+2j−k,and 3i− j− 2k.

2. Let c be the position vector of the centre of a sphere with radius R.Let an arbitrary point on the sphere have position vector r. Why is‖r− c‖ = R? Squaring both sides we get

(r− c) · (r− c) = R2.


If r = xi + yj + zk and c = c1i + c2j + c3k, deduce that the equation ofthe sphere with centre c1i + c2j + c3k and radius R is

(x− c1)2 + (y − c2)2 + (z − c3)2 = R2.

(i) Find the equation of the sphere with centre i + j + k and radius 2.

(ii) Find the centre and radius of the sphere with equation

x2 + y2 + z2 − 2x− 4y − 6z − 2 = 0.

3. The distance of a point from a line is defined to be the length of theperpendicular from the point to the line. Let the line in question haveparametric equation

r = p + λd

and let the position vector of the point be q. Show that the distanceof the point from the line is

‖d× (q− p)‖‖d‖

.

4. The distance of a point from a plane is defined to be the length of theperpendicular to the plane. Let the position vector of the point be qand the equation of the plane be (r−p) ·n = 0. Show that the distanceof the point from the plane is

|(q− p) · n|‖n‖

.

9.4. SUMMARY OF VECTORS 263

9.4 Summary of vectors

Inner products

Definition

Let a and b be two vectors. If a,b 6= 0 then we define

a · b = ‖a‖ ‖b‖ cos θ

where θ is the angle between a and b. Note that this angle is always chosento be 0 ≤ θ ≤ π. If either a or b is zero then a · b is defined to be zero. Wecall a · b the inner product of a and b.

Co-ordinate form

Let a = a1i + a2j + a3k and b = b1i + b2j + b3k. Then

a · b = a1b1 + a2b2 + a3b3.

Uses

• The most important application is the following: if the vectors a and bare non-zero then a · b = 0 precisely when a and b are orthogonal —meaning ‘at right angles to each other’.

• The inner product can more generally be used to work out the anglebetween two vectors

cos θ =a · b‖a‖ ‖b‖

where θ is the angle between the non-zero vectors a and b.

• The inner product can be used to work out the lengths of vectors:‖a‖ =

√a · a.


Vector products

Definition

Let a and b be non-zero vectors. We define a new vector

a× b = ‖a‖ ‖b‖ sin θn

where θ is the angle between a and b, and n is a unit vector at right angles tothe plane containing a and b. This determines n up to sign: we choose thedirection of n so that when rotating a to b in a clockwise direction throughthe angle θ we are looking in the direction of n. If a or b is zero then a× bis the zero vector. We call it the vector product of a and b.

Co-ordinate form

Let a = a1i + a2j + a3k and b = b1i + b2j + b3k. Then

a× b =

∣∣∣∣∣∣i j ka1 a2 a3

b1 b2 b3

∣∣∣∣∣∣Uses

• The most important application of the vector product is in construct-ing a vector orthogonal to two other vectors, and in particular in con-structing a vector orthogonal to a plane. That is, a vector normal tothe plane.

• If the vectors a and b are non-zero then a × b = 0 precisely when aand b are parallel to each other.

• The vector product can be used to calculate the sine of the angle be-tween two vectors

sin θ =‖a× b‖‖a‖ ‖b‖

where θ is the angle between the non-zero vectors a and b.

9.4. SUMMARY OF VECTORS 265

Scalar triple products

Definition

Let a, b and c be three vectors. Then b× c is a vector. Thus a · (b× c)is a scalar. We define

[a,b, c] = a · (b× c).

It is called the scalar triple product.

Co-ordinate form

Let a = a1i +a2j +a3k, b = b1i + b2j + b3k, and c = c1i + c2j + c3k. Then

[a,b, c] =

∣∣∣∣∣∣a1 a2 a3

b1 b2 b3

c1 c2 c3

∣∣∣∣∣∣Uses

• The absolute value of [a,b, c] is the volume of the parallelepiped (‘squashedbox’) determined by the three vectors.

• The scalar triple product gives a geometric interpretation of 3× 3 de-terminants.


9.5 *Two vector proofs*

This section will not be examined in 2013.

My development of the theory of vectors in this chapter depended on twoimportant results: Theorem 9.1.7 (iv), the fact that

a · (b + c) = a · b + a · c

and Theorem 9.1.10 (iii), the fact that

a× (b + c) = a× b + a× c.

I shall sketch out proofs of both of these results here. The proof of the firstis not too difficult.

Theorem 9.5.1.a · (b + c) = a · b + a · c.

Proof. Let x and y be a pair of vectors. Then the component of x in thedirection of y, written comp(x,y), is by definition the number ‖x‖ cos θ whereθ is the angle between x and y. Clearly

x · y = ‖y‖ comp(x,y).

Geometry shows (this means you should draw the pictures) that

comp(b + c, a) = comp(b, a) + comp(c, a).


(b + c) · a = ‖a‖ comp(b + c, a)

= ‖a‖ comp(b, a) + ‖a‖ comp(c, a)

= b · a + c · a

The proof of the second is hairier.

Theorem 9.5.2.a× (b + c) = a× b + a× c.

9.5. *TWO VECTOR PROOFS* 267

Proof. We defined the vector product in terms of geometry and so we shallhave to prove this property by means of geometry. I shall sketch out a prooffollowing one given in Pettofrezzo’s book.

We begin with what is in effect a lemma. Let a and b be a pair ofvectors. It is convenient to move them so that they are both emanating fromthe same point P . They determine a plane. In that plane, we can drawa line perpendicular to the vector a and passing through the point P . Weproject the vector b onto this line and we get a vector b′. We claim thata×b = a×b′. The proof follows by observing that these two vectors clearlyhave the same direction and a calculation shows that they have the samelength.

We now prove our theorem. We orientate ourselves so that the vector ais at right angles to the page and pointing at you the reader. We project thevectors a and b onto the plane of the page to get the vectors a′ and b′.

We shall prove that

a× (b′ + c′) = a× b′ + a× c′.

Let’s see first why this result is enough to prove the theorem. The vectors aand b+c define a plane. As in our lemma above, we have that a× (b+c) =a×(b+c)′. Also a×b = a×b′ and a×c = a×c′. As long as (b+c)′ = b′+c′,our theorem will follow.

We now prove that

a× (b′ + c′) = a× b′ + a× c′.

Now, by the way we have defined our vectors, a × b′ and a × c′ are in theplane of the page and are orthogonal to b′ and c′, respectively. This leads tothe crux of the proof: the angle between a×b′ and a× c′ is the same as theangle between b′ and c′. The point is that because a is pointing out of thepage, the operator a × − has the effect of rotating vectors by a right anglein the plane of the page.

It follows that a×b′+a×c′ is at right angles to b′+c′. Thus a×b′+a×c′

and a× (b′ + c′) are vectors pointing in the same direction.We now compare the lengths of these two vectors. We shall use the fact

that the triangles formed by the vectors a × b′ and a × c′ is similar to thetriangle formed by the vectors b′ and c′. Thus

‖a× b′ + a× c′‖‖b′ + c′‖

=‖a× b′‖‖b′‖

.


But this works out to give that

‖a× b′ + a× c′‖ = ‖a‖ ‖b′ + c′‖ .

Our claim is now proved.

9.6 Quaternions

The set of quaternions, denoted by H, was invented by the Irish mathemati-cian Sir William Rowan Hamilton in 1843. They are 4-dimensional gener-alisations of the complex numbers. It was from the theory of quaternionsthat the modern theory of vectors with inner and vector products developed.To describe what they are, I shall reverse history and derive them from vec-tors. Recall the following from an earlier exercise. The Pauli matrices are:I,X, Y, Z,−I,−X,−Y,−Z where

X =

(0 1−1 0

)Y =

(i 00 −i

)and Z =

(0 −i−i 0

)where i is the complex number i. You were asked to show that the productof any two Pauli matrices is again a Pauli matrix by completing a table. Weshall just need a portion of that table relating to X, Y and Z. This is

X Y Z

X −I Z −YY −Z −I XY Y −X −I

We shall now consider matrices of the form

λI + αX + βY + γZ

where λ, α, β, γ ∈ R. We calculate the product of two such matrices using thedistributivity and scalar multiplication properties of matrix multiplicationand the above multiplication table. The product

(λI + αX + βY + γZ)(µI + α′X + β′Y + γ′Z)

can be written in the form aI + bX + cY + dZ where a, b, c, d ∈ R althoughI shall write it in a slightly different form

9.6. QUATERNIONS 269

(λµ− αα′ − ββ′ − γγ′)I +

λ(α′X + β′Y + γ′Z) + µ(αX + βY + γZ) +

(βγ′ − γβ′)X + (γα′ − αγ′)Y + (αβ′ − βα′)Z.

Although this looks complicated there are some familiar things within it:the first term contains what looks like an inner product and the last termcontains what looks like a vector product. Note that because this is matrixmultiplication this operation is associative.

The above calculation motivates the following construction. Let E3 de-note the set of all 3-dimensional vectors. Thus a typical element of E3 isαi + βj + γk. Put

H = R× E3.

The elements of H are therefore ordered pairs (λ, a) consisting of a realnumber λ and a vector a. We define the sum of two elements of H in a verysimple way

(λ, a) + (µ, a′) = (λ+ µ, a + a′).

The product is defined in a way that mimics what I did above (you shouldcheck this)

(λ, a)(µ, a′) = (λµ− a · a′, λa′ + µa + (a× a′)) .

It follows that this product is associative !We shall now investigate what we can do with H. I shall only deal with

multiplication because addition poses no problems.

• Consider the subset R of H which consists of elements of the form(λ,0). You can check that (λ,0)(µ,0) = (λµ,0). Thus R mimics thereal numbers.

• Consider the subset C of H which consists of the elements of the form(λ, ai). You can check that

(λ, ai)(µ, a′i) = (λµ− aa′, (λa′ + µa)i).

In particular, (0, i)(0, i) = (−1,0). Thus C mimics the set of complexnumbers.


• Consider the subset E of H which consists of elements of the form (0, a).You can check that

(0, a)(0, a′) = (−a · a′, a× a′).

Thus E mimics vectors, the inner product and the vector product.

The set H with the above operations of addition and multiplication isthe set of quaternions. This structure pulls together most of the importantelements of this course: complex numbers, vectors and matrices.

MATHEMATICS Algebra, geometry, combinatoricsmarkl/teaching/ALGEBRA/Chapter7new.pdf · introduce you...

Documents

Transcript of MATHEMATICS Algebra, geometry, combinatoricsmarkl/teaching/ALGEBRA/Chapter7new.pdf · introduce you...