PMATH 340 Lecture Notes on Elementary Number Theory€¦ · Theorem 2.7. For each integer n 2 there...

PMATH 340Lecture Notes on Elementary Number Theory

Anton MosunovDepartment of Pure Mathematics

University of Waterloo

Winter, 2017

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Divisibility. Factorization of Integers.

The Fundamental Theorem of Arithmetic . . . . . . . . . . . . . 53 Greatest Common Divisor. Least Common Multiple. Bezout’s

Lemma. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Diophantine Equations.

The Linear Diophantine Equation ax+by = c . . . . . . . . . . . 155 Euclidean Algorithm. Extended Euclidean Algorithm . . . . . . . 186 Congruences.

The Double-and-Add Algorithm . . . . . . . . . . . . . . . . . . 247 The Ring of Residue Classes Zn . . . . . . . . . . . . . . . . . . 298 Linear Congruences . . . . . . . . . . . . . . . . . . . . . . . . . 319 The Group of Units Z?

n . . . . . . . . . . . . . . . . . . . . . . . 3310 Euler’s Theorem and Fermat’s Little Theorem . . . . . . . . . . . 3611 The Chinese Remainder Theorem . . . . . . . . . . . . . . . . . 3812 Polynomial Congruences . . . . . . . . . . . . . . . . . . . . . . 4113 The Discrete Logarithm Problem.

The Order of Elements in Z?n . . . . . . . . . . . . . . . . . . . . 45

14 The Primitive Root Theorem . . . . . . . . . . . . . . . . . . . . 5015 Big-O Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 5316 Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 56

16.1 Trial Division . . . . . . . . . . . . . . . . . . . . . . . . 5716.2 Fermat’s Primality Test . . . . . . . . . . . . . . . . . . . 5816.3 Miller-Rabin Primality Test . . . . . . . . . . . . . . . . 61

17 Public Key Cryptosystems.The RSA Cryptosystem . . . . . . . . . . . . . . . . . . . . . . . 62

18 The Diffie-Hellman Key Exchange Protocol . . . . . . . . . . . . 6719 Integer Factorization . . . . . . . . . . . . . . . . . . . . . . . . 69

1

19.1 Fermat’s Factorization Method . . . . . . . . . . . . . . . 7019.2 Dixon’s Factorization Method . . . . . . . . . . . . . . . 72

20 Quadratic Residues . . . . . . . . . . . . . . . . . . . . . . . . . 7521 The Law of Quadratic Reciprocity . . . . . . . . . . . . . . . . . 8122 Multiplicative Functions . . . . . . . . . . . . . . . . . . . . . . 8623 The Mobius Inversion . . . . . . . . . . . . . . . . . . . . . . . . 9124 The Prime Number Theorem . . . . . . . . . . . . . . . . . . . . 9525 The Density of Squarefree Numbers . . . . . . . . . . . . . . . . 9626 Perfect Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 10127 Pythagorean Triples . . . . . . . . . . . . . . . . . . . . . . . . . 10428 Fermat’s Infinite Descent.

Fermat’s Last Theorem . . . . . . . . . . . . . . . . . . . . . . . 10529 Gaussian Integers . . . . . . . . . . . . . . . . . . . . . . . . . . 11030 Fermat’s Theorem on Sums of Two Squares . . . . . . . . . . . . 12031 Continued Fractions . . . . . . . . . . . . . . . . . . . . . . . . . 12432 The Pell’s Equation . . . . . . . . . . . . . . . . . . . . . . . . . 13533 Algebraic and Transcendental Numbers.

Liouville’s Approximation Theorem . . . . . . . . . . . . . . . . 13734 Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

2

1 IntroductionThis is a course on number theory, undoubtedly the oldest mathematical disciplineknown to the world. Number theory studies the properties of numbers. These maybe integers, like −2,0 or 7, or rational numbers like 1/3 or −7/9, or algebraicnumbers like

√2 or i, or transcendental numbers like e or π . Though most of

the course will be dedicated to Elementary Number Theory, which studies con-gruences and various divisibility properties of the integers, we will also dedicateseveral lectures to Analytic Number Theory, Algebraic Number Theory, and othersubareas of number theory.

There are many interesting questions that one might ask about numbers. Insearch for answers to these questions mathematicians unravel fascinating proper-ties of numbers, some of which are quite profound. Here are several curious factsabout prime numbers:

1. Every odd number exceeding 5 can be expressed as a sum of three primes(Helfgott-Vinogardov Theorem, 2013. In 1954, Vinogardov proved the re-sult for all odd n > B for some B, and in 2013 Helfgott demonstrated thatone can take B = 5);

2. There are infinitely many prime numbers p and q such that |p− q| ≤ 246(Zhang’s Theorem, 2013. Zhang proved the result for 7 · 107, and in 2014the constant was reduced to 246 by Maynard, Tao, Konyagin and Ford);

3. For all n ≥ exp(exp(33.217)) there always exists a prime between n3 and(n+1)3 (Ingham’s Theorem, 1937. Ingham proved the result for all n ≥ Bfor some B, and in 2014 Dudek demonstrated that one can take B as above);

4. There are infinitely many primes of the form x2 + y4 (Friedlander-IwaniecTheorem, 1997);

5. Up to x> 1, there are “approximately” x/ logx prime numbers (Prime Num-ber Theorem, 1896);

6. Given a positive integer d, there exist distinct prime numbers p1, p2, . . . , pdwhich form an arithmetic progression (Green-Tao Theorem, 2004).

Despite the simplicity of their formulations, all of these results are highly non-trivial and their proofs reside on some deep theories. For example, the Green-Tao

3

Theorem resides on Szemeredi’s Theorem, which in turn uses the theory of ran-dom graphs.

There are many number theoretical problems out there that are still open. Atthe 1912 International Congress of Mathematicians, the German mathematicianEdmund Landau listed the following four basic problems about primes that stillremain unresolved:

1. Can every even integer greater than 2 be written as a sum of two primes?(Goldbach’s Conjecture, 1742);

2. Are there infinitely many prime numbers p and q such that |p− q| = 2?(Twin Prime Conjecture, 1849);

3. Does there always exist a prime between two consecutive perfect squares?(Legendre’s Conjecture, circa 1800);

4. Are there infinitely many primes of the form n2 + 1? (see Bunyakovsky’sConjecture, 1857).

It is widely believed that the answer to each of the questions above is “yes”.There is a lot of computational evidence towards each of them, and for some ofthem conjectural asymptotic formulas were established. However, none of themare proved.

Aside from being an interesting theoretical subject, number theory also hasmany practical applications. It is widely used in cryptographic protocols, suchas RSA (Rivest-Shamir-Adleman, 1977), the Diffie-Hellman protocol (1976), andECIES (Elliptic Curve Integrated Encryption Scheme). These protocols rely oncertain fundamental properties of finite fields (RSA, D-H) and elliptic curves de-fined over them (ECIES). For example, consider the Discrete Logarithm Problem:given a prime p and integers c,m, one may ask whether there exists an integer dsuch that cd−m is divisible by p, and if so, what is its value. We may write thisin the form of a congruence

cd ≡ m (mod p).

When p is extremely large (hundreds of digits) and c,m are chosen properly, thisproblem is widely believed to be intractable; that is, no modern computer cansolve it in a reasonable amount of time (the computation would require billions of

4

years). This property is used in many cryptosystems, including the first two men-tioned above. Many cryptosystems, like RSA, can be broken by quantum comput-ers. The construction of protocols infeasible to attacks by quantum computers isa subject of Post Quantum Cryptography and number theory plays a crucial rolethere (see the Lattice-Based or Isogeny-Based Cryptography).

2 Divisibility. Factorization of Integers.The Fundamental Theorem of Arithmetic

Before we proceed, let us invoke a little bit of notation:

N= {1,2,3, . . .}— the natural numbers;Z= {0,±1,±2, . . .}— the ring of integers;Q=

{mn : m ∈ Z,n ∈ N

}— the field of fractions;

R — the field of real numbers;C= {a+bi : a,b ∈ R, i2 =−1}— the field of complex numbers.

We call Z a ring because 0,1 ∈ Z and a,b ∈ Z implies a±b ∈ Z and a ·b ∈ Z.In other words, Z is closed under addition, subtraction and multiplication. Note,however, that a,b ∈ Z with b 6= 0 does not imply that a/b ∈ Z, so it is not closedunder division. A collection that is closed under addition, subtraction, multipli-cation and division by a non-zero element is called a field. According to thisdefinition, every field is also a ring.

Exercise 2.1. Demonstrate the proper inclusions in N ( Z ( Q ( R ( C. Noproofs are required.

Definition 2.2. Let a,b ∈ Z. We say that a divides b, or that a is a factor of b,when b = ak for some k ∈ Z. We write a | b if this is the case, and a - b otherwise.

Example 2.3. 3 | 12 because 12 = 3 · 4; 3 - 13; −1 | 7 because 7 = (−1) · (−7);0 - 3.

Proposition 2.4. 1 Let a,b,c,x,y ∈ Z.

1. If a | b and b | c, then a | c;

1Proposition 1.2 in Frank Zorzitto, A Taste of Number Theory.

5

2. If c | a and c | b, then c | ax±by;

3. If c | a and c - b, then c - a±b;

4. If a | b and b 6= 0, then |a| ≤ |b|;

5. If a | b and b | a, then a =±b;

6. If a | b, then ±a | ±b;

7. 1 | a for all a ∈ Z;

8. a | 0 for all a ∈ Z;

9. 0 | a if and only if a = 0.

Proof. Exercise.

Definition 2.5. Let p≥ 2 be a natural number. Then p is called prime if the onlypositive integers that divide p are 1 and p itself. It is called composite otherwise.

We remark that 1 is neither prime nor composite. We will also use the aboveterminology only with respect to integers exceeding 1 (so according to this con-vention −3 is not prime and −6 is not composite).

Exercise 2.6. Among the collection −5,1,5,6, which numbers are prime?

Theorem 2.7. For each integer n≥ 2 there exists a prime p such that p | n.

Proof. We will prove this result using strong induction on n.Base case. For n = 2 we have 2 | n. Since 2 is prime, the theorem holds.Induction hypothesis. Suppose that the theorem is true for n = 2,3, . . . ,k.Induction step. We will show that the theorem is true for n = k+ 1. If n is

prime the result holds. Otherwise there exists a positive integer d such that d | n,d 6= 1 and d 6= n. By property 4 of Proposition 2.4 we have d ≤ n, and since d 6= 1and d 6= n we conclude that 2 ≤ d ≤ n− 1 = k. Thus d satisfies the inductionhypothesis, so there exists a prime p such that p | d. Since p | d and d | n, byproperty 1 of Proposition 2.4 we conclude that p | n.

Theorem 2.8. (Euclid’s Theorem, circa 300BC) There are infinitely many primenumbers.

6

Proof. Suppose not, and there are only finitely many prime numbers, say p1, p2, . . . , pk.Consider the number

q = p1 p2 · · · pk +1.

Since q≥ 2, by Theorem 2.7 there exists some prime, say pi, which divides q. Onthe other hand, since pi | p1 p2 · · · pk and pi - 1, by property 3 of Proposition 2.4 itis the case that pi - q. This leads us to a contradiction. Hence there are infinitelymany prime numbers.

There are many alternative proofs of this fact, suggested by Euler, Erdos,Furstenberg, and other mathematicians (see the wikipedia page for Euclid’s The-orem). At the end of this section, we will see the proof given by Euler.

We will now turn our attention to the Fundamental Theorem of Arithmetic,which states that any integer greater than 1 can be written uniquely (up to reorder-ing) as the product of primes.

Example 2.9. Number 60 can be written as 60 = 22 ·3 ·5.

In order to prove the theorem, we will utilize the following tools:

1. Well-Ordering Principle. Let S be a non-empty subset of the natural num-bers N. Then S contains the smallest element. To spell it out, there existsx ∈ S such that the inequality x≤ y holds for any y ∈ S.

2. Generalized Euclid’s Lemma.2 Let p be a prime number and a1,a2, . . . ,akbe integers. If p | a1a2 · · ·ak, then there exists an index i, 1 ≤ i ≤ k, suchthat p | ai.

Theorem 2.10. (The Fundamental Theorem of Arithmetic) Any integer greaterthan 1 can be written uniquely (up to reordering) as the product of primes.

Proof. We will start by proving that every positive integer greater than 1 can bewritten as a product of primes. Let S denote the collection of all positive integersgreater than 1 that cannot be written as a product of primes. Suppose that S isnot empty. Since S ( N and N is well-ordered, we conclude that S contains thesmallest element, say n. Clearly, n is not a prime. Thus there exists a positiveinteger d such that d | n, d 6= 1 and d 6= n. Thus both d and n/d are strictly less thann and greater than 1. Furthermore, either d or n/d cannot be written as a product

2We will prove this result in Corollary 3.15 once we will introduce the notion of a greatestcommon divisor.

7

of primes, for the converse would imply that n is a product of primes. Thus eitherd or n/d is in S, which contradicts the fact that n is the smallest element in S. Thismeans that S is empty, so every integer greater than 1 is a product of primes.

To prove uniqueness, consider two prime power decompositions

n = pa11 pa2

2 · · · pakk = qb1

1 qb22 · · ·q

b`` .

We will show that they are in fact the same.3 Without loss of generality, we mayassume that p1 < p2 < .. . < pk and q1 < q2 < .. . < q`. Pick some index i suchthat 1≤ i≤ k. Since pi | n = qb1

1 qb22 · · ·q

b`` , by Generalized Euclid’s Lemma there

exists some index j(i), 1≤ j(i)≤ `, such that

pi | qb j(i)j(i) .

Now apply Generalized Euclid’s Lemma once again to deduce that pi | q j(i). Sinceq j(i) is prime, its only divisors are 1 and q j(i), which means that pi = q j(i). Sincep1 < p2 < .. . < pk, we see that j(i1) 6= j(i2) whenever i1 6= i2.

From above we conclude that for each i such that 1 ≤ i ≤ k we can put incorrespondence some element j(i) — and each j(i) arises from unique i — suchthat 1 ≤ j(i) ≤ `, which means that there are at least as many j’s as there are i’s,so k ≤ `.

Apply Generalized Euclid’s Lemma once again, but with the roles of pi andq j reversed, thus observing that for each j such that 1 ≤ j ≤ ` we can put incorrespondence some element i( j) — and each i( j) arises from unique j — suchthat 1≤ i( j)≤ `, so `≤ k. Since k ≤ ` and `≤ k, it is the case that k = `.

From here we deduce that paii | q

bii and qbi

i | paii . By property 5 of Proposition

2.4, we have paii = qbi

i . Since pi = qi, it is the case that ai = bi.

The fact that the prime factorization is unique was utilized by Euler to providean alternative proof of Euclid’s Theorem.

Theorem 2.9. (Euclid’s Theorem, circa 300BC) There are infinitely manyprime numbers.

Proof. (Euler’s proof, 1700’s) Consider the harmonic series

∞

∑n=1

1n= 1+

12+

13+ . . . .

3Note that this is not the proof by contradiction, for we do not assume that these prime powerdecompositions are distinct.

8

It is widely known that this series is divergent. Now let p > 1 and recall theformula for the infinite geometric series:

∞

∑k=0

1pk = 1+

1p+

1p2 + . . .=

11−1/p

.

Using this formula, we observe that

∏p prime

11−1/p

= ∏p prime

(1+

1p+

1p2 + . . .

)=

∞

∑n=1

1n,

where the last equality holds by the Fundamental Theorem of Arithmetic. If therewould be only finitely many primes, the product on the left hand side would befinite, which contradicts the fact that the series on the right hand side is divergent.

3 Greatest Common Divisor. Least Common Multi-ple. Bezout’s Lemma.

When divisibility fails, we speak of quotients and remainders.

Theorem 3.1. (The Remainder Theorem)4 Let a,b be integers, a > 0. Then thereexist unique integers q and r such that

b = aq+ r,

where 0≤ r < a.

Proof. Recall that every real number x “sits” in between two consecutive integers;that is, there exists some unique integer q such that

q≤ x < q+1.

Now set x = b/a. Then from above inequality it follows that

aq≤ b < aq+a.

But then0≤ b−aq < a.


9

If we now put r = b−aq, then

b = aq+ r

and r satisfies 0≤ r < a. From the above construction it is also evident that q andr are unique, so the result follows.

Definition 3.2. Let a,b be integers, a > 0. Write b = aq+ r, where 0 ≤ r < a.Then a is called the modulus, b is called the dividend, q is called the quotient andr is called the remainder.

Note that for a > 0 the expression a | b simply means that in b = aq+ r theremainder r is equal to zero.

Given a and b, one can easily compute q and r using the calculator. First,compute a/b, and the integer part of this expression is precisely your q. Thencompute r with the formula r = b−aq.

Definition 3.3. Let a and b be integers. An integer d such that d | a and d | b iscalled a common divisor of a and b. When at least one of a and b is not zero, thelargest integer with such a property is called the greatest common divisor of a andb and is denoted by gcd(a,b). When a = b = 0, we define gcd(a,b) := 0.

The greatest common divisor of a and b possesses many interesting properties.Let us demonstrate several of them.

Proposition 3.4. Let

a = pe11 pe2

2 · · · pekk and b = p f1

1 p f22 · · · p

fkk ,

where p1, p2, . . . , pk are distinct prime numbers and e1,e2, . . . ,ek, f1, f2, . . . , fk areintegers ≥ 0. Then

gcd(a,b) = pmin{e1, f1}1 pmin{e2, f2}

2 · · · pmin{ek, fk}k . (1)

Further, any common divisor c of a and b must also divide gcd(a,b).

Proof. Note that

g = pmin{e1, f1}1 pmin{e2, f2}

2 · · · pmin{ek, fk}k

divides both a and b. Also, any integer

c = pg11 pg2

2 · · · pgkk

10

such that gi > min{ai,bi} for some i fails to divide either a or b. Hence anycommon divisor c satisfies gi ≤ min{ai,bi} for all i, 1 ≤ i ≤ k. Hence c dividesg. Maximizing the inequality for each index we get that g is in fact the greatestcommon divisor.

Note that Proposition 3.4 suggests one formula for the computation of gcd(a,b).First, one has to factor a and b by writing them in the form

a = pe11 pe2

2 · · · pekk and b = p f1

1 p f22 · · · p

fkk ,

where the indices ei and f j are allowed to be 0 (convince yourself that any twonumbers can be written in this form). Then one might simply utilize the formula(1). This approach works fine when the numbers are small and easily factorable,but unfortunately as the numbers get really large the efficient factorization is in-feasible for modern electronic computers (but feasible for quantum computers,see Shor’s Algorithm). In fact, the security of the RSA public key cryptosystemis based on the difficulty of factorization.

Example 3.5. Let us compute the greatest common divisor of 440 and 300. Theprime factorizations are 440 = 23 ·5 ·11 and 300 = 22 ·3 ·52. We see that

440 = 23 ·30 ·51 ·111 and 300 = 22 ·31 ·52 ·110.

Thus

gcd(440,300) = 2min{3,2} ·3min{0,1} ·5min{1,2} ·11min{1,0} = 22 ·30 ·51 ·110 = 20.

Exercise 3.6. Let a and b be integers. An integer ` is called a common multipleof a and b if it satisfies a | ` and b | `. The smallest non-negative integer withsuch a property is called the least common multiple of a and b and is denoted bylcm(a,b). Given the statement as in Proposition 3.4, prove that

lcm(a,b) = pmax{e1, f1}1 pmax{e2, f2}

2 · · · pmax{ek, fk}k (2)

and that every common multiple c of a and b is divisible by lcm(a,b). That is, ifa | c and b | c, then lcm(a,b) | c.

Exercise 3.7. Let a and b be non-negative integers. Prove that

ab = gcd(a,b) lcm(a,b). (3)

11

Exercise 3.8. Compute lcm(440,300) using formulas (2) and (3).

We will now address the following question: which integers c can be writtenin the form ax+by, where x and y are integers? Speaking in fancy mathematicallanguage, the identity c = ax+by means that c is an integer (linear) combinationof a and b.

Let us play around a little bit with the quantity ax+ by. Clearly, a can bewritten in this form, since a = a ·1+b ·0. Same applies to b, since b = a ·0+b ·1.The number 0 can always be represented in this form, since 0 = a ·0+b ·0. Notethat, when at least one of a and b is not zero, ax+by will always represent at leastone positive integer, because a · a+ b · b > 0. It turns out that the least positiveinteger d represented by ax+by is precisely the greatest common divisor of a andb.

Example 3.9. Consider a = 7 and b = 15. Then the equation

7x+15y = 1

has a solution (x,y) = (−2,1). In fact, it has infinitely many solutions, as anysolution of the form (x,y) = (−2+15n,1−7n) for n ∈ Z is a solution, too.

However, when a = 7 and b = 14 the equation

7x+14y = 1

has no solutions, as the left hand side will always be divisible by 7, while thisis not the case for the right hand side. So number 1 cannot be represented as aninteger linear combination of 7 and 14. Hence the question: which numbers canbe represented in this form?

Theorem 3.10. (Bezout’s lemma)5 Let a,b be integers such that a 6= 0 or b 6= 0.If d is the least positive integer combination of a and b, then d divides everycombination of a and b. Furthermore, d = gcd(a,b).

Proof. We know that ax+by = d > 0. Now consider some integer combination

c = as+bt,

where s, t ∈ Z. We want to show that d | c. Recall that

c = dq+ r


12

for some q,r ∈ Z, where 0≤ r < d. Thus

0≤ r= c−dq= as+bt− (ax+by)q= a(s− xq)+b(t− yq)< d.

We see that r is an integer combination of a and b, which is less than d, and non-negative. Because d is the least positive integer combination of a and b, the onlyoption is that r = 0. Hence d | c. In particular, d | a and d | b, because a,b areinteger combinations of a and b.

We will now show that d = gcd(a,b). On one hand, we know that d | a andd | b, so d is a common divisor of a and b. By Proposition 3.4, d must divide thegreatest common divisor of a and b, i.e. d | gcd(a,b). On the other hand, sinced = ax+by = gcd(a,b)(a1x+b1y) for some x,y,a1,b1 ∈ Z, we have gcd(a,b) | d.Since d | gcd(a,b) and gcd(a,b) | d, by property 5 of Proposition 2.4 we concludethat d = gcd(a,b).

With the help of Theorem 3.10 we can answer the question which numberscan be represented in the form ax+by. Since

gcd(a,b) = ax+by

for some x,y∈Z and gcd(a,b) is the smallest positive integer representable in thisform, we see that any integer c divisible by gcd(a,b) can be written in such a way,since for some k ∈ Z it is the case that

c = k ·gcd(a,b) = k(ax+by) = a(kx)+b(ky).

On the other hand, if gcd(a,b) - c, then c cannot be written as an integer combina-tion of a and b.

We will now use Bezout’s lemma to establish a few more properties of primenumbers. In particular, we will prove Euclid’s lemma, which we already saw inSection 2.

Definition 3.11. Let a and b be integers. We say that a and b are coprime ifgcd(a,b) = 1.

13

Proposition 3.12. Let a,b,c be integers with a,b coprime. If a | c and b | c, thenab | c.

Proof. Since a and b are coprime, by Bezout’s lemma there exist integers x and ysuch that

ax+by = 1.

Thusa(cx)+b(cy) = c.

After dividing both sides of the above equality by ab we obtain

cb· x+ c

a· y = c

ab.

Since a | c and b | c, the quantity on the left hand side of the above equality is aninteger. Hence the same applies to the quantity on the right hand side, so c/(ab)is an integer.

Proposition 3.13. Let a,b,c be integers with a,b coprime. If a | bc, then a | c.

Proof. Since a and b are coprime, by Bezout’s lemma there exist integers x and ysuch that

ax+by = 1.

Thusa(cx)+b(cy) = c.

After dividing both sides of the above equality by a we obtain

c · x+ bca· y = c

a.

Since a | bc, the quantity on the left hand side of the above equality is an integer.Hence the same applies to the quantity on the right hand side, so c/a is an integer.

Proposition 3.14. (Euclid’s lemma) If p is prime and p | ab for some integers a,b, then p | a or p | b.6

6The proof is from Frank Zorzitto’s “A Taste of Number Theory” (see Proposition 2.4 on page31).

14

Proof. Say p - a. Let d = gcd(p,a). Since d | p, the definition of primes forcesd = 1 or d = p, and since p - a, it must be that d = 1, so p and a are coprime.From Proposition 3.13 it follows that p | b.

Corollary 3.15. (Generalized Euclid’s lemma) Let p be a prime number anda1,a2, . . . ,ak be integers. If p | a1a2 · · ·ak, then there exists an index i, 1 ≤ i ≤ k,such that p | ai.

Proof. The result clearly holds for k = 1, so assume that k ≥ 2. If p | a1, we aredone. If not, then p and a1 are coprime, so by Proposition 3.13 it must be the casethat p | a2a3 · · ·ak. If p | a2 we are done. If not, then p and a2 are coprime, so byProposition 3.13 it must be the case that p | a3a4 · · ·ak. Proceeding in the samefashion, we eventually reach p | ak−1ak, where we may apply Euclid’s lemma todraw the desired conclusion.

Exercise 3.16. Show that one cannot remove the coprimality condition neitherfrom Proposition 3.12 nor from Proposition 3.13.

4 Diophantine Equations.The Linear Diophantine Equation ax+by = c

An equation is called Diophantine if we are only concerned with its integer so-lutions. Any equation can be converted into its Diophantine form. For example,instead of looking at x2 + y2 = 1 for (x,y) ∈ R2 we may restrict our attention to(x,y) ∈ Z2. Note that in the former case there are infinitely many solutions (infact, there are uncountably many of them). These are all the points lying on thecircle centered at the origin with the radius equal to 1. However, if we look at(x,y) ∈ Z2 then there are only four solutions, namely (±1,0) and (0,±1). (Doyou see why?)

Sometimes, converting an equation into its Diophantine form is not very in-teresting. This is the case for the equation x2 + y2 = 1. Another example is theequation y = x

√2, which has no integer solutions aside from (0,0) due to irra-

tionality of√

2. But sometimes understanding integer solutions can get difficult,even extremely difficult. The reason is that, when considering an equation over thereal numbers R or — even better! — over the complex numbers C, there are manyanalytical tools that we can utilize. Say, if we are looking at equation f (x) = 0for x ∈ R, we might utilize the fact that f (x) is continuous, or differentiable, or

15

maybe even smooth. Another reason why it might be easier to analyze equationsnot only over R or C, but also over Q, is because all of them are fields.

Quite often we can say many things about the Diophantine equation by “lift-ing” it and considering it, for example, over Q, for if there are only finitely manysolutions over Q, then there are only finitely many solutions over Z. Such a tech-nique applies to hyperelliptic equations, like y2 = x5 +2 (see Faltings’ Theorem).However, sometimes there are infinitely many solutions over Q, but only finitelymany — or even none! — over Z. The fact that Q is a field can be utilized toprove that there are infinitely many rational solutions to elliptic equations

y2 = x3 +46,

y2 = x3−2.

Note that the first equation has a solution (−7/4,51/8), while the second equationhas a solution (129/100,383/1000). Unlike Q,R or C, the ring of integers Zis not closed under division by a non-zero element, so we need to use differenttechniques to study it. For example, the equation y2 = x3 +46 has no solutions inintegers, while the equation y2 = x3−2 has two solutions (3,±5).

Example 4.1. Let a,b,c,n be fixed integers, n≥ 3, and x,y,z be integer variables.Here are several examples of Diophantine equations:

ax+by = c — Linear Diophantine equation in two variables;x2 + y2 = z2 — Pythagorean equation;x2−dy2 =±1 — Pell equation;y2 = x3 +ax+b — Weierstrass equation;axn +byn = c — Thue equation;axn +byn = czn — Fermat type equation;x2 +7 = 2y — Ramanujan-Nagell equation.

When analyzing equations, we would like to answer the following questions:

1. Do solutions exist?

2. If solutions exist, how many of them are there? (finitely many, countablymany, uncountably many)

3. What are the solutions?

4. Are there algorithms which can generate solutions?

16

We address the same questions when analyzing Diophantine equations. Of course,in this case the number of solutions will be at most countable.

We will now turn our attention to the linear Diophantine equation in two vari-ables

ax+by = c.

Here a,b,c are fixed integers and x,y are integer variables. We will fully classifythe solutions to this equation.

The question of existence of a solution was fully resolved at the end of Section3, where we established that solutions exist if and only if gcd(a,b) | c. To this end,the only thing that is left for us to do is to find all the solutions when they exist,and come up with a procedure for their computation. As the following Propositionshows, by knowing one solution to ax+by = c we can deduce all of the solutions.

Proposition 4.2. Let a,b,c be integers. Let (x,y) be a pair of integers such that

ax+by = c.

Then any pair of integers (x′,y′) such that c = ax′+by′ must be of the form

(x′,y′) =(

x− bgcd(a,b)

n,y+a

gcd(a,b)n),

where n ranges over the integers.

Proof. Suppose that (x,y) and (x′,y′) are integer pairs such that

c = ax+by = ax′+by′.

Then a(x− x′) = b(y′− y). This means that a | b(y′− y), and furthera

gcd(a,b)| (y′− y).

This means thaty′ = y+n

agcd(a,b)

for some n ∈ Z. Substituting this relation into the equation a(x− x′) = b(y′− y),we see that

a(x− x′) = nab

gcd(a,b),

which means thatx′ = x−n

bgcd(a,b)

.

17

Thus we see that from one solution to ax+by = c (if it exists) we may produceall solutions once we compute gcd(a,b). In order to determine one solution to thisequation, we use the Extended Euclidean Algorithm. This algorithm allows oneto compute a pair of integers (x,y) such that

ax+by = gcd(a,b).

This allows us to produce a solution to ax+by = c, as then it must be the case thatgcd(a,b) | c, so for some integer k we have

c = k gcd(a,b) = k(ax+by) = a(kx)+b(ky).

We may then use Proposition 4.2 to compute all solutions to the linear Diophantineequation ax+ by = c. We will learn about the Extended Euclidean Algorithm inthe following section.

Exercise 4.3. Let a1,a2, . . . ,ak be integers at least one of which is not 0. Thelargest integer d such that d | ai for all i, 1≤ i≤ k, is called the greatest common di-visor of a1,a2, . . . ,ak. It is denoted by gcd(a1,a2, . . . ,ak). When a1 = a2 = . . .= ak = 0,we define gcd(a1,a2, . . . ,ak) := 0.

Determine the formulas for gcd(a1,a2, . . . ,ak) and lcm(a1,a2, . . . ,ak) that areanalogous to (1) and (2). Does a formula similar to (3) hold? Explain why or whynot.

Exercise 4.4. Let a1,a2, . . . ,ak be integers. We say that c ∈ Z can be representedas an integer linear combination of a1,a2, . . . ,ak if there exist x1,x2, . . . ,xk ∈ Zsuch that

c = a1x1 +a2x2 + . . .+akxk.

Given integers a1,a2, . . . ,ak, which integers can be written as an integer combina-tion of a1,a2, . . . ,ak?

5 Euclidean Algorithm. Extended Euclidean Algo-rithm

Let a,b be integers at least one of which is not 0. In the previous section, wefound one formula for the computation of gcd(a,b), namely (1). Though beinguseful, it is not very efficient, as the algorithm for fast integer factorization is

18

unknown.7 However, there exists a much more efficient algorithm to computegcd(a,b), developed by Euclid in his fundamental work Elements. It is called theEuclidean Algorithm.

We begin our explorations by first showing yet another interesting property ofthe greatest common divisor. In particular, if a,b are integers at least one of whichis not zero, then gcd(a,b) does not change if we replace b with b+ak, where k isan arbitrary integer.

Proposition 5.1. Suppose a,b are two integers. Then for any integer k it is thecase that

gcd(a,b) = gcd(a,b+ak).

Proof. Let d1 = gcd(a,b) and d2 = gcd(a,b+ak). We will show that d1 | d2 andd2 | d1.

Since d1 | a and d1 | b, it is the case that d1 | (b+ ak). Since d1 is a commondivisor of a and b+ ak, by Proposition 3.4 it must divide their greatest commondivisor d2. Thus d1 | d2.

Now observe that d2 | a and d2 | b+ak. Thus a = d2r1 and b+ak = d2r2 forsome r1,r2 ∈ Z. But then

b = d2r2−ak = d2r2−d2r1k = d2(r2− r1k).

Hence d2 | b, which means that d2 is a common divisor of a and b. By Proposition3.4 it must divide their greatest common divisor d1. Thus d2 | d1. Since d1 | d2and d2 | d1, we conclude that d1 = d2.

We will now describe the Euclidean Algorithm. Let a,b be positive integerssuch that ab 6= 0, since when ab = 0 it is easy to compute gcd(a,b). Without lossof generality, we suppose that a > b (if a < b we may interchange a and b, and ifa = b then gcd(a,b) = a). We define the finite sequence of integers a1,a2, . . . asfollows. Set r1 = a, r2 = b, and write

r1 = q1r2 + r3.

Note that the remainder r3 satisfies 0≤ r3 < r2 = b. Then compute

r2 = q2r3 + r4,

r3 = q3r4 + r5,

7By “fast” we mean “polynomial time”.

19

and so on. Since the sequence of integers r1 > r2 > .. . is bounded below by 0,in n steps this sequence eventually reaches some smallest positive number rn. Wewill show that this smallest positive integer rn is precisely gcd(a,b).

Why does this process allow one to compute gcd(a,b)? By Proposition 5.1,

gcd(r1,r2) = gcd(r1−q1r2,r2) = gcd(r3,r2).

Let us compute one more step:

gcd(r3,r2) = gcd(r3,r2−q2r3) = gcd(r3,r4).

Proceeding in the same fashion, we see that

gcd(a,b) = gcd(r1,r2) = gcd(r2,r3) = . . .= gcd(ri,ri+1)

for all i such that 1 ≤ i ≤ n−1. We see that the calculations get easier with eachstep, and in the end we obtain

gcd(a,b) = gcd(r1,r2) = . . .= gcd(rn−1,rn) = gcd(rn,0) = rn.

Theorem 5.2. Let a,b be positive integers with a > b. Let r1 > r2 > .. . be thefinite sequence as defined above. Let rn be the smallest positive integer in thissequence. Then rn = gcd(a,b).

Proof. Recall that d = gcd(a,b) = gcd(ri,ri+1) for i = 1,2, . . . ,n− 1. Now con-sider the last equation

rn−2 = qn−2rn−1 + rn.

The remainder in the expression

rn−1 = qn−1rn + rn+1

satisfies 0 ≤ rn+1 < rn. Since rn is the smallest positive integer in this sequenceand the sequence is strictly decreasing, the only possibility is that rn+1 = 0, whichmeans that rn divides rn−1. But then

rn = gcd(rn−1,rn) = gcd(rn−2,rn−1) = . . .= gcd(r1,r2) = gcd(a,b).

Consider several examples.

20

Example 5.3. Let us compute gcd(440,300) using the Euclidean Algorithm. Wehave

440 = 1 ·300+140300 = 2 ·140+20140 = 7 ·20+0.

Thus gcd(440,300) = 20.

Example 5.4. Let us compute gcd(233,144) using the Euclidean Algorithm. Wehave

233 = 1 ·144+89144 = 1 ·89+55

89 = 1 ·55+3455 = 1 ·34+2134 = 1 ·21+1321 = 1 ·13+813 = 1 ·8+5

8 = 1 ·5+35 = 1 ·3+23 = 1 ·2+12 = 2 ·1+0.

Thus gcd(233,144) = 1.

Note that both numbers in Example 5.4 are smaller than in Example 5.3. Nev-ertheless, in Example 5.4 the Euclidean Algorithm terminated in 12 steps, whilein Example 5.3 it terminated in 3 steps. This is because in Example 5.4 wechose our integers to be the 13th and the 12th Fibonacci numbers. Recall thatFibonacci numbers are the numbers defined recursively by F1 = 1, F2 = 2 andFn = Fn−1 +Fn−2 for n ≥ 3. It turns out that the slowest performance of the Eu-clidean Algorithm is achieved for consecutive Fibonacci numbers. Nevertheless,the algorithm does work in polynomial time. In 1844, Gabriel Lame proved thatthe number of steps required for the completion of the Euclidean Algorithm is atmost 5 log10(min{a,b}), so we see that the algorithm works in polynomial time.

Exercise 5.5. Let F1 = 1, F2 = 2, and for an integer n≥ 3 define Fn = Fn−1 +Fn−2.The number Fn is called the n-th Fibonacci number. Prove that the computationof gcd(Fn+1,Fn) with the Euclidean Algorithm requires n steps.

Above we managed to compute gcd(a,b). Still, we do not know how to pro-duce integer solutions (x,y) to the Diophantine equation

ax+by = gcd(a,b).

21

This can be achieved with the help of the Extended Euclidean Algorithm. It isessentially the same as the Euclidean Algorithm, but along with the sequencer1,r2, . . . we will also keep track of two additional sequences s1,s2, . . . and t1, t2, . . ..The algorithm is as follows. Set

r1 = a, r2 = b;s1 = 1, s2 = 0;t1 = 0, t2 = 1.

For i≥ 3, we proceed by computing

ri+1 = ri−1−qi−1ri;si+1 = si−1−qi−1si;ti+1 = ti−1−qi−1ti.

Note that, out of the three lines above, the Euclidean Algorithm computes onlythe first one. We claim that, if the Euclidean Algorithm terminates in n+1 steps,then integers sn and tn satisfy asn +btn = gcd(a,b).

Theorem 5.6. Let a,b be positive integers with a > b. Let r1 > r2 > .. . > rn > 0,s1,s2, . . . ,sn and t1, t2, . . . , tn be sequences as defined above. Then

asn +btn = gcd(a,b).

Proof. We claim that the equation

asi +bti = ri

is satisfied for all i = 1,2, . . . ,n. Since Theorem 5.2 asserts that rn = gcd(a,b),this would imply the result. To prove this statement, we proceed using inductionon n.

Base case. According to our setup, r1 = a, r2 = b, s2 = t1 = 0 and s1 = t2 = 1.Thus as1 +bt1 = r1 and as2 +bt2 = r2, so the base case holds for i = 1,2.

Induction hypothesis. Assume that asi +bti = ri for i = k−1,k.Induction step. We will demonstrate that the result holds for i = k+1:

rk+1 = rk−1− rkqk

= (ask−1 +btk−1)− (ask +btk)q= (ask−1−askqk)+(btk−1−btkqk)

= ask+1 +btk+1.

We conclude that asi +bti = ri for all i = 1,2, . . . ,n, as claimed.

22

Using Extended Euclidean Algorithm, we can finally solve the Diophantineequation ax+by = c.

Example 5.7. Let us determine all solutions to the Diophantine equation

440x+300y = 80

using the Extended Euclidean Algorithm. Set

r1 = 440, r2 = 300;s1 = 1, s2 = 0;t1 = 0, t2 = 1.

Step 1. 440 = 1 ·300+140, so q1 = 1 and r3 = 140. Thus

s3 = s1−q1s2 = 1−1 ·0 = 1;t3 = t1−q1t2 = 0−1 ·1 =−1.

Step 2. 300 = 2 ·140+20, so q2 = 2 and r4 = 20. Thus

s4 = s2−q2s3 = 0−2 ·1 =−2;t4 = t2−q2t3 = 1−2 · (−1) = 3.

Step 3. Since 20 | 140, the algorithm terminates.

We conclude that440 · (−2)+300 ·3 = 20.

After multiplying both sides of the above equality by 4, we obtain a solution(x,y) = (−8,12) to the Diophantine equation 440x+ 300y = 80. By Proposition4.2, if a = 440 and b = 300 then all solutions to this Diophantine equation mustbe of the form(

x− bgcd(a,b)

n,y+a

gcd(a,b)n)= (−8−15n,12+22n),

where n ranges over the integers.

Exercise 5.8. (a) Let a,b,c be integers such that a 6= 0 or b 6= 0, and gcd(a,b) | c.Consider the Diophantine equation ax+ by = c. Prove that there exists theunique solution (x,y) such that 0 ≤ x < b/gcd(a,b) and the unique solution(x′,y′) such that 0≤ y′ < a/gcd(a,b);

23

(b) For (x,y)∈R2, let ‖(x,y)‖ :=√

x2 + y2 denote the Euclidean norm. Let a,b,cbe integers such that c 6= 0 and gcd(a,b) = 1, and consider the linear Diophan-tine equation

ax+by = c.

Prove that the solution (x,y) ∈ Z2 of the above equation that corresponds tothe smallest value of ‖(x,y)‖ satisfies

|c|‖(a,b)‖

≤ ‖(x,y)‖ ≤ |c|‖(a,b)‖

+‖(a,b)‖

2.

6 Congruences.The Double-and-Add Algorithm

Throughout this section, we fix a positive integer n, which we call the modulus.

Definition 6.1. We say that integers a and b are congruent modulo n if n dividesa−b. We denote this by

a≡ b (mod n).

To say that a and b are congruent modulo n is the same as to say that theirremainders after division by n are the same. That is, if

a = q1n+ r1 and b = q2n+ r2,

where 0≤ r1,r2 < n, then r1 = r2. A rather surprising fact is that the congruencerelation ≡ behaves much like the equality relation =.

Proposition 6.2. The congruence relation ≡ is an equivalence relation. That is,it satisfies the following three axioms:

(a) Reflexivity. If a is any integer, then a≡ a (mod n);

(b) Symmetry. If a≡ b (mod n), then b≡ a (mod n);

(c) Transitivity. If a≡ b and b≡ c (mod n), then a≡ c (mod n).

Proof. Exercise.

24

Example 6.3. Let n = 5. Then the numbers 7 and 27 are congruent to each othermodulo 5, because 5 | (27− 7). Also note that both 7 and 27 have the sameremainder after division by 5:

7 = 1 ·5+2 and 27 = 4 ·5+2.

In fact, it is easy to notice that there are infinitely many numbers congruent to 7modulo 5. Convince yourself that all of them belong to the set

{5q+2: q ∈ Z}= . . . ,−8,−3,2,7,12, . . . .

Proposition 6.4. 8 Let n be a modulus, and suppose that

a≡ a1 (mod n),b≡ b1 (mod n).

Thena±b≡ a1±b1 (mod n),ab≡ a1b1 (mod n).

Proof. Let us first show that a+ b ≡ a1 + b1 (mod n). Note that n | (a− a1) andn | (b−b1). By property 2 of Proposition 2.4,

n | (a−a1)+(b−b1) = (a+b)− (a1 +b1),

so by definition we see that a+b≡ a1 +b1 (mod n). An analogous proof holds ifwe replace the plus sign with the minus sign.

To see that ab≡ a1b1 (mod n), observe that

ab−a1b1 = ab−a1b+a1b−a1b1 = (a−a1)b+a1(b−b1).

Since n | (a−a1) and n | (b−b1), once again, by property 2 of Proposition 2.4 itis the case that

n | (a−a1)b+a1(b−b1) = ab−a1b1,

and by definition this means that ab≡ a1b1 (mod n).

If we now combine Propositions 6.2 and 6.4, it becomes clear that in any con-gruence, which involves only addition, subtraction and multiplication of integers,we can easily replace a with a1 whenever a ≡ a1 (mod n). This is known as thereplacement principle.


25

Example 6.5. Let f (x) = x5−10x+7. We can compute the remainder of f (27)divided by 5 as follows: note that 27 ≡ 2 (mod 5). Since f (x) involves onlyaddition, subtraction and multiplication of integers, by the replacement principlewe can compute f (2) instead of f (27), because f (27) ≡ f (2) (mod 5). Also,since 10≡ 0 (mod 5) and 7≡ 2 (mod 5), we see that

f (27)≡ f (2)

≡ 25−10 ·2+7

≡ 25−0 ·2+2≡ 34≡ 4 (mod 5).

Since 0≤ 4 < 5, we conclude that 4 is the remainder of f (27) divided by 5.

Example 6.6. Let us compute the last decimal digit of 30799. Note that this isthe same as finding the remainder of 30799 divided by 10. By the replacementprinciple, reading from left to right and top to bottom, we have

30799 ≡ 799 ≡ (73)33 ≡ 34333 ≡ 333

≡ (33)11 ≡ (27)11 ≡ 711 ≡ 72 · (73)3

≡ 49 ·33 ≡ 9 ·27 ≡ 9 ·7 ≡ 63 ≡ 3 (mod 10).

Thus 3 is the last decimal digit of 30799. Analogously, we can determine the lastk decimal digits of any number by applying the replacement principle modulo 10k

instead of 10. However, as the modulus grows, the computations become moreand more challenging.

In practice, in order to compute a` (mod n) for some large power `, we utilizethe so called Double-and-Add Algorithm. The algorithm is as follows: first, writethe integer ` in its binary expansion, i.e.

`=k

∑i=0

ci2i = ck2k + ck−12k−1 + . . .+ c1 ·2+ c0,

where ci ∈ {0,1}. Then

a` ≡ ack2k+ck−12k−1+...+c1·2+c0,

≡(

a2k)ck·(

a2k−1)ck−1

· · ·(a2)c1 ·ac0 (mod n).

26

But then note that, for j such that 2≤ j ≤ k, we can deduce the value of a2 jfrom

a2 j−1modulo n as follows:

a2 j≡(

a2 j−1)2

(mod n).

Therefore we can compute a2,a22, . . . ,a2k

in k−1 steps.

Example 6.7. Let us compute n≡ 7114 (mod 23) such that 0≤ n < 23. Note that

114 = 64+32+16+2 = 26 +25 +24 +2.

Then72 ≡ 49 ≡ 3 (mod 23);74 ≡ (72)2 ≡ 32 ≡ 9 (mod 23);78 ≡ (74)2 ≡ 92 ≡ 81≡ 12 (mod 23);716 ≡ (78)2 ≡ 122 ≡ 144≡ 6 (mod 23);732 ≡ (716)2 ≡ 62 ≡ 36≡ 13 (mod 23);764 ≡ (732)2 ≡ 132 ≡ 169≡ 8 (mod 23).

We can utilize the table above in our calculations:

7114 ≡ 764+32+16+2

≡ 764 ·732 ·716 ·72

≡ 8 ·13 ·6 ·3≡ 1872≡ 9 (mod 23).

We will now take a look at some interesting applications of modular arith-metic. For example, it can be used to demonstrate that certain Diophantine equa-tions have no solutions.

Example 6.8. Let us show that the Diophantine equation

x2 + y2 = 4z+3

has no solutions. This is the same as solving the congruence

x2 + y2 ≡ 3 (mod 4)

27

in integers x and y. Since every integer is congruent to either 0,1,2 or 3 modulo4, there are essentially 16 possible combinations of x and y that we can check.However, the problem becomes even simpler if we note that

02 ≡ 0, 12 ≡ 1, 22 ≡ 0, 32 ≡ 1 (mod 4).

Thus every perfect square is congruent to either 0 or 1 modulo 4. Since we aredealing with the sum of two perfect squares, there are now only three options leftto check, namely

0+0≡ 0, 0+1≡ 1, 1+1≡ 2 (mod 4).

As we can see, none of them add up to 3, which means that x2 + y2 ≡ 3 (mod 4)has no solutions in integers x and y. Therefore there are no solutions to the Dio-phantine equation x2 + y2 = 4z+3.

Exercise 6.9. (a) Show that the Diophantine equation x2 + y2 + z2 = 8t + 7 hasno solutions for x,y,z, t ∈ Z;

(b) Let Z[√

2] := {a + b√

2: a,b ∈ Z}. Show that there exists a solution tox2 + y2 + z2 = 8t +7 for x,y,z, t ∈ Z[

√2];

(c) Show that integers x,y,z, t satisfy x2 + y2 + z2 = 8t +3 if and only if x, y andz are odd.

In school, you probably heard of divisibility rules for various integers. Forexample, in order to check that some integer is divisible by 3, one just has to addup all of its decimal digits together and verify that the resulting number is divisibleby 3. To verify that some integer n is divisible by 4, one just has to ensure that thenumber representable by the last two decimal digits of n is divisible by 4. Thesedivisibility rules are the consequences of modular arithmetic.

Example 6.10. Let us prove the following divisibility rule for 3 and 9. Let n be apositive integer, and let m be the sum of the decimal digits of n. Then 3 | n if andonly if 3 | m, and 9 | n if and only if 9 | m.

Let us prove the divisibility rule for 3, as the divisibility rule for 9 is analogousto it. We write the number n in base 10:

n =k

∑i=0

ai10i,

28

where ai ∈ {0,1, . . . ,9}. Then, by definition,

m = ak +ak−1 + . . .+a1 +a0.

Since 10≡ 1 (mod 3),

n≡ ak10k +ak−110k−1 + . . .+a1 ·10+a0

≡ ak ·1k +ak−1 ·1k−1 + . . .+a1 ·1+a0

≡ ak +ak−1 + . . .+a1 +a0

≡ m (mod 3).

We conclude that 3 | (n−m), so there exists an integer k1 such that n−m = 3k1.Now assume that 3 | m. Then there exists an integer k2 such that m = 3k2. Butthen

3k1 = n−m = n−3k2

implies n = 3(k1 + k2), which means that 3 | n. Analogously, we can show that if3 | n, then 3 | m. If we replace the modulus 3 with the modulus 9, the proof willremain the same.

Exercise 6.11. Prove the following divisibility rule for 11. Let n be an integer.Let m be the sum of the digits of n in blocks of two from right to left. Then 11 | nif and only if 11 | m.

Example: If n = 3928881, then m = 3+ 92+ 88+ 81 = 264 is divisible by11. Thus 3928881 is divisible by 11 as well.

7 The Ring of Residue Classes Zn

Recall that, according to our terminology, the set of all integers Z forms a ring,if 0,1 ∈ Z and for all a and b in Z we have a± b ∈ Z and a · b ∈ Z. Now let nbe a modulus. In this section, we will introduce the first example of a finite ringZn and study its properties. As the name suggests, this ring will have only finitelymany elements. Just like the ring of integers Z, it will contain its own analoguesof 0 and 1, and we will also endow it with the operations of addition, subtractionand multiplication, which will be very much similar to the operations in Z.

Definition 7.1. Let a be an integer. The set

[a] := {nq+a : q ∈ Z}

29

is called the residue class of a modulo n. The integer a is called a representativeof the residue class [a].

Note that [a] = [b] if and only if a ≡ b (mod n). Also, each residue classcontains an integer r such that 0 ≤ r < n. It is conventional to pick such integersas representatives. For example, if n = 5, even though one can denote the set ofall integers congruent to 17 modulo 5 by [17], we would rather prefer to use [2]instead, since 17 ≡ 2 (mod 5) and 0 ≤ 2 < 5. Since there are only n possiblenumbers between 0 and n (exclusive), namely

0,1,2, . . . ,n−1,

and each integer is congruent modulo n to exactly one of these numbers, we seethat there are exactly n residue classes modulo n.

Exercise 7.2. Let n be a positive integer. Prove that the residue classes [0], [1],. . . , [n−1] modulo n partition the integers. That is,

[0]∪ [1]∪ . . .∪ [n−1] = Z,

and also [a]∩ [b] 6=∅ implies [a] = [b]. Hint: use Proposition 6.2.

We denote the collection of residues modulo n by Z/nZ or Zn.9 Since thenotation Zn is utilized in your course notes, we will stick with it in these lecturenotes.

Proposition 7.3. Let n be a positive integer and consider the collection Zn of allresidues modulo n. Define the binary operations +, − and · as follows:

[a]± [b] := [a±b] and [a] · [b] := [a ·b].

Then, under these binary operations, Zn forms a ring.

Proof. Exercise. Hint: use Proposition 6.4.

Note that Zn is a finite ring. When we carry out operations in Zn, we aredoing modular arithmetic. To do modular arithmetic, just carry out the regulararithmetic and then replace the result with any other integer modulo n (once again,conventionally we pick a representative r such that 0≤ r < n).

9The latter notation might be ambiguous, as when p is prime the symbol Zp is used to representthe ring of p-adic integers.

30

Example 7.4. Here are two examples of a modular arithmetic in Z17:

[33]+ [12] = [16]+ [12] = [28] = [11].

[11] · [19] = [11] · [2] = [22] = [5].

Note that, in the case of addition, one could slightly simplify the computations bynoting that 33≡−1 (mod 17):

[33]+ [12] = [−1]+ [12] = [11].

After all, dealing with −1 is much simpler than with 16.

Despite the fact that Zn behaves much like Z, some of its properties mightbe rather unpleasant. For example, Z has no zero divisors apart from 0. In otherwords, the identity ab = 0 implies that either a = 0 or b = 0. In general, this is nottrue for Zn.

Example 7.5. To see that Z6 contains zero divisors that are 6= [0], note that

[2] · [3] = [6] = [0] = [2] · [0].

Thus we see that [2] · [3] = [0] in Z6, even though [2] 6= [0] and [3] 6= [0].The same is true for Z15:

[3] · [5] = [15] = [0] = [3] · [0].

Thus we see another major difference between Z and Zn: in Z, the expressionab = ac with a 6= 0 implied b = c. However, in general, this is no longer true forZn. It is not difficult to show that, in fact, Zn has no non-trivial zero divisors ifand only if n is prime or n = 1.

8 Linear CongruencesLet n be a modulus. We will now turn our attention to equations in Zn. Let a,b beintegers, and consider the linear equation

[a][x] = [b],

where x is an unknown integer.

31

Example 8.1. The linear equation

[7][x] = [3]

has only one solution in Z13, namely [x] = [6]. As there are only finitely manypossibilities, we may check all of them, from [0] to [12], in order to find a solution.Even though there is only one solution in Z13, there are actually infinitely manysolutions in Z. This is because any integer y ∈ [6], — that is, any integer of theform y = 13q+6, — satisfies

7y≡ 3 (mod 13).

The linear equation[3][x] = [6]

has two solutions in Z9, namely [x] = [2] and [x] = [5]. Here we see the principaldifference between the linear equation in Zn and the linear equation cx = d in Z:the only way cx = d can have more than one solution is if c = d = 0.

Finally, the equation[3][x] = [7]

has no solutions in Z9. Once again, we can easily verify this by plugging in allthe possible values of [x] = [0], [1], . . . , [8].

It turns out that the tools that we have in our hands right now can help us tosolve the linear congruence easily. Observe that

[a][x] = [ax] = [b],

and this is the same as solving the congruence

ax≡ b (mod n).

Now by definition, n has to divide ax−b, so there exists an integer y such that

ax−b = n(−y).

In other words, the linear congruence [a][x] = [b] has a solution if and only if theDiophantine equation

ax+ny = b

has a solution in integers x and y. From what we have learned in Section 3, it im-mediately follows that the linear equation [a][x] = [b] has no solutions if and onlyif gcd(a,n) - b (verify that this is the case for the last two equations in Example8.1). When the solutions exist, we can use the Extended Euclidean Algorithm tofind them.

32

Example 8.2. Let us consider the linear equation [440][x] = [80] in Z300. FromExample 5.7 we know that the solutions to

440x+300y = 80

in integers x and y are of the form

x =−8+15n and y = 12−22n,

where n is an integer. Thus [440][−8+15n] = [80] in Z300. By evaluating−8+15nat n = 1,2, . . . , 20 we obtain 20 distinct solutions in Z300, namely

[7], [22], [37], . . . , [292].

Note that gcd(440,300) = 20 and there are 20 distinct solutions. In Exercise 8.3,you are asked to prove that this phenomenon holds in general.

Exercise 8.3. Let n ≥ 1 be a modulus, a,b be integers such that a 6= 0. Provethat, if gcd(a,n) | b, then the total number of distinct residue classes satisfying[a][x] = [b] is equal to gcd(a,n).

9 The Group of Units Z?n

Let n be a modulus and consider the finite ring Zn of residues modulo n. Recallthat, in general, the ring Zn does not enjoy the property that if [a][b] = [a][c] and[a] 6= 0 then [b] = [c] (see Example 7.5). However, for special values of [a] calledunits this cancellation law actually holds.

Definition 9.1. The residue class [a] in Zn is called a unit if there exists a solutionto [a][x] = [1] in Zn. If [a] is a unit, we say that any integer b ∈ [a] is invertiblemodulo n.

Proposition 9.2. The following statements are equivalent:

1. [a] is a unit;

2. For all integers b and c, [a][b] = [a][c] implies [b] = [c];

3. a and n are coprime.

33

Proof. Let us prove that 1 implies 2. Since [a] is a unit, there exists an integer xsuch that [a][x] = [1]. Now suppose that [a][b] = [a][c] for some integers b and c.Then

[x][a][b] = [x][a][c].

Since Zn is a commutative ring, we see that [x][a] = [a][x] = [1]. Thus the aboveequality simplifies to

[1][b] = [1][c],

and this implies [b] = [c].To prove that 2 implies 3, suppose that the statement is false and a and n are

not coprime. WIthout loss of generality, we may assume that 0 ≤ a < n. Thenthere exists an integer p > 1 such that a = pk1 and n = pk2 for some integers k1and k2. Since p > 1, we conclude that 1≤ k2 < n, which in turn implies

k1 6≡ 0 (mod n).

But thenak2 = pk1k2 = pk2k1 = nk1 ≡ 0≡ a ·0 (mod n).

Thus we see that [a][k2] = [a][0], even though [k2] 6= [0]. This contradicts ourassumption, so a and n are coprime.

Finally, let us demonstrate that 3 implies 1. Since a and n are coprime, byBezout’s lemma there exist integers x and y such that ax+ny = 1. This means that[a][x] = [1], so by Definition 9.1 the residue class [a] is a unit.

Corollary 9.3. Let [a] be a unit in Zn. Then for any integer b the equation[a][x] = [b] has a unique solution.

Proof. Suppose that there are two solutions [x] and [y], so

[a][x] = [b] = [a][y].

By property 2 of Proposition 9.2, the identity [a][x] = [a][y] implies [x] = [y].

Note that the statements of Proposition 9.2 and Corollary 9.3 can be translatedfrom the language of residue classes to the language of congruences. For example,property 1 simply states that ax≡ 1 (mod n), while property 2 states that ab≡ ac(mod n) implies b≡ c (mod n). Finally, Corollary 9.3 implies that the congruenceax≡ b (mod n) has a unique solution such that 0≤ x < n, and all integer solutionsto this congruence must be of the form x+nq for q ∈ Z.

34

Proposition 9.4. If p is prime and [a] 6= [0] in Zp, then [a] is a unit. Furthermore,Zp has no zero divisors apart from [0] itself.

Proof. Since [a] 6= [0], without loss of generality we may assume that 1≤ a < p.Note that this implies that a and p are coprime, for otherwise gcd(a, p) = d > 1would imply d = p. But then p = d < a and a < p at the same time, a contradic-tion. Since gcd(a, p) = 1, by Bezout’s lemma there exist integers x and y such thatax+by = 1. But then [a][x] = [1], so by Definition 9.1 the residue class [a] mustbe a unit in Zp. Since every unit obeys the cancellation law stated in property 2 ofProposition 9.2, it follows that Zp has no zero divisors apart from [0] itself.

Definition 9.5. Let [a] be a unit in Zn. The element [x] satisfying [a][x] = [1] iscalled an inverse of Zn and is denoted by [a]−1.

When translated to the language of congruences, the fact that a is invertiblemodulo n implies the existence of an integer which we denote by a−1 such that

a ·a−1 ≡ 1 (mod n).

Definition 9.6. The set of all units of Zn is called the group of units of Zn and isdenoted by Z?

n.

Proposition 9.7. The set of all units of Zn forms a group under the operation ofmultiplication. That is, it satisfies the following four group axioms:

1. Closure. For all [a], [b] ∈ Z?n, [a] · [b] ∈ Z?

n;

2. Associativity. ([a] · [b]) · [c] = [a] · ([b] · [c]);

3. Identity element. For all [a] in Z?n, the element [1] satisfies

[a] · [1] = [1] · [a] = [a];

4. Inverse element. For each [a] in Z?n there exists an element [a]−1 in Z?

n suchthat

[a] · [a]−1 = [a]−1 · [a] = [1].

Furthermore, the group of units Z?n is finite and Abelian:10

10In the context of groups, it is conventional to use the word “Abelian” instead of “commuta-tive”.

35

5. Abelianness. For all [a], [b] ∈ Z?n, [a] · [b] = [b] · [a];

6. Finiteness. There are only finitely many elements in Z?n.

Proof. Exercise.

Example 9.8. Let us compute Z?10. By Proposition 9.2, it suffices to find all

integers m, 0≤m< 10, that are coprime to 10. Thus Z?n = {1,3,7,9}. To convince

ourselves that Z?10 is closed under the operation of multiplication, let us construct

the multiplication table:

· 1 3 7 91 1 3 7 93 3 9 1 77 7 1 9 39 9 7 3 1

We can see that all of the elements in the multiplication table are indeed inZ?

10. Furthermore, we see that each row, as well as each column in this table isjust a result of permutation of 1,3,7 and 9. In the future, we will see that this isnot a coincidence.

10 Euler’s Theorem and Fermat’s Little TheoremWe will now prove our first non-trivial result — the Euler’s Theorem.

Definition 10.1. Let ϕ(n) denote the number of integers m such that 0 ≤ m < nand gcd(m,n) = 1. The function ϕ is called the Euler’s totient function.

Exercise 10.2. Let #X denote the cardinality of a set X . Let n be a modulus. Provethat ϕ(n) = #Z?

n.

Theorem 10.3. (Euler’s Theorem) If [a] ∈ Z?n, then [a]ϕ(n) = [1].

Proof. 11 Let k = ϕ(n). Let

[1] = [u1], [u2], . . . , [uk]

11Theorem 3.16 in Frank Zorzitto, A Taste of Number Theory.

36

be the complete list of residues of Z?n. Since Z?

n is a group, all the elements

[a] · [u1], [a] · [u2], . . . , [a] · [uk]

are in Z?n. Furthermore, no element appears in this list twice, for if [a] · [ui] =

[a] · [u j] for some i 6= j, then [ui] = [u j] by property 2 of Proposition 9.2. Hencethe second list is just a permutation of [u1], [u2], . . . , [uk]. Thus

[u1] · [u2] · · · [uk] = ([a] · [u1]) · ([a] · [u2]) · · ·([a] · [uk]).

Since Z?n is an Abelian group, we can rearrange the order of multiplication in order

to obtain[u1] · [u2] · · · [uk] = [a]k · [u1] · [u2] · · · [uk].

Finally, we refer to property 2 of Proposition 9.2 to cancel the unit [u1] · [u2] · · · [uk],and conclude that [a]k = [1].

In the language of congruences, Euler’s Theorem translates to

aϕ(n) ≡ 1 (mod n)

for every integer that is invertible modulo n.

Example 10.4. Let us prove that 1223 divides 6231222−1. This become evidentonce we note that ϕ(1223) = 1222 and gcd(1223,623) = 1 (so [623] is a unit inZ1223). By Euler’s Theorem,

6231222 ≡ 1 (mod 1223),

which means that 1223 divides 6231222−1.

Corollary 10.5. (Fermat’s Little Theorem) Let p be prime. Then for any integera such that p - a it is the case that [a]p−1 = [1]. In other words,

ap−1 ≡ 1 (mod p).

Proof. Note that for any integer a such that 1≤ a< p it is the case that gcd(a, p) = 1.Thus [a] is a unit in Z?

p and ϕ(p) = p− 1. The result then follows from Euler’sTheorem.

The theorems of Euler and Fermat give us a useful tool for raising integers tohigh powers modulo n.

37

Proposition 10.6. 12 If n is a modulus, a is coprime to n, and k, ` are non-negativeintegers such that

k ≡ ` (mod ϕ(n)),

thenak ≡ a` (mod n).

Proof. Say k ≤ `. We are given that ` = qϕ(n)+ k for some q ≥ 0. Then, byEuler’s Theorem,

a` = aqϕ(n)+k =(

aϕ(n))q

ak ≡ 1qak = ak (mod n).

Example 10.7. Let us compute 177155modulo 33. Note that ϕ(33) = 20. Since

gcd(17,33) = 1, by Euler’s theorem it first makes sense to reduce 7155 modulo20. We can apply Euler’s Theorem again here. Note that ϕ(20) = 8, and sincegcd(7,8) = 1 we can see that 78 ≡ 1 (mod 20). But then, by Proposition 10.6,

7155 = 719·8+3 ≡ 73 ≡ 343≡ 3 (mod 20).

Thus177155

≡ 173 ≡ 4913≡ 33 (mod 33).

Exercise 10.8. Compute the integer n, 0≤ n < 55, such that

n≡ 8132134

(mod 55).

11 The Chinese Remainder TheoremNow that we know how to solve linear congruences, let us try to understand howto work with systems of congruences. Since the congruence relation ≡ behavesmuch like the equality relation =, solving a system of linear congruences witha single modulus would be very similar to solving a system of linear equations,which we already know how to handle through the methods of linear algebra.


38

On the other hand, if we consider different systems of different moduli, thingsmight get interesting. We will merely consider the most simple example of suchsystems, namely

x≡ a1 (mod n1),

x≡ a2 (mod n2),

. . .

x≡ ak (mod nk),

where a1,a2, . . . ,ak are integers and n1,n2, . . . ,nk are positive integers greater than1 that are pairwise coprime. Our goal here is to determine x, which satisfies all ofthe k congruences above. The existence of such an x is asserted by the ChineseRemainder Theorem. Before proceeding to its statement, let us recall Proposition3.12 and the following consequence of it.

Proposition 11.1. Let m and n be integers greater than 1 that are coprime. Thenthe congruence

a≡ b (mod mn)

is true if and only if both of the congruences

a≡ b (mod m),a≡ b (mod n)

are true.

Proof. Suppose that a≡ b (mod mn). Then mn | (a−b). But then m | (a−b) andn | (a−b) so, by definition, a≡ b (mod m) and a≡ b (mod n).

To prove the converse, suppose that a ≡ b (mod m) and a ≡ b (mod n). Thenm | (a− b) and n | (a− b). Since gcd(m,n) = 1, we may apply Proposition 3.12to conclude that mn | (a−b). Thus a≡ b (mod mn).

Theorem 11.2. (The Chinese Remainder Theorem)13 If m, n are coprime moduliand a, b are any integers, then the congruences

x≡ a (mod m),x≡ b (mod n)

have a common solution x. Furthermore, any two solutions x,y to this pair ofcongruences must be such that x≡ y (mod mn).


39

Proof. Since m and n are coprime, by Bezout’s lemma the equation

mt−ns = b−a

can be solved integers s and t. Thus mt +a = ns+b = x. Note that x≡ a (mod m)and x≡ b (mod n), which makes it a solution to both congruences.

If y is another solution to the system of congruences, then

x≡ y (mod m),x≡ y (mod n).

By Proposition 11.1, we conclude that x≡ y (mod mn).

We can easily generalize this result to arbitrary number of coprime moduli.

Theorem 11.3. (Generalized Chinese Remainder Theorem)14 Suppose n1,n2, . . . ,nkare moduli that are pairwise coprime. That is, ni and n j are coprime when i 6= j.If a1,a2, . . . ,ak are integers, then there exists an integer x such that

x≡ a1 (mod n1),

x≡ a2 (mod n2),

. . .

x≡ ak (mod nk).

Furthermore, if x0 is such a solution of these congruences, then the completesolution is given by all

x≡ x0 (mod n1n2 · · ·nk).

Example 11.4. Let us solve the system of congruences{x≡ 3 (mod 6),x≡ 7 (mod 13).

Since 6 and 13 are coprime, by Bezout’s lemma there exist integers x and y suchthat

6x+13y = 1.


40

Note that x =−2 and y = 1 give us an answer. We can multiply both sides of theabove equality by 7−3 = 4 to obtain a solution to

6x′+13y′ = 7−3.

Such a solution is given by x′= 4 ·(−2) =−8 and y′= 1 ·4= 4. After rearranging,we get

3+6x′ = 7−13y′ =−45.

Note that−45≡ 3 (mod 6) and−45≡ 7 (mod 13). Since 6 and 13 are coprime, bythe Chinese Remainder Theorem the congruence x≡−45≡ 33 (mod 78) capturesall integer solutions to the original system of congruences.

Exercise 11.5. Solve the system of congruencesx≡ 3 (mod 5),x≡ 5 (mod 7),x≡ 7 (mod 11).

12 Polynomial CongruencesThe Chinese Remainder Theorem can also be utilized to solve polynomial con-gruences. Let d be a positive integer and consider a polynomial

f (x) = cdxd + cd−1xd−1 + . . .+ c1x+ c0

with integer coefficients c0,c1,c2, . . . ,cd . Then the congruence of the form

f (x)≡ 0 (mod n) (4)

is called a polynomial congruence. We would like to find all integers x, whichsatisfy such a congruence. Note that, if we replace the coefficients ci of f (x) withtheir residue classes [ci], thus “reducing” our polynomial from Z to Zn, solvingthe congruence (4) is equivalent to solving the equation

f ([x]) = [0]

in Zn. If such an equation is satisfied by some residue class [x0], we say that [x0]is a root of f (x) in Zn.

41

Letn = pe1

1 pe22 · · · p

ekk

be the prime factorization of n. Then, as it turns out, there is a one-to-one corre-spondence between solutions to the congruence (4) and solutions to the system ofcongruences

f (x)≡ 0 (mod pe11 );

f (x)≡ 0 (mod pe22 );

. . .

f (x)≡ 0 (mod pekk ).

This result follows from the next proposition, which is very similar to Proposition11.1.

Proposition 12.1. Let f (x) ∈ Z[x] be a polynomial. Let m and n be coprimemoduli. Then

f (x)≡ 0 (mod mn)

if and only if {f (x)≡ 0 (mod m);f (x)≡ 0 (mod n).

Proof. Suppose that f (x) ≡ 0 (mod mn). Then mn | f (x), which means thatm | f (x) and n | f (x).

Suppose that f (x) ≡ 0 (mod m) and f (x) ≡ 0 (mod n). Then m | f (x) andn | f (x). Since m and n are coprime, it follows from Proposition 3.12 that mn | f (x).

Coming back to our previous notation, if n = pe11 pe2

2 · · · pekk is the prime factor-

ization of n, and integers x1,x2, . . . ,xk satisfy

f (xi)≡ 0 (mod peii )

for i = 1,2, . . . ,k, then we can find x such that x ≡ xi (mod peii ) for all i using

the Generalized Chinese Remainder Theorem. But then such an x would satisfyf (x)≡ 0 (mod pei

i ) for all i, and therefore f (x)≡ 0 (mod n). From here it followsthat, if each congruence f (x)≡ 0 (mod pei

i ) has si solutions, then the congruencef (x)≡ 0 (mod n) has s1s2 · · ·sk solutions.

Now we would like to determine how many solutions does a polynomial con-gruence f (x)≡ 0 (mod pe) have. Due to the time limitations, we will answer this

42

question only in the case e = 1, and show that there are at most d solutions, whered is the degree of f (x). We remark that, in general, there are at most d solutionswhen p is an odd prime, and at most 2d solutions when p = 2. The most accurateestimates on the number of solutions of polynomial congruences was establishedin 1991 by the Canadian mathematician Cameron L. Stewart, who is currently aprofessor at the University of Waterloo.

Proposition 12.2. 15 If p is prime and f (x) is a polynomial of degree d withcoefficients in Zp, then f (x) has at most d roots in Zp.

Proof. We will prove this result by induction on the degree d of a polynomialf (x).

Base case. Let d = 0. Then f (x) = α0 for some non-zero α0 in Zp. Clearly,this polynomial has 0≤ d = 0 roots, so the result holds.

Induction hypothesis. Suppose that the result is true for all polynomials ofdegrees k = 1,2, . . . ,d−1.

Induction step. We will show that the result holds for every polynomial ofdegree k = d. Let

f (x) = αdxd +αd−1xd−1 + . . .+α1x+α0,

where αd 6= 0. If f (x) has no roots, then surely 0≤ n. Otherwise f (x) has a root,say β . Then

f (x) = f (x)−0= f (x)− f (β )

= αd(xd−βd)+αd−1(xd−1−β

d−1)+ . . .+α1(x−β ).

Now recall that, for any positive integer j ≥ 2 it is the case that

x j−βj = (x−β )(x j−1 + x j−2

β + x j−3β

2 + . . .+ xβj−2 +β

j−1).

Now we see that we can factor out (x−β ) in the expression for f (x) given above,which means that

f (x) = (x−β )g(x)

for some polynomial g(x) with coefficients in Zp. Clearly, the degree of g(x) doesnot exceed d−1, so we can apply the inductive hypothesis to conclude that g(x)as at most d−1 roots.


43

Let γ 6= β be some root of f (x). Then

0 = f (γ) = (γ−β )g(γ).

We claim that g(γ) = 0. For assume otherwise, so that g(γ) 6= 0 and γ−β 6= 0. Butthen both γ −β and g(γ) are non-trivial zero divisors in Zp, and this contradictsProposition 9.4, which asserts that there are no non-trivial zero divisors in Zpwhenever p is prime. We conclude that g(γ) = 0.

Since every root of f (x) is either equal to β or one of at most d− 1 roots ofg(x), we conclude that there are at most d roots of f (x).

Example 12.3. Let us solve the polynomial congruence

x49 +2x33 +24≡ 0 (mod 119).

Note that 119 = 7 ·17. By Proposition 12.1, there is a one-to-one correspondencebetween the roots to the above congruence and the roots to the system of congru-ences {

x49 +2x33 +24≡ 0 (mod 7);x49 +2x33 +24≡ 0 (mod 17).

Let us solve each of these congruences separately.Consider the case n = 7 with ϕ(7) = 6. Note that x ≡ 0 (mod 7) is not a

solution. This means that gcd(x,7) = 1, so we may apply Euler’s Theorem:

x49 +2x33 +24≡ x8·6+1 +2x5·6+3 +24

≡ x+2x3 +24

≡ 2x3 + x+3 (mod 7).

Thus we need to solve the congruence

2x3 + x+3≡ 0 (mod 7).

After evaluating the left hand side at x = 1,2,3,4,5,6, we can convince ourselvesthat there are only two solutions, namely

x≡ 2 (mod 7) and x≡ 6 (mod 7).

Consider the case n = 17 with ϕ(17) = 16. Note that x≡ 0 (mod 17) is not asolution. This means that gcd(x,17) = 1, so we may apply Euler’s Theorem:

x49 +2x33 +24≡ x3·16+1 +2x2·16+1 +24≡ x+2x+24≡ 3x+24 (mod 17).

44

Thus we need to solve the congruence

3x+24≡ 0 (mod 17).

We see that x≡−8≡ 9 (mod 17) is a solution. Since 17 is prime, it follows fromProposition 12.2 that this is the only solution.

Since there are two solutions modulo 7 and only one solution modulo 17, weconclude that there are 2 · 1 = 2 solutions modulo 7 · 17 = 119. These solutionscorrespond to two systems of equations:{

x≡ 2 (mod 7),x≡ 9 (mod 17);

and

{x≡ 6 (mod 7),x≡ 9 (mod 17).

We can compute solutions modulo 119 using the Extended Euclidean Algorithm.Consider the first system of congruences. Since 7 and 17 are coprime, by Bezout’slemma there exists a solution to

7x+17y = 1.

For example, x = 5 and y = −2. By multiplying both sides of the above equalityby 9−2 = 7, we can find a solution to

7x′+17y′ = 9−2 = 7,

namely x′ = 7 · x = 35 and y′ = 7 · (−2) =−14. But then

x1 = 2+7x′ = 9−17y′ = 247

satisfies x1 ≡ 2 (mod 7) and x1 ≡ 9 (mod 17). Therefore x1 ≡ 247≡ 9 (mod 119)is a solution. The second system of congruences can be solved analogously andgives us a solution x2 ≡ 111 (mod 119).

Exercise 12.4. Give examples of polynomials with coefficients in Z8 and Z15 forwhich the conclusion of Proposition 12.2 does not hold.

13 The Discrete Logarithm Problem.The Order of Elements in Z?

n

Let n be a modulus. We already looked at certain kinds of equations in Zn. For ex-ample, in Section 6, we learned that neither [x]2+[y]2 = 3 in Z4 nor [x]2 +[y]2 +[z]2 = 7

45

in Z8 have solutions. In Section 8, we studied the equation [a][x] = [b] in Zn andsaw that the usual application of the Extended Euclidean Algorithm allows us toproduce all of its solutions.

Now we want to understand how to handle exponential equations in Z?n. In

these kinds of equations, we are given residue classes [a] and [b] from Z?n, and

we want to determine all integer solutions x to the equation [a]x = [b]. This isessentially the same as solving the congruence

ax ≡ b (mod n).

The problem of finding solutions to these exponential equations is known as thediscrete logarithm problem, or DLP.

Example 13.1. In Section 10, we already saw an example of an exponential equa-tion in Z?

n, namelyax ≡ 1 (mod n).

According to Euler’s Theorem, this equation always has a non-zero solution when-ever a and n are coprime. In particular, any x ≡ 0 (mod ϕ(n)) satisfies the abovecongruence, for if x = ϕ(n)k for some integer k, then

ax ≡ aϕ(n)k ≡ (aϕ(n))k ≡ 1k ≡ 1 (mod n).

However, we do not know whether there are no other solutions to this equation.Depending on the choice of a, there might exist other solutions as well.

In general, the discrete logarithm problem is hard to solve. This problemlies in the foundation of certain cryptosystems, which we will study in the fu-ture. Examples include the ElGamal encryption scheme and the Diffie-Hellmankey exhchange. There are algorithms for solving the discrete logarithm problem,such as Shanks’s baby-step giant-step algorithm, or the number field sieve. Noneof these algorithms run in polynomial time. However, just like for the problemof integer factorization, there are quantum algorithms which compute solve thediscrete logarithm problem in polynomial time. In these notes, when solving thediscrete logarithm problem, we will use brute force or apply Euler’s Theorem.

In order to understand how solutions to ax ≡ b (mod n) look like, we need tounderstand certain fundamental properties of the group of units Z?

n.

Definition 13.2. If α ∈Z?n, the order of α is the smallest exponent k≥ 1 such that

αk = 1. The order is denoted by k = ord(α) or, if α = [a] for some integer a, byk = ord(a).

46

From Euler’s Theorem, it follows that for all α ∈Z?n it is the case that ord(α)≤ ϕ(n).

In fact, a much stronger result holds.

Proposition 13.3. 16 Let α ∈Z?n. A positive integer m satisfies αm = 1 if and only if

ord(α) | m. Consequently, ord(α) | ϕ(n).

Proof. Let k = ord(α). We apply the Remainder Theorem and write

m = kq+ r,

where 0≤ r < k. Then, since αk = 1, we obtain

1 = αm = α

kq+r = (αk)qα

r = 1qα

r = αr.

Since k is the smallest positive integer satisfying αk = 1, it must be the case thatr = 0, so k | m.

For the converse, let m = kq. Then

αm = α

kq = (αk)q = 1q = 1.

Finally, according to Euler’s Theorem it is the case that αϕ(n) = 1. But then itfollows from what we proved above that ord(α) | ϕ(n).

Example 13.4. Let us determine ord(α) in Z?n for n = 17 and α = [3]. We have

ϕ(n) = 16. Note that D = {1,2,4,8,16} is the complete list of positive divisors ofϕ(n). It follows from Proposition 13.3 that ord(α) ∈D. Thus, in order to find theorder of α , we just need to iterate over all elements in D. The smallest element dsatisfying [3]d = [1] is the order. We have

31 ≡ 3 (mod 17),32 ≡ 9 (mod 17),34 ≡ (32)2 ≡ 92 ≡ 81≡−4 (mod 17),38 ≡ (34)2 ≡ (−4)2 ≡ 16≡−1 (mod 17),316 ≡ (38)2 ≡ (−1)2 ≡ 1 (mod 17).

Thus we see that ord(α) = 16, which is the largest possible order that the elementof Z?

17 can attain. Note that there was no need for us to compute 316 modulo 17,because we know the result from Euler’s Theorem.

In contrast, consider the element β = [9] in Z?17. We have

1≡ 316 ≡ (32)8 ≡ 98 (mod 17),

which means that ord(β )≤ 8. Convince yourself that, in fact, ord(β ) = 8.16Propositon 5.5 in Frank Zorzitto, A Taste of Number Theory.

47

Proposition 13.3 allows us to classify all solutions to the exponential equation[a]x = [b].

Proposition 13.5. Let [a], [b] be the elements of Z?n. If x satisfies the equation

[a]x = [b], then all solutions x′ to this equation satisfy

x′ ≡ x (mod ord(a)).

Proof. Let x be a solution to ax≡ b (mod n) and let k = ord(a). By the RemainderTheorem, we can write

x = kq+ r,

where 0≤ r < k. But then

ax ≡ akq+r ≡ (ak)q ·ar ≡ 1 ·ar ≡ ar (mod n).

Thus, without loss of generality, we may assume that 0 ≤ x < k. Now supposethat there exists some other x′ such that ax′ ≡ b (mod n). Once again, without lossof generality we may assume that 0≤ x≤ x′ < k. But then

ax ≡ b≡ ax′ (mod n)

impliesax′−x ≡ 1 (mod n).

Since 0 ≤ x′− x < k, it must be the case that x = x′, for otherwise we would geta contradiction to the fact that k is the smallest positive integer satisfying ak ≡ 1(mod n). Therefore all solutions to [a]x = [b] are of the form x′ ≡ x (mod ord(a)).

Example 13.6. Let us compare the solutions to exponential equations

3x ≡ 1 (mod 17) and 9y ≡ 1 (mod 17).

In the first case, we see that the congruence x≡ 0 (mod 16) captures all solutions.However, in the second case, even though y≡ 0 (mod 16) does provide solutions,it clearly does not cover all of the possibilities because, for example, y = 8 alsosatisfies 9y ≡ 1 (mod 17). In fact, Proposition 13.5 implies that the solutions areof the form y≡ 0 (mod 8).

We conclude this section with several general observations about orders ofelements of Z?

n.

48

Proposition 13.7. 17 If α ∈ Z?n and k = ord(α), then the list

α,α2,α3, . . . ,αk = 1

does not repeat itself.

Proof. Suppose that we have a repetition α i = α j, where 1 ≤ i < j ≤ k. Thusα j−i = 1. Since 1≤ j− i < k, this contradicts the minimality of k as the order ofα .

Proposition 13.8. 18 If α ∈ Z?n and k = ord(α), then

ord(α j) =k

gcd( j,k).

Proof. Let ord(α j) = `. We will show that `= k/gcd( j,k). Note that

αj` = (α j)` = 1.

It follows from Proposition 13.3 that k | j`. That is, j` = ku for some integer u.But then

jgcd( j,k)

`=k

gcd( j,k)u,

and since j/gcd( j,k) and k/gcd( j,k) are coprime, it follows from Proposition3.13 that k/gcd( j,k) divides `.

On the other hand, since k is the order of α ,

(α j)k/gcd( j,k) = (αk) j/gcd( j,k) = 1 j/gcd( j,k) = 1.

By Proposition 13.3 applied to the order of α j, we obtain that ` | k/gcd( j,k).Since k/gcd( j,k) | ` and ` | k/gcd( j,k), we conclude that `= k/gcd( j,k).

Corollary 13.9. 19 Let α be an element of Z?n. Then ord(α j)= ord(α) if and only if

gcd( j,ord(α)) = 1.

Proposition 13.10. 20 Let α , β in Z?n have orders k and `, respectively. If k and `

are coprime thenord(αβ ) = k`.

17Proposition 5.6 in Frank Zorzitto, A Taste of Number Theory.18Proposition 5.7 in Frank Zorzitto, A Taste of Number Theory.19Proposition 5.9 in Frank Zorzitto, A Taste of Number Theory.20Proposition 5.16 in Frank Zorzitto, A Taste of Number Theory.

49

Proof. Let m = ord(αβ ). Since

(αβ )k` = αk`

βk` = (αk)`(β `)k = 1`1k = 1,

we see from Proposition 13.3 that m | k`.We will now show that k` |m. Since gcd(k, `) = 1, it follows from Proposition

3.12 that we only need to demonstrate k | m and ` | m. On one hand,

(αm)k = αmk = (αk)m = 1m = 1

and(β m)` = β

m` = (β `)m = 1m = 1.

On the other hand,

(αm)` = (αm)` ·1= (αm)`(β m)`

= (αmβ

m)`

= ((αβ )m)`

= 1`

= 1.

It follows from above calculations, as well as from Proposition 13.3, that k | m`.Since k and ` are coprime, Proposition 3.13 allows us to conclude that k | m. Wecan carry out an analogous calculation to show that (β m)k = 1, which would imply` | m. But then k` | m, and since we already demonstrated that m | k`, it must bethe case that m = k`.

14 The Primitive Root TheoremLet n be a modulus. The elements α ∈ Z?

n whose order is equal to ϕ(n) deservea special attention. According to Proposition 13.7, they generate the whole groupZ?

n simply by computing the exponents α,α2, . . . ,αϕ(n) = 1. Such elements arecalled primitive roots and in this section we address the question of their existencein Z?

n. We will answer this question only partially by proving the Primitive RootTheorem.

Definition 14.1. An element α ∈ Z?n is called a primitive root if ord(α) = ϕ(n).

50

Example 14.2. Let us demonstrate that Z?17 contains a primitive root. If we reduce

the elements in the list {3,32,33, . . . ,316} modulo 17, then the resulting list is

{3,9,10,13,5,15,11,16,14,8,7,4,12,2,6,1}.

Note that all 16 elements are distinct and they constitute the whole Z?17.

Not every element in Z?17 is a primitive root. For example, the observation

made above does not hold for the list {9,92,93, . . . ,916} reduced modulo 17:

{9,13,15,16,8,4,2,1,9,13,15,16,8,4,2,1}.

The first 8 elements are distinct, and starting from the 9th element the patternrepeats. Hence 9,92, . . . ,9ϕ(n) = 1 do not produce Z?

17, which is not a surprise,because from Example 13.4 we know that ord(9) = 8.

There are groups which have no primitive roots at all. For example, thereare no primitive roots in Z?

n whenever n has at least two distinct prime divisors.Examples include Z?

6,Z?10 or Z?

15, and we leave it as an exercise to the reader toverify that each of these three groups have no primitive roots.

Before jumping into the proof of the Primitive Root Theorem, let us determinehow many primitive roots are there in Z?

n.

Proposition 14.3. 21 If Z?n has a primitive root, then the total number of primitive

roots in Z?n is ϕ(ϕ(n)).

Proof. Let α be a primitive root, so that ord(α) = ϕ(n) and

α,α2, . . . ,αϕ(n) = 1

cover all Z?n without repetition. The other primitive roots are those powers α j in

the list for whichord(α j) = ϕ(n) = ord(α).

According to Corollary 13.9, these are the powers α j where j from 1 to ϕ(n) iscoprime to ϕ(n), and there are precisely ϕ(ϕ(n)) such j’s.

We are now ready to state the Primitive Root Theorem.

Theorem 14.4. (The Primitive Root Theorem)22 Let p be prime. Then Z?p con-

tains a primitive element.21Proposition 5.10 in Frank Zorzitto, A Taste of Number Theory.22Theorem 5.17 in Frank Zorzitto, A Taste of Number Theory.

51

If you are familiar with the basics of group theory, then you can translate thestatement of the theorem into group theoretical language by saying that the groupZ?

p is cyclic whenever p is prime. In order to prove this result, we need to proveone lemma.

Lemma 14.5. 23 Let p be prime. If α is an element of Z?p of order k, then

α,α2, . . . ,αk−1,αk = 1

is the complete, non-repeating list of all β in Z?p such that β k = 1.

Proof. According to Proposition 13.7, the list α,α2, . . . ,αk contains no repeti-tions. Every α j in the list satisfies

(α j)k = (αk) j = 1 j = 1.

Hence every element in the list is a root of the polynomial xk − 1. Since wefound k distinct roots of the polynomial xk−1 whose degree is k, it follows fromProposition 12.2 that there are no other roots.

Proof. (of Theorem 14.4) Let α be an element of Z?p. If ord(α) = p−1, then α

is a primitive root, so we are done. Thus we may assume that k = ord(α)< p−1.According to Lemma 14.5, the list α,α2, . . . ,αk = 1 picks up all roots of xk−1

in Z?p. Since k < p− 1, there is some γ in Z?

p, which is not on this list. Henceγk 6= 1.

Let `= ord(γ). Notice that ` - k, for otherwise we would have γk = (γ`)k/` =1k/` = 1. This means that in the unique factorizations of k and `, there is a primenumber q that appears more often in ` than it does in k. Therefore

k = qdk1 and `= qe`1,

where 0≤ d < e and q - k1, q - `1.Let β = αqd

γ`1 . Then, according to Proposition 13.8,

ord(αqd) =

kgcd(k,qd)

=kqd = k1,

ord(γ`1) =`

gcd(`,`1)=

`

`1= qe.


52

Since k1 and qe are coprime, it follows from Proposition 13.10 that

ord(β ) = ord(αqdγ`1) = ord(αqd)ord(α`1) = qek1 > qdk1 = k = ord(α).

In this way, new elements of strictly increasing order can be found in Z?p, until

we reach some element of the largest possible order ϕ(p) = p−1. By definition,this element is a primitive root.

In conclusion, we provide a statement of the Generalized Primitive Root The-orem, which provides a full classification of moduli n such that Z?

n contains aprimitive root. Due to the time limitations, we will refrain from proving this re-sult.

Theorem 14.6. (Generalized Primitive Root Theorem) The group of units Z?n con-

tains a primitive root if and only if n = 2, 4, an odd prime power, or an odd primepower multiplied by two.

15 Big-O NotationBefore we proceed to the discussion of primality tests and integer factorizationalgorithms, let us introduce several important definitions. When analyzing theperformance of algorithms, we will often be using the big-O notation and the no-tion of a polynomial time (or subexponential time or exponential time) algorithm.

Definition 15.1. Let f (n) and g(n) be two functions of n. We say that f (n) =O(g(n)) if there exists a positive real number M such that | f (n)| ≤M|g(n)| for allsufficiently large n.

Example 15.2. Let f (n) = n2 +4n+7 and g(n) = n3. Note that

12 = f (1)> g(1) = 1,19 = f (2)> g(2) = 8,28 = f (3)> g(3) = 27,39 = f (4)< g(4) = 64,52 = f (5)< g(5) = 125,. . .

so we see that, even though f (n) dominates g(n) for n= 1,2,3, the pattern changesfor n = 4,5, and in fact it so happens that f (n)< g(n) for all n≥ 4. Thus f (n) =O(g(n)). Note, however, that g(n) 6= O( f (n)).

53

Another example is f (n) = en and g(n) = 5en + en/2. Evidently, f (n)≤ g(n),so f (n) = O(g(n)). However, one may also notice that g(n) = O( f (n)), becauseen/2 ≤ en, and this implies that

g(n) = 5en + en/2 ≤ 5en + en = 6en = 6 f (n),

which means that g(n) = O( f (n)). In this case, we say that f (n) and g(n) havethe same asymptotic behaviour as n approaches infinity.

The big-O notation is used in order to simplify f (n) whenever we are inter-ested not in its precise form, but rather in its behaviour for very large n. Forexample, a function

f (n) = n5 +2en +3log(n)

simplifies to f (n) = O(en), because 2en dominates all other summands presentabove (note that 3 log(n) < n5 < 2en for sufficiently large n). Also, according toour definition, we may ignore the constant 2 in front of 2en, because it is presentimplicitly in the expression f (n)=O(en). Thus, when writing a certain expressionin its big-O form, all that we need to do is to identify some “simple” function thatdominates f (n), and we want to pick this function in the best way possible. Say,in the example above we could have written f (n) = O(e2n), but this is a less sharpestimate than f (n) = O(en), because e2n grows much faster than en. Thus theexpression f (n) = O(en) tells us more information about the function f (n) thanthe expression f (n) = O(e2n). The most common types of functions that we willencounter are

O(1) at most constant growth;O(logn) at most logarithmic growth;O(nk) at most polynomial growth (k > 0);

O(

exp(

cn1/k))

at most subexponential growth (c > 0,k > 1);O(exp(cn)) at most exponential growth (c > 0).

When analyzing the performance of algorithms, the function f (n) will rep-resent the number of steps required for the algorithm to terminate given the in-put n. For example, it was proved by Gabriel Lame that the computation ofgcd(a,b) with the Euclidean algorithm requires at most 5 log10(min{a,b}) steps,and this allows us to conclude that the performance of the Euclidean algorithm isO(log(min{a,b})). So the number of steps required for the algorithm to termi-nate grows logarithmically as min{a,b} approaches infinity.

54

Definition 15.3. Suppose that an algorithm takes a positive integer n as its input.We say that an algorithm works in polynomial time if there exists a positive realnumber k such that the number of steps required for it to compute is O

((logn)k).

Once again, consider the Euclidean Algorithm. As the number of steps re-quired to compute gcd(a,b) is equal to O(log(min{a,b})), we see that we maytake k = 1 in order to conclude that the algorithm works in polynomial time. Thismay seem a bit strange, because (logn)k is not a polynomial function (compareit to, say, n2 or n3 + n+ 7, which are polynomials). But when talking about analgorithm, we are interested in its performance not with respect to an input n,but rather with respect to the size of an input. You may think of the size of nas the number of decimal digits of n. This number never exceeds blog10 nc+ 1,so it is logarithmic in terms of n. So, if we provide n = 1000000 as an input tosome algorithm, roughly speaking we would consider it efficient if it terminatesin 7k steps for some positive integer k (note that 7 is the number of decimal digitsof n) rather than in 1000000k steps. From this perspective, any algorithm whichworks in O(n) = O(elogn) would actually be considered as an algorithm whichworks in exponential time. Such algorithms can be used to compute values onlyfor relatively small values of n.

Example 15.4. Here are some examples of famous algorithms and their asymp-totic running time.

• The fastest algorithm for integer multiplication known to date is the Toom-Cook Multiplication Algorithm, which was invented in 1963. Given twopositive integers a and b, for d = log(max{a,b}) this algorithm requiresO(d1.585) steps to compute, so it works in polynomial time;

• Shanks’s Baby-Step Giant-Step Algorithm, which was invented in 1971, al-lows one to compute discrete logarithms modulo n. If d = logn, then therunning time of the algorithm is O(

√n) = O(ed/2), so it works in exponen-

tial time;

• General number field sieve is the fastest algorithm which factors large in-tegers that is known to date. If n is an integer and d = logn, the algorithmworks in O(e2d1/3(logd)2/3

). The constant 2 in this expression is not optimal.We see that this algorithm is neither polynomial, nor exponential. Thesetypes of algorithms are called subexponential.

55

16 Primality TestingFor more details, please refer to the monograph by R. Crandall, C. Pomerance,Prime Numbers: A Computational Perspective, 2001.

As it was mentioned in the introduction, number theory is heavily used in cryp-tography. In the upcoming sections, we will look at several cryptographic proto-cols, all of which, in one way or the other, involve primality testing. For example,in order to ensure that the communication provided by the RSA cryptosystem issecure, one has to be able to generate a pair of very large prime numbers (severalthousands of bits). But how do we ensure that some given number n is prime,when we know that the problem of factorization of large integers is infeasible toelectronic computers? It turns out that there are several alternative ways to verifythat n is prime, which do not require the factorization of n.

There are three kinds of primality tests out there, namely

1. Heuristic tests — tests that work well in practice, but reside on a heuristicexplanation rather than on a proof (Fermat’s Primality Test);

2. Probabilistic tests — given n, these tests verify whether a number n is apseudoprime, i.e., it is a prime with a very large probability (Miller-RabinPrimality Test);

3. Deterministic tests — given n, these tests guarantee the primality or thecompositeness of n (trial division, AKS Primality Test, Elliptic Curve Pri-mality Test).

In this section, we will study the trial division method, the Fermat’s PrimalityTest and the Miller-Rabin Primality Test. We remark that the best known primal-ity test, the AKS Primality Test, was invented by Indian mathematicians ManindraAgrawal, Neeraj Kayal and Nitin Saxena in 2002. To this day, it is the only deter-ministic unconditional polynomial time algorithm for primality testing. In 2005,its asymptotic running time got improved by C. Pomerance and H. W. Lenstra, Jr.to O((logn)6). Despite all of its benefits, the probabilistic Miller-Rabin PrimalityTest is used in practice more often. If k denotes the number of times the algorithmhas to run before we conclude that n is a pseudoprime, the asymptotic runningtime of the Miller-Rabin Primality Test is O(k(logn)3).

56

16.1 Trial DivisionWhat is the most obvious way for determining whether a given integer n ≥ 2 iscomposite? Well, one just has to find one of its non-trivial factors! That is, if wecan show that there exists some integer d such that d | n and 1 < d < n, then n iscomposite.

For example, if n = 35, we just have to check that 2 - 35, 3 - 35, 4 - 35, until wefind out that 5 | 35. Therefore, 35 is a composite number. Of course, if we wouldconsider n = 37, the problem arises, as now we have to check 2 - 37, 3 - 37, . . . ,36 - 37, until we find out that n is prime. Fortunately, as the following propositionsuggests, there is no need to check all n− 2 numbers in between 1 and n to becertain that n is prime.

Proposition 16.1. For any composite integer n ≥ 2 there exists a divisor d suchthat 1 < d ≤

√n. Furthermore, we may assume that d is prime.

Proof. Let n = dk for some non-trivial divisors d and k. If we now supposethat both d >

√n and k >

√n, then dk > n, a contradiction. Therefore either

1 < d ≤√

n or 1 < k ≤√

n hold. Without loss of generality, assume the former.Since Theorem 2.7 asserts the existence of a prime p dividing d and d ≤

√n, we

see that 1 < p≤ d ≤√

n.

Now we may adjust our primality test as follows. Let bxc denote the largestinteger ≤ x. According to Proposition 16.1, in order to verify that n is prime, wejust have to ensure that

2 - n,3 - n, . . . ,b√

nc - n.

For example, in the case of n = 37, we have b√

37c = b6.083c = 6, and 2 - 37,3 - 37, . . . , 6 - 37. Therefore 37 is prime. Thus we were able to reduce the numberof steps in our primality test from n−2 to b

√nc−1. Quite a significant improve-

ment!We can actually do slightly better. According to Proposition 16.1, we can limit

ourselves only to prime divisors of n. So, in the case of n = 37, there was no needto check its divisibility by 4 or 6, since these numbers are composite. So we couldachieve the same conclusion simply by testing 2 - 37, 3 - 37 and 5 - 37.

In order to make this further improvement, we need to know all prime num-bers≤

√n. Fortunately, there is a rather simple method called the Sieve or Eratos-

thenes, which allows us to produce all prime numbers up to X in O(X log logX)steps (see Assignment 3). The method was discovered by the Greek mathemati-cian Eratosthenes of Cyrene (≈ 250BC), and goes as follows:

57

1. Initialize a table A of X elements by setting A[1] = 1 and A[i] = 0 for all 2 ≤i≤ X ;

2. Let p = 2;

3. Set A[2p] = 1, A[4p] = 1, A[6p] = 1, and so on, for all multiples of p in thetable A;

4. Change p to the smallest index k > p such that A[k] = 0. If p >√

X , terminate.Otherwise, return back to step 3.

In the end, all elements i such that A[i] = 0 will correspond to prime numbers.It follows from Merten’s Second Theorem that the asymptotic running time of theSieve of Eratosthenes is O(X log logX) (see Assignment 3). This can be furtherimproved to O(X) if we start eliminating not from 2p (i.e. 2p, 4p, 6p, and soon), but from p2, thus crossing out p2,(p+1)p,(p+2)p, etc. The improvementbecomes evident once we note that by the time the algorithm reaches prime p, thenumbers 2p,3p, . . . , (p−1)p already got eliminated by some prime less than p.

Of course, it is impractical to run the Sieve of Eratosthenes up to√

n each timewe try to factor n, as then the asymptotic running time will always be O(

√n). This

is why in practice one usually runs the Sieve of Eratosthenes up to some largebound first, then stores all prime numbers in the table, and later uses this tableto factor integers. It follows from the Prime Number Theorem that the numberof primes ≤ X is O(X/ logX). So, assuming that the table of prime numbers upto√

n is given to us a priori, the trial division will now take O(√

n/ logn) stepsinstead of O(

√n).

Note the power of this method: for example, given a number n ≤ 1012, wejust have to check p | n for all primes p≤ 106. Given the table containing 78498prime numbers less than a million, this verification can be done by the computeralmost immediately. In fact this method should work quite fast for all numberswith at most 18 decimal digits. However, when the number of digits of n exceeds18, things start to get more complicated: there are too many prime numbers tocheck, and it is difficult to fit all of them into memory at once.

16.2 Fermat’s Primality TestAnother interesting way of demonstrating that a number n is composite is to usethe Fermat’s Little Theorem, which states that, if n is prime and a is coprime to n,

58

thenan ≡ a (mod n).

Therefore all that we have to do to prove that n is composite is to find a such thatan 6≡ a (mod n). If a satisfies such a property, we call it a witness for the non-primality of n. In practice, the computation of an (mod n) can be done relativelyfast using the Double-and-Add Algorithm.

Example 16.2. Let us use Fermat’s Primality Test to prove that n = 323 is notprime. Note that

323 = 28 +26 +2+1 = 256+64+2+1.

Now pick a random a such that 1< a< 323, say a= 5. If n is prime, then Fermat’sLittle Theorem should hold for a. We use the Double-and-Add Algorithm to checkwhether this is the case:

52 ≡ 25, 532 ≡ (516)2 ≡ 25654 ≡ (52)2 ≡ 302, 564 ≡ (532)2 ≡ 29058 ≡ (54)2 ≡ 118, 5128 ≡ (564)2 ≡ 120516 ≡ (58)2 ≡ 35, 5256 ≡ (5128)2 ≡ 188 (mod 323).

Thus

5323 ≡ 5256 ·564 ·52 ·5≡ 188 ·290 ·25 ·5≡ 256 ·125≡ 236≡ 5 (mod 323).

This result allows us to conclude that 323 is not prime. Note, however, that if wewould randomly pick a = 18,152,170 or any other number for which a323 ≡ a(mod 323) actually holds, we would not be able to draw any conclusion about n.Fortunately, for 323 there are only 7 possible a’s between 1 and 323 such thata323 ≡ a (mod 323), so the probability of this happening is relatively small. Andeven if this happens, we could just pick yet another random value of a, for whicha323 6≡ a (mod 323) might be true.

From Example 16.2, the algorithm becomes clear. Let n be an integer, and letk ≥ 1 be the maximal number of times that we are going to choose a at random.Then do the following:

59

1. Set i = 0;

2. If i = k, conclude that n is a pseudoprime. Otherwise pick a random integera such that 1 < a < n;

3. Compute an (mod n) using the Double-and-Add Algorithm;

4. If an 6≡ a (mod n), conclude that n is composite. Otherwise increment i andgo back to step 2.

According to this algorithm, we conclude that n is a pseudoprime whenever krandom choices of a result in an ≡ a (mod n). In practice, this algorithm worksquite well, even though it is purely heuristic. However, there are some specialcomposite numbers which do not admit witnesses of their non-primality at all.

Definition 16.3. A composite integer n is called a Carmichael number whenever

an ≡ a (mod n)

for all integers a.

There exist infinitely many Carmichael numbers, and the first 10 of them are

561,1105,1729,2465,2821,6601,8911,10585,15841,29341.

They were discovered by the American mathematician Robert Carmichael. Whatis interesting is that the criterion for determining Carmichael numbers was foundby the German mathematician Alwin Korselt in 1899, even before Carmichaelnumbers were discovered.

Theorem 16.4. 24 An integer n is a Carmichael number if and only if

1. n = p1 · p2 · · · pk, where k > 1 and p j are primes without repetition;

2. every p j−1 divides n−1.

Therefore every Carmichael number will always be regarded as a pseudoprimeby the Fermat’s Primality Test and this is unavoidable.


60

16.3 Miller-Rabin Primality TestThis test was originally developed by Gary Miller in 1976 and it was deterministic,but its determinism relied on a reasonable but unproved conjecture, called theExtended Riemann Hypothesis. In 1980, Michael Rabin converted this algorithminto unconditional, but probabilistic algorithm. This is the algorithm that we aregoing to learn about.

To understand the idea behind the Miller-Rabin primality test, recall that thecongruence

x2 ≡ 1 (mod p)

has exactly two solutions, namely x ≡ ±1 (mod p), whenever p is prime. Thissimply follows from Proposition 12.2 applied to the quadratic polynomial x2− 1with coefficients in Zp.

Now let n > 2 be prime. Then n−1 = 2sd for some positive integers s and d,where d is odd. According to Fermat’s Little Theorem,

an−1 ≡ a2sd ≡(

a2s−1d)2≡ 1 (mod n).

Thus we see that a2s−1d is a root of x2−1 modulo n. Since n is prime, a2s−1d ≡±1(mod n). If a2s−1d ≡ −1 (mod n), we stop. Otherwise, we can extract the squareroot one more time, so that a2s−2d ≡±1 (mod n), and so on, until we either reacha2rd ≡−1 (mod n) for some r or ad ≡ 1 (mod n). We conclude that, if n is prime,then

• either ad ≡ 1 (mod n); or

• a2rd ≡−1 (mod n) for some r such that 0≤ r ≤ s−1.

Thus, if we could show that

ad 6≡ 1 (mod n)

anda2rd 6≡ −1 (mod n)

for all r such that 0 ≤ r ≤ s− 1, then n has to be composite. Note that with theFermat’s Primality Test we would only check for a2sd ≡ 1 (mod n), whereas in theMiller-Rabin primality test we perform s checks for ad,a2d, . . . ,a2s−1d (mod n).As it turns out, this is more than enough to fix many problems that we saw with

61

Fermat’s Primality Test. For example, Catalan numbers can be recognized ascomposite numbers. Furthermore, one can prove that at least 3/4 of a’s coprimeto an odd composite number n are witnesses of n’s compositeness. Therefore,the probability that the Miller-Rabin Test would fail is at most 1/4, which meansthat after k verifications the probability that n is composite while it is reported aspseudoprime is at most 1/4k.

Unfortunately, one cannot do better than that, and predict the location of wit-nesses in Z/nZ. Their distribution can be very different, and this is why choosinga at random is better than to use a = 2,3,5, . . . iteratively. For example, Arnautfound a 397-digit composite number for which all bases a< 307 are not witnesses.This number was reported to be prime by the Maple isprime() function, becauseit picked prime bases a = 2,3,5, . . . iteratively, rather than randomly.

Example 16.5. Let us show that n = 323 is a pseudoprime using Miller-RabinPrimality Test and base a = 18. Note that a323 ≡ a (mod n), so if we woulduse Fermat’s Primality Test on n only once, it would report n as a pseudoprime.However, 322 = 2 ·161, and we note that

18161 ≡ 18 6≡ ±1 (mod 323),

so n = 323 would be reported as composite by the Miller-Rabin Primality Test.

17 Public Key Cryptosystems.The RSA Cryptosystem

For more details, please refer to the monograph by W. Trappe, L. C. Washington,Introduction to Cryptography with Coding Theory, 2nd edition, 2006.

Suppose that Alice wants to send a secret message to Bob, and because theyare too far away from each other and personal communication is impossible, sheneeds to send this message over the internet. The channel between Alice’s com-puter and Bob’s computer is unprotected. While travelling from one computerto the other, the message passes many times through many different routers, andit is possible to intercept it by listening on the channel. For example, this can bedone with packet analyzers like WireShark. Though interception of the message ishardly avoidable, it is possible to protect the information itself through encryption.

Since the antiquity, the humanity was using what we now call private keycryptosystems. Perhaps, the most famous example of a private key encryption

62

is the so-called Caesar cypher. According to Suetonius, Julius Caesar used thiscypher in order to encrypt messages of military significance. The cypher shifts themessage by 3 letters to the left: A→ X , B→Y , C→ Z, D→ A, . . . , Y → T , Z→V(note that we used Latin alphabet instead of English alphabet). For example, thephrase

DEVS EX MACHINA

can be encrypted using Caesar’s cypher as follows:

ABRP BS IXZEFKX

Now this cypher is not terribly sophisticated, but back in Caesar’s time it wasconsidered quite complex, and surely the receiver would have to know the magicalnumber 3 in order to decrypt it by shifting letters three times to the right. So, aswe can see, both the sender and the receiver, along with the encryption/decryptionprocedure, must agree on some private key, which in this case is equal to 3. Manyciphers, such as the Vigenere cipher, the renowned Enigma cipher, or modernciphers such as the Digital Encryption Standard (DES) or Rijndael (AES), workaccording to this principle: once the sender and the receiver agree on some secretkey, they both can encrypt and decrypt messages, thus being able to communicatesecurely. But what if the sender and the receiver are too far away from each other?If Alice is in Australia, Bob is in Bulgaria, then how can they agree on a secretkey? One answer to this problem would be public key cryptography. Key insight:Alice and Bob don’t even have to agree on a private key in order to send encryptedmessages to each other!

The RSA cryptosystem was invented in 1977 by Ron Rivest, Adi Shamir andLeonard Adleman. It was the first practical widely deployed public key cryp-tosystem. This is how RSA works. Bob generates two really large distinct primenumbers p and q, computes n = pq, as well as ϕ(n) = (p− 1)(q− 1). Then hechooses an encryption exponent e such that

gcd(e,ϕ(n)) = 1,

and solves the congruence

de≡ 1 (mod ϕ(n))

for d. Then he sends the public key (n,e) to Alice. Alternatively, he can publish(n,e) on his webpage, thus making this key publicly available to everyone. How-ever, he does not release the private key (p,q,d). No one knows the values of p, qand d except for Bob.

63

Now Alice can use Bob’s public key (n,e) to send messages to Bob securely.Suppose that Alice wants to send a message written in English. First, she convertsthis message into a number m. For example, this can be done using the ASCIItable. According to the ASCII table, every upper or lower case letter of Englishalphabet, digit, and some special characters like * $ ! or %, correspond to somenumber between 0 and 127. For example, in the message

Hello!

the letter ‘H’ corresponds to 72, letter ‘e’ corresponds to 101, and so on:

Character Base 10 Base 2H 72 010010002e 101 011001012l 108 011011002o 111 011011112! 33 001000012

We concatenate base 2 representations of ASCII numbers corresponding to ourcharacters together, thus obtaining a bigger number m:

m = 01001000︸︷︷︸H

01100101︸︷︷︸e

01101100︸︷︷︸l

01101100︸︷︷︸l

01101111︸︷︷︸o

001000012︸︷︷︸!

.

Note that each character fits into 1 byte = 8 bits. Since there are 6 characters inour message, the resulting number m satisfies 0≤ m < 26·8 = 248. Now, if Bobwill receive this number m, he can easily decode the message by reading off 8 bitsat a time and matching them to a corresponding character in the ASCII table.

Before encrypting the message, Alice needs to verify that 0 ≤ m < n so thatthe information will not get lost during the transmission. If it so happens thatm≥ n, she breaks the message into k = bm/nc+1 pieces m1, m2, . . . , mk such that0≤mi < n for all i, 1≤ i≤ k, and then sends m1, m2, . . . , mk to Bob consecutively.

Suppose that 0≤m< n. Now Alice uses Bob’s public key (n,e) and computesthe integer c, 0≤ c < n, such that

c≡ me (mod n).

This number c is the result of RSA encryption, and Alice sends this encryptedmessage to Bob over the unprotected channel.

64

When Bob receives the encrypted message c, he can decrypt it and obtain theoriginal message m using the private key d:

cd ≡ (me)d ≡ mde ≡ m (mod n).

Note that above we utilized the fact that de≡ 1 (mod ϕ(n)).

Example 17.1. Suppose that Bob chose p = 1597 and q = 4139. Then

n = pq = 1597 ·4139 = 6609983,

ϕ(n) = (p−1)(q−1) = 1596 ·4138 = 6604248.

Bob chooses the encryption exponent e = 3263993 and then computes

d ≡ e−1 ≡ 3263993−1 ≡ 2051801 (mod 6604248).

Now he keeps p,q and d in secret, and makes (n,e) publicly available.Now, in order to send the message “Hi!” to Bob, Alice converts it into an

integer m using the ASCII table:

m = 01001000︸︷︷︸H

01101001︸︷︷︸i

001000012︸︷︷︸!

= 4745505.

Alice verifies that 0 ≤ m < n, and then computes the encrypted message c withthe Double-and-Add Algorithm using Bob’s encryption exponent e:

c≡ me ≡ 47455053263993 ≡ 673426 (mod 6609983).

Then Alice sends c = 673426 to Bob.When Bob receives c, he computes m with the Double-and-Add Algorithm

using his private key d:

m≡ cd ≡ 6734262051801 ≡ 4745505 (mod 6609983).

After that, Bob converts the 3 byte number m into a three character message “Hi!”which Alice sent to him using the ASCII table.

Now why this method of communication is secure? Suppose that some mali-cious adversary Eve managed to eavesdrop on the unprotected channel and inter-cept the message c. Since Bob’s public key (n,e) is available to everyone, Eve alsoknows both n and e. Therefore Eve’s goal is, by knowing (n,e) and c, to obtain

65

m. The most obvious way to solve this problem is to find an integer d such thatde≡ 1 (mod ϕ(n)). In order to do so, Eve has to compute ϕ(n) = (p−1)(q−1)by knowing n. Unfortunately for Eve, the problem of computing ϕ(n) from nwhen n is a composite number is difficult, and requires a factorization of n. Tothis day, we do not know any polynomial time factorization algorithms. The bestones, namely the Quadratic Sieve and the Generalized Number Field Sieve, aresubexponential. Thus, if we choose n large enough, — and the National Instituteof Standards and Technology (NIST) recommends to choose n > 21024, — thefactorization of n would become infeasible to modern electronic computers, evenif the work load would be distributed among several supercomputers.

Of course, the numbers p, q and e should be chosen by Bob very carefully.For example, if either p or q are really small, then they can be located using trialdivision. If either p or q are really close to

√n =√

pq, say |p−√

n| ≤ 2n1/4,then the number n can be factored using the Fermat’s Factorization Method. Ifthe prime divisors of either p−1 or q−1 are really small, then the number n canbe factored using Pollard’s p− 1 Algorithm (see Assignment 3). If e is chosenso that d is really small, say d < 3−1n1/4, then it can be calculated in polynomialtime O(logn) (see Section 6.2.1 in Trappe and Washington).

When sending the message, Alice has to be really cautious as well. For ex-ample, if the number m is relatively small in comparison to n, then even withoutthe knowledge of d or the factorization of n Eve can decrypt the message usingthe Short Plaintext Attack (see Section 6.2.2 in Trappe and Washington). To solvethis problem, Alice can pad her message with some random characters either atthe beginning or at the end. So as you can see, there are many things that bothAlice and Bob have to check before establishing a secure communication.

The RSA cryptosystem can be utilized not only for secure communication, butalso for authentication purposes. Imagine a situation when Alice sends a messagem to Bob, and Bob cares not so much about the privacy of their communication,but rather about the authenticity of the sender. That is, he wants to be absolutelysure that the message m was sent to him by Alice and no one else. The waythis can be done using RSA is as follows: Alice puts a digital signature s on themessage m using her private key d:

s≡ md (mod n).

Then she sends (m,s) to Bob. When Bob receives the message with Alice’s signa-ture, he can verify that it belongs to Alice by using her public key e and checkingthat

m≡ se (mod n).

66

Exercise 17.2. Use your favourite computer algebra system to encrypt the mes-sage m = 12345 with RSA using the public key (n,e) = (786073,221891). Thenbreak the system by factoring n = pq, determining the private key d, and thendecrypting the message c = 547988.

Exercise 17.3. Use your favourite computer algebra system to verify that themessage (m,s) = (100,1580073) belongs to the owner of the public key (n,e) =(5988889,4324055). Then break the system and put a fake digital signature s′ onthe message m′ = 1000000, so that (m′,s′) passes the verification with the publickey (n,e).

Exercise 17.4. (Exercise 7 in Trappe and Washington) Naive Nelson uses RSA toreceive a single ciphertext c, corresponding to the message m. His public modulusis n and his public encryption exponent is e. Since he feels guilty that his systemwas used only once, he agrees to decrypt any ciphertext that someone sends him,as long as it is not c, and return the answer to that person. Eve sends him theciphertext 2ec (mod n). Show how this allows Eve to find m.

Exercise 17.5. (Exercise 8 in Trappe and Washington) In order to increase secu-rity, Bob chooses n and two encryption exponents e1, e2. He asks Alice to encrypther message m to him by first computing c1 ≡ me1 (mod n), then encrypting c1 toget c2 ≡ ce2

1 (mod n). Alice then sends c2 to Bob. Does this double encryptionincrease security over single encryption? Why or why not?

Exercise 17.6. (Exercise 10 in Trappe and Washington) The exponents e = 1 ande = 2 should not be used in RSA. Why?

18 The Diffie-Hellman Key Exchange ProtocolThere are many benefits to using RSA, but there is one big problem: despite thefact that it works in polynomial time, it is quite slow. For suppose that we want tocompute

c≡ me (mod n).

The Double-and-Add Algorithm requires at most loge squarings and at most logemultiplications, thus resulting in at most 2 loge ≤ 2logn arithmetic operations intotal. Each multiplication involves numbers of size at most logn. The best knownmultiplication algorithm, the Toom-Cook Algorithm, requires O((logn)1.465) stepsto multiply two integers of size at most logn. Since there are at most 2 logn multi-plications, the encryption and decryption require O((logn)2.465) steps to compute.

67

Roughly speaking, this means that if n is a 2048 bit number, then one can encryptor decrypt messages in 20482.465 ≈ 1.45 ·108 steps.

Private key cryptosystems (also referred to as symmetric ciphers or block ci-phers) are much much faster, because their execution does not involve any com-plex mathematical computations. Instead, in order to encrypt the message theyuse logical operations, such as AND, OR, NOT and XOR, as well as bit shiftsand bit permutations. Caesar cipher is an example of a cipher which uses onlyshifts, but on letters of the alphabet rather than on bits. Anagrams, like “eHll!o”,are examples of permutations on letters. These operations are very simple andin fact require only O(1) steps to compute (compare it to multiplication, whichrequires O((logn)1.465)). In the end, both encryption and decryption for these ci-phers require O(logn) steps. The most widely deployed symmetric ciphers are3-DES (Triple Data Encryption Standard) and AES (Advanced Encryption Stan-dard), which is also commonly referred to as Rijndael.

As it was mentioned in Section 17, in order to use private key cryptosystemstwo parties must agree on a secret key. So how can this be done when Alice andBob are too far away from each other? Here is one way: Alice generates a secretkey K, encrypts it using RSA with Bob’s public key, and then sends the encryptedmessage to Bob. Bob decrypts the message, and so now Alice and Bob share asecret K in common. Then they may use whichever symmetric algorithm theywant, such as 3-DES or AES.

But there is another way for Alice and Bob to agree on a common key. Thisprocedure, called The Diffie-Hellman Key Exchange Protocol, was patented byWhitfield Diffie and Martin Hellman in 1977. Its security is based on the DiscreteLogarithm Problem, and it works as follows. Alice generates a large prime numberp, an integer g such that 0≤ g < p, and an integer x such that 1≤ x≤ p−2. Shecomputes gx (mod p), and then sends p, g and gx (mod p) to Bob. When Bobreceives p, g and gx (mod p), he generates an integer y such that 1 ≤ y ≤ p− 2,computes gy (mod p), and then sends it back to Alice. Finally, since Alice knowsx and gy (mod p), she can compute

gxy ≡ (gy)x (mod p),

and since Bob knows y and gx (mod p), he can compute

gxy ≡ (gx)y (mod p).

So in the end both Alice and Bob share a secret in common, namely gxy (mod p).

68

Why is this secure? If a malicious adversary Eve would listen on the com-munication between Alice and Bob, she could intercept p, g, gx (mod p) andgy (mod p), and by knowing this information she would have to compute gxy

(mod p). This problem is called the Diffie-Hellman Problem, and it is at least ashard as the Discrete Logarithm Problem. That is, if Eve would know how to solvethe Discrete Logarithm Problem, she would be able to solve the Diffie-HellmanProblem (see Assignment 3). However, it is not known whether these two prob-lems are equivalent. We do not know any polynomial time algorithm for com-puting discrete logarithms. The best known subexponential algorithm is due toAdleman and it utilizes index calculus. The discrete logarithm can be computedquite fast in some special cases, but if the parameters p, g, x and y are chosenproperly, the problem becomes intractable to modern electronic computers. Thereare many things that need to be verified in order to ensure that the communicationis secure, but we will just mention that the parameter g should be chosen so thatord(g) in Z?

p is sufficiently large.As a final remark, we would like to mention that there exists an efficient quan-

tum algorithm for computing discrete logarithms, which was invented by PeterShor in 1997.

19 Integer FactorizationThe next computational problem that we address is the integer factorization prob-lem. That is, given a composite integer n, we would like to find a non-trivialdivisor of n. Unlike for primality testing, we do not know any polynomial timealgorithm for integer factorization. Many mathematicians believe that the integerfactorization problem is hard, and several cryptographic protocols, such as RSA,reside on this assumption. If you want to become a famous mathematician, tryinventing a polynomial time algorithm for integer factorization. Note, however,that there exists an efficient quantum algorithm for integer factorization, whichwas invented by Peter Shor in 1994.

There are many algorithms for integer factorization. The most obvious one,trial division, we studied in Section 16. Of course, this algorithm allows us tofactor an integer n in O(

√n) = O(elogn/2) steps, so this algorithm is exponential

and is no good for factoring large integers.In this section, we will study two factorization algorithms, namely the Fer-

mat’s Algorithm and its optimized variant, called the Dixon’s Algorithm. Theformer is an exponential algorithm and the latter is a subexponential algorithm.

69

You will also learn about Euler’s Factorization Method in Assignment 3.

19.1 Fermat’s Factorization MethodFermat’s Factorization Method was suggested by the French mathematician Pierrede Fermat back in XVII century. The idea is simple: given an integer n, the goalis to find integers x and y such that

n = x2− y2.

Thenn = (x− y)(x+ y),

and if neither x−y nor x+y are equal to 1, this results in a non-trivial factorizationof n. Note that even numbers cannot be represented in this form, but we mayeasily disregard them from consideration, since every even number greater than 2always has a non-trivial divisor equal to 2. Unlike even integers, odd integers canbe represented as a difference of two perfect squares, for if n = k`, then

n =

(k+ `

2

)2

−(

k− `

2

)2

.

Since n is odd, then so are k and `, which means that both (k+ `)/2 and (k− `)/2are integers, too. If n = k` is a multiple of 4, such a representation is also possibleonce we assume that both k and ` are even. From the formula above it is alsoevident that there can be many representations of an integer as a difference of twoperfect squares.

Let dxe denote the smallest integer ≥ x. We will now convert the observationsmade above into an algorithm:

1. Put x := d√

ne and then set y := x2−n;

2. If y is a perfect square, return(x−√y

); otherwise proceed to Step 3;

3. Increase x by 1 and then set y := x2−n;

4. Go back to Step 2.

Note that the algorithm always terminates. Furthermore, if the algorithm re-turns 1, then the number n must be prime.

70

Example 19.1. Let us use Fermat’s Algorithm to factorize n = 8023. Note that√n≈ 89.57, so we begin with x = 90 and y = x2−n = 902−8023 = 77. We see

thatx y y =�?90 77 no91 258 no92 441 yes

Since√

441 = 21, we see that

8023 = 922−212 = (92−21)(92+21) = 73 ·113.

Thus Fermat’s Factorization Algorithm terminated in just three steps, resulting ina non-trivial factor x−√y = 92−21 = 73.

Exercise 19.2. Use Fermat’s Algorithm to factor integers 4747 and 7303.

Now let us analyze the performance of the algorithm above. We will count asingle computation of x and y as one step. If n = k` and k is the largest divisor ofn such that k≤

√n, then Fermat’s Algorithm will return k as a result. In this case,

y = (k+ `)/2, which means that the number of steps required for the computationis equal to

k+ `

2−d√

Ne.

We can bound this quantity from above as follows:

k+ `

2−d√

Ne ≤ k+ `

2−√

N

=(√

k−√`)2

2

=(√

n− k)2

2k.

We see that, if n is prime, then k = 1 and the algorithm requires O(n) steps tocompute. Therefore, in its worst case, the algorithm is exponential. Note that it iseven worse than trial division, because the trial division requires O(

√n) steps to

compute.Why do we care then about Fermat’s Factorization Method? First of all, in

some special cases it performs really well. For suppose that k satisfies√

n− k ≤ 2n1/4,

71

so it is relatively close to√

n. Then for all n > 64 it is the case that

(√

n− k)2

2k≤ 4

√n

2(√

n−2n1/4)

≤ 21−2n−1/4

< 3,

which means that Fermat’s Algorithm terminates in two steps! Of course, this ismuch faster than if we would use trial division. This is why Fermat’s Factoriza-tion Method is usually used in combination with the Trial Division Method. Firstone chooses a constant c >

√n and then Fermat’s Algorithm is used to look for

divisors between√

n and c. After that, one only has to check prime divisors ofn with the trial division method up to c−

√c2−n instead of

√n. Even though

this observation does not allow us to push the bound below O(n1/2), it helps todecrease the constant implicit in the big-O notation significantly. Further improve-ments can be done through sieving, and in 1974 Lehman managed to combine allof the improvements and invented a factorization algorithm based on Fermat’sFactorization Method and trial division with asymptotic running time O(n1/3).

Though Fermat’s Algorithm can be quite slow in its worst case, it lies inthe foundation of the best factorization algorithms known to date, namely thequadratic sieve and the generalized number field sieve, which have subexponentialasymptotic running time. Both of these algorithms evolved from the factorizationmethod due to Dixon.

19.2 Dixon’s Factorization MethodDixon’s Factorization Method was proposed in 1971 by the Canadian mathemati-cian John D. Dixon, who is a professor emeritus at Carleton University, Ottawa.Recall that in Fermat’s Factorization Method we were choosing an integer x be-tween 0 and n and then evaluating x2 (mod n), hoping that the result would be aperfect square; that is,

x2 ≡ y2 (mod n).

Unfortunately, up to n, there are only b√

nc perfect squares, and so for very largen the total proportion of perfect squares less than n tends to zero:

b√

ncn≤√

nn

=1√n−→ 0.

72

Dixon’s method suggests that, instead of looking for a perfect square we can ac-tually construct it from many random samples. The idea is as follows: by pickingdistinct x1,x2, . . . between 0 and n at random, we obtain relations of the form

x21 ≡ z1 (mod n),

x22 ≡ z2 (mod n),. . .

where z1,z2, . . . are integers between 0 and n. One would then hope to selectrelations i1, i2, . . . , ir so that the number zi1zi2 · · ·zir = y2 is a perfect square. Butthen

(xi1xi2 · · ·xir)2 ≡ y2 (mod n),

which means that one can compute a divisor d of n by evaluating

d = gcd(xi1xi2 · · ·xir − y,n).

If it so happens that d = 1 or d = n, we construct a new set of random samples, orselect a different k-tuple i1, i2, . . . , ir with the property described above.

Now the main question is, how do we construct congruences x2i ≡ zi (mod n),

from which we can produce a non-trivial perfect square? The main idea here is topick only those xi’s, for which the resulting values of zi’s are so-called B-smoothnumbers.

Definition 19.3. Let B ≥ 2 be a real number. An integer n is called B-smooth iffor any prime p | n it is the case that p≤ B.

Example 19.4. For example, numbers 2,3,4,5,6,8,9,10,12 are all 5-smooth.The reason is that every prime p dividing an integer from that list satisfies p≤ 5.The numbers 7 and 11, however, are not 5-smooth, but they are both 11-smooth.

Now every time we choose a random x and then evaluate z≡ x2 (mod n) suchthat 0 ≤ z < n, we need to verify that z is B-smooth. One can check that a givennumber z is B-smooth in just O(B) steps using trial division. Note that, if p1 <p2 < .. . < pk are all prime numbers ≤ B, then every B-smooth number can bewritten in the form

z = pe11 pe2

2 · · · pekk ,

where e1,e2, . . . ,ek are non-negative integers. Thus we obtain a vector v=(e1,e2, . . . ,ek)in Zk. Further, we can reduce the elements of this vector modulo 2, thus obtaininga vector v = (e1, e2, . . . , ek) in Zk

2 with e1, e2, . . . , ek ∈ {0,1}. Because Z2 forms a

73

field (that is, division by a non-zero element is always allowed), the set Zk2 con-

stitutes a k-dimensional vector space over Z2, which means that we can analyze itfrom the perspective of linear algebra. In particular, any collection of k+1 vectorsin Zk

2 will always be linearly dependent.Now suppose that for distinct values x1,x2, . . . ,xk+1 we managed to compute

B-smooth values z1,z2, . . . ,zk+1, which correspond to vectors v1, v2, . . . , ˜vk+1 inZk

2. Since Zk2 has dimension k, it must be the case that vectors v1, v2, . . . , ˜vk+1 are

linearly dependent in Zk2. But then there must exist indices i1, i2, . . . , ir for some

r ≤ k+1 such that

vi1 +vi2 + . . .+vir ≡ 0 (mod 2),

which means that zi1zi2 · · ·zir is a perfect square. In order to find such linearlydependent vectors vi1, vi2, . . . , vir in Zk

2, we row reduce the (k+1)×(k+1) matrix

M = [v1, v2, . . . , ˜vk+1]T ,

whose coefficients belong to Z2. Note that the row reduction requires O(k3) =O(B3) steps. At this point, we can compute the value

d = gcd(xi1xi2 · · ·xir − zi1zi2 · · ·zir ,n)

and, in case if d = 1 or d = n, repeat the procedure of choosing distinct randomvalues x1,x2, . . . ,xk+1 once again.

The only thing that is left for us to establish is the value of B. As it turns out,the most optimal choice for B is B = eO(

√logn log logn), so the asymptotic running

time of Dixon’s algorithm is subexponential.

Exercise 19.5. In this exercise, we will use Dixon’s method to find a non-trivialfactor of 34081.

(a) Factorize integers 15, 486, 24010 to ensure that they are all 7-smooth;

(b) Suppose that the execution of Dixon’s Factorization Algorithm allowed us tolocate the congruences

8052 ≡ 486 (mod 34081);8462 ≡ 15 (mod 34081);9542 ≡ 24010 (mod 34081).

Using the above congruences, as well as the factorizations obtained in Part (a),find integers x and y such that

x2 ≡ y2 (mod 34081),

and then use these x and y to compute a non-trivial factor of 34081.

74

20 Quadratic ResiduesLet n ≥ 3 be a modulus and a,b,c be arbitrary integers. We will now turn ourattention to quadratic congruences

ax2 +bx+ c≡ 0 (mod n).

We require that n - a, for otherwise the above congruence would reduce to thelinear congruence bx+c≡ 0 (mod n). Also, if n = 2, by Fermat’s Little Theoremx2 ≡ x (mod 2) regardless of x. Thus

ax2 +bx+ c≡ (a+b)x+ c (mod 2),

so once again we obtain a linear congruence. Thus it is reasonable to assume thatn ≥ 3. Finally, for the simplicity of exposition, we will assume that n is an oddprime, and we will indicate that by writing p instead of n. Note that the integerp−1

2 is even.In this section, we will not aim to solve quadratic congruences. Instead, we

will investigate when solutions exist. Note that it follows from Propositon 12.2that the polynomial [a][x]2 +[b][x]+ [c] has at most 2 roots in Zp.

Proposition 20.1. 25 Let p be an odd prime, and a, b, c be integers where p - a.The quadratic congruence

ax2 +bx+ c≡ 0 (mod n)

has a solution x if and only if the congruence

y2 ≡ b2−4ac (mod p)

has a solution y. In that case, y≡ 2ax+b (mod p).

Proof. Multiply both sides of the quadratic congruence by 4a to get

4a2x2 +4abx+4ac≡ 0 (mod p).

This can be rewritten as

(2ax+b)2−b2 +4ac≡ 0 (mod p),


75

which is the same as

(2ax+b)2 ≡ b2−4ac (mod p).

Conversely, suppose that y is a solution to y2 ≡ b2− 4ac (mod p). Note thatwe can solve the linear congruence 2ax+ b ≡ y (mod p) for x, because [2a] is aunit in Zp. Thus

(2ax+b)2 ≡ y2 ≡ b2−4ac (mod p),

which is the same as

4a2x2 +4abx+4ac≡ 0 (mod p).

Since [4a] is a unit in Zp, we can multiply both sides of the above congruence by(4a)−1 (mod p) in order to obtain

ax2 +bx+ c≡ 0 (mod p).

Therefore x which satisfies 2ax + b ≡ y (mod p) is a solution to the originalquadratic congruence.

Proposition 20.1 tells us that solving the quadratic congruence

ax2 +bx+ c≡ 0 (mod p)

is equivalent to solving a simplified quadratic congruence

x2 ≡ d (mod p),

where d = b2− 4ac. The integer d is called the discriminant of the quadraticpolynomial aX2 +bX + c. Thus, in order to find solutions to x2 ≡ d (mod p), weneed to understand which residue classes of Zp are squares.

Definition 20.2. A residue α in Zp is called a quadratic residue when α ∈Z?p and

α = β 2 for some other residue β in Z?p. If such β does not exist, then α is called

a quadratic nonresidue.

When translated to the language of congruences, we say that an integer a hasa quadratic residue modulo an odd prime p if p - a and a≡ x2 (mod p) for someinteger x.

76

Example 20.3. Let us find all quadratic residues in Z?13. We note that

[1]2 = [1] [7]2 = [10][2]2 = [4] [8]2 = [12][3]2 = [9] [9]2 = [3][4]2 = [3] [10]2 = [9][5]2 = [12] [11]2 = [4][6]2 = [10] [12]2 = [1]

Thus the quadratic residues are [1], [3], [4], [9], [10], [12].

Exercise 20.4. Determine all quadratic residues in Z?17, Z?

19 and Z?23.

Proposition 20.5. Let p be an odd prime. Then the group of units Z?p has exactly

(p−1)/2 quadratic residues and exactly (p−1)/2 quadratic nonresidues.

Proof. Note that, for any [a] in Z?p, it is the case that [a]2 = (−[a])2. Thus it is

sufficient to look at a’s such that 1 ≤ a ≤ (p− 1)/2. We now claim that all theelements in the collection

[1]2, [2]2, . . . ,[

p−12

]2

are distinct. Suppose not, and [a]2 = [b]2 = [c] for some residue [c]. Then both[a] and [b] are the roots of the polynomial X2− [c] in Zp. By Proposition 12.2,such a polynomial has at most 2 roots in Zp. However, we see that it has at least 4distinct roots, namely ±[a] and ±[b]. Thus we obtain a contradiction. Thereforethe above collection has no repetitions, so Z?

p contains (p−1)/2 residues. Sinceevery element of Z?

p which is not a residue is a nonresidue, we conclude that thereare exactly (p−1)/2 nonresidues.

Definition 20.6. For an odd prime p and an integer a coprime with p, we let(ap

):=

{+1 if a has a quadratic residue modulo p;−1 if a does not have a quadratic residue modulo p.

The symbol(a

p

)is called the Legendre symbol for a modulo p.

Example 20.7. Note that(8

17

)=+1 while

(6

17

)=−1.

77

Also, for any odd prime p it is clear that 1 is a quadratic residue, i.e.(1

p

)= +1.

However, the value of(−1

p

)varies with p. For example,(−113

)=+1 while

(−119

)=−1.

We will now give an alternative proof of Proposition 20.5 using primitiveroots.

Proof. (of Proposition 20.5) Since p is an odd prime, it follows from the PrimitiveRoot Theorem that there exists a primitive root γ in Z?

p. That is, for every residueα in Z?

p there exists an integer j, 1≤ j ≤ p−1, such that α = γ j.First of all, let us demonstrate that it is impossible to represent α by both odd

and even powers of γ . For suppose that α = γ i = γ j for some 1 ≤ i ≤ j. Thenγ j−i = 1. By Proposition 13.3, ord(γ) | j− i. Since ord(γ) = p−1, we concludethat an even number p−1 divides j− i. But then it means that either both i and jare odd or both i and j are even.

Now recall that, since γ is a primtive root in Z?p, the elements γ,γ2, . . . ,γ p−1 are

distinct, and half of them are even powers of γ . These are the quadratic residues.On the other hand, all odd powers of γ are quadratic nonresidues.

Proposition 20.8. Let p be an odd prime and let α and β be the elements of Z?p.

Then

• If α and β are quadratic residues then αβ is a quadratic residue;

• If α is a quadratic residue and β is a quadratic nonresidue then αβ is aquadratic nonresidue;

• If α and β are quadratic nonresidues then αβ is a quadratic residue.

Proof. Since p is an odd prime, it follows from the Primitive Root Theorem thatthere exists a primitive root γ in Z?

p. Then α = γ i and β = γ j, so αβ = γ i+ j.Now, as we saw in the second proof of Proposition 20.5, if α and β are quadraticresidues then both i and j are even, which means that i+ j is even as well. There-fore αβ = (γ(i+ j)/2)2 is a quadratic residue. We can prove the other two state-ments analogously.

The propositions above suggest one algorithm for calculating the Legendresymbol

(ap

). First, we need to find the primitive root γ in Z?

p and then determine theparity of x in γx = [a]. Fortunately, Euler came up with a much simpler procedure.

78

Proposition 20.9. (Euler’s Test)26 If p is an odd prime and a is an integer suchthat p - a, then

ap−1

2 ≡(

ap

)(mod p).

In other words, if a has a quadratic residue, then ap−1

2 ≡ +1 (mod p), and if adoes not have a quadratic residue, then a

p−12 ≡−1 (mod p).

Proof. Let [b] be a primitive root in Z?p. Suppose that a is a quadratic residue.

Thena≡ b2 j (mod p)

for some non-negative integer j. Thus

ap−1

2 ≡(b2 j) p−1

2 ≡ b(p−1) j ≡ (b j)p−1 ≡ 1 (mod p).

Thus(a

p

)=+1, as claimed.

Now suppose that a is a quadratic nonresidue. Then

a≡ b2 j+1 (mod n)

for some non-negative integer j. Then

ap−1

2 ≡(b2 j+1) p−1

2 ≡ bp−1

2 b(p−1) j ≡ bp−1

2 (mod p).

Note that (b

p−12

)2≡ bp−1 ≡ 1 (mod p),

so the residue class[b

p−12

]is a root of the polynomial X2−1 in Zp. Since p is an

odd prime, by Proposition 12.2, this polynomial has at most two roots. In fact, ithas exactly two roots, namely X =±[1]. Therefore

bp−1

2 ≡±1 (mod p).

Note that it cannot happen that bp−1

2 ≡ 1 (mod p), because then the order of [b]would be strictly less than p− 1 = ϕ(p), which contradicts the fact that [b] is aprimitive root in Z?

p. Therefore

bp−1

2 ≡−1 (mod p),


79

and so we conclude that, when a is a quadratic nonresidue,

ap−1

2 ≡−1 (mod p).

Therefore for any a such that p - a it is the case that ap−1

2 ≡(a

p

)(mod p).

Corollary 20.10. 27 The integer −1 is a quadratic residue modulo an odd primep if and only if p≡ 1 (mod 4).

Proof. By Euler’s Test, (−1p

)≡ (−1)

p−12 (mod p).

Since both sides of the above congruence are equal to ±1, this congruence isactually an equality. The result then follows from the fact that

(−1)p−1

2 =

{1 p≡ 1 (mod 4);−1 p≡ 3 (mod 4).

Example 20.11. Does a = 138 have a quadratic residue modulo p = 557? We useEuler’s Test to answer this question. Note that p−1

2 = 278. We can now compute

ap−1

2 (mod p) using the Double-and-Add algorithm:

ap−1

2 ≡ 138278 ≡−1 (mod 557).

Therefore 138 does not have a quadratic residue modulo 557.

Exercise 20.12. Compute( 51

199

),(364

503

)and

(273461

)using Euler’s Test.

At the end of this section, let us take a look at one curious application of thetheory of quadratic residuocity.

Proposition 20.13. 28 There are infinitely many primes congruent to 1 modulo 4.27Proposition 6.10 in Frank Zorzitto, A Taste of Number Theory.28Proposition 6.11 in Frank Zorzitto, A Taste of Number Theory.

80

Proof. Suppose we have a finite list of primes p1, p2, . . . , pn congruent to 1 mod-ulo 4. We will show how to produce yet another prime congruent to 1 modulo 4that is not on this list. Let

x = (2 · p1 · p2 · · · pn)2 +1.

Let q be any prime factor of x. If q ∈ {2, p1, p2, . . . , pn}, then q | 1, which isimpossible. Since q divides x, we see that

−1≡ (2 · p1 · p2 · · · pn)2 (mod q),

which means that −1 is a quadratic residue modulo q. But then it follows fromCorollary 20.10 that q≡ 1 (mod 4). Thus we were able to produce on more primewhich is not in the original list of primes. Repeating this procedure yet anothertime but with the list p1, p2, . . . , pn, pn+1 = q, we can produce one more primecongruent to 1 modulo 4, and so on. Hence we can generate infinitely manydistinct primes that are congruent to 1 modulo 4.

21 The Law of Quadratic ReciprocityLet p≥ 3 be prime and a be an integer such that p - a. We have already seen severalapproaches for computing

(ap

), for example the Euler’s Test. In this section, we

will investigate one more approach invented by Gauss. In fact, he established whatwe now call the Law of Quadratic Reciprocity, which encapsulates very importantproperties of quadratic residues.

We begin by proving the following proposition on the multiplicativity of theLegendre symbol.

Proposition 21.1. 29 The Legendre symbol is multiplicative. That is, if p is an oddprime and a, b are integers coprime to p, tehn(

abp

)=

(ap

)(bp

).

Furthermore, if a≡ b (mod p), then(ap

)=

(bp

)29Propositon 6.15 in Frank Zorzitto, A Taste of Number Theory.

81

Proof. The second statement is obvious because the residue is the same for allcongruent integers.

To prove that(ab

p

)=(a

p

)(bp

)for any a and b coprime to p, we apply Euler’s

Test (see Proposition 20.9):(ap

)(bp

)= a

p−12 b

p−12 ≡ (ab)

p−12 ≡

(abp

)(mod p).

Since(a

p

)(bp

)= ±1 and

(abp

)= ±1 and these two integers are congruent modulo

p, they have to be identical.

By the Fundamental Theorem of Arithmetic, every positive integer a > 1 isa product of primes. That is, a = q1q2 · · ·qn for some primes q1,q2, . . . ,qn withrepetitions allowed. By Proposition 21.1,(

ap

)=

(q1

p

)(q2

p

)· · ·(

qn

p

).

Also, if a is a negative integer, then a =−1 ·b for some positive integer b, whichmeans that (

ap

)=

(−1p

)(bp

).

We conclude that, in order to determine the value of(a

p

), one has to explore the

values of(q

p

)for distinct primes p and q.

Essentially, for any fixed prime q, the Law of Quadratic Reciprocity allows usto understand what values does the Legendre symbol

(qp

)take when an odd prime

p varies. As a very simple example, let us explore the case q = 2.

Proposition 21.2. 30 If p is an odd prime then(2p

)=

{+1 p≡ 1,7 (mod 8);−1 p≡ 3,5 (mod 8).

Proof. Suppose p = 8k+1 for some for some positive integer k. There are 4k =p−1

2 even integers between 1 and p, namely

2,4,6, . . . ,4k−2,4k,4k+2,4k+4, . . . ,8k−2,8k.


82

Let us compute their product:

x = 2 ·4 ·6 · · ·(4k−2) · · ·(4k) · · ·(4k+2) · · ·(4k+4) · · ·(8k−2) · · ·(8k)

= 24k(1 ·2 ·3 · · ·(2k) · (2k+1) · (2k+2) · · ·(4k−1) · (4k))

= 24k(4k)!

However,4k+2 ≡ 1−4k (mod p)4k+4 ≡ 3−4k (mod p)

...8k−2 ≡−2 (mod p)

8k ≡−1 (mod p).

Using the above information, we can compute x (mod p) as follows:

x≡ 2 ·4 ·6 · · ·(4k−2) · (4k) · (1−4k) · (3−4k) · (5−4k) · · ·(−2) · (−1)

≡ 2 ·4 ·6 · · ·(4k−1) · (4k−3) · (4k−5) · · ·3 ·1 · (−1)2k

≡ (4k)! (mod p).

We conclude that24k(4k)!≡ (4k)! (mod p).

After cancelling (4k)! on both sides we obtain

2p−1

2 ≡ 24k ≡ 1 (mod p).

By Euler’s Test, the integer 2 has a quadratic residue modulo p. The cases p ≡3,5,7 (mod 8) can be studied analogously and are left as an exercise to the reader.

Since we managed to understand how(q

p

)behaves for fixed q = 2, one would

hope that such a result can be established for all other primes. Indeed, this canbe achieved with the Law of Quadratic Reciprocity, proved by the German math-ematician Carl Friedrich Gauss at the age of 19.

Theorem 21.3. (Gauss’s Law of Quadratic Reciprocity)31 Let p and q be distinctodd prime numbers. Then (

pq

)(qp

)= (−1)

p−12 ·

q−12 .


83

In other words,(pq

)=

{(qp

)if p≡ 1 (mod 4) or q≡ 1 (mod 4);

−(q

p

)if p≡ 3 (mod 4) and q≡ 3 (mod 4).

The proof is quite non-trivial, so due to time limitations we will not present itin class or in these notes. If you would like to see the proof, see Section 6.4 inFrank Zorzitto, A Taste of Number Theory.

Example 21.4. Let us examine how the value of(3

p

)depends on the odd prime p.

By the Law of Quadratic Reciprocity,(3p

)(p3

)= (−1)

p−12 ·

3−12 = (−1)

p−12 .

Multiplying both sides of the above equality by(p

3

), we obtain(

3p

)= (−1)

p−12

(p3

).

Now there are two cases to consider:

1. Suppose that p ≡ 1 (mod 4). Then(3

p

)=(p

3

), so the value of

(3p

)depends

on the congruence class of p modulo 3. Note that(1

p

)=+1 and

(2p

)=−1.

We conclude that(3

p

)=+1 if{

p≡ 1 (mod 4);p≡ 1 (mod 3),

and(3

p

)=−1 if {

p≡ 1 (mod 4);p≡ 2 (mod 3).

Since 3 and 4 are coprime, we can apply the Chinese Remainder Theoremto conclude that

(3p

)=+1 when p≡ 1 (mod 12) and

(3p

)=−1 when p≡ 5

(mod 12).

84

2. Analogously, we can analyze the case p≡ 3 (mod 4). We have(3

p

)=−

(p3

),

which means that(3

p

)=+1 if{

p≡ 3 (mod 4);p≡ 2 (mod 3),

and(3

p

)=−1 if {

p≡ 3 (mod 4);p≡ 1 (mod 3).

Applying the Chinese Remainder Theorem, we see that(3

p

)= +1 when

p≡ 11 (mod 12) and(3

p

)=−1 when p≡ 7 (mod 12).

We conclude that(3p

)=

{+1 p≡ 1,11 (mod 12);−1 p≡ 5,7 (mod 12).

Exercise 21.5. Determine for which odd primes p the Legendre symbols(±5

p

)and(±7

p

)are equal to +1 or −1.

Exercise 21.6. Let us determine the value of(247

479

). Note that 209= 13 ·19, 13≡ 1

(mod 4) and 19,479 ≡ 3 (mod 4). Then we may use the multiplicativity of theLegendre symbol and the Law of Quadratic Reciprocity as follows:(

247479

)=

(13

479

)(19479

)=

(47913

)·(−(

47919

))=−

(1113

)(4

19

)=−

(1113

)(2

19

)2

=−(

1311

)=−

(2

11

)= 1.

85

Note that the last equality holds because the only quadratic residues in Z?11 are

[1], [3], [4], [5] and [9]. Since [2] is not in this list, it is a quadratic nonresidue.

22 Multiplicative FunctionsThe last 16 sections were all devoted to the theory of congruences, and at thispoint it is time to switch gears and move towards other topics. This section, webegin our first exposition to the Analytic Number Theory.

In analytic number theory, we utilize the tools of real or complex analysis inorder to answer some questions in number theory. For example, the techniques ofanalytic number theory allow us to explain the asymptotic behaviour of functions

π(x) = #{p≤ x : p is prime}

orQ(x) = #{n≤ x : n is squarefree}.

Here #X denotes the cardinality of the set X . The study of analytic number theorybegins with the introduction of multiplicative and totally multiplicative functions.

Definition 22.1. A non-zero function f : N→C is called multiplicative if for anycoprime positive integers m and n it is the case that

f (mn) = f (m) f (n).

Definition 22.2. A non-zero function f : N→ C is called totally multiplicative iffor any positive integers m and n, not necessarily coprime, it is the case that

f (mn) = f (m) f (n).

Example 22.3. Here are some examples of multiplicative and totally multiplica-tive functions:

1. The indicator function I(n) is totally multiplicative:

I(n) =

{1, if n = 1;0, if n 6= 1;

2. The constant function 1(n) is totally multiplicative:

1(n) = 1 for all n.

86

3. The identity function i(n) is totally multiplicative:

i(n) = n for all n.

4. The Legendre symbol(n

p

)for a fixed odd prime p is totally multiplicative in

accordance with Proposition 21.1;

5. The Euler totient function ϕ(n) is multiplicative, but not totally multiplica-tive;

6. The number of divisors function τ(n) is multiplicative, but not totally mul-tiplicative:

τ(n) = #{d : d | n,d > 0};

7. The sum of divisors function σ(n) is multiplicative, but not totally multi-plicative:

σ(n) = ∑d|nd>0

d;

8. The Mobius function is multiplicative, but not totally multiplicative (youwill prove this fact in Assignment 5):

µ(n) =

1, if n = 1;0, if n is not squarefree;(−1)k, if n is squarefree with k distinct prime factors.

We will now explore some properties of multiplicative functions.

Proposition 22.4. 32 If m and n are coprime positive integers, then every positivedivisor d of their product mn comes from a unique pair of integers a and b suchthat

a | m, b | n and ab = d.

Proof. If the unique factorizations of m and n are given by

m = pe11 pe2

2 · · · pekk and n = q f1

1 q f22 · · ·q

f`` ,


87

then the unique factorization of mn takes the form

d = pr11 pr2

2 · · · prkk qs1

1 qs22 · · ·q

s`` ,

where 0≤ ri ≤ ei and 0≤ s j ≤ f j. If we now set

a = pr11 pr2

2 · · · prkk and b = qs1

1 qs22 · · ·q

s`` ,

it becomes obvious that a | m, b | n and ab = d.Now we need to confirm that the above a and b are unique. Suppose that there

exist positive integers c and e such that c |m, e | n and ec = d. Then ce = ab. Sincec | m and b | n, it must be the case that c and b are coprime. Therefore c | a. By asymmetric argument, a | c, whence a = c, and then b = e.

Proposition 22.5. Let f : N→ C be a multiplicative function. Then

1. f (1) = 1;

2. The function f (n) is fully determined by its values at prime powers;

3. The function g(n) given by

g(n) := ∑d|n

d>0

f (d)

is multiplicative.

Proof. Property 1 is obvious, because

f (n) = f (1 ·n) = f (1) f (n).

By definition, f (n) is non-zero, so there exists some n such that f (n) 6= 0. For suchn, we may cancel f (n) on both sides of the above equality, thus leaving f (1) = 1.

To establish property 2, let n = pe11 pe2

2 · · · pekk be the prime factorization of n.

Then

f (n) = f (pe11 ) f (pe2

2 · · · pekk ) since gcd(pe1

1 , pe22 · · · p

ekk ) = 1;

= f (pe11 ) f (pe2

2 ) f (pe33 · · · p

ekk ) since gcd(pe2

2 , pe33 · · · p

ekk ) = 1;

· · ·= f (pe1

1 ) f (pe22 ) · · · f (pek

k ).

Thus if we know the values of f (pe) for all prime powers pe, we know the valuesof f (n) for all positive integers n.

88

To establish property 3, we use Proposition 22.4:

g(mn) = ∑d|mn

f (d)

= ∑a|m,b|n

f (ab) by Proposition 22.4;

= ∑a|m,b|n

f (a) f (b) since gcd(a,b) = 1 and f is multiplicative;

=

(∑a|m

f (a)

)(∑b|n

f (b)

)= g(m)g(n).

Proposition 22.6. The Euler totient function ϕ(n) is multiplicative. Furthermore,if n = pe1

1 pe22 · · · p

ekk is the prime factorization of n, then

ϕ(n) = (pe11 − pe1−1

1 )(pe22 − pe2−1

2 ) · · ·(pekk − pek−1

k ).

Proof. For an integer x, let us use the notation [x]n to indicate the residue class ofx modulo n.

Let m and n be coprime integers exceeding 1. We will show that Z?mn is in

one-to-one correspondence with the Cartesian product

Z?m×Z?

n = {(α,β ) : α ∈ Z?m,β ∈ Z?

n}.

Let [x]mn ∈ Z?mn. Then gcd(x,mn) = 1, which means that gcd(x,m) = 1 and

gcd(x,n) = 1. But then [x]m and [x]n must be units in Z?m and Z?

n respectively,so [x]m ∈ Z?

m and [x]n ∈ Z?n.

Conversely, if [a]m ∈ Z?m and [b]n ∈ Z?

n, then by the Chinese Remainder Theo-rem there exists some [x]mn ∈ Zmn such that

[x]m = [a]m ∈ Z?m and [x]n = [b]n ∈ Z?

n.

Therefore x is coprime to both m and n, and so x is coprime to mn. Thus weconclude that [x]mn ∈ Z?

mn.Now that we saw that there exists a one-to-one correspondence between Z?

mnand Z?

m×Z?n, we can conclude that

#Z?mn = #(Z?

m×Z?n) .

89

But since the cardinality of the Cartesian product is equal to the cardinality of theindividual sets, i.e.

#(Z?m×Z?

n) = #Z?m ·#Z?

n,

with the help of Exercise 10.2 we can conclude that

ϕ(mn) = #Z?mn = #(Z?

m×Z?n) = #Z?

m ·#Z?n = ϕ(m)ϕ(n).

In order to establish the formula for ϕ(n) recall that according to property 2of Proposition 22.5 it is sufficient to compute ϕ(pe) for a prime power pe. Theonly numbers less than pe that are not coprime to it are p,2p,3p, . . . ,(pe−1−1)p.There are pe−1−1 numbers like that in total, which means that

ϕ(pe) = (pe−1)− (pe−1−1) = pe− pe−1.

Now that we know the formula for ϕ(pe) when pe is a prime power, it is straight-forward to write down the general formula for ϕ(n) because it is multiplica-tive.

Proposition 22.7. The number of divisors function τ(n) is multiplicative. Fur-thermore, if n = pe1

1 pe22 · · · p


σ(n) = (e1 +1)(e2 +1) · · ·(ek +1) .

Proof. To see that τ(n) is multiplicative, let n ≥ 2 be an integer and consider theprime factorization of n:

n = pe11 pe2

2 · · · pekk .

Then every divisor d of n must be of the form

d = p f11 p f2

2 · · · pfkk ,

where 0≤ fi ≤ ei for all i = 1,2, . . . ,k. Each fi has ei +1 possibilities, so we seethat there are exactly

τ(n) = (e1 +1)(e2 +2) · · ·(ek +1)

possible divisors of n.Now suppose that

m = pe11 pe2

2 · · · pekk and n = q f1

1 q f22 · · ·q

f``

are coprime, i.e. the prime numbers p1, p2, . . . , pk,q1,q2, . . . ,q` are distinct. Then

τ(mn) = (e1 +1)(e2 +1) · · ·(ek +1)( f1 +1)( f2 +1) · · ·( f`+1) = τ(m)τ(n),

which means that τ(n) is a multiplicative function.

90

Proposition 22.8. The sum of divisors function σ(n) is multiplicative. Further-more, if n = pe1

1 pe22 · · · p


σ(n) =

(pe1+1

1 −1p1−1

)(pe2+1

2 −1p2−1

)· · ·

(pek+1

k −1pk−1

).

Proof. To see that σ(n) is multiplicative, note that

σ(n) = ∑d|n

d>0

d = ∑d|n

d>0

i(d),

where i(n) = n is the identity function. Since the identity function i(n) is multi-plicative, it follows from property 3 of Proposition 22.5 that σ(n) is multiplicativeas well.

In order to establish the formula for σ(n) recall that according to property 2of Proposition 22.5 it is sufficient to compute σ(pe) for a prime power pe. Thedivisors of pe are 1, p, p2, . . . , pe, so

σ(pe) = 1+ p+ p2 + . . .+ pe =pe+1−1

p−1.

Note that the last equality holds because the sequence 1, p, . . . , pe constitutes an(e+1)-term geometric progression with the first element equal to 1 and commonratio p. Now that we know the formula for σ(pe) when pe is a prime power, itis straightforward to write down the general formula for σ(n) because it is multi-plicative.

23 The Mobius InversionFrom now on, when writing d | n, we will always assume that the divisor d ispositive. As we shall see, the Mobius function

µ(n) =

1, if n = 1;0, if n is not squarefree;(−1)k, if n is squarefree with k distinct prime factors

plays a crucial role in analytic number theory.

91

Proposition 23.1. 33 For every n≥ 1,

∑d|n

µ(d) = I(n).

Proof. Let g(n) = ∑d|n µ(n). Note that

g(1) = µ(1) = 1 = I(1).

Now let n≥ 2. Since µ(n) is multiplicative, it follows from property 3 of Propo-sition 22.5 that g(n) is multiplicative as well. By property 2 of Proposition 22.5,it suffices to check that g(pe) = 0 for every prime power pe. We have

g(pe) = ∑d|pe

µ(d)

= µ(1)+µ(p)+µ(p2)+ . . .+µ(pe)

= 1−1+0+ . . .+0= 0= I(pe),

so the result follows.

The Mobius function is important because it allows us to express the functionf in terms of g whenever these two functions are connected by the relation

g(n) = ∑d|n

f (d).

The operation of expressing f through g is called the Mobius inversion.

Proposition 23.2. 34 If f and g are arbitrary functions, not necessarily multi-plicative, that are defined on the set of positive integers and satisfy

g(n) = ∑d|n

f (d)

for all n≥ 1, then

f (n) = ∑d|n

g(d)µ(n

d

)= ∑

d|ng(n

d

)µ(d).

33Proposition 8.6 in Frank Zorzitto, A Taste of Number Theory.34Theorem 8.7 in Frank Zorzitto, A Taste of Number Theory.

92

Proof. First, note that for a positive integer n and a pair of positive integers d,e itis the case that de | n if and only if d | n and e | n/d.

Second, note that

∑d|n

g(n

d

)µ(d) = ∑

d|n

∑e| nd

f (e)

µ(d)

= ∑d|n,e| nd

f (e)µ(d)

= ∑ed|n

f (e)µ(d)

= ∑e|n,d| ne

f (e)µ(d)

= ∑e|n

∑d| ne

µ(d)

f (e)

= ∑e|n

I(n

e

)f (d)

= f (n).

Before proceeding to examples of the Mobius inversion, let us prove the fol-lowing fact about the Euler totient function ϕ(n).

Proposition 23.3. 35 For every positive integer n,

∑d|n

ϕ(n) = n.

Proof. Let g(n) = ∑d|n ϕ(d). By property 3 of Proposition 22.5, the function g(n)is multiplicative. Therefore, by property 2 of Proposition 22.5, it is sufficientto understand its values g(pe) for prime powers pe. Using the formula given inProposition 22.6, we obtain

g(pe) = ϕ(1)+ϕ(p)+ϕ(p2)+ . . .+ϕ(pe)

= 1+(p−1)+(p2− p)+ . . .+(pe− pe−1)

= pe.


93

And now, since g(n) is multiplicative, for any integer n with the prime factoriza-tion n = pe1

1 pe22 · · · p

ekk we may conclude that

g(n) = g(pe11 pe2

2 · · · pekk ) = g(pe1

1 )g(pe22 ) · · ·g(pek

k ) = pe11 pe2

2 · · · pekk = n.

Now that we established the connection between the identity function i(n) andthe Euler totient function ϕ(n), we can write down a new formula for ϕ(n) via theMobius inversion.

Example 23.4. Let us prove that for every positive integer n it is the case that

ϕ(n) = n∑d|n

µ(d)d

By Proposition 23.3, the identity function i(n) and the Euler totient functionϕ(n) are connected by means of the relation

i(n) = ∑d|n

ϕ(d).

Now the Mobius inversion formula tells us that

ϕ(n) = ∑d|n

µ(d)i(n

d

)= ∑

d|nµ(d)

nd= n∑

d|n

µ(d)d

.

Example 23.5. Note thatσ(n) = ∑

d|nd,

which means that there is a connection between the sum of divisors function σ(n)and the identity function i(n). But then it follows from the Mobius inversionformula that

n = ∑d|n

µ(d)σ(n

d

).

Exercise 23.6. The von Mangoldt function, denoted by Λ(n), is defined as

Λ(n) =

{log p, if n = pk for some prime p and integer k ≥ 1;0, otherwise.

94

Prove thatlogn = ∑

d|nΛ(d),

and then use the Mobius inversion to establish the formula

Λ(n) =−∑d|n

µ(d) logd.

24 The Prime Number TheoremIn 1797 or 1798, it has been conjectured by Legendre that the number of primes upto x is approximated by the function x

A logx+B , where A and B are some constants.According to the recollections of Gauss, “in the year 1792 or 1793”, when he was15 or 16 years old, he made a similar observation. In simple terms, this conjecturestates that, up to x, there are “roughly” x

logx prime numbers.The Prime Number Theorem is a theorem which confirms the conjecture made

by Legendre and Gauss. It is one of the most renowned results in Analytic Num-ber Theory. The Prime Number Theorem was proved independently by JacquesHadamard and Charles Jean de la Vallee-Poussin in 1896.

Theorem 24.1. (The Prime Number Theorem) Let

π(x) := #{p≤ x : p is prime}.

Then

limx→∞

π(x)x

logx= 1.

A more accurate statement of the Prime Number Theorem is the followingone:

π(x) = Li(x)+O(

xe−a√

logx),

where a is a positive constant and

Li(x) =x∫

2

dtlog t

.

Indeed, the function Li(x) describes the behaviour of the prime counting functionmore precisely than x

logx . In this form, we also see the error term, which tells ushow far is the value of π(x) from the value of Li(x).

95

The analytic proof of Prime Number Theorem heavily relies on complex anal-ysis, so it is not “elementary”. More precisely, it requires some delicate analysisof (non-trivial) zeros of the Riemann zeta function

ζ (s) :=∞

∑n=1

1ns ,

where s is a complex number with Re(s) > 1. The elementary proof of PrimeNumber Theorem was discovered half a century later, in 1948, by the Norwegianmathematician Atle Selberg.36

Since the proof was introduced, the error term O(xe−a√

logx) was improvedmany times. If the Riemann Hypothesis is true, the error term can be improvedto O(

√x logx). The Riemann Hypothesis concerns the distribution of non-trivial

zeros of ζ (s). It is undoubtedly one of the hardest open mathematical problems.At the University of Waterloo, there are several experts which work in the areaof Analytic Number Theory and problems related to the distribution of zeros ofRiemann zeta function, including Yu-Ru Liu and Michael Rubinstein.

It is worthwhile mentioning a very interesting elementary argument of Erdos,which explains why the function x

logx “captures” the behaviour of π(x). The proofdoes not involve any analytic techniques and should be quite accessible to secondor third year undergraduate students in mathematics. To those who are interestedin the subject, we recommend this proof for further reading.

Theorem 24.2. (Erdos, 1949) For x≥ 2,(3log2

8

)x

logx< π(x)< (6log2)

xlogx

.

Proof. See Theorem 4 in https://uwaterloo.ca/pure-mathematics/sites/ca.pure-mathematics/files/uploads/files/pmath440notes_0.pdf.

25 The Density of Squarefree NumbersIn this section, we will see one basic analytical result on the density of squarefreenumbers.

36On the history of elementary proof of Prime Number Theorem and Selberg’s dispute withErdos, see the article of D. Goldfeld, The elementary proof of the prime number theorem: anhistorical perspective, 2003.

96

Theorem 25.1. Let

Q(x) = #{n≤ x : n≥ 2 is squarefree}.

Then the natural asymptotic density of squarefree numbers is given by

limx→∞

Q(x)x

=6

π2 ≈ 0.6079.

In other words, Theorem 25.1 tells us that over 60% of all positive integers aresquarefree. Before proceeding to the proof, let us establish the following simplelemma.

Lemma 25.2. Let f (n) be a multiplicative function such that the series

∞

∑n=1| f (n)|

converges. Then

∞

∑n=1

f (n) = ∏p is prime

(1+ f (p)+ f (p2)+ . . .

).

Proof. For a fixed positive number y, the following identity holds:

∏p is prime

p<y

(1+ f (p)+ f (p2)+ . . .) = ∑n

if p | n then p < y

f (n).

As y approaches infinity, the right hand side approaches ∑∞n=1 f (n), while the left

hand side approaches the desired Euler product.Since the series ∑

∞n=1 | f (n)| converges, it must be the case that

∑n≥y| f (n)| → 0

as y approaches infinity. We can utilize this fact in order to show that, as y ap-proaches infinity,∣∣∣∣∣∣∣

∞

∑n=1

f (n)− ∑n

if p | n then p < y

f (n)

∣∣∣∣∣∣∣=∣∣∣∣∣∣∣ ∑

n∃p|n : p≥y

f (n)

∣∣∣∣∣∣∣≤ ∑n≥y| f (n)| → 0.

97

This observation allows us to conclude that∞

∑n=1

f (n) = limy→∞

∑n

if p | n then p < y

f (n) = limy→∞

∏p is prime

p<y

(1+ f (p)+ f (p2)+ . . .)

= ∏p is prime

(1+ f (p)+ f (p2)+ . . .

).

Proof. (of Theorem 25.1) Note that

µ2(n) =

{1, if n is squarefree;0, otherwise,

which means thatQ(x) = ∑

n≤xµ

2(n).

Let `(n) denote the largest integer such that `(n)2 | n. Then it follows from Propo-sition 22.8 that

µ2(n) =

{1, if `(n) = 1;0, otherwise;

= I (`(n))

= ∑d|`(n)

µ(d)

= ∑d

d2|n

µ(d).

As it turns out, this formula is much easier to analyze than µ2(n).Now let {x} := x−bxc denote the fractional part of x. Note that {x} satisfies

98

0≤ {x}< 1 for any x. Then

Q(x) = ∑n≤x

µ2(n)

= ∑n≤x

∑d

d2|n

µ(d)

= ∑d≤√

x

µ(d)

∑n≤xd2|n

1

= ∑

d≤√

x

µ(d)⌊ x

d2

⌋= ∑

d≤√

x

µ(d)( x

d2 −{ x

d2

})= ∑

d≤√

x

µ(d)x

d2 − ∑d≤√

x

µ(d){ x

d2

}.

Since |µ(d){x/d}|< 1, we conclude that

Q(x) = ∑d≤√

x

µ(d)x

d2 − ∑d≤√

x

µ(d){ x

d2

}< x ∑

d≤√

x

µ(d)d2 + ∑

d≤√

x

1

= x ∑d≤√

x

µ(d)d2 + b

√xc

≤ x∞

∑d=1

µ(d)d2 − x

∞

∑d>√

x

µ(d)d2 +

√x.

Now observe that∣∣∣∣∣ ∑d>√

x

µ(d)d2

∣∣∣∣∣≤ ∑d>√

x

1d2 <

∞∫b√

xc

dtt2 =

1b√

xc≤ 2√

x.

Above we utilized the fact that√

x≤ 2b√

xc for all x≥ 2. For convenience, define

99

the constant c as

c :=∞

∑d=1

µ(d)d2 .

Then

Q(x)≤ cx− x∞

∑d>√

x

µ(d)d2 +

√x

< cx+ x2√x+√

x

= cx+3√

x.

Through analogous observations, we can also establish the lower bound on Q(x),and obtain the final relation

cx−3√

x < Q(x)< cx+3√

x.

Now the only thing that is left for us to do is to compute c. Recall that

∞

∑n=1

1n2 =

π2

6.

This result was proved by Leonhard Euler in 1734. Further, by the argumentanalogous to the second proof of Theorem 2.10, we see that

π2

6=

∞

∑n=1

1n2 = ∏

p is prime

(1+

1p2 +

1p4 + . . .

)= ∏

p is prime

(1− 1

p2

)−1

.

Note that the last equality holds due to the formula for the infinite geometric series.Since the function µ(n)/n2 is multiplicative and

∞

∑d=1

∣∣∣∣µ(d)d2

∣∣∣∣≤ ∞

∑d=1

1d2 =

π2

6< ∞,

100

we can apply Lemma 25.2 to the series ∑∞d=1 µ(d)/d2 in order to obtain

c =∞

∑d=1

µ(d)d2 = ∏

p is prime

(1+

µ(p)p2 +

µ(p2)

p4 + . . .

)= ∏

p is prime

(1− 1

p2

)

=

(∞

∑d=1

1n2

)−1

=6

π2 .

Thus we conclude that

6π2 x−3

√x < Q(x)<

6π2 x+3

√x,

and further6

π2 −3√x<

Q(x)x

<6

π2 +3√x.

By letting x tend to infinity, we see that the Squeeze Theorem implies

limx→∞

Q(x)x

=6

π2 .

26 Perfect NumbersOne of the oldest problems in mathematics concerns the existence of odd perfectnumbers. Around 300BC, these numbers were introduced by Euclid in his bookElements (VII.22).

Definition 26.1. A positive integer n is called perfect if the sum of its divisors isequal to 2n, or in other words σ(n) = 2n.

The first eight perfect numbers are

6,28,496,8128,33550336,8589869056,137438691328,2305843008139952128.

101

Aside from the fact that they tend to grow pretty quickly (which we shall explainlater), we may notice one thing that they all have in common, namely that they areall even. But do there exist odd perfect numbers? We do not know. This questionwas studied thoroughly over the past two centuries, and quite a few things areknown about odd perfect numbers. For example, if an odd perfect number n exist,it must satisfy the following three (out of many other) criteria:

1. n > 101500;

2. n has at least 101 prime factors and at least 10 distinct prime factors;

3. The largest prime factor of n is greater than 108.

In 2003, Carl Pomerance gave a heuristic argument why the existence of oddperfect numbers is highly unlikely. Those who are interested can find his argumenthere: http://home.earthlink.net/~oddperfect/pomerance.html.

Unlike odd perfect numbers, we do know that even perfect numbers exist.Even more than that, we know exactly how perfect numbers look like. However,we still do not know whether there are infinitely many even perfect numbers. Aswe shall see later, this problem is equivalent to showing that there are infinitelymany Mersenne primes.

Definition 26.2. Let Mn := 2n−1. An integer Mp = 2p−1 is called a Mersenneprime if it is prime.

The first eight Mersenne primes are

3,7,31,127,8191,131071,524287,2147483647.

As we will see in the proof of Euclid-Euler Theorem, which was proved by Leon-hard Euler in 1747, the even perfect numbers and Mersenne primes are closelyrelated.

Theorem 26.3. (Euclid-Euler Theorem, 1747)37 An even positive integer n is aperfect number if and only if it has the form n = 2p−1Mp, where Mp is a Mersenneprime.

Proof. The sufficient condition was proved by Euclid around 300 BC. You areasked to reproduce his proof in Assignment 5, so we omit it in these lecture notes.


102

For the necessary condition, suppose that n is even and perfect. Let us writen = 2p−1m, where p≥ 2 and m is odd. Note that p≥ 2 because n is even. We willshow that m = 2p−1, and that m is prime.

We have that n is perfect, and so

σ(n) = 2n = 2pm.

Because 2p−1 and m are coprime and σ is multiplicative, the first equation yields

σ(n) = σ(2p−1)σ(m).

By adding up the divisors of 2p−1 we obtain

σ(2p−1) = 1+2+22 + . . .+2p−1 = 2p−1.

We conclude thatσ(n) = (2p−1)σ(m),

and so2pm = (2p−1)σ(m).

Since 2p and 2p−1 are coprime, 2p−1 | m. So

m = (2p−1)d

for some positive integer d.Now we need to prove that in the expression m= (2p−1)d we have d = 1. We

plug in this expression into the equality 2pm = (2p−1)σ(m) in order to obtain

2p(2p−1)d = (2p−1)σ(m),

and thus 2pd = σ(m). From m = (2p−1)d and 2pd = σ(m) we come to

m+d = 2pd = σ(m).

Now suppose that d > 1. Since d < m, there are at least three divisors ofm, namely 1, d and m. So σ(m) ≥ m+ d + 1, and this contradicts the fact thatσ(m) = m+d. Therefore d = 1.

To see that m is prime, note that σ(m) =m+d =m+1. Since the divisors of madd up to m+1, our m can have only 1 and m as divisors, which makes m a prime.Hence our perfect even number m is of the form 2p−1Mp, where Mp = 2p−1 is aMersenne prime.

103

Though we do not know if there are infinitely many Mersenne primes, we doknow quite a few of them. On January 7th 2016, The Great Internet MersennePrime Search reported the discovery of the 49th Mersenne prime, which is thelargest Mersenne prime known to date. This prime is M74207281, and it has 22338618decimal digits. If you want to make some significant impact to ComputationalNumber Theory, try to search for other Mersenne primes!

27 Pythagorean TriplesIn Section 4, we learned how to solve the linear Diophantine equation ax+by = c.We will now turn our attention to equations of degree two or more. The analysis ofsuch equations can be much more challenging, and many Diophantine equations,such as Thue equations, remain the objects of active research nowadays.

In this section, we will classify all positive integer solutions to the Pythagoreanequation

x2 + y2 = z2.

Note that if the integers x, y and z satisfy the above equation, then so do integersdx, dy and dz for any integer d. Thus it is only interesting to consider the casewhen gcd(x,y,z) = 1. In this case, we call the triple of solutions primitive. Thefirst three primitive solutions to the Pythagorean equation are (x,y,z) = (3,4,5),(5,12,13) and (8,15,17).

Theorem 27.1. Suppose integers x, y and z satisfy the Pythagorean equationx2 + y2 = z2. Then there exist integers d,m,n such that

x = d(n2−m2), y = 2dmn, z = d(n2 +m2).

Proof. 38 Let d = gcd(x,y,z). Then the triple (x/d,y/d,z/d) is also a solution,so without loss of generality we may assume that gcd(x,y,z) = 1, i.e. (x,y,z) is aprimitive solution. From here it follows that either x or y have different parity, forif we assume that both x and y are odd, then x2+y2≡ 2 (mod 4), which contradictsthe fact that z2 ≡ 0,1 (mod 4) for any integer z. Without loss of generality, wemay assume that x is odd and y is even, which means that z is odd. Now we write

y2 = z2− x2 = (z− x)(z+ x).

38The proof is taken from Section 1.1 of M. J. Jacobson, Jr. and H. C. Williams, Solving the PellEquation, 2009.

104

If we let g = gcd(z− x,z+ x) = gcd(2z,z+ x) = gcd(z− x,2x) (see Proposition5.1), then g | 2z and g | 2x, which means that g | gcd(2z,2x) = 2gcd(z,x). Since(x,y,z) is a primitive solution, it must be the case that gcd(z,x) = 1. This meansthat g | 2, and since x and z are odd it must be the case that g = 2.

Now we can write (y2

)2=

(z− x

2

)(z+ x

2

).

Since the value on the left hand side of the above equality is a perfect square andz−x

2 , z+x2 are coprime integers, it must be the case that z−x

2 and z+x2 are perfect

squares. Putz− x

2= m2 and

z+ x2

= n2.

But then x = n2−m2, y = 2mn and z = n2 +m2. Now we see that for any integerd the identity (

d(n2−m2))2

+(2dmn)2 =(d(n2 +m2)

)2

holds, which means that all solutions (x,y,z) to x2 + y2 = z2 are of the form(d(n2−m2),2dmn,d(n2 +m2)), as claimed.

28 Fermat’s Infinite Descent.Fermat’s Last Theorem

Perhaps, the most famous mathematical story is the story of Fermat’s Last Theo-rem. Around 1637, Fermat wrote his Last Theorem in the margin of his copy ofDiophantus’s Arithmetica. When reformulated, his claim sounds as follows:

Theorem 28.1. (Fermat’s Last Theorem) Let n≥ 3. Then the equation xn+yn = zn

has no solutions in positive integers x, y and z.

He claimed to discover a “truly marvellous” proof of this fact, but couldn’twrite it because the margin of the book which he was reading was too narrow tocontain all of the proof.

Many mathematicians tried to establish the proof of Fermat’s Last Theorem.The case n = 4 was proved by Fermat himself in 1636. In 1753, Euler proved itfor the case n = 3. Alternative proofs were given by Kausler, Legendre, Calzo-lari, Lame, and many others. In his proof, Euler utilized Fermat’s idea of infinite

105

descent, which we shall discuss in this section. The case n = 5 was proved byDirichlet and Legendre around 1825, and alternative proofs were given by Gauss,Lebesgue, Lame, and others. The case n= 7 was proved by Gabriel Lame in 1839.

In the 1820’s, Sophie German developed an approach to attack the problemfor several exponents at the same time. In particular, she managed to show thatthe Fermat’s Last Theorem holds for all primes n < 100.

In 1847, Gabriel Lame suggested to approach the problem by factoring theequation xp + yp = zp for odd prime p as follows:

zp = xp + yp = (x+ y)(x+ζpy)(x+ζ2p y) · · ·(x+ζ

p−1p y), (5)

where ζp = exp(2πi/p) is the primitive p-th root of unity. If instead of the stan-dard ring of integers Z one considers the ring of integers

Z[ζp] = {x0 + x1ζp + x2ζ2p + . . .+ζ

p−1p : x1,x2, . . . ,xp ∈ Z},

then one would hope that such notions as unique factorization or coprimality takeplace in Z[ζp], just like they do in Z. Assuming that this is the case, one couldshow that the algebraic integers x+y,x+ζpy, . . . ,x+ζ

p−1p y are coprime, and since

the expression (5) has a p-th power of an integer z on its left hand side, one couldthen hope that x+ ζ i

py = qpi for some qi ∈ Z[ζp], where i = 0,1, . . . , p. In other

words, each of the numbers x+ y,x+ ζpy, . . . ,x+ ζp−1p y are perfect p-th powers,

and one could prove that this is impossible. Note how similar this idea to the onepresented in the proof of Theorem 27.1.

Unfortunately, there is a flaw in this argument: it is not necessarily true thatthe ring Z[α] for some algebraic number α has the unique factorization. Perhaps,the most famous example is that in the ring

Z[√−5] = {x1 + x2

√−5: x1,x2 ∈ Z}

one can write the number 6 in two different ways:

6 = 2 ·3 = (1+√−5)(1−

√−5).

The odd primes p such that the elements of the ring Z[ζp] may not possess theunique factorization are called irregular primes. They are called regular other-wise. The first eight irregular primes are

37,59,67,101,103,131,149157.

106

Therefore Lame’s strategy applies to all primes p < 100, except for p = 37, 59and 67. Around 1850, Ernst Kummer managed to prove that for all regular primesthe Fermat equation xp+yp = zp has no solutions in positive integers when p is anodd prime. However, it is still unknown whether there are infinitely many regularprimes. In 1964, Carl Ludwig Siegel conjectured that approximately 60.65% ofall prime numbers are regular. The techniques suggest by Lame and Kummer(and Euler before that) evolved into a whole new area of mathematics, knownnowadays as the Algebraic Number Theory. The next few sections will contain abrief introduction to this subject.

The Fermat’s Last Theorem was proved by the English mathematician An-drew Wiles. His proof was published in 1994 in the special issue of Annals ofMathematics. The original paper is available here: https://math.stanford.

edu/~lekheng/flt/wiles.pdf. As an exercise: try to understand at least thefirst page! Since the Fields medal, which is one of the most important awardsfor mathematicians, is restricted to those under age 40, and Andrew Wiles provedthe Fermat’s Last Theorem at the age 41, he received a silver plaque from theInternational Mathematical Union instead of the Fields medal.

The proof of Andrew Wiles combined many areas of number theory together.It is an interconnection of the Theory of Elliptic Curves, Theory of Modular Forms,Representation Theory, Iwasawa Theory, and many other mathematical subjects.In short, Andrew Wiles managed to do the following. Consider the equation

y2 = x3 +ax+b,

where a and b are complex numbers such that 4a3 + 27b2 6= 0. When a and bare real, such an equation defines a plane curve, called an elliptic curve. In 1985,the German mathematician Gerhard Frey pointed out that for an integer n≥ 3 theelliptic curve

y2 = x(x−an)(x+bn),

where a and b are positive integers such that an+bn = cn for some integer c, mustbe very special. In particular, he pointed out that such a curve must be semistableand non-modular. The fact that it is non-modular would then contradict the so-called Taniyama-Shimura Conjecture, proposed by the Japanese mathematiciansYutaka Taniyama and Goro Shimura in 1957. The conjecture stated that everyelliptic curve, semistable or not, has to be modular. Andrew Wiles managed toprove this conjecture in the semistable case. Fermat’s Last Theorem then followsfrom this result. The fact that all elliptic curves, semistable or not, are modu-lar, was proved in 2001 by Christophe Breuil, Brian Conrad, Fred Diamond and

107

Richard Taylor. This result is known as the Modularity Theorem. It took morethan 350 years for the proof of Fermat’s Last Theorem to be discovered.

Fermat claimed that he had the proof of the Fermat’s Last Theorem. Of course,it is highly unlikely that the argument he had in mind was as involved as the onegiven by Andrew Wiles. Most likely, Fermat believed that the theorem could beproved using the technique of infinite descent, which he developed. This tech-nique allowed him to prove the theorem in the special case when n = 4. Wepresent a more general result in the following proposition. The idea of infinitedescent can be summarized as follows: when considering certain Diophantineequations, like x3 + 2y3 + 4z3 = 0 or x4 + y4 = z2, one can show that the exis-tence of one solution leads to the existence of another solution, which is “smaller”than the previous one. One would then obtain an infinite strictly decreasing se-quence of positive integers x1 > x2 > x3 > .. ., which would contradict the factthat the natural numbers are bounded below by 1. We will demonstrate the appli-cation of this technique in two special cases. More examples can be found in thefollowing survey of Keith Konrad: http://www.math.uconn.edu/~kconrad/

blurbs/ugradnumthy/descent.pdf.

Proposition 28.2. (Fermat, 1636)39 The equation x4+y4 = z2 has no solutions inpositive integers x, y and z.

Proof. By Theorem 27.1, every primitive solution (x,y,z) to the equation x2 + y2 = z2

must be of the form

x = n2−m2, y = 2mn, z = n2 +m2.

Assume that there is a solution to x4 + y4 = z2, where x, y and z are positiveintegers. Without loss of generality, we may suppose that gcd(x,y) = 1, whichmeans that gcd(x,z) = 1 and gcd(y,z) = 1. We will find a second positive integersolution (x′,y′,z′) with gcd(x′,y′) = 1 that is smaller than (x,y,z) in a suitablesense.

Since x4 +y4 = z2 and gcd(x,y) = 1, at least one of x and y is odd. Otherwise,z2 ≡ 2 (mod 4), and this congruence as we saw before has no solutions. Withoutloss of generality, we may assume that x is odd and y is even. Then z is odd. Since(x2)2 +(y2)2 = z2, the triple (x2,y2,z) must be a primitive Pythagorean triple, sothere exist integers m and n such that

x2 = n2−m2, y2 = 2mn, z = n2 +m2. (6)39Theorem 3.1 in Keith Conrad, Proofs by descent.

108

Since x2 +m2 = n2 and gcd(m,n) = 1, we conclude that (x,m,n) is another prim-itive Pythagorean triple. Since x is odd, the formula for primitive Pythagoreantriples once again tells us that

x = a2−b2, m = 2ab, n = a2 +b2, (7)

where a and b are positive. Substituting the values of m and n in (7) into thesecond equation of (6), we obtain y2 = 4(a2 +b2)ab. Since y is even,(y

2

)2= (a2 +b2)ab.

Since gcd(a,b) = 1, the three factors on the right are pairwise coprime. Since theyare all positive, each of them must be a perfect square:

a = x′2, b = y′2, a2 +b2 = z′2.

Since gcd(a,b) = 1, it must be the case that gcd(x′,y′) = 1. Now the last equationcan be rewritten as x′4 + y′4 = z′2, so (x′,y′,z′) is another solution to our originalequation with gcd(x′,y′) = 1.

Now we compare z′ to z. Since

0 < z′ ≤ z′2 = a2 +b2 = n≤ n2 < z,

we see that from one primitive solution (x,y,z) to x4 + y4 = z2 we can produceanother primitive solution (x′,y′,z′) such that z > z′. But then we could producean infinite strictly decreasing sequence of positive integers z > z′ > z′′ > .. ., andthis contradicts the fact that the positive integers are bounded below by 1.

Corollary 28.3. The Fermat’s Last Theorem holds for n = 4. In other words, theequation x4 + y4 = z4 has no solutions in positive integers x, y and z.

Another example of the proof by infinite descent is the proof of irrationality of√2. This proof was discovered by Pythagoreans, who showed that the diagonal

of a square cannot be represented as a ratio of two integers. The Pythagoreanskept the proof of this fact as a secret and, according to the legend, its discoverer(possibly Hippasus of Metapontum) was murdered for divulging it.

Proposition 28.4. The number√

2 is irrational. That is, there exist no integers mand n such that

√2 = m/n.

109

Proof. Suppose not and there exist positive integers m and n such that√

2 = m/n.Then m =

√2n. Raising both sides of this equation to the power of two, we obtain

m2 = 2n2,

so (m,n) is a positive solution to the Diophantine equation x2 = 2y2. From theabove equality we see that 2 | m2, which means that 2 | m. But then we can writem as m = 2m′ for some integer m′. Therefore

m2 = (2m′)2 = 4m2 = 2n2.

Thus we obtain 4m2 = 2n2, and by cancelling 2 on both sides we get

n2 = 2m′2.

Thus from the positive integer solution (m,n) we can obtain another positive inte-ger solution (n,m′) to the Diophantine equation x2 = 2y2. Note that

m′ =12

m =1√2

n < n,

so the second coordinate in the solution (m,n) is strictly greater than the secondcoordinate in the solution (n,m′). Thus, if there would be a positive integer so-lution to the Diophantine equation x2 = 2y2, we could produce an infinite strictlydecreasing sequence of positive integers n > m′ > m′′ > .. ., and this contradictsthe fact that the positive integers are bounded below by 1.

Exercise 28.5. Let k be a positive integer. Prove that the number√

k is rational ifand only if k is a perfect square.

29 Gaussian IntegersLet i denote one of the complex roots of the polynomial x2+1. That is, the numberi satisfies the equation i2 =−1. Notice that if i is a root of x2 +1, then so is −i.

Definition 29.1. A complex number of the form a+bi, where a,b ∈ Z is called aGaussian integer. The set of Gaussian integers is denoted by Z[i].

The notation Z[i] suggests that the set of Gaussian integers is analogous to thering of rational integers Z, where we now treat the numbers i or −i as (Gaussian)integers as well. The similarities between the two sets become even more obviousonce we note that, just like the set of rational integers Z, the set Z[i] forms acommutative ring under the standard operations of addition and multiplication.

110

Proposition 29.2. The set of Gaussian integers

Z[i] := {a+bi : a,b ∈ Z}

forms a commutative ring under the standard operations of addition and multipli-cation.

Proof. Strictly speaking, to prove this result one would have to do the routineverification of the ring axioms and the commutativity. We will leave this part anexercise. What is worthwhile mentioning is that both 0 = 0+0 · i and 1 = 1+0 · iare the elements of Z[i], and also that the operations of addition and multiplicationare well-defined. That is, for all a+ bi,c+ di ∈ Z[i], their sum, difference andproduct are the elements of Z[i]:

(a+bi)± (c+di) = (a± c)+(b±d)i ∈ Z[i];

(a+bi)(c+di) = (ac−bd)+(ad +bc)i ∈ Z[i].

Also, note that Z ( Z[i], so every rational integer is also a Gaussian integer.We will see that the Gaussian integers will be of a great help when we will try

to answer the question which integers can be represented as a sum of two squares.In other words, we will use Gaussian integers to solve the Diophantine equationn = a2 + b2, where n is fixed and a,b ∈ Z are variables. Note that, if n ∈ N isrepresentable as a sum of two squares, then

n = a2 +b2 = (a+bi)(a−bi),

so we just managed to factor a rational integer n, which is also a Gaussian integer,as a product of two Gaussian integers a+bi and a−bi.

Definition 29.3. Let a,b ∈ Z[i]. We say that a divides b, or that a is a factor ofb, when b = ak for some k ∈ Z[i]. We write a | b if this is the case, and a - botherwise.

Example 29.4. For example, 5 = (1+2i)(1−2i), so (1+2i) | 5 and (1−2i) | 5.

One of the most important invariants attached to a Gaussian integer z is itsnorm, which we denote by N(z).

111

Definition 29.5. The norm function is defined to be

N : Z[i]→ N∪{0}, a+bi 7→ a2 +b2.

Definition 29.6. Let z = a+bi be a Gaussian integer. The complex conjugate ofz is z = a−bi. The absolute value of z is

|z| :=√

zz =√(a+ ib)(a− ib) =

√a2 +b2.

Note the obvious connection between the norm of a Gaussian integer z= a+biand the absolute value of z:

N(z) = N(a+bi) = a2 +b2 =(√

a2 +b2)2

= |z|2.

The norm map has many nice properties. For example, it is multiplicative; that is,the norm of the product of two Gaussian integers is equal to the product of theirnorms:

N(zw) = |zw|2 = (|z||w|)2 = |z|2|w|2 = N(z)N(w).

We will see the usefulness of this property later. Another important thing to men-tion is that the only Gaussian integer whose norm is equal to zero is zero itself.That is, N(z) = 0 if and only if z = 0.

Now comes the time to speak about the geometric interpretation of the Gaus-sian integers. Consider Figure 1,40 which depicts a complex plane. The Gaussianintegers a+ bi form a square grid located at points (a,b), where the coordinatesare rational integers. If z = a+ bi is a Gaussian integer, then the point (a,−b),which corresponds to the complex conjugate z = a− bi of z, is just the result ofreflection of the point (a,b) along the x-axis. In turn, the absolute value |z| rep-resents the distance from the point (a,b) to the origin. Note that it is equal to thedistance from the point (a,−b) to the origin.

The next important concept that we need to introduce is the concept of units.

Definition 29.7. A Gaussian integer u is a unit of Z[i] when u | w for all w in Z[i].

In other words, the units are those very special numbers that divide everysingle element of the ring. The notion of a unit does not apply only to the ring ofGaussian integers, but in fact applies to any algebraic ring. For example, in Z theonly units are 1 and −1. When talking about the prime factorization of rational

40The picture is taken from https://upload.wikimedia.org/wikipedia/commons/7/

7d/Gaussian_integer_lattice.png.

112

Figure 1: Gaussian integers

integers, we always omit ±1. When doing so, we actually mean that the primefactorization is unique up to multiplication by ±1. We will see that the analogueof the Fundamental Theorem of Arithmetic holds for Gaussian integers, and soevery Gaussian integer has the unique prime factorization up to multiplication bya unit. In the next proposition, we prove that the only units in the ring of Gaussianintegers are ±1 and ±i.

Proposition 29.8. 41 The following are equivalent:

1. z is a unit in Z[i];

2. N(z) = 1;

3. z ∈ {1,−1, i,−i};

4. the inverse complex number z−1 := 1/z is also a Gaussian integer.

Proof. Suppoze that z is a unit. Then z | 1, since z divides every Gaussian integer.Thus 1 = zw for some w in Z[i]. Then

1 = 12 +02 = N(1) = N(zw) = N(z)N(w).

Since N(z) and N(w) are positive integers, we deduce that N(z)= 1 (and N(w) = 1).


113

Suppose that N(z)= 1, where z= a+bi for some a,b∈Z. We have a2 +b2 = 1,which means that a2 = 1, b = 0 or a = 0, b2 = 1. In the first case, a = ±1 andb = 0, which means that z = ±1. In the second case, a = 0 and b = ±1, whichmeans that z =±i.

If z is one of 1,−1, i,−i, its inverse is 1,−1,−i, i, respectively, and these areagain Gaussian integers.

Finally, suppose that z and z−1 are Gaussian integers. If w is any other Gaus-sian integer, we see that z | w, because w = z(z−1w) and z−1w is a Gaussian inte-ger.

We will now turn our attention to establishing the analogue of the FundamentalTheorem of Arithmetic in the ring of Gaussian integers. For this purpose, we needto introduce the definition of a Gaussian prime.

Definition 29.9. Let z be a Gaussian integer. Then z is called a Gaussian prime ifit is not a unit and any factorization z = wu in Z[i] forces w or u to be a unit.

Compare this definition to Definition 2.5, where we introduced the notion ofa rational prime. One can notice the similarities, since an ordinary rational primecan be factored in Z only if one of its factors is a unit, which in the case of Z are±1.

Example 29.10. The integer 2 is a prime in Z, but it is not a Gaussian primebecause 2 = (1+ i)(1− i), and neither 1+ i nor 1− i is a unit. The number 3,however, is not only a rational prime, but also a Gaussian prime. For suppose that3 = zw for some Gaussian integers z and w. Then

9 = N(3) = N(zw) = N(z)N(w),

which means that N(z) | 9 and N(w) | 9. If we assume that N(z) = 1 then z mustbe a unit by Proposition 29.8. Thus we need to eliminate this case. But thenN(z) = 3, and if we let z = a+bi, then

3 = N(z) = N(a+bi) = a2 +b2.

However, we saw many times that integers congruent to 3 modulo 4 cannot berepresented as a sum of two squares, which means that N(z) 6= 3. Analogously,N(w) 6= 3. But then either N(z) = 1 or N(w) = 1, which means that either z or wis a unit.

114

Exercise 29.11. Prove that every rational prime p such that p ≡ 3 (mod 4) is aGaussian prime.

The next step is to establish the analogue of the Remainder Theorem for Gaus-sian integers.

Proposition 29.12. 42 If z,w are Gaussian integers and z 6= 0, then there existGaussian integers q and r such that w = qz+ r, where N(r)< N(z).

Proof. Recall the geometric interpretation of the Gaussian integers, given in Fig-ure 1. The complex number w/z is located somewhere on the complex plane C.This w/z need not be a Gaussian integer. However, as Figure 2 demontrates, onecan see that it falls into one of the rectangular areas, whose vertices are Gaussianintegers.

Figure 2: Complex number in a square with Gaussian integers as vertices

We pick our Gaussian integer q so that the distance between the point corre-sponding to q and the point corresponding to w/z is the smallest. By inspection,we can see that such a Gaussian integer q must satisfy∣∣∣∣wz −q

∣∣∣∣≤ 1√2.

The Gaussian integer q has to be in one of the four boxes as shown on Figure 3,and the diagonal of each box has length√(

12

)2

+

(12

)2

=1√2.

115

Figure 3: Gaussian integer closest to a given complex number

We conclude that ∣∣∣∣wz −q∣∣∣∣2 ≤ 1

2< 1.

Therefore ∣∣∣∣w− zqz

∣∣∣∣2 < 1,

and so |w− zq|2 < |z|2, which is the same as N(w− zq) < N(z). Put r := w− zq,and obtain w = zq+ r, where N(r)< N(z).

Example 29.13. Let us see how the Remainder Theorem for Gaussian integersworks. Let w = 4+7i and z = 1−3i. Then

wz=

4+7i1−3i

=(4+7i)(1+3i)(1−3i)(1+3i)

=−17+19i

10=−1.7+1.9i.

We see that the nearest integer point to (−1.7,1.9) is (−2,2). Thus q =−2+2i.Then

r = w−qz = (4+7i)− (1−3i)(−2+2i) =−I.

We conclude that4+7i = (−2+2i)(1−3i)− i.

Note that N(−i) = 1 < 10 = N(z).


116

We will now prove the analogue of Bezout’s lemma for Gaussian integers. Fora,b ∈ Z[i], we call an integer ax+by with x,y ∈ Z[i] a Gaussian combination of aand b. In the following proposition, it is crucial that for every Gaussian integer athe value N(a) is always non-negative.

Proposition 29.14. Let a,b be Gaussian integers such that a 6= 0 or b 6= 0. If dis a Gaussian combination of a and b such that N(d) is minimal, then d dividesevery combination of a and b.

Proof. We know that ax+by = d and N(d) > 0 is minimal. Now consider someinteger combination

c = as+bt,

where s, t ∈ Z[i]. We want to show that d | c. By Proposition 29.12,

c = dq+ r

for some q,r ∈ Z[i], where N(r)< N(d). Thus

0≤ r= c−dq= as+bt− (ax+by)q= a(s− xq)+b(t− yq).

We see that r is an integer combination of a and b such that N(r)< N(d). Becaused is the integer combination of a and b such that N(d) is minimal, the only optionis that N(r) = 0. Hence d | c. In particular, d | a and d | b, because a,b are integercombinations of a and b.

Definition 29.15. A Gaussian integer d = ax+by such that x,y are Gaussian in-tegers, d | a and d | b is called a greatest common divisor of Gaussian integers aand b.

Exercise 29.16. Let d1 and d2 be greatest common divisors of Gaussian integersa and b. Prove that d1 = ud2 for some unit u in Z[i].

Finally, we prove the analogue of Euclid’s lemma for Gaussian integers.

Proposition 29.17. 43 if p is a Gaussian prime and p | zw for some Gaussianintegers z,w, then p | z or p | w.


117

Proof. Suppose that p - z. We will show that p | w. Let u be a greatest commondivisor of p and z. Thus u = pt + zs for some t,s ∈ Z[i] and u | p, u | z. Writep = uk for some k ∈ Z[i]. Since p is a Gaussian prime, one of u or k is a unit inZ[i].

If k is a unit, then u = pk−1 ∈ Z[i], and so p | u. Since p | u and u | z, it mustbe that p | z, contrary to our assumption. Thus u is a unit with inverse u−1 ∈ Z[i].

Now multiply u = pt + zs by wu−1:

w = ptwu−1 + zswu−1.

Clearly, p | ptwu−1, and we are given that p | zw. Thus p | w.

Exercise 29.18. Use the results established above to prove the Fundamental The-orem of Arithmetic for Gaussian integers: any Gaussian integer that is not a unitcan be written uniquely (up to reordering and multiplication by a unit) as a productof Gaussian primes.

Exercise 29.19. Compute the quotient and the remainder after division of w by z,when (w,z) = (6+ i,2− i),(27−5i,3−7i),(4+7i,8− i).

Exercise 29.20. Let ω denote the primitive third root of unity. That is,

ω = e2πi3 =

−1+√−3

2.

Note that ω satisfies the equation ω2 +ω +1 = 0. The set

Z[ω] := {a+bω : a,b ∈ Z}

is called the ring of Eisenstein integers. For any Eisenstein integer α = a+ bω ,where a,b ∈ Z, the norm map is defined by

N(a+bω) := a2−ab+b2. (8)

Just like the ring of Gaussian integers, the ring of Eisenstein integers is a UniqueFactorization Domain. Geometrically, Eisenstein integers form a lattice on thecomplex plane (see Figure 4).

1. Prove that Z[ω] is a ring by showing that 0,1∈Z[ω], and for all α,β ∈Z[ω]it is the case that α±β ∈ Z[ω] and α ·β ∈ Z[ω];

118

2. Prove that the norm map defined in (8) is multiplicative. That is, for everyα,β ∈ Z[ω] it is the case that N(αβ ) = N(α)N(β ). Explain why N(α)≥ 0for every α ∈ Z[ω] and why N(α) = 0 if and only if α = 0;

3. We say that υ ∈ Z[ω] is a unit if υ | α for every α ∈ Z[ω]. Prove thatυ ∈ Z[ω] is a unit if and only if N(υ) = 1;

4. Find all units in Z[ω].

Figure 4: Eisenstein integers

Exercise 29.21. Let

Z[√

2] :={

a+b√

2: a,b ∈ Z}.

We say that υ ∈ Z[√

2] is a unit if υ | α for every α ∈ Z[√

2]. Prove that there areinfinitely many units in Z[

√2].

Hint: Consider the Pell equation x2 − 2y2 = ±1. Explain why, for every(x1,y1) satisfying this Diophantine equation, the value x1 + y1

√2 is a unit in

Z[√

2]. Find any solution (x1,y1), and then prove that, for every positive inte-ger n, the integer coefficients xn and yn of the number xn + yn

√2 := (x1 + y1

√2)n

also satisfy the equation x2n−2y2

n =±1.

Exercise 29.22. Consider the ring

Z[√−13] = {a+b

√−13: a,b ∈ Z}.

For every a,b ∈ Z, the norm map on Z[√−13] is defined by

N(a+b√−13) := a2 +13b2.

119

You may assume that the norm is multiplicative. We will show that the uniquefactorization fails in Z[

√−13]. To solve this problem, you might want to refer to

Section 2.3 in Frank Zorzitto, A Taste of Number Theory.

1. Prove that the only units of Z[√−13] are ±1.

Hint: Let υ = a+ b√−13 for a,b ∈ Z. By definition, υ ∈ Z[

√−13] is a

unit if υ | α for every α ∈Z[√−13]. Thus, in particular, υ | 1. Explain why

this fact implies the equality a2 + 13b2 = 1. What are the solutions to thisDiophantine equation?

2. We say that a non-zero number γ ∈ Z[√−13] is prime if the factoriza-

tion γ = αβ for α,β ∈ Z[√−13] implies that either α is a unit or β is a

unit. Prove that the numbers 2,7, 1+√−13 and 1−

√−13 are prime in

Z[√−13];

3. Using Part (b), explain why the unique factorization fails in Z[√−13].

30 Fermat’s Theorem on Sums of Two SquaresWe will now turn our attention to the Diophantine equation n = a2 +b2, where nis a fixed positive integer and a,b are integer variables. On December 25th 1640,Fermat sent the proof of the following theorem to Mersenne, which is why insome sources it is called Fermat’s Christmas Theorem. This theorem will allowus to explain which positive integers are representable as a sum of two squares,and how many solutions does the equation n = a2 +b2 have.

Theorem 30.1. (Fermat’s Theorem on Sums of Two Squares)44 If p is a rationalodd prime and p ≡ 1 (mod 4), then p = a2 +b2 for some rational integers a andb.

Proof. (Richard Dedekind, circa 1894) Since p ≡ 1 (mod 4), it follows fromCorollary 20.10 that −1 is a quadratic residue modulo 4. Thus −1 ≡ x2 (mod p)for some rational integer x. Thus p | x2 +1 in Z, and so p | (x+ i)(x− i) in Z[i].

Now note that p - x+ i, for if we assume that x+ i = p(c+di) for some Gaus-sian integer c+ di, then by equating the imaginary parts we get pd = 1, whichcontradicts the fact that p - 1. Likewise, p - x− i.


120

Since p divides a product without dividing either of the factors, Proposition29.17 tells us that p is not a Gaussian prime. Thus p = uv, where u,v ∈ Z[i] arenot units. But then

p2 = N(p) = N(uv) = N(u)N(v),

so N(u) = 1, p or p2.If N(u) = 1, then u is a unit. If N(u) = p2, then N(v) = 1, sov is a unit. Hence N(u) = N(v) = p. But if we now write u = a+bi, then

p = N(u) = N(a+bi) = a2 +b2,

so p is a sum of two squares of rational integers.

Now we know that, when p is an odd prime, the equation p = x2 + y2 has asolution in positive integers x and y if and only if p ≡ 1 (mod 4). Notice thatit also has a solution when p = 2, because 2 = 12 + 12. We would now like togeneralize this result to all positive integers n. For this purpose, we need to provethe following lemma.

Lemma 30.2. 45 If p in Z[i] is a Gaussian prime and pk | uv for some Gaussianintegers u and v and exponent k ≥ 1, then there are exponents j, ` = 0,1, . . . ,ksuch that p j | u, p` | v and j+ `= k.

Proof. We will prove this statement using the principle of mathematical induction.Base case. For k = 1, the result is equivalent to Euclid’s lemma for Gaussian

integers, stated in Proposition 29.17.Induction hypothesis. Suppose that the theorem is true for k−1.Induction step. Let pk | uv. Then p | u or p | v. Suppose that p | v. Write

v = wp for some w in Z[i]. Then pk | uwp, which means that pk−1 | uw. Accordingto the induction hypothesis, there exist integers j and m, 0 ≤ j,m ≤ n− 1, suchthat p j | u, pm | w, and j +m = k− 1. But then pm+1 | wp = v. If we now put`= m+1, then p j | u, p` | v, and j+ `= k, as claimed.

Proposition 30.3. Let n be a positive integer. The Diophantine equation n = x2 + y2

has a solution if and only if n has the prime factorization

n = 2t pe11 pe2

2 · · · pekk q2 f1

1 q2 f22 · · ·q

2 f`` ,

where p j ≡ 1 (mod 4) for all j = 1,2, . . . ,k and q j ≡ 3 (mod 4) for all j =1,2, . . . , `.


121

Proof. Let w = a+bi and z = c+di be Gaussian integers. Since the norm map ismultiplicative, it is the case that

(a2 +b2)(c2 +d2) = N(w)N(z)= N(wz)= N ((ac−bd)+(ad +bc)i)

= (ac−bd)2 +(ad +bc)2.

The identity above allows us to conclude that the product mn of any two numbersm = a2+b2 and n = c2+d2 will be representable as a sum of two squares as well:

mn = (a2 +b2)(c2 +d2) = (ac−bd)2 +(ad +bc)2.

Since 2 is representable as a sum of two squares, as well as any odd prime p≡ 1(mod 4), we conclude that every integer n with the prime factorization

n = 2t pe11 pe2

2 · · · pekk ,

where p j ≡ 1 (mod 4) for all j = 1,2, . . . ,k is representable as a sum of twosquares. We know that for every rational prime q ≡ 3 (mod 4) the Diophantineequation q2 f+1 = a2 + b2 has no solutions for every non-negative integer f , be-cause q2 f+1 ≡ 3 (mod 4). However, every even power of q is representable as asum of two squares, because q2 f = (q f )2 + 02 for every positive integer f . Butthen once again we can use the identity

(a2 +b2)(c2 +d2) = (ac−bd)2 +(ad +bc)2

to conclude that every integer n with the prime factorization

n = 2t pe11 pe2

2 · · · pekk q2 f1

1 q2 f22 · · ·q

2 f`` ,

where p j ≡ 1 (mod 4) for all i= 1,2, . . . ,k and q j ≡ 3 (mod 4) for all j = 1,2, . . . , `is representable as a sum of two squares. We will now show that these are the onlynumbers representable as a sum of two squares.

To prove this fact, all that we have to do is to show that, whenever n = x2 +y2

and some prime q≡ 3 (mod 4) satisfies n = qkm, where m is an integer such thatq - m, then the exponent k has to be even. We see that

qk | x2 + y2 = (x+ yi)(x− yi).

122

Since every rational prime q≡ 3 (mod 4) is also a Gaussian and prime, it followsfrom Lemma 30.2 that there exist integers j and `, 0≤ j, `≤ k, such that j+`= k,

q j | (x+ yi) and q` | (x− yi).

Suppose that j ≥ `. Then x+yi = q j(c+di) for some integers c and d. Therefore

x+ yi = q jc+q jdi,

which means that x = q jc and y = q jd. But then

n = x2 + y2 = q2 jc2 +q2 jd2 = p2 j(c2 +d2).

Since j ≥ `, we see that 2 j = j+ j ≥ j+ `= k, and since qk is the highest powerof q that divides n and q2 j | n, we must conclude that k = 2 j, which is an evennumber.

Now that we know for which positive integers n does the Diophantine equationn = x2 + y2 have a non-trivial solution, there are only two questions left for usto discuss namely how many solutions are there and how does one compute thesolutions. Let r2(n) denote the number of integer solutions to n = x2 + y2, wherex,y ∈ Z are allowed to be positive, negative or zero. As it turns out,

r2(n) = 4(d1(n)−d3(n)) ,

where d1(n) and d3(n) correspond to the number of divisors of n congruent to 1and 3 modulo 4, respectively. This formula can also be rewritten as follows:

r2(n) = 4 ∑d|n

d≡1,3 (mod 4)

(−1)d−1

2 .

From this formula it follows that for every prime p≡ 1 (mod 4) the Diophantineequation p = x2 + y2 has only 4 solutions, and if (x,y) is one of them, then theother three are (x,−y),(−x,y) and (−x,−y).

As for the computation of the actual solutions, when p≡ 1 (mod 4) is prime,the computation of x and y such that p = x2 + y2 basically reduces to finding aquadratic residue of −1 modulo p. This can be done in polynomial time usingthe Tonelli-Shanks Algorithm. If z is an integer such that z2 ≡ −1 (mod p) thenone can use the Euclidean algorithm for Gaussian integers to compute x+ yi =gcd(z+ i, p). In order to find a solution to n = x2 + y2 for a composite integer

123

n one would have to factor n first, and as we know in general the integer factor-ization is a difficult problem. In fact, as we saw in Assignment 3, the ability torepresent a composite integer n as a sum of two squares in two different waysyields a non-trivial factorization of n. Such a method of factorization is calledthe Euler Factorization Method. Leonhard Euler used this method to factor theinteger 10000009 = 293 ·3413 by knowing the fact that

10000009 = 10002 +32 = 9722 +2352.

Exercise 30.4. Consider the setup as in Exercise 29.19. We say that γ 6= 0 is anEisenstein prime if the factorization γ = αβ for α,β ∈ Z[ω] implies that either α

is a unit or β is a unit.

1. Prove that every rational prime p≡ 2 (mod 3) is also an Eisenstein prime.

Hint: See Example 29.10.

2. Note that 3 = (1−ω)(1−ω2), so 3 is not an Eisenstein prime. Also, itcan be shown that every rational prime p ≡ 1 (mod 3) is not an Eisensteinprime. Use this fact, as well as Parts (a) and (b), to show that every integern with the prime factorization

n = 3t pe11 pe2

2 · · · pekk q2 f1

1 q2 f22 · · ·q

2 f`` ,

where pi ≡ 1 (mod 3) for all i = 1,2, . . . ,k and q j ≡ 2 (mod 3) for all j =1,2, . . . , `, admits a non-trivial solution (x,y) to the Diophantine equationn = x2− xy+ y2.

31 Continued FractionsEven though most of the real numbers are not rational, to simplify calculations weapproximate them by rationals. However, some rationals are better than others, sowhich ones should we pick? For example, we can truncate the decimal expansionof the number π = 3.1415926535 . . . after the 9th digit, and approximate π bythe rational number 3141592654/109. However, after a careful investigation wediscover that the rational number 103993/33102 also approximates π to 9 deci-mal digits while having a significantly smaller denominator. So we can ask thefollowing question:

124

For a given real number α and a positive integer Q, which rationalnumbers p/q with 1 ≤ q ≤ Q correspond to the minimal value of|α− p/q|?

This question lies in the core of the subarea of Number Theory called Dio-phantine Approximation. As we will find out, the best possible rational approx-imations to a non-zero real number α form a sequence {pn/qn}∞

n=0, entitled thecanonical continued fraction expansion of α . Every canonical continued fractionis a special case of what is called a partial fraction, whose properties we will nowinvestigate.

Definition 31.1. Let a0,a1, . . . ,aN be real numbers such that ai > 0 for all i satis-fying 1≤ i≤ N. Define the partial fraction [a0,a1, . . . ,aN ] by

[a0,a1, . . . ,aN ] := a0 +1

a1 +1

. . .+ 1aN

.

The numbers a0,a1, . . . ,aN are called partial coefficients of [a0,a1, . . . ,aN ]. If n isan integer such that 0≤ n≤N, the partial fraction [a0,a1, . . . ,an] is called the n-thcovergent to [a0,a1, . . . ,aN ].

Note that in the definition of a partial fraction we let ai’s be real numberssuch that ai > 0 for all i satisfying 1 ≤ i ≤ N. If we allow ai’s to be negative orcomplex, then not every choice of ai’s is admissible, as the examples [1,1,−1]or [i, i, i] demonstrate. Soon we will introduce canonical continued fractions andrestrict the domain of ai’s from real numbers to integers.

Example 31.2. Let us determine the value of [√

2,√

2,√

2]. We have

[√

2,√

2,√

2] =√

2+1√

2+ 1√2

=√

2+

√2

3=

4√

23

.

Also, we see that

4√

23

= 1+1

1+ 172+3√

2

=

[1,1,

72+3√

2].

Thus several continued fractions can correspond to the same number. Some con-tinued fractions, like

4√

23

= [1,1,7,1,2,1,7,1,2,1, . . .],

125

appear to be periodic, while some continued fractions, like

3√

3 = [1,2,3,1,4,1,5,1,1,6,2,5,8, . . .]

seem to be aperiodic. They can also be infinite. Certain continued fractions havequite elegant continued fraction expansions. For example,

tan(1) = [1,1,1,3,1,5,1,7,1,9,1,11,1,13, . . .].

Exercise 31.3. Compute [1,2,3,4,5] and [√

5,2√

5,3√

5]. Give an example of acontinued fraction of 3

√2 with at least five terms.

Some elementary properties of continued fractions are

[a0,a1, . . . ,an] =

[a0,a1, . . . ,an−1 +

1an

],

[a0,a1 . . . ,an] = [a0, [a1, . . . ,an]]

and, more generally,

[a0,a1, . . . ,an] = [a0,a1, . . . ,am−1, [am, . . . ,an]] .

Proposition 31.4. Let a0,a1, . . . ,aN be real numbers such that ai > 0 for all isatisfying 1≤ i≤ N. For a non-negative integer n, define the real numbers pn andqn by

p0 = a0, q0 = 1,p1 = a1a0 +1, q1 = a1. . . . . .pn = an pn−1 + pn−2, qn = anqn−1 +qn−2.

Then [a0,a1, . . . ,an] = pn/qn.

Proof. We will prove this statement using the principle of mathematical induction.Base case. Clearly, we have

[a0] = a0 =a01 = p0

q0,

[a0,a1] =a0a1+1

a1= p1

q1,

so the result holds for n = 0,1.Induction hypothesis. Suppose that the statement is true for n = m− 1,m,

where m < N.

126

Induction step. We will show that the result holds for n = m+1. We have

[a0,a1, . . . ,am+1] =

[a0,a1, . . . ,am +

1am+1

]

=

(am + 1

am+1

)pm−1 + pm−2(

am + 1am−1

)qm−1 +qm−2

=am+1(am pm−1 + pm−2)+ pm−1

am+1(amqm−1 +qm−2)+qm−1

=am+1 pm + pm−1

am+1qm +qm−1

=pm+1

qm+1.

Proposition 31.5. For any positive integer n, it is the case that

pnqn−1− pn−1qn = (−1)n−1

or, equivalently,pn

qn− pn−1

qn−1=

(−1)n

qnqn−1.

Proof. See Assignment 6.

Proposition 31.6. For any positive integer n, it is the case that

pnqn−2− pn−2qn = (−1)nan

or, equivalently,pn

qn− pn−2

qn−2=

(−1)nan

qnqn−2.

Proof. The result follows from Proposition 31.5:pn

qn− pn−2

qn−2=

an pn−1 + pn−2

anqn−1 +qn−2− pn−2

qn−2

=qn−2(an pn−1 + pn−2)− pn−2(anqn−1 +qn−2)

qn−2(anqn−1 +qn−2)

=an(pn−1qn−2− pn−2qn−1)

qnqn−2

=(−1)nan

qnqn−2.

127

Proposition 31.7. Let a0,a1, . . . ,aN be real numbers such that ai > 0 for all isatisfying 1≤ i≤ N. Let xn = pn/qn. Then the following hold:

1. It is the case thatx0 < x2 < x4 < .. .

andx1 > x3 > x5 > .. . .

2. Every odd convergent is greater than any even convergent. That is,

x2k+1 > x2`

for any k and `;

3. The N-th convergent xN is greater than any even convergent and less thanany odd convergent.

Proof. Let us prove property 1. If n is even, then it follows from Proposition 31.6that

xn− xn−2 =pn

qn− pn−2

qn−2=

an

qnqn−2> 0.

Thereforexn−2 =

pn−2

qn−2<

pn

qn= xn

for all even n. Analogously, one can show that xn−2 > xn for all odd n.To establish property 2, recall that by Proposition 31.5 we have x2k+1 > x2k for

all non-zero k. If `≤ k, then x2k > x2`, so x2k+1 > x2`. If ` > k, then x2` < x2`+1;since x2k+1 > x2`+1, it follows that x2k+1 > x2`.

Finally, to see that property 3 holds, we note that if xN is even then by property 1we have x0 < x2 < .. . < xN . Thus xN is greater than any even convergent. On theother hand, by property 2, every even convergent, including xN , is less than everyodd convergent. The result then follows for all even N, and similarly one can alsoargue that it is true when N is odd.

Example 31.8. Let us see an example of the phenomenon described in Proposition31.7. Consider the following continued fraction expansion of

√7:

√7 = [2,1,1,1,4,1,1,1,4,1, . . .] = 2.64575 . . . .

128

The first 10 convergents of√

7 are

2,3,52,83,3714

,4517

,8231

,12748

,590223

,717271

.

We see that

2 <52<

3714

<8231

<590223

< .. . <√

7 < .. . <717271

<12748

<4517

<83< 3.

The n-th convergents to the left of√

7 correspond to even n, while the n-th con-vergents to the right of

√7 correspond to odd n.

Now let α be a real number. We construct a canonical continued fractionexpansion of α as follows:

Step 1. Define a0 := bαc. If α = a0 then α = [a0]. Otherwise let α = a0 +1/α1 for some α1.

Step 2. Let a1 = bα1c. If α1 = a1 then α = a0 + 1/a1 = [a0,a1]. Otherwiselet α1 = a1 +1/α2 for some α2.

We repeat this procedure. If it stops after a finite number of steps then α =[a0, . . . ,aN ]. Otherwise α = [a0,a1, . . .] has an infinite canonical continued frac-tion expansion.

Example 31.9. Let us determine the first five terms in the canonical continuedfraction expansion of π = 3.14159 . . ., as well as the first five convergents of π .

Step 1. Define a0 := bπc= 3. Then

π = [3,α1] = 3+1

α1,

where α1 = 1/(π−3) = 7.06251.

Step 2. Define a1 := bα1c= 7. Then

π = [3,7,α2] = 3+1

7+ 1α2

,

where α2 = 1/(α1−7) = 15.99659. We see that the first convergent to π is

p1

q1= a0 +

1a1

= 3+17=

227.

129

Step 3. Define a2 := bα2c= 15. Then

π = [3,7,15,α3] = 3+1

7+ 115+ 1

α3

,

where α3 = 1/(α2− 15) = 1.00342. We see that the second convergent toπ is

p2

q2= a0 +

1a1 +

1a2

= 3+1

7+ 115

=333106

.

Proceeding in the same fashion, we see that

π = [3,7,15,1,292,1, . . .],

and the first five convergents of π are

227,333106

,355113

,10399333102

,10434833215

.

Exercise 31.10. Determine the first five terms in the canonical continued fractionexpansion of the Euler constant e = 2.71828 . . ., as well as the first five conver-gents of e.

Exercise 31.11. Prove that α has a finite canonical continued fraction expansionif and only if α is a rational number.

Proposition 31.12. Let α be a real number and let pn/qn be the n-th convergentin the canonical fraction expansion of α . Then

|q1α− p1|> |q2α− p2|> |q3α− p3|> .. . .

Proof. Let α = [a0,a1, . . . ,an,αn+1]. Then

α =αn+1 pn + pn−1

αn+1qn +qn−1.

It follows from Proposition 31.5 that

|qnα− pn|=∣∣∣∣qn

(pnαn+1 + pn−1

qnαn+1 +qn−1

)− pn

∣∣∣∣=|qn pn−1− pnqn−1||qnαn+1 +qn−1|

=1

qnαn+1 +qn−1.

130

Now note that

qnαn+1 +qn−1 ≥ qn +qn−1

= anqn−1 +qn−2 +qn−1

= qn−1(an +1)+qn−2

> qn−1αn +qn−2.

The observation made above allows us to conclude that

|qnα− pn|=1

qnαn+1 +qn−1<

1qn−1αn +qn−2

= |qn−1α− pn−1|.

Proposition 31.13. Let α be a real number and let pn/qn be the n-th convergentin the canonical fraction expansion of α . Then

1(an+1 +2)q2

n<

∣∣∣∣α− pn

qn

∣∣∣∣< 1an+1q2

n.

Proof. Let α = [a0,a1, . . . ,αn+1] for some αn+1 such that an+1≤αn+1 < an+1+1.Also, let pn/qn = [a0,a1, . . . ,an] be the n-th convergent to α . Then it follows fromthe formula

α =αn+1 pn + pn−1

αn+1qn +qn−1,

as well as from Proposition 31.5, that∣∣∣∣α− pn

qn

∣∣∣∣= 1qn(αn+1qn +qn−1)

.

Since qn > qn−1, we can deduce the desired result by establishing the followinginequalities:

an+1qn < αn+1qn +qn−1 < (an+1 +1)qn +qn = (an+1 +2)qn.

Proposition 31.14. Let α be a real number and let pn/qn be the n-th convergentin the canonical fraction expansion of α . Then for all integers p and q such that0 < q < qn+1 it is the case that |qα− p| ≥ |qnα− pn|.

131

Proof. Note that if p = pn and q = qn then the result holds. Thus we may assumethat p/q 6= pn/qn. Recall from Proposition 31.5 that

pnqn+1−qn pn+1 = (−1)n+1.

Then the matrix

A =

(pn pn+1qn qn+1

)has a non-zero determinant detA = (−1)n+1, which means that it is invertible.Furthermore, the inverse matrix is defined by

A−1 =1

detA

(qn+1 −pn+1−qn pn

)= (−1)n+1

(qn+1 −pn+1−qn pn

).

As we can see, the matrix A−1 has integer coefficients. This means that the matrixequation (

pq

)= A

(uv

)can be solved in integers u and v, and the solution is

u = (−1)n+1(qn+1 p− pn+1q), v = (−1)n+1(pnq−qn p).

Note that v 6= 0 and u 6= 0, for otherwise it would be the case that p/q = pn/qnor p/q = pn+1/qn+1. Of course, the latter is impossible because, according to thehypothesis, q < qn+1.

Now consider the expressions

p = upn + vpn+1,q = uqn + vqn+1.

Note thatq = uqn + vqn+1 < qn+1.

We claim that u and v have opposite signs. If we assume that both u and v arenegative then q would be negative, which contradicts the assumption q > 0. Onthe other hand, if we assume that both u and v are positive, then q would have toexceed qn+1. This would lead us to a contradiction to the inequality establishedabove. Since neither u nor v can be zero, we see that our claim holds; that is, thenumbers u and v have different signs.

132

Next, recall that according to property 3 of Proposition 31.7 either

pn

qn< α <

pn+1

qn+1or

pn+1

qn+1< α <

pn

qn

must hold, depending on whether n is even or odd. In any case, it must be thatαqn− pn and αqn+1− pn+1 have different signs. Since u,v have different signsand αqn− pn,αqn+1− pn+1 have different signs, the signs of u(qnα − pn) andv(qn+1α− pn+1) match. Hence

|qα− p|= |α(uqn + vqn+1)− (upn + vpn+1)|= |u(qnα− pn)+ v(qn+1α− pn+1)|= |u(qnα− pn)|+ |v(qn+1α− pn+1)|≥ |u||qnα− pn|≥ |qnα− pn|.

The fact that u(qnα− pn) and v(qn+1α− pn+1) have the same signs was utilizedto establish the third equality. In turn, the last inequality follows from the fact thatu is a non-zero integer.

Corollary 31.15. Let p/q be a rational number and let α be a real number. Thenthe inequality ∣∣∣∣α− p

q

∣∣∣∣< 12q2

implies that p/q = pn/qn for some non-negative integer n. That is, the numberp/q appears as a convergent in the canonical continued fraction expansion of α .

Proof. See Assignment 6.

We conclude this section by discussing the question of periodicity of canonicalcontinued fraction expansions.

Definition 31.16. Let α be a real number with the canonical continued fractionexpansion

α = [a0,a1, . . . ,an;b1,b2, . . . ,bk,b1,b2, . . . ,bk,b1, . . .].

In other words, at some point the elements of the continued fraction expansionstart to repeat. We indicate this by writing

α = [a0,a1, . . . ,an;b1,b2, . . . ,bk].

133

A canonical continued fraction expansion of such kind is called preperiodic, andif the terms a0,a1,a2, . . . ,an are missing we say that it is periodic. The smallestnumber k such that the terms repeat is called the period of a continued fraction.

It was proved by Joseph-Louis Lagrange that a real number α has a preperiodiccanonical continued fraction expansion if and only if it is a quadratic irrational.That is, α = a+ b

√d for some rational numbers a,b 6= 0 and d, where d is a

positive integer that is not a perfect square.

Example 31.17. Let us determine the canonical continued fraction expansion of√7. By computing the first few terms, we see that

√7 = [2,1,1,1,4,1,1,1,4,1, . . .].

Thus we can guess that√

7 = [2,1,1,1,4]. Let us prove this fact.Let θ = [1,1,1,4]. Then

θ = [1,1,1,4] = [1,1,1,4,θ ] = 1+1

1+ 11+ 1

4+ 1θ

=14θ +39θ +2

.

We see that θ satisfies the equation

3θ2−4θ −1 = 0.

The above equation has two roots, but since θ > 0 we can conclude that

θ =2+√

73

.

Then

[2,1,1,1,4] = [2,θ ] = 2+1θ= 2+

32+√

7=

7+2√

72+√

7=√

7,

as claimed.

Exercise 31.18. Determine canonical continued fraction expansions for 1+√

52 and√

2. Are they both preperiodic? Are they both periodic? What are the periods oftheir continued fraction expansions?

Exercise 31.19. Prove that if a real number α has a preperiodic canonical contin-ued fraction expansion, then there exist rational integers a,b and c, not all zero,such that

aα2 +bα + c = 0.

134

32 The Pell’s EquationFor more details on the subject, we refer the reader to the monograph of M. J. Ja-cobson, Jr. and H. C. Williams, Solving the Pell Equation, 2009.

In 1773, Gotthold Ephraim Lessing was appointed librarian of the HerzogAugust Library in Wolfenbuttel, Germany. In this library, he discovered an an-cient Greek manuscript with a poem of 44 lines, which contained an interestingarithmetical problem. This problem is attributed to Archimedes and is called theArchimedes’ Cattle Problem. The problem was to calculate the number of cattlein the herd of Helios, the god of the sun. There were two parts to this problem,the first of which could be solved relatively easy by setting up a system of sevenequations with eight unknowns, each for one type of bulls and cows present in theherd. Much more challenging was the second part of the problem, which, in itsessence, asked the reader to calculate a solution to the equation

x2−4729494y2 = 1.

Despite its innocent look, the smallest solution to this equation has more than100000 digits. In 1880, A. Amthor discovered that the smallest herd that couldsatisfy both parts of this problem had approximately 7.76× 10206544 bulls. Incomparison, it is conjectured that there are between 1078 and 1082 atoms in theknown, observable universe.46 Of course, Amthor himself did not calculate thisnumber precisely. In 1965, the precise answer to the Archimedes’ Cattle Problemwas given by Hugh Williams, Gus German and Robert Zanke, who were Uni-versity of Waterloo students at that time. To calculate the answer, they used acombination of the IBM 7040 and IBM 1620 computers. You can find a fasci-nating article about the history of computing at the University of Waterloo here:https://cs.uwaterloo.ca/40th/Chronology/printable.shtml.

An equation of the formx2−dy2 =±1,

where d is positive and is not a perfect square, is called a Pell’s equation. Thename is due to Euler, who attributed the method of solving this equation to JohnPell. It is widely believed that Euler actually made a mistake and confused JohnPell with William Brouncker. The English mathematician William Brouncker dis-covered a general method for solving the Pell’s equation, which was based oncontinued fractions. He was able to apply it to the equation

x2−313y2 = 146According to http://www.universetoday.com/36302/atoms-in-the-universe/.

135

and find the smallest positive solution

x = 32188120829134849, y = 1819380158564160.

When writing to Frenicle de Bessy who proposed this problem to him, Brounckerclaimed that it only took him “an hour or two” to find the solution. In 1768,Joseph-Louis Lagrange managed to prove that Pell’s equation has a solution dif-ferent from (±1,0) for every positive d that is not a perfect square.

We will now apply Corollary 31.15 to show that every positive solution toPell’s equation

x2−dy2 =±1

must arise as a convergent of√

d.

Theorem 32.1. Let d be a positive integer that is not a perfect square. Then everysolution (x,y) 6= (±1,0) to Pell’s equation

x2−dy2 =±1

must satisfy x/y = pn/qn for some positive integer n, where pn/qn is the n-thconvergent of

√d.

Proof. Suppose that (x,y) 6= (±1,0) is a solution. Without loss of generality, wemay assume that x and y are positive. Then

x≥√

dy2−1≥ y√

d−1.

Therefore

|x−√

dy|= 1|x+√

dy|≤ 1√

dy(

1+√

1− 1d

) <12y

,

since√

d +√

d−1 > 2 for d ≥ 2. Thus∣∣∣∣√d− xy

∣∣∣∣< 12y2 .

It follows from Corollary 31.15 that x/y is a convergent of√

d.

136

33 Algebraic and Transcendental Numbers.Liouville’s Approximation Theorem

In 1840, the French mathematician Joseph Marie Liouville proved the so-calledApproximation Theorem, which allowed him to discover the first transcendentalnumber ∑

∞k=0 10−k!. This number is called the Liouville Number. You are asked

to reproduce Liouville’s proof for a different number in Exercise 33.7.

Definition 33.1. A complex number α is called algebraic if there exists a non-zero polynomial f (t) with rational coefficients such that f (α) = 0. Otherwise, itis called transcendental.

Definition 33.2. Let α be an algebraic number. Let

f (t) = cdtd + cd−1td−1 + . . .+ c1t + c0

be a polynomial such that

a) f (α) = 0;

b) c0,c1, . . . ,cd ∈ Z;

c) cd > 0;

d) gcd(c0,c1, . . . ,cd) = 1;

e) The polynomial f (t) has the smallest degree among all non-zero polynomialssatisfying a), b), c) and d).

Then f (t) is called the minimal polynomial of α . It is a fact from algebraic numbertheory that such a polynomial is unique. We say that the algebraic number α hasa degree d if the degree of its minimal polynomial is equal to d, i.e. deg f = d.

Example 33.3. Consider the number√

2. This number is algebraic, since√

2 is aroot of the polynomial f (t) = t2−2, which has rational coefficients. Note that it isalso a root of f1(t) = 0, or f2(t) = t3+3t2−2t−6, or f3(t) = 6t2−12. However,none of these polynomials satisfy Definition 33.2.

Exercise 33.4. Explain why the numbers α = 0,1/2, i,√√

2+√

3 are algebraic.For each α , find a non-zero monic polynomial with rational coefficients such thatf (α) = 0.

137

Exercise 33.5. a) Prove that every rational number x/y has degree 1;

b) Prove that every quadratic irrational has degree 2. In other words, show thatevery number of the form a+ b

√d, where a,b,d ∈ Q and d 6= ±r2 for some

r ∈ Q, satisfies some polynomial f (x) of degree 2 and does not satisfy anypolynomial of degree 1.

Some properties of an irreducible polynomial:

• For a given algebraic number α the minimal polynomial of α is unique;

• Every minimal polynomial f (t) is irreducible over the field of rational num-bers. That is, if g(t) | f (t) and g(t) ∈Q[t], then g(t) =± f (t) or g(t) =±1;

• Let α be a root of its minimal polynomial f (t). Then f ′(α) 6= 0. That is, inC[t] it is the case that (t−α) | f (t) while (t−α)2 - f (t).

Theorem 33.6. (Liouville’s Approximation Theorem, 1840) Let α be an irra-tional algebraic number (that is, a number of degree d ≥ 2). Then there existssome constant C, which depends only on α , such that for any x ∈ Z,y ∈ N thefollowing inequality holds: ∣∣∣∣α− x

y

∣∣∣∣≥ Cyd .

Proof. 47 Let f (t) = cdtd + . . .+c1t+c0 be the minimal polynomial of α . Since fis irreducible over Q and is of degree d ≥ 2, it has no rational roots, so f (x/y) 6= 0for any x ∈ Z,y ∈ N. Furthermore,∣∣∣∣ f (x

y

)∣∣∣∣=∣∣∣∣∣ d

∑k=0

ck

(xy

)k∣∣∣∣∣= 1

yd

∣∣∣∣∣ d

∑k=0

ckxkyd−k

∣∣∣∣∣︸︷︷︸∈N

≥ 1yd .

We now apply the Mean Value Theorem and observe that there exists somereal ξ , satisfying

f ′(ξ ) =f (α)− f (x/y)

α− x/y=− f (x/y)α− x/y

.

47The proof is from P. Garrett, Liouville’s theorem on diophantine approximation, 2013.See http://www.math.umn.edu/~garrett/m/mfms/notes_2013-14/04b_Liouville_

approx.pdf. Note that there is an error in these notes: instead of estimating | f ′(ξ )| from above,the author obtains the estimate from below.

138

Rearranging the terms of the above equality, we get∣∣∣∣α− xy

∣∣∣∣= ∣∣∣∣ f (xy

)∣∣∣∣ · ∣∣ f ′(ξ )∣∣−1 ≥ | f′(ξ )|−1

yd .

For now, our constant | f ′(ξ )|−1 depends on α and y (note that x depends on α

and y), but it is not hard to eliminate the dependency on y by slightly adjusting ourconstant. In particular, since f is minimal, the multiplicity of α is 1, which meansthat f ′(α) 6= 0. This means that for all ξ within some small neighbourhood Uα

of α , it must be the case that 0 < | f ′(ξ )| ≤ 2| f ′(α)|. Plainly, there exists somelarge y0, which depends only on α , such that some rational fraction x/y with thedenominator y≥ y0 falls into Uα . We conclude that∣∣∣∣α− x

y

∣∣∣∣≥ | f ′(ξ )|−1

yn ≥ | f′(α)|−1

2yd

for all y≥ y0. Finally, we choose our constant c by picking the minimum between2−1 | f ′(α)|−1 and yd |α− x/y| over all y < y0. This concludes the proof.

Liouville’s Approximation Theorem is a very elegant result which can be ex-plained on a rather intuitive level. As y grows, we certainly expect our approx-imations x/y of α to be more precise. The question is, to what extent, and howcan we measure the ”quality” of our approximation? The theorem tells us that anyirrational algebraic number cannot be approximated “too well” by rational num-bers. One intuitive explanation of this phenomenon is the following: no fractionwill approximate α better than up to

⌊d + logy

1C

⌋base-y places.

For example, when α =√

2 one may take C = 1/4, and observe that for allq≥ 2 ∣∣∣∣√2− x

y

∣∣∣∣≥ 14y2 .

One of the ways to interpret the above inequality is as follows: no fraction x/y fory > 2 will approximate

√2 significantly better than up to 2 base-y places.

Many more things can be said regarding Liouville’s inequality. For example,one may ask what happens if we make C a function of y:∣∣∣∣α− x

y

∣∣∣∣≥ C(y)yd .

It turns out that for d = 2 one cannot replace the constant C with some monotonouslyincreasing function C(y), but for d ≥ 3 this can be done. The first improvement

139

of such kind was introduced by Thue in 1909, who showed that one can takeC(y) = c1y

d2−1−ε for some constant c1, which depends only on α , and any ε > 0.

This result allowed him to prove Thue’s Theorem. The further improvementswere developed by Siegel, Gelfond and Dyson, until in 1955 Roth showed thatC(y) = c1yd−2−ε would do the job as well. In basic terms, his result states thatthere are only finitely many rational approximations x/y to α of degree≥ 3, whichwill result in more than 2+ ε accurate base-y places.

Exercise 33.7. (a) Prove that, for every integer n≥ 1, the number

α :=∞

∑k=0

12k! = 1+

12+

14+

164

+1

16777216+ . . .

satisfies the inequality ∣∣∣∣∣α− n

∑k=0

12k!

∣∣∣∣∣< 1(2n!)

n . (9)

Hint: Note that∞

∑k=n+1

12k! <

∞

∑k=(n+1)!

12k .

Use the formula for the infinite geometric series afterwards.

(b) Use Liouville’s Theorem and the inequality established in Part (a) to provethat the number α is either rational or transcendental.

Hint: Suppose not. Then there exist fixed integers d ≥ 2 and C > 0 such that∣∣∣∣α− xy

∣∣∣∣≥ Cyd

for all integers x and y > 0. Why does this inequality contradict the inequality(9)?

34 Elliptic CurvesLet n be a squarefree number. We say that n is congruent if there exists a righttriangle with rational sides whose area is n. For example, the number 5 is congru-ent since it is the area of the right triangle with rational sides 20/3, 3/2 and 41/6.

140

Number 6 is also congruent, since it is the area of the right triangle with rationalsides 3, 4 and 5. In contrast, the number 3 is not congruent. Also, note that if nis congruent, then any integer of the form s2n also trivially arises as the area ofa right triangle with rational sides. That is why we restrict our attention only tosquarefree n.

Given a squarefree number n, how can we find out whether it is congruent ornot? Essentially, what we need to do is to solve the system of equations{

a2 +b2 = c2;12ab = n

for a,b,c ∈Q. Set

x =n(a+ c)

b, y =

2n2(a+ c)b2 .

Theny2 = x3−n2x,

where y 6= 0. Thus, instead of the original system of equations we just have tofind x,y ∈Q such that y2 = x3−n2x. If such rational x and y exist, one can easilyobtain a solution to the original system of equations by setting

a =x2−n2

y, b =

2nxy

, c =x2 +n2

y.

Thus we just have to find a rational point (x,y) on the curve y2 = x3−n2x. Such acurve is an example of elliptic curve.

Definition 34.1. Let F= Fq,Q,R,C, where q is a prime power.48 Let a,b ∈ F besuch that 4a3 +27b2 6= 0. The collection

E(F) ={(x,y) ∈ F2 : y2 = x2 +ax+b

}∪{∞}

is called an elliptic curve, defined over the field F. Here ∞ denotes the point atinfinity. The value

∆ =−16(4a3 +27b2)

is called the discriminant of an elliptic curve E(F).48Here Fq denotes the finite field of order q. We will not give a rigorous construction of Fq here.

We remark though that when q is prime the finite field Fq is the same as Zq, the ring of residueclasses modulo q.

141

Example 34.2. The graph of an elliptic curve E1 : y2 = x3− 25x over R is de-picted on Figure 5. This elliptic curve, aside from trivial rational points (0,0)and (±5,0), contains a rational point (x,y) = (45,300). This fact implies that thenumber 5 is congruent. Furthermore, one can show that in the case of E1(Q) theexistence of one non-trivial rational point implies the existence of infinitely manyrational points. In contrast, E2 : y2 = x3−9x has no non-trivial rational points, sothe elliptic curve E2(Q) contains only four points, namely (0,0), (±3,0) and thepoint at infinity. Both curves E1(R) and E2(R) contain infinitely many points.

Also, note that the graph of E1(R) contains two connected components. Thisis because the discriminant of E2 is equal to ∆(E1) = 106 and is positive. Incontrast, the discriminant of E3 : y2 = x3− 2 is equal to ∆(E3) = −1728 and isnegative. The negative sign indicates that the graph of E2(R) has one connectedcomponent.

-20 -15 -10 -5 0 5 10 15 20

-10

-5

5

10

-20 -15 -10 -5 0 5 10 15 20

-10

-5

5

10

Figure 5: Elliptic curves y2 = x3−25x and y2 = x3−2

Exercise 34.3. Find integers a and b such that the discriminant of a curve y2 =x3 +ax+b is equal to zero. How does the graph of such a curve look like?

Many problems in number theory are actually connected to elliptic curves. Forexample, consider the Fermat equation

a3 +b3 = c3.

The question of existence of non-trivial solutions to this Diophantine equation isequivalent to solving the equation

u3 + v3 = 1

142

in rational numbers u and v. If we now let

x = 12(u2−uv+ v2), y = 36(u− v)(u2−uv+ v2),

theny2 = x3−432.

If some point (x,y) ∈Q2 lies on the elliptic curve determined by the above equa-tion, then it is straightforward to check that the numbers

u =36+ y

6x, v =

36− y6x

are rational and satisfy u3 + v3 = 1. So once again the existence of a solution tosome Diophantine equation reduces to the question of existence of a non-trivialrational point on some elliptic curve.

The first questions about elliptic curves date back to Diophantus of Alexandria,who looked at the Diophantine equation of the form

y(6− y) = x3− x.

Fermat claimed that he knew how to solve the Diophantine equation y2 = x3 +1,but did not provide his proof. The problem got fully resolved only one centurylater by Euler. The field of algebraic number theory essentially was born whenEuler tried to solve the Diophantine equation y2 = x3−2 by writing

x3 = y2 +2 = (y+√−2)(y−

√−2)

and then claiming that y+√−2 and y−

√−2 are “coprime”, without rigorously

explaining what coprimeness means in this setting. Of course, his intuition wascorrect: the ring Z[

√−2] is a Unique Factorization Domain, and indeed one can

show that y+√−2 and y−

√−2 are coprime in Z[

√−2], as long as y 6= 0.

Elliptic curves got extensively studied over the past two centuries. The the-ory of elliptic curves truly blossomed with the prominent work of Weierstrass onelliptic functions, which connects elliptic curves defined over the field of com-plex numbers C to lattices on a complex plane. In fact, every elliptic curve arisesfrom (or can be reduced to) a lattice on the complex plane! Elliptic curves areintimately connected to modular forms, and the development of the theories ofelliptic curves and modular forms resulted in Andrew Wiles’s proof of Fermat’sLast Theorem (see Section 28 for more details).

143

Other prominent mathematicians which contributed a lot to the development ofthe theory of elliptic curves were Abel and Jacobi. By studying so-called ellipticintegrals, they realized that, in fact, one can impose arithmetic on points of anelliptic curve. More precisely, such an arithmetic takes place whenever an ellipticcurve E(F) is defined over a field F. This is why we restrict our attention only toF = Fq,Q,R,C and not, say, Z or Z/pkZ for p prime and k ≥ 2. The latter twocollections are rings but not fields.

To explain what this means, consider for now some elliptic curve E definedover the field of real numbers R. For two distinct points P,Q ∈ E(R), we drawa line through P and Q. Of course, this line is uniquely defined. For now, let usassume that this line is neither tangent to P nor to Q (see the first picture on Figure6).49 Our line will intersect E at some third point, say R. Our arithmetic on anelliptic curve is then defined as follows:

P+Q+R = ∞;

that is, any three points P, Q and R which lie on E add up to ∞ (the point atinfinity). Alternatively, if R′ = (xR′,yR′), we can write

P+Q =−R′,

so by “adding” two points together we were able to produce the third point,namely−R′ = (xR′,−yR′). On Figure 6, the point at infinity is actually denoted by0. Soon we will see that there is a deep reason for this alternative notation.

Figure 6: Group law

49The picture is taken from Wikipedia: https://upload.wikimedia.org/wikipedia/

commons/thumb/7/77/ECClines-2.0.svg/680px-ECClines-2.0.svg.png.

144

We can formalize the observations made above as follows. Let P = (xP,yP)and Q = (xQ,yQ). Suppose that xP 6= xQ. Let

s =yP− yQ

xP− xQ

denote the slope of the line passing through the points P and Q. Then we definethe third point R = (xR,yR) = P+Q as follows:

xR = s2− xP− xQ, yR =−yP + s(xP− xR).

It is straightforward to verify that R indeed belongs to E(R). Furthermore, if welook closer at the expressions for xR and yR, we can notice that they preserve thefield of definition. That is, if P and Q are points in R2, then R is also a point inR2. If P and Q are points in Q2, then so is R. This applies to any field, so theprocedure of addition of points is well-defined for any base field F. See Figure 7for the demonstration that the field of definition remains unchanged. All the pointsin this example belong to Z2, and therefore in Q2 as weill. Note, however, that ingeneral an addition of two integer points on an elliptic curve may not result in aninteger point, but it will result in a rational point or a point at infinity.50

We need to consider three special cases separately. For example on the secondpicture of Figure 6, we see the situation when the line is tangential to the point Q.This picture corresponds to the following: if instead of distinct points P and Q wepick two identical points, i.e. P = Q, then we can think of the tangent line as theline which passes through both P and Q. In this case, the slope of our line tangentto E at P = (xP,yP) is equal to

s =3x2

P +a2yP

,

and we may compute the point R = (xR,yR) = P+P as

xR = s2−3xP, yR =−yP + s(xP− xR).

Once again, we can easily verify that (xR,yR) indeed lies on E and the formu-las of xR and yR above preserve the field of definition. That is, if P ∈ F, thenR = P+P ∈ F. For short, we write R = 2P, and more generally

nP = P+P+ . . .+P︸︷︷︸n times

.

50The picture is taken from William Stein’s lecture notes, Chapter 6, Figure 6.3: http://

wstein.org/simuw06/ch6.pdf.

145

Figure 7: The group law: (1,0)+(0,2) = (3,4) on y2 = x3−5x+4

The only two special cases left for us to consider is when P 6= Q with xP = xQ(third picture on Figure 6) and when P=Q with yP = 0 (fourth picture on Figure 6).Both cases result in a vertical line, which has an infinite slope. In the former case,we write P+Q = ∞, and in the latter case we write 2P = ∞.

At this point, we covered all four cases that can arise. In this unorthodox way,we were able to define the operation of addition ”+” on E(F). We can also definethe operation of negation: if P = (xP,yP), we write−P = (xP,−yP). Also, we candefine the operation of subtraction ”−” as follows: P−Q = P+(−Q). One canalso notice that the point at infinity plays the role of zero, and this explains thenotation present in Figure 6. We summarize the observations made above (andintroduce a few more) in Proposition 34.4.

Proposition 34.4. (The Group Law) Let F be a field and E(F) be an elliptic curve.The collection of points E(F) forms a group, called a Mordell-Weil Group, underthe operation of addition. That is, it satisfies the following four group axioms:

1. Closure. For all P,Q ∈ E(F), P+Q ∈ E(F);

2. Associativity. For all P,Q,R ∈ E(F), (P+Q)+R = P+(Q+R);

146

3. Identity element. For all P in E(F), the element ∞ satisfies

P+∞ = ∞+P = P;

4. Inverse element. For each P in E(F) there exists an element −P in E(F)such that

P+(−P) = (−P)+P = ∞.

Furthermore, the group of points on an elliptic curve E(F) is Abelian:

5. Abelianness. For all P,Q ∈ E(F), P+Q = Q+P.

Theorem 34.5. (Mordell’s Theorem, 1922) Every elliptic curve E defined overthe field of rational numbers Q is a finitely generated Abelian group. That is,

E(Q)∼=C×Zr,

where r is a non-negative integer and C is a finite Abelian group.

The main point of the above theorem is that the number r is finite. It is calledthe Mordell-Weil rank of an elliptic curve E(Q). Such a nice classification isimpossible when the base field is R or C. In its essence, the theorem is sayingthat even though there can be infinitely many rational points, there cannot be “toomany” of them in a very precise sense.

To better explain the theorem of Mordell, let us recall the notion of an orderof a group element. Just like for other groups that we studied, we say that thepoint P ∈ E(F) has order n if n is the smallest positive integer such that nP = ∞.If such an integer does not exist, we say that P has infinite order. According toMordell’s Theorem, there exist r elements P1,P2, . . . ,Pr of infinite order such thatevery element P ∈ E(Q) can be written in the form

P = T +r

∑i=1

niPi,

where n1,n2, . . . ,nr are integers and T is a point of finite order (such points arecalled torsion points).

An elliptic curve is a first interesting example of what is called an Abelianvariety. In 1928, the theorem of Mordell was generalized by the French mathe-matician Andre Weil to all Abelian varieties.

We conclude this section with Siegel’s Theorem, which has profound conse-quences in the analysis of Diophantine equations related to elliptic curves.

147

Theorem 34.6. (Siegel’s Theorem, 1929) Every elliptic curve E(C) contains onlyfinitely many integer points. That is, for any numbers a,b ∈ C such that 4a3 +27b2 6= 0, the Diophantine equation

y2 = x3 +ax+b

has only finitely many solutions in integers x and y.

148

PMATH 340 Lecture Notes on Elementary Number Theory€¦ · Theorem 2.7. For each integer n 2 there...

Documents

Transcript of PMATH 340 Lecture Notes on Elementary Number Theory€¦ · Theorem 2.7. For each integer n 2 there...