SGC 2015 - Mathematical Sciences Extension Studies

St. George’s College 2015 - MathematicalSciences Exploration Studies

Daniel Xavier Ogburn 1

School of Physics,Field Theory and Quantum Gravity,

University of Western Australia

June 2, 2015

1Electronic address: [email protected]

Contents

1 Introduction 51.1 Tutor List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Exploration Studies and Extension Problems . . . . . . . . . . . 5

1.2.1 People you should know about . . . . . . . . . . . . . . 61.2.2 Theorems and Theories you should know about . . . . . 7

1.3 Layout and Conventions . . . . . . . . . . . . . . . . . . . . . 9

2 Dimensional Analysis and Fundamental Laws 112.1 Dimensional Analysis . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Preamble: March 9, 2015 . . . . . . . . . . . . . . . . . 112.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . 152.1.4 Moral of the story . . . . . . . . . . . . . . . . . . . . . 19

2.2 Dimensionless Constants and Fundamental Laws . . . . . . . . 202.2.1 Physical Systems, Fundamental Laws . . . . . . . . . . 202.2.2 Examples and Problems . . . . . . . . . . . . . . . . . 212.2.3 Buckingham Pi-Theorem . . . . . . . . . . . . . . . . . 242.2.4 Gravity, The Hierarchy Problem and Extra-Dimensional

Braneworlds . . . . . . . . . . . . . . . . . . . . . . . 25

3 Geometry of Antiquity and The Universe 313.1 Introduction: Conic Sections . . . . . . . . . . . . . . . . . . . 313.2 Parabolas and Geometric Optics . . . . . . . . . . . . . . . . . 32

3.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 323.2.2 The Parabola . . . . . . . . . . . . . . . . . . . . . . . 323.2.3 Scale Invariance and Transcendality . . . . . . . . . . . 343.2.4 Symmetries and Canonical Form . . . . . . . . . . . . . 373.2.5 Optical Properties and Spherical Aberration . . . . . . . 38

3.3 Ellipses and Planetary / Atomic Orbits . . . . . . . . . . . . . . 433.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 433.3.2 The Ellipse . . . . . . . . . . . . . . . . . . . . . . . . 433.3.3 Parametric Form . . . . . . . . . . . . . . . . . . . . . 46

3.4 The Two Body Problem and Planetary Orbits (Easter Sketch) . . 493.4.1 History and Cultural Impact . . . . . . . . . . . . . . . 493.4.2 Inverse Square Law and Central Potentials . . . . . . . 503.4.3 Symmetries and Jacobi Coordinates . . . . . . . . . . . 513.4.4 Kepler’s Laws . . . . . . . . . . . . . . . . . . . . . . 593.4.5 Superintegrability and Constants of Motion . . . . . . . 59

3.5 Hyperbolae, Comets and Atomic Scattering . . . . . . . . . . . 603.6 General Relativistic Corrections . . . . . . . . . . . . . . . . . 60

4 Physics in Non-Inertial Frames 614.1 The Lie Group of Rotations: Design a Death Star . . . . . . . . 61

3

4 CONTENTS

4.1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . 624.1.2 BFF: Linear Maps and Matrices . . . . . . . . . . . . . 624.1.3 SO(3): The Lie Group of Rotations . . . . . . . . . . . 64

4.2 Rigid Bodies and Moments of Inertia . . . . . . . . . . . . . . . 694.2.1 Rotations: about an arbitrary axis . . . . . . . . . . . . 694.2.2 Principal Axes and Spectral Decomposition . . . . . . . 794.2.3 Parallel and Perpendicular Axis Theorems . . . . . . . . 824.2.4 Precession and Torque: Equinox, Spinning top . . . . . 83

4.3 Accelerating Frames: The Tides . . . . . . . . . . . . . . . . . 864.3.1 Isometries of Euclidean Space: Galileo . . . . . . . . . 864.3.2 Linearly Accelerating Frames . . . . . . . . . . . . . . 864.3.3 The Tides . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.4 Centrifugal and Corriolis Forces . . . . . . . . . . . . . . . . . 894.4.1 Rotational Motion and Angular Velocity . . . . . . . . . 894.4.2 Differential Operators in Rotating Frames: Newton’s Sec-

ond Law . . . . . . . . . . . . . . . . . . . . . . . . . 924.4.3 Centrifugal Force . . . . . . . . . . . . . . . . . . . . . 954.4.4 Coriolis Force . . . . . . . . . . . . . . . . . . . . . . . 96

4.5 Focault’s Pendulum . . . . . . . . . . . . . . . . . . . . . . . . 96

5 Nature’s Ways: The Calculus of Variations 975.1 Lagrangian Mechanics . . . . . . . . . . . . . . . . . . . . . . 97

5.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 975.1.2 The Principle of Stationary Action . . . . . . . . . . . . 975.1.3 The Euler-Lagrange Equations of Motion . . . . . . . . 985.1.4 N-Dimensional Euler-Lagrange Equations . . . . . . . . 1015.1.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 1035.1.6 Multiple Independent Parameters . . . . . . . . . . . . 1055.1.7 More Examples . . . . . . . . . . . . . . . . . . . . . . 1065.1.8 Closing Remarks . . . . . . . . . . . . . . . . . . . . . 108

Chapter 1

Introduction

1.1 Tutor List

For the year 2015, here is a list of tutors for mathematics, physics and relatedareas:

• Myself (Daniel Ogburn)

• Ben Luo

• Theresa Feddersen .

1.2 Exploration Studies and Extension Problems

As a young (or old) individual, one should strive for ‘professionalism’ in theirpursuits. This means doing things ‘properly’ and diligently, without cutting cor-ners. One aspect of professionalism in the mathematical sciences, is to develop aforte for ‘problem solving’. Developing your capacity for problem solving is notsomething that can really be philosophised or ‘rote-learned’. Certain principlesmay help you, but at the end of the day the best way to develop this faculty is toattempt many different problems. Your mission, should you choose to accept, isto attempt the problems in these extension studies. Because they are unique and‘different’ to the style of problems you will usually get at a university, they willhelp you to develop in new ways.

Further to improving your ‘professionalism’ and problem solving capacity, thesestudies should provide a fun and interesting side journey. We will try and exploreareas of mathematics and physics which are somehow glossed over. Hopefullyyou will find that these areas are in fact rich and interesting, with much to beexplored. To a large extent, you will develop tools and insights to give you anedge in your university work.

Lastly, note that the idea here is to have fun! Engage your peers and your tutorsas you work your way through these explorations. As the prince of mathematicsonce said:

“It is not knowledge, but the act of learning, not possession but the act of gettingthere, which grants the greatest enjoyment.”– Carl Friedrich Gauss.

5

6 CHAPTER 1. INTRODUCTION

1.2.1 People you should know about

• Euclid

• Archimedes

• Muhammad Al-Kwarizmi

• Carl Friedrich Gauss: Gauss referred to mathematics as ‘The Queen of allSciences’ and is almost universally considered to be the ‘Prince of Math-ematics’. Apart from being a prodigy, Gauss is responsible for the expan-sion and developments in 18th century mathematics and physics. There-fore, you will see his name behind many fundamental theorems from allareas of mathematics.

• Bernard Riemann: Gauss’ best student. Along with Gauss, Riemann islargely responsible for non-Euclidean geometry – the basis for all moderntheoretical physics. He is also responsible for the ‘Riemann Hypothesis’,which has been long considered to be the greatest unsolved conjecture inmathematics.

• Leonard Euler

• Joseph Lagrange

• Joseph Fourier

• Emmy Noether

• Henri Poincare

• David Hilbert

• Elie Cartan

• John von Neumann

• Srinivasa Ramanujan

• Alexander Grothendieck

• Grigori Perelman

• Terrence Tao

• Isaac Newton and Gottfried Wilhelm:

• Gallileo Galilei

• Michael Faraday

• James Maxwell

• Sir William Rowan Hamilton

• Ludwig Boltzmann

• Max Planck

• Paul Dirac

• Albert Einstein

• Richard Feynmann

• Lev Landau

1.2. EXPLORATION STUDIES AND EXTENSION PROBLEMS 7

• Edward Witten

• Nima Arkani-Hamed

• Jin-Mann Wong (Jenny Wong).

1.2.2 Theorems and Theories you should know about

Here are a few of the ‘major’ results that you should know about by the endof your mathematics / physics degree. Some of them you should understand indetail – i.e. derivations, proofs and conceptual working knowledge. Other items,you should at least have heard or come across and understand the essence of theresult if not the specifics. Please note that the list is far from exhaustive and thereare probably many important results that I have ommitted at this time.

• Euler’s Formula

• The Fundamental Theorem of Algebra

• Weirstrass Factorization Theorem

• The Generalized Stokes’ Theorem

• The Cauchy Residue Theorem

• The Gaussian Distribution and Hypothesis Testing

• Principle of Least Squares Regression and the L2, l2 Hilbert Spaces

• The Multinomial Theorem

• Uniform, Binomial, Poisson and Chi-squared distributions

• Non-parametric statistics

• Fourier’s Theorem and its generalizations

• The Spectral Theorem

• The Shannon-Nyquist Theorem

• Parseval’s Theorem

• Shannon Entropy

• The Riemann Hypothesis and Prime Number Theorem

• The Navier-Stokes Equation and associated Millenium Problem

• Thurston’s Geometrization Theorem (Poincare Conjecture)

• Information Complexity and P = NP Conjecture

• Fermat’s Last Theorem

• Maxwell’s Equations in 3-dimensional and 4-dimensional form

• The Black-Scholes Equation

• Markov Chains and Monte Carlo Methods

• Game Theory and the Nash Equillibrium

• Measure Theory and the Lesbegue Integral

• Dedekind’s Construction of the Real Numbers


• Proof of the Irrationality of?

2.

• Cantor’s diagonalization argument and classification of infinities

• Godel’s Incompleteness Theorem

• The Axiom of Choice

• Lagrange’ Thereom (Group Theory)

• The First and Second Isomorphism Theorems

• Splitting Lemma

• The Jordan-Holder Theorem

• Hamilton’s Quaternion Formula

• Brouwer Fixed Point Theorems

• The Hairy Ball Theorem

• Banach-Tarksi Paradox

• The Lyapunov Exponent and Lorenz Attractor

• Noether’s Theorem and Killing’s Equation

• De-Rham Cohomology, Closed Forms and Conservation Laws

• Topological Invariants and the Atiyah-Singer Index Theorem

• Newton’s Second Law

• Newton’s Law of Gravity and Derivation of the Kepler Orbits

• The Lorentz Symmetry Group and Special Relativity

• Planck’s Radiation Law

• Boltzmann’s Equipartition Theorem and Proof

• The Ising Model and its solution

• The Bose-Einstein Distribution and Fermi Distribution

• The EPR Paradox

• Boltzmann’s Entropy Formula

• LASERS and Stimulated Emission

• Spherical Harmonics and Quantum Model of the Hydrogen Atom

• General Relativity, Einstein Field Equations and Einstein-Hilbert Action

• The Standard Model of Particle Physics

• Lie Groups and Lie Algebras

• Hilbert Spaces, Metric Spaces and Topological

• The Heisenberg Uncertainty Principle

• The Schrodinger and Dirac Equations for Quantum Mechanics

• The Klein-Kordon Equation and Feynmann Path Integral

• The Hawking-Bekenstein and Unruh Black Hole Thermodynamic Formu-las

1.3. LAYOUT AND CONVENTIONS 9

• The Hawking-Penrose Singularity theorems and Penrose black hole In-equalities

• The Magnetohydrodynamic Equations

• Hamiltonian and Lagrangian Varitational Principles

• The Hamilton-Jacobi Equation and Louiville’s Theorem

• Sturm-Liouville Theory and Harmonic Analysis

• Gamma Function, Hypergeometric Function and Orthogonal PolynomialFamilies

• Eigenvalue/Eigenvector Decomposition and Jordan Normal Form

• The Matrix Exponential

• The Standard Model of Cosmology

• Random Matrices and Random Walks

• The Cosmological Constant Problem and Dark Energy

• Approaches to Quantum Gravity

• Ogburn-Waters-Sciffer method for generating ellipsoidal harmonic func-tions.

1.3 Layout and Conventions

For these exploration sessions, I will typically include excerises and problems inincreasing order of difficulty. Furthermore, I make the (somewhat grey) distinc-tion between ‘exercises’ and ‘problems’ as follows.

Definition 1 (Exercise) ‘Exercises’ are essentially numerical or algorithmic cal-culations to check whether you understand how to manipulate the mechanics ofthe mathematics involved.

‘Problems’ are generally more difficult as they require more conceptual under-standing – i.e. you have to be able to interpret the problem and formulate in sucha way that it is reduced to an ‘exercise’.

Definition 2 (Problem) A ‘problem’ is something which can be reduced to anexercise, with the appropriate creativity and conceptual faculty.

One essential skill for any true scientist, is the ability to reduce real-world sce-narios, models and problems into mathematical ones. In this manner, the mostpowerful sciences (quantitative ones) are ones in which scientists turn problemsinto exercises.

Chapter 2

Dimensional Analysis andFundamental Laws

2.1 Dimensional Analysis

2.1.1 Preamble: March 9, 2015

Dimensional analysis is a deceptively simple, but fundamentally powerful tool inthe mathematical sciences – one that is often overlooked! Furthermore, dimen-sional is not just a study tool – it is a research-level tool that can allow one toprobe the unknown and construct or discover something new and tangible.

For student purposes, dimensional analysis serves as fast error-checking algo-rithm for your calculations. It is also useful for extracting ‘physically meaning-ful’ information out of your system. In this sense, being ‘dimensionally-aware’throughout all of your calculations can help one to develop a deeper understand-ing of the quantities and objects being manipulated.

Given a large set of parameters describing a system, one can often form a smallernumber of dimensionless parameters which completely characterize that system– hence removing any redundant information. The precise statement of the lastidea is known as the ‘Buckingham Pi Theorem’1. This has vast practical appli-cations to mathematical modelling, fluid mechanics, thermodynamics, electro-dynamics, cosmology and much more. For now, we begin with a few examplesthen work through some questions 2

The main idea of the following examples and problems is two-fold: first inspectan equation and work out the dimensions (or units) of each variable and constant,given some starting information. We then check whether or not the equation isdimensionally consistent. Any equation from any area of science and mathemat-ics must be dimensionally consistent – if it isn’t, then it’s wrong. In this sense,you don’t need to understand the science or theory behind an equation to deducewhen it is incorrect on dimensional grounds!

What was said for ‘dimensional consistency’ is also true of ‘structural consis-tency’ – a concept which we shall briefly review below.

1For those of you who have done (or will do) linear algebra, this is just a practical conse-quence of the ‘rank-nullity’ theorem.

2Thanks to Scott Meyer and Matthew Fernandez for feedback.

11

12CHAPTER 2. DIMENSIONAL ANALYSIS AND FUNDAMENTAL LAWS

Theorem 1 (Dimensional Consistency) Given any equation:

LHS = RHS (2.1)

In any set of variables, the equation is wrong if the dimensions of ‘LHS’ and‘RHS’ are not equivalent.

An easy way to prove this theorem is to note that given any equation, we have anassociated ‘auxiliary dimensional equation’. That is, an equation consisting onlyof the dimensions of LHS and RHS, which we shall denote with the ‘square-bracket notation’ [LHS] and [RHS], respectively:

[LHS] = [RHS]. (2.2)

Hence for an equation to be correct, it must be both numerically correct anddimensionally correct.For more complicated equations, such as tensor or spinorequations, we must also consider ‘structural consistency’.

Notation: Throughout this tutorial we will use ‘logarithmic notation’ for dimen-sional analysis. This is often used by cosmologists and particle physicists, butcan be easily converted to the more common ‘exponential notation’.

2.1.2 Examples

Example 1 (Structural Consistency) One simple example of structural consis-tency is matrix mutiplication – we can only multiply two matrices A and B if thenumber of columns of A is equal to the number of rows of B.

Any equation that is not structurally consistent is fundamentally nonsensical andtherefore wrong. Therefore, it is wise to be always be mindful of the structuresinvolved in an equation.

Now recall lengths, areas and volumes. The fundamental unit that characterizesthese quantities is length: L.

Example 2 (Spatial Dimensions) Given a rectangular box, with sides of lengtha, b, c the volume is VB = a× b× c. Since each of the sides has the dimensionsof length: [a] = [b] = [c] = L, the volume has dimensions

[VB] =[a× b× c]=[a] + [b] + [c]

=L+ L+ L

=3L ,

which we interpret as length-cubed: L3. The notation [ ] is used to denote thedimensions of whatever quantity is inside the brackets. Notice also, that when wewere looking for the dimensions of a product of variables [a×b×c], we added thedimensions of each variable: [a×b×c] = [a]+[b]+[c] = L+L+L = 3L.Finally,we ended up with [VB] = 3L, which means that the volume V has 3 factors of theunit length L – hence volume V has dimensions of length-cubed (in ‘exponentialnotation’) L3. Of course, we already knew this!

Exercise 1 Use the rectangular box example to calculate the dimensions of thearea of a rectangle of sides with length ‘a′ and ‘b′, given the area formula

AR = ab. (2.3)

2.1. DIMENSIONAL ANALYSIS 13

Problem 1 (Hyperbox) Sometimes playing in a 3-dimensional world is boring– which is why 10-dimensional superstring was invented 3. Consider now aN-dimensional hyperbox, which we shall refer to simply as ‘N-box’. So for ex-ample, a cube can be considered to be a 3 − box. In determining the amountof material required to mass-produce N-boxes, Microsoft comes up with the fol-lowing equationw for the ‘hyper-volume’ and of an N − box with sides of equallength ‘a’:

V ol(N − box) = aN−1. (2.4)

Explain mathematically why this equation is incorrect, or (possibly) correct ondimensional grounds.

Similarly to the multiplication rule, if we are inverting quantities we invert theirunits – hence: [ 1

a] = −[a], [ 1

a2] = −[a2] = −2[a], etc. Combining this with the

multiplication rule, we get the division rule: [ab] = [a] − [b]. For example, if

C is the concentration of whey protein in milk, it has units ML−3 of mass overvolume – hence dimensionally: [C] = M − 3L.

Now that we have done some simple exercises, lets see how dimensional analysiscan be used for error checking. Lets say someone tells us that the volume VS ofa sphere of radius R is given by

VS =4

3πR2. (2.5)

Obviously, this is wrong – but if you’ve forgotten the correct formula, there’s aneasy way to see why it is wrong using dimensional analysis. First of all [R] = L,since radius has dimensions of length. Furthermore, [4

3π] = 0 since this is just a

pure number (so it is dimensionless). Therefore,

[VS] =[4

3πR2]

=[4

3π] + [R×R]

=[4

3π] + [R] + [R]

=0 + L+ L

=2L.

But wait a minute, volume has units of length cubed, hence [VS] = 3L. Wethen conclude by dimensional arguments that the formula VS = 4

3πR2 is incor-

rect!

Although the last example was easy, the same principles can be applied to muchmore complicated formulas in the mathematical sciences – indeed, it is used inresearch and in practice when doing estimates, checking articles or performinglarge derivations and calculations. Lets do one more example.

Example 3 Newton’s Second Law of Motion:

Force = Mass × Acceleration, (2.6)

or more generally,

~F = md~P

dt, (2.7)

where ~P is the momentum of a particle of mass m. Newton’s Second Law ofmotion is the fundamental postulate governing classical physics between the late

3Disclaimer: String Theory may have been invented for other reasons.


17th and early 19th centuries. It is vastly important today as the law defines whatthe force is, for an object of mass ‘m′ moving with an acceleration ‘a′. The threefundamental units here are mass M , time T and length L. Displacement ‘x′

has dimensions of length L, hence velocity ‘v′ – which is the rate of change ofdisplacement 4, has units of length over time:

[v] =[dx

dt]

=[dx]− [dt]

=[x]− [t]

=L− T , (2.8)

hence v has units LT

. Similarly, acceleration a is the rate of change of velocity,hence

[a] =[dv

dt]

=[dv]− [dt]

=(L− T )− T=L− 2T, (2.9)

which means ‘a′ has units of length over time-squared: LT 2 . Finally, mass m

trivially has units of mass: [m] = M (note that here we use the capital M todenote the fundamental unit of mass, where as the lower-case m is mass variablethat we insert into Newton’s 2nd Law). Therefore, force F has the followingdimensions

[F ] =[m][a]

=[m] + [a]

=M + L− 2T,

(2.10)

whence F has units of (mass × length)/ (time-squared): MLT 2 .

Exercise 2 Use dimensional analysis to conclude which formulas are incorrecton dimensional grounds – i.e. which of the following formulas are dimensionallyinconsistent. Show your working.

1. A triangle has a base b and a vertical height h, each with dimensions oflength L. Check whether the following formula for its area is dimension-ally consistent

A =1

2b2h. (2.11)

2. A circle has a radius r with dimensions of length L. Its area is given by

A =1

2πr2. (2.12)

Is this dimensionally consistent? A stronger question to ask is whetherthis formula is correct – if not, why not?

4For those of you unfamiliar with the definition of velocity and acceleration in terms ofcalculus, you can think of dx

dt as the change in displacement x over an ‘infinitesimally smallamount’ of time dt. Then dx carries dimensions of length and dt has dimensions of time: [dx] =L , [dt] = T . Note that in general, for an arbirtrary quantity y, the ‘infinitesimal quantity’ dycarries the dimensions: [dy] = [y].


A Euclidean ellipse, produced by slicing a cone, can be defined as: “the set ofpoints uch that the sum of the distances to two fixed points (the foci) is constant.”In a parallel universe, Rene Descartes decides that he write the equation for anellipse with its major axis coincident with the x − axis of the Cartesian plane,as:

x2

a2+y

b2= 1, (2.13)

where the right-hand side is dimensionless.

Here a is the semi-major axis length and b is the semi-minor axis length. Notingthat one parametrisation in polar coordinates is x = a cos(θ), y = b sin(θ), wecan see that x and y have units of length. Therefore, prove that this equation iswrong on dimensional grounds. Now suggest the correct equation.

There is one more rule of dimensional analysis which involves analysing equa-tions which include a sum of terms. In particular, given a quantity A = B +C + D, to compute the dimensions [A] of A, we don’t just add the dimensionsof B,C and D:

[A] 6= [B] + [C] + [D], (2.14)

but rather, we have the consistency requirement that:

[A] = [B] = [C] = [D]. (2.15)

This is because B,C and D should all separately have the same units. As such,this observation is very useful for determining the dimension of multiple un-known quantities in an equation that involves a sum of different terms. Forexample, the area of a toddler house drawing is given by: AHouse = ATriangle +ASquare = 1

2bh + a2, where b is the base length of the triangle, h is its verti-

cal length and a is the length of the sides of the square. Therefore, [AHouse] =[ATriangle] = [ASquare] = 2L, hence [1

2bh] = [a2] which implies [b] + [h] =

2[a] = 2L.

One last concept: A dimensionless constant, C, is defined to be a quantity whichhas no dimensions – hence [C] = 0. These are fundamentally important inthe description of a physical system since they do not depend on the units youchoose. Thus, in some manner they are represent a ‘universal’ quantity or prop-erty – indeed, the dimensionless constants of a system describe a universalityclass 5. A university class is essentially a set of theories which have the same‘critical behaviour’ – i.e. near a critical point (e.g. phase transition), each the-ory in the same universality class will possess quantities which obey the samescaling laws.

2.1.3 Problems

To answer the following questions, try not to worry too much about terminologyor new and abstract concepts. We are only interested in dimensions – so if youstay focused and don’t get distracted by the extra information, you can finishthem quickly with no prerequisite knowledge!

Problem 2 A hypercube living in d dimensions has d sides, each with length aand dimensions of length L. Its hyper-volume has units of Ld and is given bythe formula

V = aD. (2.16)5A more precise meaning of this statement can be found in the theory of ‘Renormalization

Groups’.


Verify that this is dimensionally consistent – i.e. show that [V ] = L+ ...+ L =d× L. What dimensions would its surface area have? Hint: this would be samethe as dimensions of the area of one of its ‘hyper-faces’.

Exercise 3 (Make Math, Not War) The U.S. Navy invests a significant amountof money into acoustic scattering studies for submarine detection (SONAR). Aspart of this research, the Dahlgren Naval Academy uses ‘prolate spheroidal har-monics’ (vibrational modes of a ‘stretched sphere’) to do fast, accurate scatteringcalculations. In this process, a submarine can be approximated to be the shape ofa ‘prolate spheroid’ or ‘rugby ball’. A prolate spheroid is essentially the surfacegenerated by rotating an ellipse about its major axis. Given a prolate spheroidwith a semi-major axis length a and semi-minor axis of length b, its volume is

V =4π

3ab2 (2.17)

Is this formula dimensionally-consistent? What about the following formula forthe surface area (it should have units of length-squared):

S = 2πb2(1 +a

besin−1(e))? (2.18)

Note, sin−1 is the ‘inverse sine’ or ‘arcsine’ function. It necessarily preservesdimensionality, hence [sin−1(e)] = [e]. The variable e is the ‘eccentricity’ ofthe spheroid. It is a dimensionless quantity: [e] = 0, which measures how‘stretched’ the spheroid is – i.e. how much it deviates from a sphere. It is givenby the (dimensionally-consistent!) formula:

e2 = 1− b2

a2. (2.19)

A perfect sphere corresponds to e = 0, where as an infinitely stretched spherecorresponds to e→ 1.

Problem 3 (Twiggy) In a parallel-universe, Andrew Forrest has a dungeon withBF flawless black opals inside it. From a financial point of view, these have di-mensions of money $ – i.e. [BF ] = $. A machine recently designed by IanMcArthur, head of physics at UWA, uses quantum fluctuations of the spacetimevacuum to produce black opals at a rate of RUWA black opals per minute. Sens-ing the loss of his monopoly on the black opal market, Andrew Forrest employsa competing physicist at Curtin University to create a quantum vacuum stabi-lizer. This reduces the number of black opals that Ian can produce per minute byRC black opals per minute, where |RC |≤ RUWA. Working on a broad conceptproblem, a team of first year students at St. George’s college come up with thefollowing model to predict the value V of shares in Forrest BlackOps inc. on thestockmarket as a function of time t (time has dimensions T ):

V = βD

BF

− λ(RUWA +RC)τDe−λ(1− tτ

) (2.20)

where the constant τ (having dimensions of time T ) denotes 6 the time at whichEuropean Union is predicted to collapse. Furthermore, D is a function thatmeasures the market demand for black opals (with no dimensions) and β is aneconomic constant predicted by game theory with units of money-squared: $2.Finally, λ is a dimensionless parameter (so [λ] = 0) that depends on the num-ber of avocados served at the college since the establishment of St. George’sAvocadoes Anonymous up to the given time t.

6This is the Greek letter tau – not the Roman letter t.


Is this model dimensionally consistent – i.e. does [V ] = $?

What about the following formula, proposed by students from St. CatherinesCollege (who didn’t practice dimensional analysis)?

V =D

BF

−D2e−t (2.21)

On dimensional grounds, list two reasons why this model incorrect.

Problem 4 (Spacetime Surfing) Don’t worry about the physics, just keep trackof dimensions and rules!

On March 17, 2014 the Harvard-Smithsonian Center for Astrophysics releaseda press-conference tomorrow indicating the discovery of gravitational waves.Gravitational waves are ripples through spacetime created by large gravitationaldisturbances in the cosmos – for example, exploding stars and coalescing black-holes. These are predicted by Einstein’s theory of General Relativity – a theoryin which gravity is a simple consequence of the geometry (shape) of spacetime.In this theory, choosing natural units for the speed of light: c = 1, time andspatial length become dimensionally equivalent: T = L. Therefore, dimension-ally we have: [time] = [distance] and [c] = [distance/time] = L − T = 0. Ageometry which models gravitational waves is described by the following metric(an abstract object which tells you how gravity and measures of time and lengthvary at each point in spacetime):

g = η + εh (2.22)

where η is a flat-space metric (describing an empty universe):

η := −dt+ dx⊗ dx+ dy ⊗ dy + dz ⊗ dz (2.23)

and h is a symmetric-tensor, given in de-Donder gauge by

h := cos(k · r)A+1

2× trace(h)× η. (2.24)

Here ε is a small (<< 1) dimensionless parameter: [ε] = 0 and A is a symmetrictensor field with dimensions of length-squared: [A] = 2L. Note that the traceoperation turns tensors into scalars, so it removes the dimensionality of a tensor:[trace(h)] = 0. Furthermore, consider · as another form of multiplication. Sincethe wave vector k and position vector r have inverse units, we have [k] = −L,[r] = +L – hence [k · r] = 0. For the purposes of dimensional-analysis, wecan treat the tensor product ⊗ as ordinary multiplication also. The differentialquantities have the following dimensions: [dt] = [dx] = [dy] = [dz] = L, hence[dx ⊗ dx] = 2[dt] = 2L for example. Since x, y, z, t represent coordinates inspacetime, we also have [x] = [y] = [z] = [t] = L.

Show that the metric g demonstrates a dimensionally-inconsistent solution to theEinstein field equations. Where is the error? Suggest what could be done to thismetric to ‘fix’ it and give a dimensionally-consistent solution.

Remark: If you were certain that the equation for h was correct, it would beunnecessary to tell you the dimensions of A – you could work it out since youalready know [cos(k · r)] = 0 (the function cos(something) is necessarily di-mensionless). Therefore, pretending [A] is unknown, prove that [A] = 2L givenall the other information.


After completing the last few problems, one should realize that much time can besaved by ignoring most of the information and concentrating only the dimensionsof the variables and constants in the given formulas. This is true in general!Therefore, to do dimensional analysis, one need not necessarily understand thescience or mathematics behind an equation – but simply the dimensions of thequantities involved. Therefore, it is an easy way to show when something iswrong without knowing what you are talking about. 7

Problem 5 (Super-Fluffy Super-Symmetric Tensors) Despite successful ‘solar-system tests’ of Einstein’s theory of gravity, it has severe shortcomings. Onefundamental issue with Einstein’s theory is that it is not consistent with quantumtheory (which we know is extremely accurate on short-distance scales) – leadingto problems such as the ‘cosmological constant problem’. Another problem isthat it predicts singularities where the laws of physics breakdown. To rectifyEinstein’s theory, many physicists have attempted to unify gravity with quantummechanics over the last century. As it turns out, creating a theory of quantumgravity presents immense mathematical and experimental obstacles.

One approach to understanding quantum gravity, is to consider supersymmetrictheories containing particles of ‘higher spin’. Such theories are conjectured toreduce back to superstring theory when some symmetry is broken. If so, suchtheories (when constructed) will lead to a greater understanding of the globalstructure of string theory – for example, dualities. To construct a supersymmetrictheory with massive higher spin particles, one must find a geometric object calledthe ‘Super-cotton tensor’ – this describes the conformal8 (‘shapes and angles’)structure of spacetime. To help find this tensor, Wα(2s), we know that it hasfollowing dimensional form:

[W ] = [(DD)2s+1]H, (2.25)

where Hα(2s) is the gravitational superfield (when s = 1) and Dα and Dβ are‘spinor-derivative’ operators.

Roughly speaking, along the lines of Roger Penrose, one can think of a (mass-less) spinor as the ‘square root’ of some vector field. Therefore, the square of aspinor must have the same dimensionality as a partial derivative:

[D2] = [D2] = [∂

∂x]. (2.26)

Furthermore, analysis of non-minimal and type (1,1) supergravity actions leadsus to conclude that:

[H] = −1

2M, (2.27)

where M is the unit mass. Note that in natural units with the speed of light,c = 1, the unit of mass and unit of length are inverses of each other:

M = −L, (2.28)

or M = L−1 in exponential notation. Finally, note that a differential 1-form dxcan be thought of as a differential length element, hence has units of length:

[dx] = L. (2.29)

7Dimensional analysis would have saved the present author about 100 hours of supergravitycalculations – time which was largely lost due to two dimensionally-inconsistent equations in apublished journal article.

8Conformal symmetries are ones that preserve relative angles, but not lengths. For example,a scaling transformation is an example of a conformal transformation.


Since ∂∂x

can be thought of as a rate of change of length, it must have the inverseunits of dx.

I: With this information, deduce both the mass dimensions and length dimen-sions of the Super-Cotton tensor W .

II: Using the flat superspace anti-commutation relation

Dα, Dβ := DαDβ − DβDα = ∂αβ, (2.30)

where ∂αβ ∝3∑

a=1

σaαβ∂a is the spinor form of the vector derivative ∂a := ∂∂xa

,

derive the relation:[D] =

1

2[∂

∂xa] (2.31)

which was assumed in part I .

Note that you may assume [D] = [D] and that the Clifford algebra generatorσa is dimensionless. Also, note that superscript ′a′ in xa is simply a coordinatelabel. For a 3-dimensional manifold, a = 1, 2, 3 so we have local coordinatesx1, x2, x3 to keep track of points in space.

2.1.4 Moral of the story

Dimensional analysis can tell you when an equation is wrong, but it doesn’t nec-essarily imply that an equation is correct – even though its dimensions might beconsistent. As a student, you should make use of dimensional analysis when-ever you can – try it on all formulas you get which have dimensionful quantities.This will help you to gain a strong intuition of whether or not statements andequations are sensible and consistent. This helps you to be a fast calculator andit will also help you to pick up errors in your lecture notes ...


2.2 Dimensionless Constants and Fundamental Laws

2.2.1 Physical Systems, Fundamental Laws

One of the key concepts in dimensional analysis is that of dimensionless param-eters. Dimensionless parameters are important, because they allow you to char-acterise both physical and theoretical mathematical systems in a scale-invariantway. Note that mastering the following concepts and exercises requires a goodunderstanding of the material in Session 1. For the more mathematically in-clined, one of the examples and exercises illustrates how to mathematically provethe π theorem by using the rank-nullity theorem from linear algebra – this is agood exercise for understanding matrix equations and the correspondence be-tween matrices and simultaneous equations.

In terms of applications, we will use dimensional analysis to reach a deeperunderstanding of simple harmonic motion, viscous fluids, electromagnetism andEinstein’s theory of gravity.

Definition 3 (Physical Systems) A physical system in the mathematical sciencestypically consists of:

1. A set of physical parameters.

2. A set of governing equations (fundamental laws) which describe the be-haviour and evolution of the system.

3. A set of fundamental ‘units’ which describe the dimensionality of the sys-tem.

Definition 4 (Fundamental Law) A fundamental law is a principle, mathe-matical statement or an axiom used to describe a system, which cannot be de-rived from any other principles, equations or axioms.

In this sense, we can view the ‘fundamental laws’ of the natural sciences as moreheauristic notion of ‘mathematical axioms’ (formal assumptions agreed upon byutility and sensibility).

Arguably, the notion of a ‘fundamental law’ is relative to the context of one’sanalysis. For example, many classical laws, previously considered to be fun-damental, are a macroscopic consequence of quantum dynamics or statisticalmechanics. However, if we are only looking for classical effects in our analysis,we may often ignore the quantum mechanical details and treat our classical lawsas ‘fundamental’. The goal of the natural sciences – including mathematics, is toreduce the number of ‘fundamental laws’ of nature to a minimum. In this sense,one is able to capture nature in the ‘simplest way possible’9. In this manner, per-haps the largest and longest standing goal of theoretical physics, is to constructand experimentally test a full theory of quantum gravity.

Fundamental laws often go hand-in-hand with one or more special ‘dimension-less constants’ that capture information and deep insights about the mathematicsand physics of systems governed by that law. We shall now investigate suchexamples.

9Confer with ‘Occam’s Razor’.

2.2. DIMENSIONLESS CONSTANTS AND FUNDAMENTAL LAWS 21

2.2.2 Examples and Problems

Example 4 Lets take a simple, but profound 10 example – the simple harmonicoscillator. One example of a simple harmonic oscillator, is a mass placed ona frictionless tabletop attached to a spring. This string is either stretched orcompressed, then released so that the mass proceeds to undergo simple harmonicmotion. This physical system is therefore described by

1. A set of 4 physical parameters: the spring constant κ and the initial posi-tion x0 and initial velocity v0 of the mass m.

2. An equation of motion called ‘Hooke’s Law’ 11, which says that when youstretch or compress the spring, the force acting to restore the spring to itsnatural length is given by:

F = −κx (2.32)

where x is the displacement of the mass attached to the spring. Combiningthis with Newton’s 2nd Law, F = ma, we get the equation of motion forthe spring:

md2x

dt2= −κx, (2.33)

where a = d2xdt2

is the acceleration of the spring.

3. A set of 3 physical units: mass M , time T , length L (usually kilograms,seconds, metres).

Ignoring microscopic and non-linear effects, we may naively view ‘Hooke’s law’as a fundamental law (or definition) concerning systems in simple harmonic mo-tion. Therefore, it must have some special ‘dimensionless constant’ attached toit.

From the 4 parameters and 3 physical units in the simple harmonic oscillatorsystem, the Pi theorem claims that we can form one dimensionless constant. Todo this, one needs to know the dimensions of the parameters involved. Clearlyinitial displacement has dimensions of length and initial velocity has dimensionsof length /time: [x0] = L, [v0] = L − T . To work out the dimensions of thespring constant κ, we inspect the equation of motion. Since acceleration hasdimensions of length over time-squared, we have [d

2xdt2

] = L−2T . Therefore, wehave

[md2x

dt2] = [−κx] =⇒

[m] + [d2x

dt2] =[κ] + [x]

M + L− 2T =[κ] + L =⇒[κ] =M − 2T. (2.34)

Note that the mathematical symbol ‘ =⇒ ’ means ‘implies’. Now that we havethe dimensions of all parameters in this system, we can form a dimensionlessproduct. In particular, we need one inverse mass factor and two factors of timeto cancel the dimensions in [κ] = M − 2T . We can get an inverse unit ofmass from [ 1

m] = −M and two inverse time units by combining [x0] = L and

10Despite its simplicity, the (quantum) harmonic oscillator is the cornerstone for modern quan-tum field theory and particle physics. In this picture, a quantum field is an infinite continuumof simple harmonic oscillators, whose motion is captured by Fourier theory, Lie algebras andSpecial Relativity.

11After the famous pirate, Captain Robert Hooke.


[v0] = L − T . In particular, [(x0v0

)2] = 2[x0] − 2[v0] = 2L − 2(L − T ) = 2T .Hence, we get the dimensionless constant:

G :=k

m(x0

v0

)2 =⇒

[G] =[k

m(x0

v0

)2]

=[k]− [m] + 2([x0]− 2[v0])

=M − 2T −M + 2T = 0. (2.35)

Since the constant G has no formal name, we will claim it and call it the ‘Geor-gian Constant’ after St. George – the patron saint of dimensional analysis.

The last example illustrated a few important concepts. First of all, we showedthat mathematically all the information about a physical system is giving by aset of parameters, a set of physical units (corresponding to the ‘dimensions’) andat least one governing equation. Second, we showed how we can calculate theunits of an otherwise unknown constant by using dimensional analysis – this ishow we found the dimensions of the spring constant κ.

Finally, we showed in this particular case, having 4 parameters and 3 physicalunits, we were able to form one dimensionless constant: G . Although we couldhave taken any multiple or power of this constant and still arrived at a dimen-sionless quantity, is there essentially only one independent product that we canform out of the parameters in the simple harmonic oscillator. This is because G,1G , G2 or 2G for example, all contain the same ‘information’.

The last observation is one example of the ‘fundamental theorem of dimensionalanalysis’, also known as the ‘π theorem’.

Theorem 2 (Buckingham Pi Theorem) Given a system specified by n inde-pendent parameters and k different physical units, there are exactly n − k in-dependent dimensionless constants which can be formed by taking products ofthe parameters.

Thus in the last example, we saw that the simple harmonic oscillator was de-scribed 4 parameters and 3 physical units – hence as claimed, there was indeedonly 4− 3 = 1 independent dimensionless constant that we could have formed.Hence, any other dimensionless constant in this system must be some multipleor some power of G. Before doing the exercises, here is one more example fromfluid mechanics.

Example 5 In fluid mechanics, the notion of the ‘thickness’ of a fluid is formal-ized by defining its ‘viscosity’. In particular, the dynamic or shear viscosity of afluid measures its ability to resist ‘shearing’– an effect where successive layersof the fluid move in the same direction but with different speeds. For example,relative to water, glass 12 and honey have a very high shear viscosity, whereassuperfluid Helium has zero viscosity 13.

Given a fluid trapped between two parallel plates–the bottom plate being station-ary and the top plate moving with velocity v parallel to the stationary plate, themagnitude of the force required to keep the top plate moving at constant velocityis given by:

F = ηAv

y(2.36)

12The myth about old church windows sagging is not due to the fact that glass can be modelledas a viscous liquid, but rather due to the glass-making techniques of past centuries.

13The transition to the ‘superfluid’ phase occurs below 1 Kelvin – i.e. close to absolute zerotemperature.


Here v is the speed (magnitude of the velocity) of the top plate, A is its surfacearea and y is the separation distance between the plate. The parameter η isdefined to be the shear viscosity of the fluid. We can calculate its units usingdimensional analysis. First, from Newton’s 2nd law we know that the force hasthe dimensions: [F ] = M + L − 2T . Furthermore, the area A has dimensionsof length-squared [A] = 2L, the speed v has dimensions [v] = L − T and theseparation y has dimensions [y] = L. Hence

[F ] =[η] + [A] + [v]− [y] =⇒[η] =[F ]− [A]− [v] + [y]

=(M + L− 2T )− 2L− (L− T ) + L

=M − L− T (2.37)

whence η has units of MLT

. Now, the kinematic viscosity ν 14 of the fluid is definedas the ratio of the dynamic viscosity η and the density ρ (mass per volume) ofthe fluid:

ν =η

ρ. (2.38)

Since density has units of mass per length-cubed, we have [ρ] = M − 3L andthus

[ν] = [η

ρ] = [η]− [ρ] = M − L− T − (M − 3L) = 2L− T. (2.39)

In some set of scenarios, we can think of this fluid as parameterized by fourparameters: density ρ, shear viscosity η , kinematic viscosity ν and the fluidspeed v (assuming the fluid only travels in the horizontal direction). Since wehave three different physical units – mass, length and time, the Pi theorem tells uswe can form one independent dimensionless constant. This special, widely-usedconstant is called the ‘Reynolds number’ of the fluid and is defined by:

R =ρvl

η=lv

ν(2.40)

where l is the ‘characteristic length scale’ for the fluid system (e.g. for a fluidflowing in a pipe, this length scale would be the diameter of the pipe).

In essence, the Reynolds number expresses the ratio of inertial forces to theviscous forces. In this manner, it describes relative importance of these twotypes of forces in different scenarios. Since it is dimensionless, the Reynoldsnumber is scale invariant – meaning it characterises the way a fluid will flow onall length scales (within the valid regime of your theory).

Exercise 4 We defined the Reynolds number R in two ways – one in terms of itsdynamic viscosity η and the other in terms of its kinematic viscosity ν. Show thatthe Reynolds number is dimensionless using both of its definitions.

The Reynold’s number also controls the amount of ‘turbulence’ present in a fluidsystem – with high Reynolds numbers corresponding to turbulent flow. There-fore, by evolving the dimensionless Reynolds constant from low values to highvalues, we will see a laminar flow turn into one with instabilities, vortices andchaos.

14This is the Greek letter ‘nu′ - not the Roman letter ‘v’.


2.2.3 Buckingham Pi-Theorem

Formally, the rank-nullity theorem states that given a m × n matrix (m rows, ncolumns) A, which maps n-dimensional vectors in Rn to m-dimensional vectorsin Rm, then the rank and nullity of the matrix A satisfy:

rank(A) + nullity(A) = n (2.41)

where the rank ofA is defined as the number of linearly independent row vectors(or column vectors) of A and the nullity of A is defined as the dimension of thekernel of A – i.e. the number of linearly independent n-dimensional vectorswhich get mapped to 0 by A. Note that m ≤ n necessarily (or the system isover-determined).

In the context of dimensional analysis and the π Theorem, we can think of math-ematical or physical system with n parameters and k different types of funda-mental units (dimensions) as a system of n linear equations (one for each pa-rameter) in k variables (the units). In particular, we make use of the additive or‘logarithmic’ notation which we have been using for dimensional analysis.

Problem 6 (π Day, π theorem) On 04/03/2015, Tibra Ali decides to have a Pibattle at the Perimeter Institute for Theoretical Physics. On the same day, An-gela Burvill and Joshua Bailey decide to have a Pi eating contest – where eachstudent has to eat one frozen meat pie for each correct digit of Pi that other re-cites. However, realizing that becoming a mathematician requires thousands ofhours of diligence, William decides to prove the ‘Buckingham Pi theorem’ usingthe ‘rank-nullity’ theorem from linear algebra – which he remembers from lastsemester! To up the stakes, Ben Luo decides to dangle Rowan Seton from thetop of the college tower till William proves his theorem.

Assuming Ben has finite strength, save Rowan by proving the Pi theorem withWilliam.

Hint for proof: For the above problem, note that we can view a system withparamters χ1, ..., χn, fundamental units u1, ..., uk and the following dimensionsfor the parameters:

χ1 =λ11u1+ . . .+ λ1kuk

χ2 =λ21u1+ . . .+ λ2kuk...

......

...χn =λn1u1+ . . .+ λnkuk , (2.42)

as a system of n linear equations in m variables. One can now apply the rank-nullity theorem to this system that the number of dimensionless constants whichone can form from the corresponding physical system, should be equal to thenullity of the n × k ‘dimensional matrix’ formed by the coefficients λij wherei = 1, ..., n and j = 1, ..k.

Example 6 As a simple example of the algebra required, say we have a systemwith three parameters x, y, z and two fundamental physical units U1, U2. We canrepresent the dimensions of our parameters as a matrix by letting each columncorrespond to different parameters and letting each row correspond to differentfundamental units. Therefore, we let the first column correspond to the parame-ter x, the second column to y and the third column to z.

The he first row corresponds to the unit U1 second row to the unit U2. Then theentry in the first row and column corresponds to the number of dimensions of x


has in the unit U1. So if for example, x has the units Ua1U

b2 then it has dimensions:

[x] = [Ua1 ] + [U b

2 ] = aU1 + bU2. Similarly, let y have units U c1U

d2 and z have

units U e1U

f2 : hence [y] = cU1 + dU2 and [z] = eU1 + fU2.

Forming the ‘dimensional matrix’ D for this physical system, we have:

D =

a c eb d f

(2.43)

To see that this makes sense, we can simply act15 the transpose of the dimen-

sional matrix DT on the vector U =

U1

U2

containing the physical units to re-

cover all three of our dimensional equations [x] = aU1 + bU2, [y] = cU1 + dU2

etc. To find dimensionless constants, we have to solve the ‘nullspace equation’:

a c eb d f

αβγ

=

00

for all possible vectors

αβγ

. In particular, dimension-

less constants will be a product of powers of the different physical parameters:

xαyβzγ , where the exponents α, β, γ are components of a vector

αβγ

which

solves the nullspace equation.

The number of linearly independent vectors

αβγ

which solves the null-space

matrix equation, coincides with the ‘nullity’ of the dimensional matrix D – itis precisely equal to the number of dimensionless constants we can form. Inparticular, since we have n = 3 independent physical parameters x, y, z corre-sponding to three columns of our dimensional matrix D and k = 2 fundamentalunits U1, U2 corresponding to the two (linearly-independent 16) rows of D, therank-nullity theorem tells us that the nullity of D is given by

nullity(D) = n− k = 3− 2 = 1. (2.44)

Since the nullity of D is precisely equal to the number of dimensionless con-stants we can form for this physical system, this shows that the π Theorem fordimensional analysis, is just a special instance of the rank-nullity theorem forlinear algebra.

2.2.4 Gravity, The Hierarchy Problem and Extra-DimensionalBraneworlds

The following is an extended set of exercises which test all the skills the tuto-rials have elucidated so far in dimensional analysis. It will also you introduceto some concepts which may be new and bizarre, whilst linking them back toeveryday reality. The overall goal will be to derive a dimensionless constantthat characterises classical gravity on all length scales (no knowledge of relativ-ity is required)! By comparing this constant to another dimensionless constantfrom electromagnetism, we will see why gravity is so much weaker than theother three forces in nature – then investigate a solution to this peculiarity usingbrane-world models of the universe.

15By matrix multiplication.16These rows are necessarily linearly independent, since we assume our fundamental physical

units to be independent – by definition.


As far as we understand, all interactions in nature take place through four funda-mental forces. At present, we have a rather ‘successful’ theoretical and experi-mental quantum description of three of these forces – that is, we have constructedquantum field theories to describe the ‘quanta’ (particles) which mediate theseforces. Gravity, despite our everyday experience of it, remains somewhat myste-rious and theoretically elusive in several ways – in particular, because it is highlyresistant to all attempts to turn it into a quantum theory like the other forces. Ifthere really any hope of long-distance interstellar space travel and other extreme‘sci-fi’ technologies, a theory of quantum gravity will be the cornerstone.

As a reminder, the four forces dictating our universe are the

• Electromagnetic Force: Which governs electromagnetic radiation (such aslight) as well as interactions between charged particles. In the quantumdescription (Quantum Electrodynamics), this force is carried by masslessparticles known as ‘photons’.

• Weak Nuclear Force: In the quantum description, this force is mediated bymassive particles known as the Z and W± bosons. It is involved in quarktransformations as well as some interactions between charged particles.

• Strong Nuclear Force: In the quantum description (Quantum Chromodyan-mics), this force is mediated by ‘gluons’ and is responsible for the inter-actions between quarks, which are the particles making up hadrons suchas the proton and neutron. In this manner, it is responsible for processessuch as fusion, which is the source of energy for our sun.

• Gravitational Force: In the attempted quantum descriptions, this force ismediated by a massless particle known as the ‘graviton’. It is responsiblefor the interactions of all particles with mass, but also determines the tra-jectories of massless particles (e.g. gravitational bending of light) since itwarps the spacetime continuum.

At higher energies, these four forces start to unify into one single force – forexample, the electromagnetic and weak nuclear forces unify to make the elec-troweak force. Attempts to unify the electroweak and strong nuclear forceshave been partially successful and fall under ‘The Standard Model’ of particlephysics. On the other hand, attempts to unify gravity with the other forces havebeen largely unsuccessful, with the only real promising candidate being StringTheory.

One of the biggest mysteries about the gravitational force, is why it is so weakcompared to the other forces in nature. In some sense this is ‘unnatural’, hencesuggests that on some deeper level, gravity is fundamentally different form theother forces. As the goal of this tute, we will use dimensional analysis to charac-terise the gravitational and electromagnetic forces with some special dimension-less constants – then compare their strengths to prove this claim. Finally, we willend on some very recent 17 advancements in theoretical physics which proposean explanation of why gravity is the weakest of the four forces.

Exercise 5 (Newton, Einstein and Braneworlds: The Gravitational Coupling Constant)Of the many things that Isaac Newton is famous for, one of them is coming upwith multiple mathematical proofs of the fact that the planets orbit the sun inelliptical paths – and that this elliptical motion is a direct consequence of aninverse square law. Thus, by planar geometry and calculus he came up withthe following gravitational force law to explain the astronomical bservations of

17The last 5-10 years.


Johannes Kepler and Tycho Brae:

~F = −GNm1m2

r2~r (2.45)

where GN is Newton’s gravitational constant, m1 and m2 are the masses of twoobjects separated by a distance r and ~r is a ‘unit vector’ (vector with magnitude1) pointing from one object to the other. This tells us the gravitational force thatone massive object exerts on another massive object.

QI:Using Newton’s 2nd Law, ~F = m~a, deduce the dimensions or units of GN .Note that you are working with mass, length and time (M,L,T) as your funda-mental units, hence [m1] = [m2] = M . Furthermore, by definition the unitvector 18 ~r = ~r2−~r1

|~r2−~r1| is dimensionless: [~r] = 0. Note that in general, the di-mensions or units of a vector quantity are always the same as the units of themagnitude (and components) of that vector – hence [~r] = [r] for example.

Now that we have the dimensions of GN , we are ready to consider Einstein’stheory of gravitation. Einstein’s theory differs from Newton’s theory in manyways – fundamentally it explains gravity as a consequence of spacetime curvingaround any object with mass, where the ‘amount’ of curvature being greaterfor greater masses (e.g. the Sun). On an astrophysical level, it is importantas it helps to explain the big bang, solar fusion and the existence of the blackholes – objects which are necessary for the stability of some galaxies such asthe Milk Way. In terms of everyday living, general relativity is essential for theoperation of GPS satellites – without the gravitational corrections to the timing(gravitational time-dilation) offered by Einstein’s theory, the GPS system wouldnot be accurate enough to work.

In Einstein’s theory, spacetime is modelled by the following objects 19

• A energy-momentum tensor T which contains information about ‘sources’of curvature – matter and energy. It’s components have dimensions ofan energy-density: [Tab] = [Energy

V olume] = M − L − 2T . Since the tensor

itself is a second-rank covariant tensor, we have: [T] = [Tabdxa ⊗ dxb] =

[Tab] + [dxa ⊗ dxb] = M − L− 2T + 2L = M + L− 2T .

Note that the dimensionality of energy can be deduced from the relation:Work = Force × Distance and hence [Energy] = [Work] = [Force] +[Distance] = M + L− 2T + L = M + 2L− 2T .

• A metric tensor g describing how gravity distorts measures of length andtime. This has units of length-squared: [g] = 2L.

• The Riemann Curvature tensor, Riem, describes how the curvature ofspacetime varies in different regions. It also measures how gravity distortsparallel-transport. It is given roughly 20 as the anti-symmetrized secondtensor ‘gradient’ of the metric: Riem ∼ ∇⊗∇⊗ g, where ∇ are a typeof derivative operator and ⊗ is a type of multiplication for tensors.

18Here ~r1 and ~r2 are the position vectors describing the location of the masses m1 and m2

with respect to some origin.19Note that most physicists do not understand differential geometry, hence when they speak

of tensors they usually are talking about components of tensors. This won’t matter here, but forreference, if you ever want to compare: covariant tensors have two extra factors of length com-pared to their components and contravariant tensor have two factors less than their components– which basically means adding ±2L to the dimensions.

20Don’t ever show this to a differential geometer. If you want the real definition, see me.


• The Ricci tensor, Ric, is given by taking the trace of the Riemann tensor:Ric = Trace(Riem). It describes how gravity distorts volumes and isalso related to how different geometries evolve under the heat equation.

• The Ricci ScalarR – this quantity is a function which measures how grav-ity locally distorts volumes. Einstein’s theory can be derived by sayingthat nature minimizes this quantity – an approach due to a mathematiciannamed David Hilbert 21. It is given by the taking the trace of Riemanntensor twice: R = Trace(Trace(Riem)) = Trace(Ric).

QII:Using the above information, derive the dimensions of Newton’s gravita-tional constant GN again, this time using Einstein’s law of gravity:

Ric− 1

2Rg =

8πGN

c4T. (2.46)

You will need the following facts: the derivative operator ∇ reduces the lengthdimension of a tensor by one factor, whereas the tensor product⊗ raises it by onefactor (in this case). Hence [Riem] = 2[∇]+2[⊗]+[g] = −2L+2L+2L = 2L.Furthermore, the trace of a (covariant) tensor reduces its length dimension bytwo factors, hence for example: Trace[Riem] = [Riem]− 2L.

Tip: To ease calculations, you may use so-called ‘natural units’ where the speedof light c = 1. In these units length and time have the same dimensionality, hence[c] = [Distance]− [Time] = 0 and T = L. You will then get the dimensions ofGN in natural units which you can compare to your value of GN using Newton’sLaw, after you set T = L.

Finally, we are in a position to understand a very special dimensionless con-stant – the ‘gravitational coupling constant’, αG. Since it is dimensionless, thisconstant characterises the strength of the gravitational force on all length scales(within the regime of validity of Einstein’s theory). It can be defined in terms ofany pair of stable elementary particles – in practice, we use the electron.

In particular, we have:

αG =GNm

2e

hc≈ 1.7518× 10−45 (2.47)

where c is the speed of light, GN is Newton’s gravitational constant and me isthe mass on an electron. The quantity h = h

2πis the reduced Planck constant

which characterises the scale at which matter exhibits quantum behaviour suchas wave-particle duality 22

QIII:Show that the gravitational coupling constant αG is indeed dimensionless.Note that [me] = M . To work out the dimensions of h = h

2π, you will need the

Planck-Einstein relation which relates the energy of a photon (particle of light)its frequency:

E = hf. (2.48)

Then [h] = [E]− [f ]. Since the frequency of light is the number of oscillationsof the electromagnetic wave per unit time, we have [f ] = −T . You can get thedimensions , [E] of energy E from the calculation shown above for the energy-momentum tensor.

21In retrospect, David Hilbert deserves almost the same level of credit as Einstein for thetheory of general relativity.

22If h was really large – say h ≈ 1 for example, then we would observe wave-particle dualityon a macroscopic scale and the universe would be a scary, crazy place. Bullets would diffractthrough doorways and Leanora’s fists could quantum tunnel through walls.


Now, for the last part of this problem, we introduce one more fundamental phys-ical unit: the unit of electric charge, Q 23. Similar to the gravitational couplingconstant, there is a dimensionless constant which characterises the strength of theelectromagnetic interaction (which is responsible for almost all of chemistry) –the ‘fine structure constant’ αEM . The value of this constant is (accurately) pre-dicted and measured using the theory of Quantum Electrodynamics, which isa type of quantum field theory largely due to Richard Feynmann and FreemanDyson. It is given by

αEM =1

4πε0

e2

hc(2.49)

where ε0 is electric permittivity of the vacuum. It has units [ε0] = [Farads/Meter] =[Seconds4 Amps2 Meters−2 kg−1] = 4T + 2Q − 2T − 2L − M . Hence[ε0] = 2T + 2Q − 2L − M . The parameter e is the charge of an electron,with dimensions [e] = Q.

Using ‘natural units’ – a popular convention in particle physics, we set all of ourprevious parameters to equal 1. Thus, 4πGN = c = h = ε0 = 1, where ε0 iselectric permittivity of the vacuum. In these units, the fine-structure constant isgiven by

αEM =e2

4π≈ 7.297× 10−3. (2.50)

QIV:Choosing natural units: 4πGN = c = h = ε0 = 1, is the same as forcingthese parameters to be dimensionless. Show that this is equivalent to setting allthe fundamental units to be the same T = L = M = Q. Hint: you should getfour equations for the dimensions of these parameters.

Note that you can calculate the values of the fine-structure and gravitational cou-pling constants yourself by Googling their values in SI units (or any other con-sistent set of units you choose). Taking their ratio, we see that (in natural units):

αEMαG

= (e

me

)2 ≈ 7.297× 10−3

1.752× 10−45≈ 4.16× 1042. (2.51)

This says that the electromagnetic force is about 42 orders of magnitude24 strongerthan the gravitational force. In a similar fashion, the weak-nuclear force is about32 orders of magnitude (1032) times stronger than gravity. The challenge toexplain why gravity is so weak compared to the other forces is known as ‘theheirarchy problem’.

One class of attempts to solve the heirarchy problem, involves the visible uni-verse being confined to a 4-dimensional ‘brane’, which is basically a 4-dimensionalslice living in a larger spacetime. Such models are called ‘braneworld models’.In this view, the electromagnetic, weak and strong nuclear forces take place onthe 4-dimensional brane – but gravitational interactions (mediated by ‘graviton’particles) take place in 4-dimensions and in the ‘large extra dimensions’. Thisthen gives a natural explanation to the gravitational coupling constant being sosmall. In some variations 25, the introduction of large extra-dimensions alsosolves the ‘Dark Energy’ or ‘Cosmological Constant’ problem – where Dark En-ergy naturally arises as the ‘surface tension’ of the 4-dimensional brane. Usingbraneworld models, we can derive (!) Newton’s gravitational constant directlyfrom the size (‘hyper-volume’) of the extra dimensions in our universe.

23The SI unit for charge is Coulombs.24Note, 42 is also the meaning of life.25Those investigated in the present author’s masters thesis.


A very special class of braneworld models , known as known as theories with‘Supersymmetric Large Extra Dimensions’ envisions spacetime as 6-dimensional(4-dimensional brane + 2 large extra dimensions) with some super-symmetryadded – this enables bosons and fermions to transform into each other 26. Inthese models, the extra-dimensions take the form of some compact hypersur-face. Newton’s gravitational constant GN is then derived from the relation 27:

GN =3κ2

16πS(2.52)

where S is the surface-area of the extra dimensions and κ is Einstein’s constant,with dimensions [κ] = [GN ].

QV:The above formula forGN is correct, even though it may look dimensionallyincorrect. What units would S need to have for dimensional consistency? In thatcase, what quantity does the surface-area S actually represent? Hint: Recall the‘unit vector’ in Newton’s law of gravity.

The last problem illustrates a common theme in engineering, physics and math-ematics – normalization. Normalized quantities are typically dimensionless! Assuch, they are very useful and friendly to work with.

26Supersymmetry removes the problem of Tachyons in String Theory and also stabilizes themass of the Higgs boson.

27First derived in this generality by the present author in 2013.

Chapter 3

Geometry of Antiquity and TheUniverse

3.1 Introduction: Conic Sections

Our scientific perception of the world today, is due largely to the great geometersof antiquity. Pythagoras’ theorem for example, essentially defines the ‘straight-line’ (Euclidean) distance between two points in space – giving us Euclideanpreconceptions of the world. In this manner, one of the most influential devel-opements that the Greeks left us with, is the theory of conic sections. Developedto a large extent by Appolonius and Archimedes, conic sections have provided acore staple of the framework for the scientific renaissance instigated by Galileoand Kepler – leading ultimately to Newton’s theory of gravity, the planetary or-bits and a heliocentric view of the universe.

Definition 5 A traditional conic section is the curve of intersection, obtained byslicing a cone with a plane. Geometricallys, a general conic C is a set of pointsS whose distances to a fixed point (focus) F and a fixed line (directrix) l are ina constant ratio (the eccentricity) ε. Algebraically:

p ∈ C ⇐⇒ d(p, F )

d(p, l)= ε, (3.1)

where d is any metric (measure of distance).

Note that in the special case of a (Euclidean) circle, the focus is at the centerof the circle and the directrix is at infinity – hence the eccentricity ε = 0 for acircle.

The following problem should be re-attempted at the end of each (conic) sectionof this chapter.

Problem 7 (GoPro or Go Home) Frustrated by his attempts to retake Constantino-ple from the neo-Ottoman empire, the cyborg Emperor Constantine decides to gohome. Getting into his skytaxi, which travels on fixed skylanes which permit onlyperpendicular turns, Constantine realises that he travelling in an l1 metric space– the ‘taxicab geometry’. Here the distance between two points P1 = (x1, y1)and P2 = (x2, y2) in R2 is defined by the so-called taxi-cab metric:

d1(P1, P2) := |x1 − x2|+|y1 − y2|. (3.2)

1. Using the geometric definition of a parabola, sketch the graph of a fewparabolas with different focal lengths in the taxi-cab metric.

31

32 CHAPTER 3. GEOMETRY OF ANTIQUITY AND THE UNIVERSE

2. Using the geometric definition of a circle, sketch the graph of a unit circlein the taxi-cab metric.

3. Using the geometric definition of an ellipse, sketch the graph of a fewellipses – with varying eccentricity, in the taxi-cab metric.

4. Using the geometric definition of a hyperbola, sketch the graph of a fewhyperbolae in the taxi-cab metric.

5. Compare the above ‘taxi-cab’ conic sections to graphs of the correspond-ing Euclidean conic sections.

3.2 Parabolas and Geometric Optics

3.2.1 Overview

The first, most significant application of parabolas, was in Galilleo’s revolution-ary projectile motion experiments. Sesequently, they served as ‘Victoria’s secretmodels’ for Isaac Newton’s ‘Principia Mathematica – in particular, in his analy-sis of conic sections and Kepler’s laws of planetary motion.

We shall begin first by giving a general geometric definition of the parabola,then deriving the canonical (natural) equation for a Euclidean parabola in carte-sian coordinates. Once this is established, we will investigate and derive a fewremarkable properties of parabolas – in particular, motivated by the ‘science oflight’ (optic). Finally, we will study some fun, practical applications of parabo-las in regards to the natural world – geometric optics, projectile motion andparabolic orbits.

3.2.2 The Parabola

Definition 6 (Geometric Definition) A parabola is the set of points which isequistant from a focus (fixed point) and directrix (fixed line).

It follows from the definition, that a parabola is a conic section with eccentricityε = 1. In particular, it can be obtained by slicing a cone parallel to a planetangent to the cone.

See Whiteboard Diagram

By now, most of you will be familiar with algebraic forms of the parabola. Forexample, as a rational normal curve with exponent 2 in algebraic geometry, or ascartesian equation: y = x2 from Euclidean geometry. You will now derive thecanonical Euclidean equation y = x2 from the geometric definition.

Exercise 6 (Pachelbel’s Parabola and The Canon Equation) Having being am-bushed by the ineffable weeping angels, Dr. Who is forced back to the Renais-sance Era. In a severe misunderstanding, he accidentally replaces Pachelbel’sCanon for a derivation of Canoncial Euclidean parabola equation – which heneeds to return to his own timeline. Trying to make sense of the parabola,Pachelbel decides to invent the Cartesian coordinate system so he can graphthis technology of the ‘future’.

Help Palchelbel by deriving the canonical parabola equation, while sketchingevery step clearly.

3.2. PARABOLAS AND GEOMETRIC OPTICS 33

1. Draw horizontal (x) and vertical (y) coordinate axes. On the vertical axis– the symmetry axis of the parabola, mark the origin, (0, 0) – this is thevertex of the parabola. Upwards from the origin, on the symmetry axis,mark the point F = (0, f) – this is the ‘focal point’ (focus) of the parabolaand f is the ‘focal length’. Below the x-axis, draw the line y = −f – thisis the ‘directrix’ of the parabola.

Note that the value, y = −f for the directrix, can be derived from thedefinition of the parabola having chosen the origin 0 = (0, 0) to lie on theparabola. In particuarly, the length OF is equal to the distance from O toa point perpendicularly below O on the directrix.

2. Pick any point P in the plane – preferably one in the positive (x, y) quad-rant. Now draw a line FP between this point and the focus F . Drawanother line PD from P to a point D perpendicularly below on the direc-trix. By the definition of the parabola, the lines FP and PD should haveequal length. Therefore, using Pythagora’s theorem to compute the lengthof FP , derive the following relation:

(y + f)2 = x2 + (y − f)2. (3.3)

3. Using the above equation, show that:

y =x2

4f. (3.4)

This is the cartesian equation for a Euclidean parabola with focal lengthf , axis of symmetry along the y-axis and vertex (0, 0). Setting f = 1

4, we

get the ‘canonical parabola equation’:

y = x2. (3.5)

Problem 8 (The Doctor’s Cannon) Having finished his derivation, Pachelbelreturns to Dr. Who to verify his mathematical construction. At this point, Dr.Who has added skrillex to Pachelbel’s cannon. Furious, Pachelbel demands thatDr. Who remove all new additions to the cannon. Reluctantly, Dr. Who decidesthat he shall acquiesce provided that Pachelbel removes all redundant steps fromhis mathematical derivation and justify its generality.

Help save history from skrillex by helping Pachelbel in his derivation. In par-ticular, some steps in the above derivation provided superfluous, ‘a-priori’ infor-mation. Can you identify which ones?

Furthermore, we choose the origin to be the vertex and the y − axis to be thesymmetry axis – this made calculations easier. What ‘obvious’ properties of Eu-clidean space allow us to do this, without losing any generality in our derivation?

Problem 9 (Constantine’s Plasma Cannon) On his way home, cyborg EmperorConstantine’s taxicab is ambushed by the weeping angels who are hunting Dr.Who throughout spacetime. As a result, the emperor is teleported back to Pachel-bel’s study in the Renaissance era. Seeing this opportunity, Dr. Who and Pachel-bel beg for the emperor’s help – in particular, his plasma cannon should give theangels something real to weep about. To this end, Constantine decides he willhelp Dr. Who and save the universe from skrillex music ... iff Dr. Who helps himto sketch and derive the parabola equation in the taxi-cab metric.

Help Constantine help Palchelbel help Dr. Who, by writing down the cartesianequation for a parabola of focal length f with distances defined by the taxi-cabmetric d1 instead of the euclidean metric.


Now sketch this parabola.

3.2.3 Scale Invariance and Transcendality

Recalling from earlier exploration sessions, we studied several physical systemsand laws of the universe which exhibited very special constants – in particu-lar, ‘dimensionless constants’ which characterised such laws or systems on alllength scales. The Reynold’s number for fluids and fine-structure constant forquantum electrodynamics were two such constants. Now we present a mathe-matical constant which characterises the ‘shape’ of all parabolas in a universal,scale-invariant way. Since this constant is dimensionless, it is invariant underconformal transformations. 1

First, we must define the ‘Latus rectum’ of the parabola. In particular, the latusrectum of a parabola is the chord perpendicular to the symmetry axis (i.e. parallelto the directrix) which passes through the focus F and intersects the parabola oneach side of the symmetry axis.


Exercise 7 (Parabolic Proctology) Using the geometric definition of a parabola,prove that the latus rectum has a length of 4f , where f is the focal length of theparabola.

Hint: For a parabola of the form y = x2

4f, note the y-coordinates of the point at

which the latus rectum intersects the parabola.

Hint: Since you’re using the geometric definition of a parabola, you will have tomake use of the directrix – which is conveniently located at y = −f if you chosethe above parabola.

Definition 7 (Universal Parabolic Constant) The universal parabolic constantP , is defined as the ratio (for any parabola), of the arc length S of the parabolicsegment formed by the latus rectum to the focal parameter 2f (half the latus-rectum length) :

P =S

2f. (3.6)


Exercise 8 (Who would like to write a Fugue?) Whilst waiting for the cyborgemperor to take care of the angels, Dr. Who picks up an renaissance guitarancestor and plays ‘While my guitar gently weeps’. Unsatisfied, he decides towrite a Fugue. Fugue’s, interpretted in the right sense, possess (almost) confor-mal symmetry. One particular conformal symmetry is the ‘dilation/contraction’operation – which shrinks or expands vectors (and hence objects).

If we let f have units of length, use dimensional analysis to prove that the uni-versal parabolic constant is dimensionless.

Now, for a more serious derivation, we shall calculate the exact value of P andprove a remarkable number-theoretic property – that it is transcendental.

Problem 10 (Transcendence (Hard)) 1. Simplifying the problem: Becauseof translational and rotational symmetry, it suffices to consider a parabolaof the following form: y = x2

4fwith the y axis as its symmetry axis and

origin (0, 0) as the vertex.1Roughly, transformations that preserve relative angles but not lengths.


2. Calculating parabolic arc-length To calculate the arc-length of parabolacut-off by the latus rectum, we express the parabola as a parametric curveγ with curve parameter x:

γ(x) = (x,x2

4f), (3.7)

hence γ maps the parameter x to the corresponding point (x, y) = (x, x2

4f)

on the parabola.

Since the tangent vector to this curve represents infinitesimal rates of changealong the curve (with respect to parameter x), it is given by the velocityvector:

d

dxγ(x) =

d

dx(x,

x2

4f) = (1,

x

2f). (3.8)

In particular, an infinitesimal length element along the curve, is repre-sented by the vector (differential 1-form):

dγ = (1,x

2f)dx, (3.9)

which has magnitude:

ds = ‖(1, x2f

)‖dx =

d1 +

x2

4f 2dx. (3.10)

Hence, if we integrate this length element from x = −2f to x = +2f (theend points of the latus rectum), we get the parabolic arc length we desire:

S =

2f∫−2f

d1 +

x2

4f 2dx. (3.11)

Now, the universal parabolic constant was defined to be P = S2f

, hence:

P =1

2f

2f∫−2f

d1 +

x2

4f 2dx. (3.12)

3. Integration Step I Use a change of variables to prove that we can simplifythe arc-length integral to the following canonical form:

P =

1∫−1

?1 + t2dx. (3.13)

This form is ‘canonical’ in the sense that focal length f doesn’t appearanywhere in the integral.

4. Integration Step II Use trigonometric substitution (or otherwise) to showthat:

P = arcsin(1) +?

2. (3.14)

Hint: Recall the hyperbolic trigonometric identities:

cosh2(θ)− sinh2(θ) = 1 =⇒ 1 + sinh2(θ) = cosh(θ), (3.15)

cosh(2θ) = cosh2(θ) + sinh2(θ) = 2 cosh2(θ)− 1. (3.16)


5. Algebraic Simplification Using the definition of hyperbolic sine:

sinh(θ) =eθ − e−θ

2, (3.17)

along with the quadratic formula:

az2 + bz + c = 0 ⇐⇒ z =−b±

?b2 − 4ac

2a, (3.18)

prove thatarcsin(1) = ln(1 +

?2). (3.19)

Hint: let z = eθ, then solve sinh(θ) = 1 for theta using the Euler expan-sion for sinh given above.

6. Transcendality Recall that a real number α is transcendental if it is not theroot of any polynomial equation with rational coefficients. Real numberswhich are roots of polynomials with rational coefficients are ‘algebraic’numbers. Hence if a number is transcendental it cannot be algebraic andvice-versa. It follows that the sum of a transcendental number and analgebraic number is necessarily transcendental.

To see that the universal parabolic constant P = ln(1 +?

2) +?

2 istranscendental, it suffices to prove that ln(1 +

?2) is transcendental. This

is because?

2 is irrational, but not transcendental: in particular, we canform a quadratic equation with rational coefficients: x2 − 2 = 0, of which?

2 is a root.

To see that ln(1 +?

2) is transcendental, we do a proof by contradiction.In particular, the Lindemann–Weierstrass theorem implies that if λ is alge-braic (not transcendental), then eλ is necesarily transcendental. Hence, ifln(1 +

?2) were algebraic, eln(1+

?2) = 1 +

?2 would be transcendental –

however, it is clearly not since this is a root of a quadratic with rational co-efficients. Therefore, ln(1 +

?2) is transcendental and hence the universal

parabolic constant:

P = ln(1 +?

2) +?

2 u 2.295587, (3.20)

is a transcendental number.

Problem 11 (Tying loose ends) Prove the assertion that 1+?

2 is an algebraicnumber. In particular, find a polynomial with rational coefficients such that oneof its roots is equal to 1 +

?2.

Hint: Recalling elementary polynomial theory, roots of the form: α +?β –

where α, β are integers, come in pairs: λ = α ±?β. Therefore, you should be

looking for a quadratic.

Problem 12 (Pi Day) If you didn’t celebrate Pi day, use the Lindelmann-Weirstrasstheorem to prove that π is transcendental. In particular, recall Euler’s formula:

eiπ + 1 = 0. (3.21)

Hint: Try assuming that iπ is algebraic.

Look at what we achived so far – we have proved that parabolas are charac-terised by a transcendental dimnesional constant. Transcendental numbers areextremely rare – e and π being the most famous examples.


3.2.4 Symmetries and Canonical Form

The natural symmetries of Euclidean space are symmetries which preserve theEuclidean metric – that is, transformations of Rn which leave lengths and rela-tive angles (i.e. angles between vectors) unchanged. In elementary terms, theseare symmetries which leave the ‘dot-product’ unchanged. Because of this, asparabolas are invariant under rotations and translations – their governing equa-tions in a given coordinate system might change, but the parabola itself will beunaffected. For example, translations simply correspond to a shift in the focusF of the parabola, whilst rotations correspond to a rotation of the directrix D ofa parabola. Therefore, we can define a parabola more abstractly in the followingway.

Definition 8 (Ogburn’s Definition) Given a metric space (M,d) with set Mand metric d, a parabola is the ordered pair (F,D) where F ∈ M and D is astraight line in (M,d), satisfying the following properties:

1. Focal Parameter: The minimum distance between F and D is 2f.

2. Parabolic Property: When (F,D) acts on any subset S of M , the result isthe collection of points U ⊂ S which is equidistant from F and D:

d(U,D) = d(U, F ). (3.22)

In this manner, it becomes clear that if d is an inner product – such as theEuclidean metric (dot-product), then a parabola (F,D) will be preserved byisometries (rotations and translations for Euclidean space) since they preservethe ‘parabolic property’ and ‘focal parameter’.

In Euclidean space, we can take any parabola and apply a sequence of transfor-mations to it so that it becomes a canonical parabola y = x2

4f. In particular, we

will need at most 2 translations to move the focus to F = (0, f), followed by atmost 1 rotation to rotate the directrix to coincide with the line y = −f . Provingthis for parabolas which have only been translated and/or rotated by multiples of90 degrees, is relatively simple – which we shall do now.

Problem 13 (Transformations and Canonical Form) 1. Translations Givena parabola of the form:

ay2 + bx2 + cy + dx+ e = 0, (3.23)

where a, b, c, d, e are real constants and either a or b is zero, complete thesquare to get a parabola of the form:

(y − y0) =(x− x0)2

4f, or (x− x0) =

(y − y0)2

4f. (3.24)

In particular, find the vertices (x0, y0) and focal lengths f for these parabo-las in terms of a, b, c, d and e.

2. Vertices Using the previous equations:

(y − y0) =(x− x0)2

4f, or (x− x0) =

(y − y0)2

4f, (3.25)

prove that (x0, y0) is indeed the vertex of each of these parabolas.

Hint: It suffices to show that (x0, y0) is a minimum or maximum criticalpoint (turning point) of each the curves. Use calculus.


Problem 14 (DIY) For parabolas which have been rotated through some arbi-trary angle θ, we note that parabolas can be put into 1-1 correspondence withquadratic forms. Using the quadratic form corresponding to a given parabola,we can then apply change of basis transformations (rotation matrices) to rotatethe parabola back into the standard orientation with the y axis the symmetryaxis. Investigate this when you get the chance!

3.2.5 Optical Properties and Spherical Aberration

Due to their reflective properties, parabolas act as the ideal shape for many mir-rors and lenses. In reality, parabolic lenses are difficult to construct, so ‘sphericallenses’ are used instead. To this extent, one takes the radius of curvature of sucha lense be large relative to the length of the lense – then one can approximate theportion of circle traced out by the lense as a parabola. Such an approximation isthe basis for a large amount of classical optics – for example, lens making.

Perhaps the most ‘physically’ important mathematical property of the parabola,is its ‘parabolic reflection property’. To this extent, in the following, we shalltreat parabolas as ‘reflective surfaces’ and take it for granted that light travels instraight lines (geodesics to be precise). Furthermore, we shall assume the law ofreflection: that is, that the angle between the normal to a surface and the incidentlight ray is equal to the angle between the reflected light ray and the normal tothe surface. Mathematically:

θincidence = θreflection. (3.26)

For the purpose of reflection, we look at the tangent plane to a surface at a point– the point where the incident light ray strikes the surface. This allows us toapply the law of reflection to arbitrary differentiable surfaces.

Theorem 3 (Parabolic Reflection) Light rays incident on a reflective parabola,parallel to the axis of symmetry are reflected back through the focus. Conversely,light rays incident on the parabola which travel through the focus, are reflectedfrom the parabola along a line parallel to the symmetry axis.

Problem 15 (Reflective Moments) To prove this theorem we must do the fol-lowing:

1. Simplify Since parabolas are characterised by a universal constant, it suf-fices to prove the reflection property for a simple parabola of the formy = x2 – i.e. f = 1

4.

2. Diagrams Draw the focus F , vertex O and point P = (x0, y0) on theparabola which the light ray hits. Now draw a line PD from P to thepoint D perpendicularly below P , lying on the directrix. Draw the lineFP – this has the same length as PD, via the geometric definition of aparabola.

3. Bisector = Tangent Draw a point M as the mid-point of the line con-necting F and D. Then, using the law of reflection and some congruenttriangles, you should be able to show that MF bisects the angle FPD –in particular, MF is perpendicular to FD. Now locate the x coordinateof the point M – you should be able to prove (again using the geometricdefinition of the parabola), that x = 1

2x0 – i.e. the midpoint of the line

OD.


Now use calculus to calculate the slope of the tangent to the parabola atthe point of light intersection, P . Prove that the slope of the bisector MPis equal to the slope of the tangent at P – hence identifying the bisector asthe tangent to parabola at P .

4. Fin At this point, the theorem has been proved. Do the necessary trigonom-etry to and ray diagrams to see why this is so (unless it’s already obviousto you). If you’re still stuck, as your tutor to draw the diagrams for you!

So far, we have demonstrated that (reflective) parabolas have the unique prop-erty of reflecting light rays which are parallel to their symmetry axis, through thefocus of the parabola and vice-versa. Therefore, for many practical applications– where a single focal point is required, parabolic lenses are the ideal lens. Inreality however, it is hard to make perfectly parabolic lenses so spherical or ‘cir-cular’ lenses are used instead. The idealised performance of such lenses dependson the ratio of the tangential length L of the lens, to the radius of curvature R oflens. In particular, for the lens to ‘behave like a parabola’, its length L must bemuch smaller than the radius of curvature and hence the focal length f (notingthat R = 2f ). The deviation or ‘error’ arising from this parabolic approxima-tion is the essence of ‘spherical aberration’ – that is, the blurring and loss ofresolution of images formed by the lens.

Problem 16 (Parabolic Approximation and Spherical Abberation in Lenses)To quantify the previous statements, we shall now investigate spherical aberra-tion mathematically. Let yp define a segment of a parabola – i.e. an ideal lens,with focal length f and tangential length L. Now let ys define the lower segmentof a semi-circle whose center lies a distance R = 2f directly above the vertex ofthe parabola – this represents a circular lens. Therefore, we have

yp =x2

4f

ys =R−Rc

1− (x

R)2, −L ≤ x ≤ L

∆y :=ys − yp, (3.27)

where ∆y is the difference between the y coordinate of the lower semi-circle,ys, and the parabola, yp. Approximating a circular lens – a lower semi-circle,by a parabola whose vertex (0, 0) coincides with the edge of the semi-circle andwhose focal length f = 1

2is half the radius of curvatureR of the circle, we get an

error ∆y which grows the further away we are from the vertex of the parabola.

1. Taylor Expanding the Semi-Circle Using a Taylor expansion about zero,in the variable z := x

R, show that we can write semi-circle equation as:

ys =∞∑k=1

(−1)k

12

k

(x

R)2k =

1

4(x

R)4 +

1

8(x

R)4 +

1

16(X

R)6 + ... (3.28)

Hint: You can use binomial theorem instead. In particular, this says thatfor any real constant α and variable z with |z|< 1:

(1 + z)α =∞∑n=0

α

n

zn, (3.29)

where the binomial coefficients are defined by:α

n

=

α!

n! (α− n)!. (3.30)


When α is non-integer, the binomial coefficients are generalized by the‘Gamma funtion’ Γ – or equivalently, for real-valued α, the ‘Pochammer’symbol’ (α)(n) := α(α− 1)...(α− n+ 1). In particular, we have:

α

n

:=

Γ(α + 1)

Γ(n+ 1)Γ(α− n)=α(α− 1)...(α− n+ 1)

n!. (3.31)

Note that for integer n, Γ(n+ 1) = n!.

2. A Parabola: To be or not to be Using the series expansion of the cir-cle equation, ys, show that the error in the parabolic approximation for aspherical(circular) lens, is given by:

∆y =R−Rc

1− (x

R)2 − x2

2R

=∞∑k=2

(−1)k

12

k

(x

R)2k

=1

8(x

R)4 +

1

16(x

R)6 + ... (3.32)

This shows that the ‘spherical abberration’ that occurs in the parabolicapproximation of a circular lens, is of the order O(( x

R)4) – where x is the

distance from the vertex of the lens in the direction parallel to the directrix– i.e. perpendicular to the symmetry axis of the parabola. In particular,the maximum error we have is:

Max[∆y] =∞∑k=2

(−1)k

12

k

(L

R)2k = O((

L

R)4), (3.33)

where xmax = L is the length of the lens measured by a line tangent tothe vertex of the lens. In our case, this is the length of the line tangent tothe parabola at (0, 0) when we are trying to approximate the parabola nearits vertex by a circular arc. Thus, one way to keep the spherical aberrationsmall is make the radius of curvature R of the lens large with respect tothe length L of the lens.

We shall now apply the last result to obtain a differential error estimate whichquantifies how the spherical-aberration of a lens (‘non-parabolicness’) generatesa ‘fuzziness’ or spread in the in focus. In particular, instead of the focus being asingle (ideal) point, it now becomes a small line segment – physically leading toa blurriness of images formed by the lens.

Problem 17 (Losing Focus!) Thanks to Emperor Constantine’s ‘plasma inter-vention cannon’, the weeping angels are now a thing of the past. However, the‘past’ is relative! This means that the weeping angels still lurk in one of manyuniverses. Not to fear, ‘The Doctor’ 2 decides it is time to return to the future– leaving Pachelbel’s (musical) cannon unspoiled by The Doctor’s attemptedSkrillex additions.

To travel to the future, The Doctor needs to fire up his ‘Alcubierre’ warp drive– this will allow him to generate a faster-than-light warp-bubble which he cantravel through spacetime with. However, during the angel attack, one of hissynchronising lasers was damaged. In order to fix the laser, he must ground a

2Thanks ‘The’ Dr. Ashleigh Punch for noting that ‘Dr. Who’ should always be referred toas ‘The Doctor’. Good luck to her when she finally meets, marries him and has time-travellingbabies.


new optical lens – such that its focal point, F = (0, f), shifts by a maximum of1 micrometer: ∆f = 1µm = 10−6m under the effect of spherical aberration.Assuming he needs a lens of length L = 1mm = 10−3m, we can help TheDoctor, as follows.

1. Mathematical Constructions By re-writing the standard parabola equa-tion, we can express the focal length f as a function of the (x, y) coordi-nates:

f =x2

4y. (3.34)

Now, we note that a ‘linear approximation’ to the error in the focal length,is given by the ‘total differential’, df . In particular, by viewing f = f(x, y)as a function of two-variables x and y, show that its total differential (ex-terior derivative) is given by:

df =x

2ydx− x2

4ydy. (3.35)

Hint: Recall that the total differential of a function f(x, y) of two variablesis given by:

df(x, y) :=∂f

∂xdx+

∂f

∂ydy. (3.36)

2. Physical Estimates Now, we replacing the differential df , dx and dy bytheir finite counterparts: ∆f , ∆x, ∆y – i.e. the ‘error’ in f ,x and y, weget:

∆f =x

2y∆x− x2

4y∆y. (3.37)

To get the maximum error in the focal length however, we need to considerthe magnitude of error contributions from ∆x and ∆y, hence we re-define∆f as:

∆f :=| x2y||∆x|+|x

2

4y||∆y|. (3.38)

Therefore, the error ∆f is maximized when ∆x and ∆y are maximized(for a fixed coordinate (x, y)).

Physically, we set ∆x = 0 since there is no ‘error’ in the x-coordinate ofour lens – the tangent to circle aligns with the tangent to parabola vertex.Now, we recall from the last problem that the maximum error in our ycoordinate is given when x takes its maximum value x = L – i.e. the‘spherical aberration’ is maximized at the edges of the lens (away fromthe vertex):

Max[∆y] =∞∑k=2

(−1)k

12

k

(L

R)2k = O((

L

R)4), (3.39)

where R = 2f is the ‘radius of curvature’ of the lens (the radius of thecircle).

Using x = L, y = L2

4f, ∆x = 0 and the maximum value for ∆y, show that:

Max[∆f ] = fMax[|∆y|] = f |∞∑k=2

(−1)k

12

k

(L

R)2k|= O((

L

R)4).

(3.40)


3. Experimental Solution Ignoring higher-order contributions to the error inf , we have:

∆f ≈ 1

8(L

R)4 =

1

16

L4

R3. (3.41)

Show this by taking the first term in the binomial expansion above.

With this leading-order estimate for ∆f , we want ∆f ≤ 10−6m to achievethe accuracy desired for The Doctor’s laser. For the given lens lengthL = 0.001m, calculate the minimum radius of curvature Rmin for the lensrequired to achieve the accuracy: ∆f ≤ 10−6m.

4. Checking Validity By taking the next term in the binomial expansion, wecan compute the next order contribution to the error in f :

f1

16(L

R)6 =

1

32

L6

R5. (3.42)

Rather than adding this to the error ∆f , we can instead use the value ofRmin we calculated (which gave ∆f = 10−6m) to estimate the relativemagnitude of the leading error term: 1

16L4

R3 and the next correct term , 132

L6

R5 .Compute the ratio of these error terms and argue whether or not it wasjustified to ignore the next correction term when calculating an approxi-mation for Rmin. For example, if the ratio is less than 0.01 (or 1%), wecan justify ignoring the correction term.

3.3. ELLIPSES AND PLANETARY / ATOMIC ORBITS 43

3.3 Ellipses and Planetary / Atomic Orbits

3.3.1 Overview

In this session we shall review one of the great conic sections from antiquity –the ellipse! Ellipses have played an important role in the scientific and culturalhistory of human society – perhaps most controversially3 as proof (through Jo-hanne’s Kepler’s laws of planetary motion) that the earth and other planets, orbitthe Sun along elliptical trajectories. Here, we shall study the famous ‘two body’problem – that is the orbits of two massive objects interacting with each othergravitationally. After constructing the associated differential equations and con-stants of motion, we shall solve the two body problem to derive Kepler’s lawsof planetary motion – in particular, obtaining elliptical trajectories as bound or-bits.

Apart from planetary or semi-classical atomic orbits, ellipses play a huge rolein modern mathematics and physics. In particular, this includes ellipsoidal har-monic analysis (e.g. MRI scans), elliptical integrals and even in the generalizedsense – elliptic curves used in the proof of Fermat’s Last Theorem. Hence, apartfrom scholarly and cultured reasons to study ellipses, it is prudent for mathemati-cians and scientists to have some working knowledge of their geometry.

3.3.2 The Ellipse

We can think of an ellipse as a ‘stretched’ circle in the sense that a circle is aspecial case of an ellipse – an ellipse that has zero eccentricity. More generally,we can define an ellipse as the planar curve which is generaetd by slicing a conewith a plane non-parallel to the cone’s symmetry axis. Hence, if we slice thecone perpendicular to its symmetry axis, we will get a circle. If we slice the coneat an angle 0 < θ < π

2to the symmetry axis we will get a general ellipse.

See Tutor for Diagrams

A more operationally useful definition, is the following geometric definition.

Definition 9 (Geometric Ellipse) Given a metric4 d and set M , we choose twofixed points F1 and F2 in M . An ellipse is then defined to be the subset E ofpoints of M , such the sum of the distances from any point p ∈ E on the ellipse,to each of the foci, is a constant:

d(p, F1) + d(p, F2) = 2a. (3.43)

The constant 2a, is the length of the major axis of the ellipse – which is thestraight line segment connecting the foci F1 and F2 to opposing edges of theellipse. The line-segment perpendicular to the major axis and intersecting thecenter of the ellipse, is the ‘minor axis’ – conventionally, we label its length as2b. The distance d(F1, F2) between the foci is defined as d(F1, F2) = 2f , wheref is said to be the focal length of the ellipse.

3Recall that Galileo was persecuted by the church for proposing a heliocentric model of thesolar system.

4Recall that a metric is a means of measuring (or defining) distances.


Exercise 9 (Not so eccentric) Using the above geometric definition of an el-lipse, prove that when the foci F1 and F2 are located at the same point – i.e.F1 = F2 = F , that such an ellipse is simply a circle.

Hint: Recall that a circle B(r;C) with center C and radius r is defined to be setof all points at a distance r from the central point C:

d(p, C) = r ∀p ∈ B(r;C). (3.44)

The previous definitions, although powerful, are somewhat abstract. Some ofyou will be more familiar with the ‘cartesian form’ form for a Euclidean ellipse– an ellipse with the Pythagoras measure of distance 5. In the next problem, weshall derive this ‘standard’ ellipse equation.

Problem 18 (A Canonical Western) Having destroyed the weeping angels, backin his home spacetime neighbourhood the cyborg Emperor Constantine decidesit is time to relax. In particular, he feels like watching an old ‘western’ stylemovie – to his surprise, it turns out that his excursions with The Doctor haveremoved Clint Eastwood from history! Shocked, the Emperor decides to travelback in time to the wild west, taking his plasma cannon with him.

Stepping into Ye Old Town, Southern Mississippi, he comes across a rowdy coun-try girl, ‘Big A. Geller’. Noticing his large cannon and thinking herself NumeroUno as the county sheriff, she challenges the cyborg to a game of cards. Notwanting to be beaten by her classic parlour tricks, the cyborg ups the challengeand calls a duel. The rules of duelling in Ye Old Town are such that each con-testant must stand at a fixed location. The crowd then must stand such that thesum of a spectator’s distance from each of the contestants is equal a fixed value– ‘the duelling constant 2a’, which is chosen prior to the duel by the duellingmaster.

You – the duelling master and member of Ye Old Town, are asked to provide theequation and draw the curve on which the crowd must stand during the duel.

1. Symmetry is your friend Because of the translational and rotational in-variance of Euclidean space, it suffices to consider an ellipse whose centeris at the origin (0, 0) of some cartesian coordinate system (translationalsymmetry). Furthermore, we may choose the foci F1 and F2 to lie on thex axis, coinciding with the major axis of the ellipse (rotational symmetry).Such an ellipse is now in ‘canonical form’. Therefore, we have:

F1 = (−f, 0) and F2 = (f, 0), (3.45)

where f is the focal length of the ellipse.

Convince yourself of these arguments above. For example, consider anellipse is characterised by its focal length and the length of its major axis– or equivalently, its eccentricity and focal length or its major and minoraxes lengths. Now consider what happens to these parameters when yourotate or translate the ellipse.

2. Algebra Let the point P = (x, y) be an arbitrary point on the ellipse. Thegeometric definition of an ellipse says that each point must have a sum ofdistances to the foci, which is constant – equal to the major axis length.

5Recall, this means that d((x1, y1), (x2, y2)) =a

(x1 − x2)2 + (y1 − y2)2 – the length ofthe hypotenuse of the triangle with sides of length |x1 − x2| and |y1 − y2|.


Hence, we have:

d(P, F1) + d(P, F2) =‖P− F1‖+‖P− F2‖=a

(x+ f)2 + (y − 0)2 +a

(x− f)2 + (y − 0)2

=a

(x+ f)2 + y2 +a

(x− f)2 + y2. (3.46)

Using the fact that d(P, F1)+d(P, F2) = 2a for all points P on the ellipse,simplify the resulting equation:a

(x+ f)2 + y2 +a

(x− f)2 + y2 = 2a (3.47)

to the canonical form:x2

a2+y2

b2= 1, (3.48)

where b =aa2 − f 2 = a

a(1− ε2) is the semi-minor axis length and

ε = fa

is the eccentricity of the ellipse.

Hint: To get rid of square roots, take one square root to the other sideof the equation, square both sides and then simplfy. Using the simplifiedequation, get rid of the remaining square root by moving all other terms tothe other side of the equation and squaring both sides again.

3. Interpretation The eccentricity ε = fa

of the ellipse – ratio of its focallength to semi-major axis length, controls how ‘stretched’ the ellipse is inthe x and y directions. To get an understanding of this parameter, it helpsto draw a few different ellipses corresponding to different eccentricities.

i)Draw an ellipse with eccentricty ε = 0. What curve is this?

ii) Now draw ellipses with eccentricity ε = 0.25 and ε = 0.8. What doyou notice?

iii) Try and sketch an ellipse with ε = 0.99. What happens as ε → 1−? Ifyou recall the previous section, what special curve is obtained in the limitε = 1?

Hints: To sketch these curves, you must first fix some value for the focallength f – or equivalently, the semi-major axis length a. For simplicity,set a = 1 to obtain the resulting sketches.

Devil in the Details Using the relations given between the semi-major axislength a and semi-minor axis length b, show that for all ellipses:

a ≥ b. (3.49)

Note that when we derived our ellipse equation, we assumed that the major axisand x axis coincided. What would happen to the equation if the major axis wasinstead along the y axis?

More generally, what can you say about the denominators of x2 and y2 appearingin your ellipse equation and the location of the major axis of the ellipse?

Exercise 10 (Dat Metric) Referring back to the previous section on parabolas,derive the cartesian equation for a ‘taxicab ellipse’. That is, derive an equa-tion for an ellipse with major axis length 2a using the ‘taxicab’ (l1) measure ofdistance.

For simplicity, you may assume the foci are located at (−f, 0) and (f, 0).


3.3.3 Parametric Form

Now that we have studied Euclidean ellipses in cartesian form, it is prudent tostudy the ellipse in ‘parametric form’. In particular, we shall study the ellipse in‘polar coordinates’ parameterised by the angle θ between positive x axis and theposition vector r = (x, y). This will assist us in solving the two body problem,which involves solving a second-order non-homogenous differential equation inpolar coordinates, for the planetary orbits given as trajectories (solutions to theDE) in polar coordinates. Alternatively, the Laplace-Runge-Lens vector may beused to solve the two-body problem (much simpler!), however this still requiresidentifying the ellipse in polar coordinates.

Recall that polar coordinates (r, θ) are related to cartesian corodinates (x, y) bythe following equations:

x =r cos(θ) y = r sin(θ), (3.50)

whence r ∈ [0,∞) and θ ∈ [0, 2π) (measured counter-clockwise).

To derive the polar form of the ellipse, parameterised by the polar angle θ, weshall first construct a form of the ellipse with an arbitrary parameter s – then usesome geometry to convert relate this parametrisation to the polar one. This isachieved in the following problem.

Problem 19 (Art thou 580nm?) The ellipse has been formed, the dust settlesand the crowd begins to go quiet. Suddenly, the duel is interrupted by the ap-pearance of a wild, green titanoboa6 from a pre-historic era! It seems like thecyborg forgot to turn off his time machine... Well well well, rowdy sherrify, BigA. Geller. At this moment, the cyborg and sherrif agree to put their differencesaside to take down the gigantic serpent.

Upon taking position behind Ye Olde Tavern, the townspeople notice that BigA. Geller is starting to panick. Rousing her to action, they implore – “R’ youyella?!" Being wildlings of the wild west, the crowd enjoy the spectacle andforms a moving ellipse with the titanoboa at one foci and the sherrif and cyborgat the other foci. In this moment, the sherrif notices a large wolf running theperimeter of the ellipse. In order to estimate the time before the wolf attacksthe serpent (providing a perfect distraction to fire his 500nm laser), the cyborgneeds to know both the arc-length of ellipse segment and the angular velocity ofthe wolf – a calculation most easily performed in polar coordinates.

Help the people of Ye Olde Town by deriving the polar coordinate representationof an ellipse.

1. Arbitrary Parameter t Using an abitrary parameter t ∈ [0,∞) we canparametrise a standard ellipse (major axis along the x-axis) as the follow-ing curve:

γ(t) := (x(t), y(t)) = (a cos(t), b sin(t)), (3.51)

where a and b are semi-major and semi-minor axes lengths of the ellipse.Note that this parameter t is not the same as the polar angle – i.e. the anglebetween the position vector and the x-axis!

Prove that the above parametrisation defines the standard ellipse.

6An extinct species of snake from the Paleocene epoch (≈ 60 million years ago). These werethe largest snakes to ever exist – up to 12.8m long and ∼ 1, 100kg heavy!


Hint: Show that the equation:

x2

a2+y2

b2= 1, (3.52)

is satisfied.

2. Polar Angle Parameter θ Now, switching to polar coordinates (r, θ), usetrigonometry and/or algebra to show that

tan(θ) = − b sin(t)

a cos(t). (3.53)

Hence, we have: tan2(θ) = b2

a2cos2(t).

Using the previous relations and the fact that radial coordinate is given by

r =ax2 + y2 =

ba2 cos2(t) + b2 sin2(t), (3.54)

prove that we have:

r =aba

a2 sin2(θ) + b2 cos2(θ). (3.55)

Hence we can view r = r(θ) as the paramteric equation for an ellipseparametrised by the polar-angle θ.

Using some trigonometry and the relationship between eccentricity ε andthe semi-axes lengths a, b, simplify our polar equation to the ‘canonical’form:

r(θ) =ba

1− ε2 cos2(θ). (3.56)

3. Understanding Plot the points γ(0), γ(π) and γ(π2, γ(3π

2). What is special

about these points?

Now, find the coordinates of the foci F1 and F2 in polar coordinates. Notethat the cartesian coordinates for these points are (−f, 0) and (0, f).

4. Translation of center (challenge) Note that in the above derivation, wehave constructed an ellipse whose center is at the origin (0, 0). As itturns out, for celestial mechanics and other problems, a much more usefulparametrisation is when we let one foci of the ellipse coincide with theorigin (0, 0). This corresponds to translating the ellipse along the majoraxis by a distance ±f (depending on which focus is now at the origin).

By using a slight modifcation of our construction, show that the canon-ical equation for an ellipse in polar coordinates (r, θ) whose center is at(r, θ) = (f, 0) or (f, π), is given parametrically by:

r =c

1± ε cos(θ). (3.57)

wherec = a(1− ε2). (3.58)

Hence, using the relations between c, ε and a, b, show that:

a =c

1− ε2, b =

c?1− ε2

(3.59)


5. Rotation of major axis (challenge) Now, our final challenge. Prove thatwhen rotate our ellipse so that one foci lies at the origin and the other nowlies on a line whose (fixed) polar angle θ = θ0, that our canonical equationtakes the following form:

r =c

1− ε cos(θ − θ0). (3.60)

In particular, this is the equation for an ellipse with one focus at the origin(r, θ) = (0, 0), center at (r, θ) = (f, 0) and second focus at (r, θ) =(2f, θ0).

6. Easy Stretches Using the formula for an ellipse with one focus at theorigin (r, θ) = (0, 0), center at (r, θ) = (f, 0) and second focus at (r, θ) =(2f, θ0), show that the radial coordinate of the ellipse has the followingextremal values:

rmin =c

1 + ε, rmax =

c

1− ε. (3.61)

In terms of celestial mechanics, with one massive body (such as the Sun)at one focus and another massive body (such as the earth) at second focus,rmin and rmax represent the distance between the bodies at perihelion(θ = θ0) and aphelion (θ − θ0 = π), respectively. When we are dealinglunar bodies orbiting a planet, these minima and maxima are the ‘perigee’and ‘apogee’.

Wolf Velocity Let us consider now, that our polar angle θ of the wolf cicrum-navigating the ellipse, is a function of time t. We can express this by writingθ = θ(t).

Using the chain-rule and the formula for an ellipse with one focus at the origin(we can choose θ0 = 0) compute the radial velocity dr

dtof the wolf in terms of

the angular velocity drdθ

.dr

dt= (

dr

dθ)dθ

dt(3.62)

Therefore, we can get the actual velocity of the wolf – that is, the time-rate ofchange of its position vector:

d

dtr =

dx

dte1 +

dy

dte2. (3.63)

One can do this either by switching back to cartesian coordinates, or noting thatthe polar coordinate basis vectors will vary with the paramter t – hence we needto differentiate these too! The latter, we shall illustrate next week when solvingthe two body problem.

Arc Length and time taken Note that it is a significant challenge to computethe arc-length of a segment of an ellipse. Although there are many expressions(elliptical integrals or ‘Jacobi functions’), these are all very non-trivial! How-ever, since the cyborg emperor in our story can measure the angular velocity dθ

dt,

we can compute the it takes for the wolf to reach a point on the ellipse where aray from the polar-coordinate origin intersects the serpent. All we need are theinitial polar angles of the wolf, the ray on which the serpent lies and the angularvelocity as a function of time.

Using the above argument, derive a simple expression for the time it takes thewolf to travel from θstart to θtitanoboa.

That’s all for this week!

3.4. THE TWO BODY PROBLEM AND PLANETARY ORBITS (EASTER SKETCH)49

3.4 The Two Body Problem and Planetary Orbits(Easter Sketch)

3.4.1 History and Cultural Impact

The two body problem, is arguably one of the most profound and influentialproblems in Western science since antiquity. As such, we have the followingkey players in this story:

• Apollonius of Perga – 262BC to 190 BC.

• Aristachus of Samos – 3rd Century BC. Heliocentrism.

• Copernicus – 1473 to 1543. Heliocentric theory.

• Tycho Brahe – 1546 to 1601. Data collection.

• Johannes Kepler – 1571 to 1630. Kepler laws of planetary motion fromBrahe’s measurements.

• Galileo Galilei – 1564 to 1642. Gravitational acceleration. Constant mo-tion, preliminary notions of inertia.

• Isaac Newton – 1652 to 1726. Principia Mathematica, solution to two-body problem.

In the days of the early astronomy of the ancient Greeks, it was widely believedthat the earth was the center of the universe – with the celestial bodies of theheavens revolving about it. To this extent, Ptolemy created a way to track themotion of the heavens that fitted this geocentric viewpoint. In contest however,was Aristachus of Samos, who prosed a heliocentric model – one in which theSun was the centre of the observable universe. This viewpoint – which we nowknow to be accurate for our solar system, was strongly opposed and did notresurface till the early Renaissance era.

In the transition between the medieval and renaissance periods, Copernicus pro-posed a heliocentric model of the solar system. This was radical in the sensethat it ran against the conventional philosophies and paradigms of the time – inparticular, the Vatican church. After Copernicus’ death, Tycho Brahe performedmany astronomical observations (later aided with the new improvements madeto telescopes by Galileo). Using the extensive astronomical data he gathered,Brahe developed several laws of planetary motion to match his observations.These were subsequently revised by Kepler, who demonstrated that the planetsorbited the sun along trajectories described by geometries of the ancient Greeks– in particular, the ‘conic sections’ developed by Apollonius of Perga. AlthoughKepler’s laws of planetary motion were in strong agreement with Brahe’s data,the physical mechanism for producing the elliptical orbits of the planets (or hy-perbolic orbits of comets) had not yet been demonstrated.

The problem of finding a physical mechanism (potential or force law) to pro-duce the Kepler orbits, when considering two interacting bodies, became knownas the ‘two body problem’7. This problem caught the attention of none other thanthe father of modern mathematics – Sir Isaac Newton. Inspired by this problem,Newton reinvestigated the geometry of antiquity – proving many theorems andlemmas regarding conic sections. Newton used these results to solve the two

7Note that today, we use this terminology for the converse problem – “given two bodiesinteracting under some potential or force law, what trajectories will their motion follow?"


body problem. In particular, Newton showed that the Kepler orbits arise whentwo massive bodies interact via an attractive ‘inverse square’ force law – New-ton’s ‘universal law of gravitation’. However, Newton did not stop here. He theninvented calculus and demonstrated his new mathematics by using it to againprove that his force law gave rise to the Kepler orbits – along with many otherresults.

Newton’s work on the two body problem – and his demonstration of calculusin its solution, formed the basis for his Magnus Opus – the ‘Principia Mathe-matica’, which is without doubt, one of the most important texts in human his-tory. At this point, with combined astronomical data and a powerful mathemati-cal demonstration of a heliocentric solar system governed by an experimentallytestable force law (Newtonian gravity), the two body problem had conclusivelybeen solved. Well, not quite...

Since Newton, many other influential mathematicians and physicists have stud-ied the two body problem, providing alternative proofs and solutions. The twomost notable methods are that of variational calculus – Lagrangian and Hamil-tonian mechanics, as well as more abstract methods pertaining to symmetriesand constants of motion – the ‘Laplace-Runge-Lens’ vector. Such methods havesubsequently formed the basis for modern physics – in particular, quantum me-chanics, general relativity and (relativistic) quantum field theories.

3.4.2 Inverse Square Law and Central Potentials

Here we shall derive the Kepler orbits as the solutions to the motion of two mas-sive bodies acting on each other via the gravitational force, provided by New-ton’s Universal Law of Gravitation8 In particular, we have two massive bodieswith masses m1 and m2 acting on each other as follows:

F12(r1, r2) :=Gm1m2

‖r1 − r2‖2r12

F21 =− F12 . Newton’s 3rd Law (3.64)

Here Fij is a vector function of two vectors in R3 that is, Fij maps the dis-placement vectors ri and rj of two masses of mass mi and mj , respectively, toa force vector Fij(ri, rj) whose direction is from mass mi to mj – i.e. parallelto (ri − rj). Therefore, using Newton’s 2nd Law we get the following coupledsecond-order differential equations:

F12 =m1:r1

F21 =m2:r2, (3.65)

which we want to solve for the trajectories (integral curves) traced out by thedisplacement vectors r1 and r2, of the massive bodies.

Exercise 11 ([Art 101) Sketch the above force laws with a vector diagram forthe masses, displacement vectors and force vectors.

As a bonus, sketch what would happen to the force vectors if we replaced themasses with electric charges and the force law with Coulomb’s law. Considerthe following cases: two positive charges, two negative charges and oppositelycharged particles.

8Historically, the two body problem was posed in reverse – that is, given the Kepler orbits,find a potential (or force law) that gives rise to these trajectories.


Problem 20 (Central Potential) Conservative force fields have the property thatthe work done in moving an object under a conservative force, is independent ofthe path taken. Furthermore, such forces can be derived as the exterior deriva-tive (or gradient) of some scalar potential:

F = −∇U. (3.66)

Equivalently 9,F [ = −dU. (3.67)

In this problem, consider a so-called ‘central potential’:

U(r1, r2) = − Gm1m2

‖r1 − r2‖. (3.68)

Such a potential has the property that it is ‘spherically symmetric’ – meaning itis invariant under rotations. Furthermore, it is invariant under translations. Tosee this explicitly, note that U does not depend on the directions of r1 or r2, butonly on the magnitude of their difference, ‖r1 − r2‖ (which is invariant undersaid isometries10). Hence, we can write:

U(r1, r2) = U(‖r1 − r2‖). (3.69)

Such a potential is said to be a ‘central potential’. This is because if we let onemass coincide with the origin, then U only depends on the distance r = ‖r1−r2‖from the origin. In particular, the U = constant surfaces are spheres.

Prove that Newton’s Universal Law of Gravitation tells us that gravity is a con-servative force field. In particular, show that:

F12 = −∇U(‖r1 − r2‖). (3.70)

Path independence then follows from noting that the gradient vector field is dualto the exterior derivative – from which one can apply the generalized Stokes’theorem (or ‘Fundamental theorem of calculus’):∫

p1p2dF =

∫∂Path(p1,p2)

F = F(p1)− F(p2). (3.71)

Although this is sufficient, prove path independence by showing that:

Curl(F12) := ∇× F12 = 0. (3.72)

Path independence then follows explicitly from the classical Stokes theorem.

3.4.3 Symmetries and Jacobi Coordinates

Clearly, solving the differential equations for the displacement vectors of eachmass is a rather primitive and inefficient brute-force approach. However, aswe know with most problems involving the momenta of more than one mas-sive body, they are drastically simplified by switching to the ‘center of mass’(CM) coordinates. In particular, we make the following simplifying approxima-tions:

9The musical ‘flat’ superscript is denote the differential 1-form (‘covector’ or ‘dual vector’)corresponding to the force vector F. This correspondence is provided by the Euclidean metric.For practical purposes, you can consider dU as the ‘total differential’ from first-year calculus.

10Recall that isometries are symmetries that leave the metric unchanged – i.e. lengths andrelative angles between vectors.


1. The bodies are spherical and can therefore be dynamically treated as point-particles of an equivalent mass, located at their centres of mass.

2. There are no external net forces – that is, the bodies are only interactingvia the inverse-square force law (gravity or electromagnetism).

We define the following vector and scalar variables:

M =m1 +m2

r :=r1 − r2

R :=m1r1 +m2r2

Mpj :=mjrj j=1,2 – no sum

P :=M 9R = p1 + p2. (3.73)

These are the Jacobi variables for the center of mass frame. In particular, r is the‘relative displacement’ (of the two bodies), R is the center-of-mass displacementand P is the center-of-mass momentum.

Exercise 12 Sketch a vector diagram to illustrate the relation between the CM(center-of-mass) coordinates (r,R) and the original coordinates (r1, r2). Forthis sketch, investigate the difference when one body is much less massive thanthe other m1 << m2, (approximately) equally massive m1 ≈ m2 or much moremassive: m1 >> tm2.

This may illustrate, for example, the earth-Sun and Earth – 3D-Printed Earthscenarios.

Exercise 13 By employing Newton’s 3rd Law, prove that the center of mass ex-periences no acceleration (in the absence of external forces):

:R = 0. (3.74)

Show that in the presence of external forces, Newton’s Second Law implies that:

Fext = M 9R = M :R. (3.75)

Therefore, the CM moves as if it were just single particle of mass M subjectedto some net external force on the system. This justifies our ability to representour extended bodies as point particles, in the sense that their trajectories can berepresented by their CM trajectories.

This shows that the velocity V = 9R of the center of mass, is a constant vector.Therefore, it follows that the total momentum P = MV is also constant (i.e.momentum conservation).

Since the phase space for 2-body system (point particles) is 12 dimensional (3momenta and 3 position coordinates for each particle/body), it follows that thetrajectory R(t) of the center of mass can be uniquely determined from knowl-edge of the initial displacement and velocity vectors of the masses.

We now note, that a consequence of our initial assumptions, is that the two-bodyproblem can be reduced to an equivalent 1-dimensional motion. To see this, wenote that the total angular momentum L of our system is constant (conserved) –in reality, due to the oblateness of the earth and other inhomogeneities, the totalangular momentum varies slightly. In particular, L precesses.

To see this, we first establish the following results.


Problem 21 Recall that the angular momentum of a particle of mass m, dis-placement vector r and linear momentum p, with respect to some origin, is givenby

l = r× p. (3.76)

Using this definition, the total angular momentum of our system is given by:

L = r1 × p1 + r2 × p2 = r1 ×m1v1 + r2 ×m2v2. (3.77)

Now define r′j to be the displacement of mass mj relative to the center of mass.Note that j without the prime is just the displacement vector of mj relative toour original origin, (0, 0, 0). Given these definitions, show that:

L = R×M 9R + R×∑j

mj 9r′j +

∑j

mjr′j × 9R

+∑j

r′j ×mj 9r′j. (3.78)

Simplify this by showing that: ∑j

mjr′j = 0. (3.79)

Hint: Identify this as the derivative of an expression for the position of the CMrelative to the CM. Hence show that:

L = R×P +∑j

r′j ×mj 9r′j. (3.80)

It follows that:

L = LCM motion + Lmotion relative to CM := Lorbital + Lspin. (3.81)

Now, apart from the precession of the equinoxes 11), to good a approximation itholds that for the earth-Sun gravitational interaction:

9Lspin = 9L− 9Lorb

=∑j

r′j × Fextj

=Γextabout CM

≈0, (3.82)

where9Lorbital = 9R×P + R× 9P = 0 + R× Fext. (3.83)

A useful quantity to work with is the reduced mass:

µ :=m1m2

M=

m1m2

m1 +m2

. (3.84)

To see that the motion of the two bodies with respect to each other lies in a2D plane, it suffices to show that the angular momentum vector is constant. Inparticular, this is because the angular momentum vector is perpendicular to boththe momentum (velocity) P and displacement vector 9r – since the trajectory ofthe motion is described by these vectors it follows that they are confined to a

11In astronomy, distant stars provide a roughly ‘fixed’ reference frame which we may domeasurements with respect to. In particular, ‘the precession of the equinoxes’ refers to the 50arcseconds per year rotation of the earth’s axis relative to the ‘fixed’ stars.


plane orthogonal to the angular momentum vector, L. To see that the angularmomentum vector is constant, note that it is given by:

L =L1 + L2

=r1 × p1 + r2 × p2

=m1r1 × 9r1 +m2r2 × 9r2, (3.85)

with linear momenta defined as usual by pj = mj 9rj and j = 1, 2. Now, the nettorque on our system is given by the rate of change of the angular momentum ofthe system:

Γ =d

dtL

=d

dtpr1 × p1 + r2 × p2q

=0 +m1r1 × :r1 +m2r2 × :r2 + 0. (3.86)

In the center-of-mass frame, we take the center of mass to be the origin – hencein this frame, R = 0. Therefore:

0 =m1

Mr1 +

m2

Mr2. (3.87)

Exercise 14 (Mass Reduction Strategies) In her effort to reduce her mass, Sarahdecides to use the following ‘reduced mass’ formula:

µ :=m1m2

M=

m1m2

m1 +m2

(3.88)

to simplify her exercise routine. One of her exercises (set by Dogburn), is toprove that we can simplify the angular momentum expression to:

L = r× µ 9r, (3.89)

where r := (r1 − r2) is the relative displacement vector, as before.

Help Sarah by deriving the above expression for the total angular momentum interms of the reduced mass µ and relative displacement r.

Hint: Work in the center-of-mass frame and use the relation

0 =m1

Mr1 +

m2

Mr2 (3.90)

along with the earlier expression for L in terms of r1 and r2.

Using the reduced expression for the total angular momentum in the CM frame,we see that the torque on the system vanishes in the absence of external forces:

d

dtL = 9r× µ 9r + r× µ:r

=0 + r× Fgrav

=0. (3.91)

It follows that the angular momentum L is constant and hence the motion of thebodies occurs in a plane with L as its normal vector.


Exercise 15 (CERN Confirms Existence of The Force) On April 1, 2015, theEuropean Organization for Nuclear Research (CERN), confirmed the existenceof ‘the force’. To verify their claim12, prove that Newton’s universal gravitationalforce law takes the following form in reduced-mass coordinates:

Fgrav =GMµ

r2r, (3.92)

where r = ‖r‖ is the magnitude of the relative displacement vector and r := 1rr

is a unit vector in the direction of r. Since r points from mass 2 towards mass 1,the force written above is that experienced by mass 1.

Now, re-express Newton’s Second Law for mass 1,

F = 9p1 = m1:r1, (3.93)

in terms of the relative displacement vector, r.

Hint: Work in the CM frame as before when we proved the angular momentumvector was constant.

Our overall equation of motion for our center of mass R is given by the differ-ential equation:

M :R = 0. (3.94)

This just tells us that the center of mass moves with constant linear velocity –i.e. we are working in a translating, non-accelerating frame. Recall that to solvethe two body problem, we were required to derive the trajectories (r1, r2), orequivalently (R, r) with respect to some parameter (e.g. time). Since the abovedifferential equation implies that R(t) = R0 for all time, it remains to solvefor the trajectory, r(t), of the relative displacement vector. To this extent, wecombine Newton’s Law of Universal Gravitation with Newton’s Second Law, toobtain the following vector differential equation (a system of scalar differentialequations):

−GMµ

r2r =µ:r , (3.95)

which simplifies:

:r = −GMr2

r. (3.96)

Since we have proved that the motion occurs in a 2D plane, we are workingin a 2-dimensional vectors – which is necessarily by spanned by two linearly-independent vectors. Now, we know that the above vectors appearing in thedifferential equation are equal iff their components (in some basis) are equal –we can use this extract a system of scalar (component) differential equations.Although we could work in Cartesian coordinates, the symmetry of our problemsuggests a far more natural coordinate system – polar (‘circular’) coordinates.However, not only do we need polar coordinates – we need polar coordinatebasis vectors. Therefore, we perform the following change of basis and changeof variables:

r :=∂r = cos(θ)∂x + sin(θ)∂y

θ :=∂θ = − sin(θ)∂x + cos(θ)∂y

x =r cos(θ)

y =r sin(θ), (3.97)

12Disclaimer: April Fools


where r, θ (or ∂r and ∂θ) are unit vectors in the radial and angular (tangential)directions and pdx, ∂y are unit vectors in the x and y coordinate directions – i.e.the standard Cartesian basis vectors13.

Problem 22 (Bipolar) One therapy for bipolar problems, is to derive the change-of-basis relations between a standard (Cartesian) basis and a polar coordinatebasis. To help your friend, complete the following problems.

1. Trigonometric Derivation: Derive the change of basis ∂x, ∂y → ∂r, ∂θ,from Cartesian to Polar (circular) coordinates, by drawing the x, y and r, θcoordinate lines on a plane and using trigonometry. Note that at any pointin the the plane R2, we have two Cartesian basis vectors – these are in-variant in the sense that we can transport them around without changingthem. However, the polar basis is only defined at every point except theorigin (since θ is singular at the origin) – more importantly, the polar basisvectors change for different θ (but do not vary with respect to r).

2. Chain-Rule and Differential Operator Derivation: We can alternativelyuse the fact that the basis vectors for Cartesian and polar coordinates, arein fact the tangent vectors to the cartesian and polar coordinate curves.Differential geometry tells us that tangent vectors correspond to differen-tial operators in the ‘obvious’ way. To see this, show that the followingoperators (partial derivatives) coincide with the polar basis vectors youconstructed geometrically:

∂r =(∂x

∂r)∂x + (

∂y

∂r)∂y

∂θ =(∂x

∂θ)∂x + (

∂y

∂θ)∂y , (3.98)

using the expressions for x and y in polar coordinates.

3. Group-Theoretic Derivation Some of you will recall the 1−1 correspon-dence between linear operators and matrices. In this particular case, wenote that the polar basis is obtained by rotating the Cartesian basis vec-tors through an angle θ anti-clockwise. In particular, recall that the 2-by-2matrix which rotates vectors in R2 in this manner, is given (w.r.t to thestandard basis) by:

R(θ) =

cos θ − sin θ

sin θ cos θ

(3.99)

(3.100)

Hence, show that the following rotation of the Cartesian basis vectors

R(θ)

10

, R(θ)

01

(3.101)

results in the polar basis vectors derived with earlier methods.

Before we can finish expanding our equation of motion (vector differential equa-tion) in the polar basis, we must observe the following critical observation –“Although the cartesian basis vectors do not vary with respect to time t, the

13Note that we use partial derivative notation to suggest the correspondence between tangentvectors and differential operators. In particular, this makes it easy to memorize and derive thechange of basis formulae.


polar basis vectors do. This is because the polar coordinate basis vectors arefunctions of r and θ, which are in-turn, quantities that do vary in time.

Using the chain-rule and the expressions for the polar basis vectors (in terms ofthe Cartesian basis) derived previously, one can derive the following expressionsfor relative displacement, velocity and acceleration:

r =r(cos(θ)∂x + sin(θ)∂y) = r∂r = rr

9r = 9r∂r + r 9θ∂θ = 9rr + r 9θθ

:r =(:r − r 9θ2)r + (r:θ + 2 9r 9θ)θ. (3.102)

Exercise 16 Assert the above statement by filling the details of this derivation.

Hint: Before differentiating each vector, express it in the Cartesian basis ∂x, ∂y(but keep the polar coordinates r, θ), then differentiate. Use the expressions forthe polar coordinate basis vectors to then simplify the resulting vector (expressedin the Cartesian basis) as a linear combination of polar basis vectors.

Therefore, our relative-displacement equation of motion becomes:

(:r − r 9θ2)r + (r:θ + 2 9r 9θ)θ =(−GGr2

)r + 0θ. (3.103)

DOUBLE-CHECK AND SIMPLIFY

We therefore get the two ordinary differential equations:

:r − r 9θ2 =− GM

r2

r:θ + 2 9r 9θ =0. (3.104)

Now note that the magnitude of angular momentum is given by: L := ‖L‖=‖r× µ9r‖= µr2

9θ, hence::θ = −2L 9r

r3µ. (3.105)

Since angular momenta is constant, we can decouple the θ variable from theradial differential equation – in particular, we shall use the substitution

9θ =L

r2µ, (3.106)

to remove the θ variable from the r-component of the vector differential equation.Now, rather than solving for r and θ as functions of time, we can get the orbit(solution the vector differential equation) by parametrising r in terms of the polarangle, θ. To do this, we must make use of the chain-rule:

9r :=dr

dt=dr

dθ

dθ

dt:= r′ 9θ

:r =d 9r

dt= r′′ 9θ2 + r′:θ. (3.107)

Exercise 17 Substitute the expressions for 9r,:r in terms of r′ and r′′ into the dif-ferential equations we have been working with. In particular, use the θ com-ponent of the vector differential equation (involving :θ) to re-write the :θ whichappears, in terms of 9r and r.


Show that after these substitutions, we obtain the following second-order non-homogenous differential equation of a single variable, r :

L2

r4µ

d2r

dθ2− 2

r(dr

dθ)2 − r

= −GM

r2. (3.108)

Hint: Do not eliminate 9θ till the very final step , where you can substitute theexpression:

9θ =L

r2µ. (3.109)

Remark: Note that we have reduced a system of coupled, second order dif-ferential equations into a single differential equation of single variable. Thiswas achievable because we used a constant of motion – the angular momentum(proportional to 9θ) to eliminate some degrees of freedom for the motion of ourobjects. In particular, we reduced the dimensionality of the phase space for theorbit.

Finally, we make the judicious change of variables (to reciprocal radius), r → swith

s :=1

r. (3.110)

To this extent, we make use of the chain rule again:

dr

dθ=−1

s2

d2r

dθ2=

2

s3(ds

dθ)2 − 1

s2

d2

dθ2. (3.111)

Exercise 18 Verify the above identies by using the chain rule yourself. Substitutethe above identities into our simplified differential equation. After some algebra,you should be able to obtain the final form of our original differential equation:

d2s

dθ2+ s =

GMµ2

L2. (3.112)

Note that the term on the right-hand side of this equation is a constant since the(reduced) mass and angular momenta are conserved.

Those of you who have studied ordinary differential equations, should realizethat this is a linear, ordinary second-order differential equation with a non-homogenous term: GMµ2

L2 . Such ODEs can be solved by obtaining their char-acteristic equation, finding the homogenous solution, then adding the particularsolution to the homogenous solution to obatin the general solution.

Exercise 19 Write the homogenous second order ODE corresponding to ournon-homgenous ODE. Sovle the characteristic equation to obtain a homogenoussolution of the form:

sh(θ) = A sin(θ) +B cos(θ), (3.113)

where A,B are constants determined by initial conditions.

Show, using trigonometric identities, that the homogenous solution can be re-written in the form:

sh(θ) = λ cos(θ − θ0), (3.114)


where λ, θ0 are some constants related to A,B (i.e. find the relations).

Now show that we have the following particular solution:

sp =GMµ2

L. (3.115)

Therefore, our general solution is given by:

s(θ) = sp(θ) + sh(θ) = λ cos(θ − θ0) +GMµ2

L. (3.116)

Re-write this solution so that it takes the following form:

s(θ) = C r1 + ε cos(θ − θ0)s . (3.117)

In particular, find the constants C and ε in terms of λ and GMµ2

L.

Since our general trigonometric solution is given by

s =Gµ

L2(1 + ε cos(θ − θ0)), (3.118)

we can choose our coordinates so that θ = 0 when s is maximal (r is minimal),so that θ0 = 0 – this corresponds to rotating our (r, θ) polar coordinate system.Changing s back to our radial coordinates, r = 1

s, we thus have:

r(θ) =L2

Gµ

1

1 + ε cos(θ), (3.119)

where ε is the eccentricity of the orbit (conic section). For 0 < ε < 1 we getelliptical orbits – with two foci. For the special case ε = 0 (which happens onlywhen one mass is infinitely larger than the other), we get circular orbits. For thesun-Earth orbits, we have ε ≈ 0 – so to some approximation, we can treat theorbit as circular.

The bound orbits for the Kepler problem are clearly given by conics sectionswith ε < 1. For ε > 1, we get hyperbolae and for ε = 1 we get parabolic‘orbits’ – these orbits are not ‘bound’ and hence not period. They may describeincoming comets or asteroids as they are gravitationally slingshotted by the sunout of the solar system.

Implications – Kepler’s laws of Planetary Motion

3.4.4 Kepler’s Laws

Corollaries (K laws)

Derive period of Earth’s orbit

3.4.5 Superintegrability and Constants of Motion

Conservation of Energy

Solution 2: Laplace-Runge-Lens vector ...

Casimirs and rotation algebra

Hydrogen two-body problem and solution with Pauli vector.


3.5 Hyperbolae, Comets and Atomic Scattering

Hyperbolae

Hyperbolic Trigonometry

Hyperbolic Metric Spaces (special relativity – hyperbolic reverse Cauchy-Schwarzinequality, light cones and foliation, ) and manifolds (elementary concepts)

3.6 General Relativistic Corrections

No actual ‘GR’ needed to derive/analyse orbit equation. Just state the next-order terms added to newtonian potential for an observer in the SchwarschildSpacetime. From this potential, construct the ‘Newtonian force’ or Lagrangianto obtain the resulting differential equations.

Show how the DE’s can be solved rather simply by a slight modification to thesolution for Newtonian gravity.

Chapter 4

Physics in Non-Inertial Frames

In this series of exploration studies, we will investigate the mathematics of accel-erating reference frames and the isometries of Euclidean space – the rotation andtranslation groups. Such mathematics has been extensively developed through-out the 17th, 18th and 19th Centuries and is elegantly unified in a higher realmof – the theory of Lie Groups / Lie Algebras, Clifford Algebras and DifferentialGeometry. Nonetheless, we shall these topics in their most basic form – oneaccessible to first year tertiary students.

We will apply our mathematical structures to study the dynamics of rigid bod-ies moving in Euclidean space. In particular, we aim to explore and develop aworking knowledge of angular momentum, moments of inertia and rotationalsymmetry operations. Once these foundations are reviewed, we may move ontothe dynamics of objects in linearly and angularly (rotating) accelerating frames.Such dynamics is then applied to understand the ‘Coriolis effect’ – a naturalphenomenon. Our journey ends with a brief study and solution of the ingenious‘Focault’s Pendulum’ – the first device ever built to measure the rotation of theearth about its axis.

4.1 The Lie Group of Rotations: Design a DeathStar

In this section, we investigate how one can apply the theory of the Lie groupsand Lie algebras to the construction and design of an orbital death star 1 – inparticular, an orbital space station equipped with high intensity Bose-Einsteincondensate based gamma-ray LASERS, naval anti-missile lasers, electromag-netic rail guns and nuclear warheads.

When it comes to military technology, the most advanced science often takesplace in the form of weapons targeting, tracking and detection systems – a re-cent example is the huge investment in stealth technology and C.I.A drone re-connaissance by the United States military. This is because target detection andacquisition is paramount – after all, you can’t eliminate something if you can’tdetect it and aim at it. Even master Sun Tzu understood the importance of thiselement of warfare 2. To this extent, we will see how the rotational Lie groups

1For those of you who haven’t seen Star Wars, a death star is a large spherical-ish spaceship,the size of a small moon, equipped with a beam weapon which can destroy entire planets.

2For those of you who need to read more – Sun Tzu’s “Art of War”. The Giles translation isrecommended.

61

62 CHAPTER 4. PHYSICS IN NON-INERTIAL FRAMES

and Lie algebras, realized in matrix form, can be used to orient an orbital spacestation along with the gun turrets it is equipped with. We conclude by lookingat the quaternionic representation of the rotation group – which leads us to thefirst solid historical example of an abstract algebra (a ‘generalization’ of com-plex numbers), constructed by the famous Irish polymath – Sir William RowanHamilton.

This tutorial will make use of matrices and matrix algebra, abstract algebrasand group theory, vectors, rotations and various physical concepts. As such itshould be mastered by engineering, physics, computer science and math studentsalike. Hopefully, it will unify and consolidate various areas of your studies – andmaybe convince you to get a job in weapons design/satellite programming.

4.1.1 Notation

For this tutorial, we will be sticking to Einstein notation – this means that when-ever we see two indices repeated in some quantity that we are summing thisquantity over all possible values of those indices (omitting the summation sym-bol

∑). So for example, we denote a 3-dimensional real vector v in terms of a

standard basis e1, e2, e3 as:v = viei, (4.1)

where the contracted index i ranges across i = 1, 2, 3:

viei := v1e1 + v2e2 + v3e3. (4.2)

As before, we keep one index raised and one index lowered for a pair of repeatedindices 3. Furthermore, components of vectors are raised – hence vj refers to thej−th component of the vector v (not the j−th power), for example. You may befamiliar with representing a vector by its components – v = (v1, v2, ..., vn) – thisnotation is fine, yet elementary as it hides the choice of basis (which is assumedto be the standard basis) by only displaying the components of the vector.

4.1.2 BFF: Linear Maps and Matrices

As one progresses in the mathematical sciences, one frequents the land of matrixoperations – for proofs, problems and simplifying calculations. Perhaps the mainreason for their popularity is that there is a one-to-one correspondence betweenmatrices and linear maps on vector spaces. In particular, a linear map L on avector space V (e.g. 3-dimensional Euclidean space R3) is defined as follows.

Definition 10 A linear map L : V → V which maps the vector space V to itself,is one which has the following property:

• Linearity:

L(au + bw) = aL(u) + bL(w) ∀u,w ∈ V, ∀a, b ∈ F (4.3)

where F is some number field (e.g. the real numbers R or the complexnumbers C).

3A convention which matters in non-Euclidean spaces, since it helps to distinguish covarianttensors (e.g. covectors such as the total differential) from contravariant ones (e.g. your usualvectors).

4.1. THE LIE GROUP OF ROTATIONS: DESIGN A DEATH STAR 63

How does this correspond to matrices? Notice that if we represent a vectorv = viei := v1e1 + ... + vnen in an n-dimensional vector space (e.g. Rn) as acolumn vector:

v =

v1

v2

...vn

(4.4)

Then we can readily compute the action of some matrix on this vector via matrixmultiplication. In particular, the action of an n×n matrix Aon an n-dimensionalvector v will produce another n-dimensional vector, u = Mv – which we callthe transformation of the vector v by the matrix M . For example:

Av =

A1

1 A12 · · · A1

n

A21 A2

2 · · · A2n

...... · · · ...

An1 An2 · · · Ann

v1

v2

...vn

=

A1

1v1 + A1

2v2 + ...+ A1

nvn

A21v

1 + A22v

2 + ...+ A2nv

n

...An1v

1 + An2v2 + ...+ Annv

n

(4.5)Alternatively, in Einstein notation, the action of the matrix A on the vector v isgiven by:

u = Aijvjei (4.6)

where the components Aij of the matrix A correspond to the entry in the ithcolumn and jth row of A4. The contracted indices i and j run over 1 to n (thedimension of the vector space in which v lives).

Now, if one recalls, the action of matrices on vectors is linear – that is, given anyscalars λ, γ and any n-dimensional vectors v and u, then for any n× n matricesA and B we have:

A(λv + γu) = λAv + γAu, (4.7)

hence matrices obey the linearity property required by linear maps. In this sense,we can think of the components of a matrix as the components of a linear mapin some chosen basis – conversely, by computing the action of a linear map Lon a set of basis vectors ej, we can determine its components in that basis –which we can view as entries in some matrix. To make this explicit with someexamples, we shall see how rotation maps can be realized in matrix form.

Exercise 20 (Apocalypse Now) Being quite bored of mathematics, physics, sword-play, music and games, Thomas McKenney chooses to partake in a new pastime– world domination. He decides the best way to undertake this, is to build hisown star wars-inspired Orbital Death Star. The St. George’s College Boarddecides to fund Thomas in this pursuit – agreeing that world domination fitsinto the cultural expansion program as well as securing funding for buildingmaintenance. To this extent, Thomas realizes he must complete the St. George’sCollege Mathematical Sciences tutorials in order to prepare his laser targetingalgorithms. To aid Thomas in this noble enterprise, think of a way to mathemat-ically express the statement – “by computing the action of a linear map L on aset of basis vectors ej, we can determine its components in that basis".

Hint: Compare the action of a linear map L on a vector v with the action ofsome matrix A on v – in particular, compare the coefficients of standard basisvectors ej in the resulting transformed vectors: L(v) and Av. Now look atthe special case when v is simply equal to one of the standard basis vectors ej .

4Rows have a raised index and columns have a lowered index – taking the transpose of thematrix reverses this.


4.1.3 SO(3): The Lie Group of Rotations

In 3-dimensional space Euclidean space, there are three independent axes ofrotation in any given coordinate system. Rotations of vectors are linear maps –to see this, complete the following exercise.

Exercise 21 (Microsoft Death Star) Linear operations are nice – firstly becausethey are relatively simple and second because they can be represented by matri-ces, meaning that they are easy to program and implement into computer algo-rithms. Therefore, to build a feasible laser targeting system, one would hope thatprogramming the rotation of the laser turret amounts to linear operations. Tak-ing an interest in weapons targeting systems, Emma Krantz decides to programsuch a system for her programming competition – to assess the feasibility, shehas to prove that rotations are linear operations.

Let R represent some 3-dimensional rotation operation and v be some 3-dimensionalvector. Argue geometrically that the action of R on the vector v is linear – i.e.show that R satisfies the linearity property required by a linear map.

Hint: Given a 3-dimensional vector v, we can always scale it by some numberλ ∈ R. If |λ|> 1 we dilate the length of the vector and if |λ|< 1 we contractit. Furthermore, if λ > 0 we preserve the orientation of the vector and if λ < 0we reverse it. Argue that scaling first v → λv and then rotating the resultingvector λv is the same as first rotating v and then scaling it by λ — this showsthat rotation is a degree 1 homogeneous operation.

Hint XP: Further show that adding two vectors v + u and then rotating thesum of the two vectors, is the same as rotating each of the vectors separately (bythe same rotation) and then adding the individual rotated vectors. This showsthat rotations are additive operations – if you combine this property with thedegree 1 homogeneous property, this gives the linearity property which provesthat rotations are linear maps.

If one sets up a 3-dimensional Cartesian coordinate system, with coordinatesx, y, z (or x1, x2, x3) and standard basis vectors e1, e2, e3 corresponding to unitvectors in the x, y and z directions, respectively, then one has three independentrotation operators R1, R2 and R3 which rotate vectors about each of the corre-spondence axes (x, y and z). These are linear maps and hence can be representedas 3 × 3 matrices. We can also view them as functions of the angle which theyrotate by. Explicitly, these matrices are:

R1(θ) =

1 0 00 cos θ − sin θ

0 sin θ cos θ

(4.8)

R2(β) =

cos β 0 sin β

0 1 0

− sin β 0 cos β

(4.9)

R3(γ) =

cos γ − sin γ 0

sin γ cos γ 0

0 0 1

(4.10)


Geometrically, R1(θ) rotates any vector v anti-clockwise 5 by an angle θ aboutthe x-axis – this means it rotates v in a plane perpendicular to the x-axis. Sim-ilarly, R2(β) rotates by an angle β anticlockwise about the y-axis and R3(γ)rotates anticlockwise by an angle γ about the z-axis.

Exercise 22 (Eigenvectors of Rotation) Clearly if you have a vector that liesalong the x-axis and you rotate it about the x-axis, nothing happens to the vector.This is because any vector lying along the x-axis is an eigenvector of the x-rotation matrix R1(θ). More generally, if we rotate any vector v = v1e1 +v2e2 + v3e3, about the j-th coordinate axis, then its j-th component will notchange.

Q: Using matrix multiplication and representing each vector as a column vector,show that:

Rj(θ)ej = ej, (no summation) (4.11)

which means that the standard basis vector ej is an eigenvector of the rotationoperator Rj with eigenvalue 1.

Now, using the previous result and the fact that rotations are linear operators,prove that6:

Rj(θ)v =∑k 6=j

(vkRj(θ)ek) + vjej, (4.12)

where the summed index k 6= j means you sum over all values (1, 2, 3) of k notequal to j. Hence, rotations about a given axis preserve the component of anyvector along that axis.

Problem 23 (The Proof is Trivial) If you rotate a vector about an axis throughangle of zero degrees, the vector should remain unchanged.

Verify that all three rotation operators Rj(θ) become the 3 × 3 identity matrix(the matrix with 1’s down the main diagonal entries and zeros everywhere else)when you set the angle θ = 0.

As it turns out, the set of rotation matrices forms a mathematical structure knownas a ‘Lie Group’. As such they are used for lying/truth algorithms. Actuallythat’s a lie – they are actually a type of ‘continuous’ or rather ‘smooth’ group (asopposed to a discrete group) named after the mathematician Sophus Lie, whodeveloped and pioneered them. Lie groups are of fundamental importance tomodern physics and mathematics – in fact, they are the core element underlyingmajor developments in particle physics 7, high energy physics and gauge theory.We define a Lie group as follows.

Definition 11 A Lie group G is a differentiable manifold which is also a groupwhose group operations are smooth (infinitely differentiable). This means that Gequipped with the operation ? satisfies the group properties

• Closure/Binary Operation: If A,B ∈ G then A ? B ∈ G.

• Associativity: For any A,B,C ∈ G, A ? (B ? C) = (A ? B) ? C.

• Identity Element: ∃I ∈ G such that I ? A = A ? I = A, ∀A ∈ G.

5Almost always in mathematics, anti-clockwise is considered to be a positive orientation andclockwise is considered to be negative.

6This is not using the Einstein summation convention – so vjej is for a fixed value of j, nota sum over all possible values of the index j.

7The Standard Model of Particle physics is in fact a Lie Group – this tells us the symmetriesthat nature obeys for the electromagnetic, weak and strong nuclear forces.


• Inverses: For any A ∈ G∃B such that A ? B = I . If ? is a multiplicativeoperation, we denote B = A−1, the inverse of A. IF ? is additive (orcommutative), we denote B by −A.

where ? is a binary operation8 (e.g. matrix multiplication) which is smooth.

Exercise 23 (YOLO) Unaware of the on-going ‘Project Death Star’ of St. George’sCollege, University Hall decides to hold a party to show how awesome they are.After shouting YOLO, a drunken University Hall student jumps into a pit of hornyhoney badgers and dies a humiliating death. Despite making it into the presti-gious Darwin Awards, this is tragic because that student lived a life without everproving that the real numbers R form a group under addition – and that thenon-zero real numbers R\0 form a group under multiplication.

Using your wisdom and foresight to avoid a similar fate, prove that the realnumbers form a group under the addition operation + with 0 being the additiveidentity element. Similarly, prove that the set of non-zero real numbers forms agroup under the multiplication operation × with 1 being multiplicative identityelement. Together, these statements imply that the real numbers form a specialmathematical structure called a ‘field’.

Rotations form the Lie Group SO(3), which is the 3-dimensional ‘Special Or-thogonal Group’. This group is characterized as the set of 3 × 3 matrices Awhich have the following properties 9

• det(A) = 1

• ATA = 1

for any rotation matrixA. Since the determinant of a linear map tells you how themap distorts volumes, the first condition (the ‘Special’ part) says that rotationspreserve volumes – this is a consequence of the more general observation thatrotations are isometries of Euclidean space, meaning that they preserve lengthsof vectors and relative angles between vectors (rotating any pair of vectors si-multaneously leaves the angle between them unchanged). Furthermore, sincethe second condition (the ‘Orthogonal’ part) can be written as:

AT = A−1 (4.13)

where A−1 is the inverse of the rotation matrix A, the second condition saysthat rotations are orthogonal10 transformations – meaning they preserve orthog-onality of vectors (or that the column vectors in a rotation matrix are mutuallyorthogonal). Hence, the second property comes from the fact that isometriespreserve angles between objects.

Note that the group operation for SO(3) is matrix multiplication – which is asmooth operation since it essentially amounts to the multiplication and additionof numbers.

Exercise 24 (Killing Time) Whilst waiting on the construction of the death starby the St. George’s College engineering, science and mathematics students (aswell as legal approvals from Georgian law graduates), Thomas feels the urgeto kill – kill time that is. As a member of the St. George’s College Orbital

8A binary operation ? on a set V , is one that combines two elements a, b of V to give anotherelement of V : a ? b = c ∈ V . Examples of binary operations include addition of numbers orvectors, multiplication of numbers and cross products of vectors.

9Recall that det means the matrix determinant of A and AT denotes the matrix transpose ofA.

10Recall that orthogonal is the mathematical term for ‘perpendicular’.


Death Star, help Thomas kill time by explicitly showing that the rotation matri-ces Rj(θ) satisfy the two properties which characterize the special orthogonalgroup, SO(3).

Hint: It helps to show that for any rotation matrix Rj(θ), one has (Rj(θ))T =

Rj(−θ) = (Rj(θ))−1, which can be argued geometrically and/or algebraically

using the fact that cosine is an even function 11 cos(θ) = cos(−θ) and that sineis odd: sin(−θ) = − sin(θ).

Exercise 25 (Group Project: Project Death Star) In an attempt to understandrotations better for the programming of a weapons targeting system on the Geor-gian Death Star, the members of the SGC Mathematical Sciences Tutorial sitdown and try to prove that the set of rotation matrices, SO(3), form a group.Since this includes you, complete this proof. This means verifying that SO(3)satisfies the four properties required to be a group, with matrix multiplicationbeing the group operation.

Hint: Recall how the 3 × 3 identity matrix I3 acts on a 3-dimensional vectorv – that is, I3v = v. Furthermore, to show that every element of SO(3) hasinverse, consider Ru(θ) – an arbitrary rotation operator which rotates objectsanticlockwise through an angle θ about an axis defined by the vector u, thenconsider how one would undo rotations performed by Ru(θ).

Because the Lie Group SO(3) is transitive, we can write any general rotationas a product of finitely-many rotation matrices. For us, this means that we canwrite any rotation as a sequence of rotations about the x, y and z axes:

R(α, γ, β) = R3(γ)R2(β)R1(α). (4.14)

Note that since matrix multiplication is not commutative, the order in whichmultiply (hence the order in which we rotate) matters. In particular, when therotation R(α, γ, β) given by (4.14) acts on a vector v, it rotates it first by anangle α about the x-axis, then by an angle β about the y-axis and finally by anangle γ about the z-axis. In general, we could write down a matrix Ru(θ) whichrotates objects anticlockwise about some axis defined by the vector u throughan angle θ – indeed, such a matrix is given by the (easy-to-prove) ‘Rodrigue’srotation formula’, which we will investigate later.

Exercise 26 (Spring Cleaning) After finally getting building and environmen-tal approvals, as well as successfully subduing Greens Party protesters, St. George’sCollege sends Project Death Star into its testing phase. Having a particular dis-taste for Justin Bieber, Thomas decides that he wants to aim and fire the gamma-ray LASER on the death star at Justin Bieber’s hometown – during Christmaswhen Justin Bieber is home with his family. For shielding reasons, in its inac-tive state, the Death Star’s cannon is oriented along the x-axis in the followingfigure.

This is because the cannon portion of the death star has weaker armour. In orderto fire the death star at Justin Bieber, Thomas must rotate the death star to pointat Ontario, Canada. After the death star is oriented in this way, Emma Krantz’stargeting algorithm will takeover and refine the aim to Justin Bieber’s house.

The coordinate system we use is centred with the death star at its origin. In orderto shoot JB, the death star must be oriented in the direction of the purple ray inthe above diagram. This can be achieved by feeding the correct rotation matrix

11Recall that even functions f(x) are symmetric about x = 0 and odd functions are anti-symmetric.


Figure 4.1: Aiming an Orbital Death Star with sequential rotations.

into the death star targeting systems. There are multiple ways to construct sucha matrix – however, for our purposes, it is easiest to construct it by sequentialrotations about the three different coordinate axes.

Q: Write down the rotation matrices corresponding to the rotations indicatedby each of the angles – α, β, γ – show in the diagram. Note that these are notnecessarily in the order x − y − z! Once you’re confident that you have thecorrect rotation matrices, multiply these matrices in the correct order to givea rotation matrix which will rotate the death star cannon from the x − axis toJustin bieber’s home state.

Hint: It helps to keep track of which coordinate stays constant under a cer-tain rotation – recalling the rotation eigenvectors, it then follows that you areperforming a rotation about that coordinate axis. For example, the γ rotationcorrespond to an anticlockwise rotation about the y coordinate axis through anangle γ.

After pointing the death star at Ontario, the Krantz algorithm takes over andperforms a super-accurate shot – killing Justin Bieber with minimal collateraldamage. Fearing that the warlike nation of Canada will retaliate with directline-of-sight missile attacks, Thomas decides it is best to return the death star toits original orientation along the x-axis – the side that faces Canada will thushave more armour as well as an anti-missile system featuring an array of LAWSNaval lasers stolen from the U.S. Military.

Q: Write down a sequence of rotations to rotate the death star to its original ori-entation. Now write down a single rotation matrix to perform this total rotation.

Hint: Recall the fact that rotation operations form a Lie group – in particular,this means that every rotation has an inverse. Recalling the properties of therotation group SO(3), in particular the orthogonality property: AT = A−1,there is a super-easy way to invert the death star rotation and return it to itsoriginal orientation. Alternatively, recall that you showed R(−θ) = (R(θ))−1 –either algebraically or geometrically. Use this to find the rotation matrix whichreturns the death star to its original orientation.

4.2. RIGID BODIES AND MOMENTS OF INERTIA 69

4.2 Rigid Bodies and Moments of Inertia

A rigid body is a collection of particles (discrete or continuous) which are ‘fixed’relative to each other. The dynamics of many objects – such as a cricket ballflying out of Shane Warne’s hand, can be modelled as the motion of a rigidbody. In this regard, we can decompose the motion of rigid bodies as a linearmotion of the center of mass of the rigid body, accompanied by some rotationalmotion of the rigid body about its principal axis.

To see why we need to consider rotation about more than one axis, consider thefollowing. In general, the direction of the angular momentum vector of a rotat-ing object does not necessarily coincide with the axis of rotation. The angularmomentum and rotation axis coincide when the rotation is a principal axis. Aswe shall see, for any rigid body rotating about some specified point (e.g. thecenter of mass), there are three unique principal axes – these form an orthogonalsystem (and hence a basis for some 3-dimensional vector space).

In this tutorial12, we shall see how to describe the general rotational dynamicsof rigid bodies with general concepts such as the ‘inertia tensor’ and eigenvectordecompositions for determining the principal rotation axes for a rigid body. Aswe shall see, for rigid bodies with various geometric symmetries, we obtain thewell-known (first year physics / engineering) formulae for simple rigid bodies –e.g. cubes, spheres, cylinders. After deriving some familiar results, we shall usethe general theory to analyse the motion of a spinning top – although a child’stoy, a deceptively non-trivial system!

4.2.1 Rotations: about an arbitrary axis

Unlike the circular motion of point particles (whose rotational motion can bedescribed by some scalar angular velocity), the rotational dynamics of extendedrigid bodies requires us to treat angular velocity as a 3-dimensional vector quan-tity:

ω = (ωx, ωy, ωz), (4.15)

where the components, ωj denote the angular velocity about the j−th coordinateaxis. Note that it is the angular velocity vector, ω, which defines the axis of rota-tion – that is, ω lies along the axis rotation. For general rotational dynamics, theangular momentum vector, L, need not lie along the axis of rotation – hence Land ω are not necessarily parallel13. To construct the angular momentum vectorfrom the angular velocity, we note two scenarios in rigid body rotations.

1. Rotation with a fixed point: If a rigid body has a fixed point then itcan only rotate about this fixed point, making it a natural choice for theorigin of your coordinate system (vector space). Simple examples includea pendulum swinging from a fixed pivot or a spinning top whose tip isconfined to some hole in the surface on which it rotates.

2. Free Rotation: A freely rotating rigid body is one which does not havea fixed point. In this case, the center of mass (CM) makes for the natu-ral choice of origin for a coordinate system (the center-of-mass reference

12This tutorial is based mostly on chapter 11 of John Taylor’s ‘Classical Mechanics’. A veryaccessible book for second (or first) year students who are willing to solve many problems

13For special cases, such as circular motion of a point particle, L and ω are parallel – allowingone to treat the angular velocity as a scalar quantity


frame). One may then decompose the motion of the object as the motionof the CM along with rotations relative to the CM.

The total angular momentum of a rigid body, composed of particles (subs-systems)of mass mj with displacements vectors rj relative to some origin 0, with linear(tangential) velocities vj , is given as the sum of the angular momenta of eachparticle (sub-system):

L =∑j

mjrj × vj

=∑

mjrj × (ω × rj), (4.16)

where ω is the angular momentum vector for the rigid body14. One expand thetriple cross product (triple vector product), using the result from the followingexercise.

Exercise 27 (BAC-CAB Rule) For any 3-dimensional vectors, A,B,C, provethat:

A× (B×C) = B(A ·C)−C(A ·B). (4.17)

Hint: Expand out both sides of the BAC − CAB equation and compare. Keepin mind that the vector product takes two vector inputs and produces a vectoroutput. The scalar product (dot product) on the other-hand, takes two vectorinputs and produces a scalar output – the magnitude of the projection of onevector onto the other vector.

Note that an alternative to the above formulae, is to treat r and ω as 1-forms anduse the exterior product and hodge dual operations:

? (r[ ∧ ?r[ ∧ ω[

, (4.18)

which is equivalent to using Levi-Civita symbol identities.

Returning to the angular momentum construction and using the vector-tripleproduct identity, we have:

r× (ω × r) =ω(r · r)− r(r · ω)

=‖r‖2ω − (r · ω)r

=(y2 + z2)ωx − xyωy − zxωz

∂x +

−xyωx − (x2 + z2)ωy − zyωz

∂y

+−zxωx − yzωy + (x2 + y2)ωz

∂z.

using a coordinate system and basis with

r = x∂x + y∂y + z∂z, ω = ωx∂x + ωy∂y + ωz∂z. (4.19)

Exercise 28 (How to read an angular momentum vector) If necessary, verifythe intermediate steps of the previous calculation. Hence, or otherwise, showthat the total angular momentum takes the form:

L =∑j

mjrj × vj

=∑j

mj

(y2j + z2

j )ωx − xjyjωy − zjxjωz∂x +

∑j

mj

−xjyjωx − (x2

j + z2j )ωy − zjyjωz

∂y

+∑j

mj

−zjxjωx − yjzjωy + (x2

j + y2j )ωz

∂z.

14Note that the body has one unique angular momentum vector since it is a rigid body. Thisis in contrast to a loose collection of particles with (possibly) distinct angular momenta vectors.


Note that we have added the sub-script j to denote the position coordinates(xj, yj, zj) of the j − th mass in the rigid body system.

Using the above expansion of the angular momentum vector, we can write downthe components of the Inertia Tensor as follows. Since the inertia tensor is alinear operator, we can represent it in our basis ∂x, ∂y, ∂z as a 3-by-3 matrix I ,whose entries are read-off the angular momentum vector by matching the aboveexpansion with the following important matrix equation:

L := Iω, (4.20)

or equivalently, the three component equations for Ln with n = 1, 2, 3 (repre-senting the directions x, y, z)

Ln =3∑

k=1

Inkωk. (4.21)

That is, using the vector-triple expansion of the definition of the total angularmomentum vector, we can get the components Ink of the inertia matrix by look-ing at the coefficient of the angular velocity ωk about the k− th coordinate axis,in the n − th component of the angular momentum vector. So for example, Ixxwould be the coefficient of ωx in the x-component of the angular momentum Lvector:

Ixx =∑j

mj(y2j + z2

j ). (4.22)

Ixy would be the coefficient of ωy in the x-component of the angular momentumvector:

Ixy = −∑j

mjxjyj. (4.23)

Exercise 29 (Bedtime Reading) Continue the above to read-off the other com-ponents of the inertia matrix. In particular, determine all components of theinertia matrix and list them.

Hint: Don’t fall asleep.

The inertia matrix then takes the form:

I =

Ixx Ixy IxzIyx Iyy Iyz

Izx Izy Izz

(4.24)

(4.25)

and our matrix equation relating the angular momentum vector to the inertiamatrix and the angular velocity vector, gives us the following components of theangular momentum vector:

Lx =Ixxωx + Ixyωy + Ixzωz

Ly =Iyxωx + Iyyωy + Iyzωz

Lz =Izxωx + Izyωy + Izzωz. (4.26)

As it turns out, the inertia tensor for a rigid body is a symmetric tensor. Thismeans it can be represented by a symmetric matrix I – that is, we have:

I = IT , (4.27)


where T denotes the‘transpose’ operation. Visually, this means that the 3− by−3 matrix I has reflection symmetry about its main diagonal. Note that this isan extremely important property of the inertia tensor as the ‘spectral theoremfor self-adjoint operators’15 guarantees that the intertia tensor is diagonalizable.This means we can choose a basis (the eigen-basis)in which the inertia matrix isdiagonal – thus guaranteeing the existence of principal axes.

Finally, as our last technical point, we note that since the inertia tensor is sym-metric, it corresponds to a ‘quadratic form’ – in particular, one that acts on the3-dimensional vector space R3. This means that in principle, we could graphi-cally represent the intertia tensor of different rigid bodies by a ‘quadric surface’,generated by the equation:

rT Ir = 0 ⇐⇒ (Ir) · r = r · (Ir) = 0. (4.28)

In the principal axes basis, I will be diagonal – allowing us to represent theinertia tensor as an ellipsoid with principal axis lengths corresponding to eachcomponent of the intertia tensor.

Problem 24 (Symmetry: To be, or not to be?) In the alternate timeline of Cy-borg Emperor Constantine, the cyborg comes across William Shakespeare – abard-mathematician. In his new play, ‘Hamlet’, he depicts the dramatic life ofa physicist who cannot decide on the best construction for his theory of rota-tional dynamics. In particular, Hamlet has the option of using exterior calculusand defining the inertia-tensor as an anti-symmetric rank 2 tensor (differential2-form), or using simple matrix/vector algebra and defining the inertia matrix ina symmetric way.

Help Hamlet by proving our definitions lead to a symmetric inertia tensor.

Hint: It suffices to show that Ixy = Iyx, Ixz = Izx, Iyz = Izy.

Now, can you think of any physical reasons that the inertia tensor should besymmetric (with our definitions)?

Because the inertia tensor is in general symmetric, it follows that there are only3(3 + 1)/2 = 6 independent components at most. Note that this depends on thebasis we choose for our vector space – if we choose the principal basis, then theinertia matrix is diagonal and thus has 3 independent components.

The results derived thus far pertain to a rigid body composed of discrete sub-systems of mass mj (hence the sums over j). In the case of continuously dis-tributed rigid bodies – such as a cube, ball or Theresa Feddersen, we must replacethese summations with integrations. In particular, the continuum limit is givenby:

mj → δmj,∑j

mj →∫Ddm, (4.29)

where dm is an infinitesimal mass element and D is a subset of R3 representingthe rigid body. The infinitesimal mass element, dm, can be related to the densityand measure (volume, area of length)of a (1,2, or 3-dimensional) rigid body. Forexample, for a 3-dimensional rigid body with mass-density profile ρ(x, y, z),we would have: dm = ρ(x, y, z)dxdyz, which is the volume density at a pointmultipled by the volume of an infinitesimal box at that point. If the density ρ isa constant, then we simply have: dm = ρ × d(Measure), where ‘Measure’ isequal to the length, area or volume (depending on the object). This may seem

15Note that a symmetric matrix corresponds to a self-adjoint linear operator over Rn or Cn

with respect to Euclidean (formally, l2) inner-product.


confusing at first, so it is best to illustrate articulate these points with an example.

Example 7 (Constantine’s X-Cube) Having defended the residents of ‘Ye OldeTown’ from the Titanoboa with Big A. Geller, Constantine is rewarded with amysterious cube. The cube has special functions16, which can only be activatedif it thrown into the air and undergoes certain rotational sequences. In an at-tempt to unlock the powers of the cube (and gain X-cube achievement points),Constantine decides to master the rotational dynamics by investigating a cube’smoments of inertia. To help the cyborg Emperor, we now procede with the nec-essary mathematics.

First, we are given that the cube has a uniform density profile, with total massMand side length a. To make life easy, we choose a coordinate system such thatour coordinate axes coincide with the edges of the cube. We now construct theinertia tensor for two classes of rotational motion for the cube.

1. Rotation about a corner (rotation with a fixed point): Here we con-sider the scenario where our cube is rotating about a corner. A naturalcoordiante system is to let the origin of some coordinate system coincidewith a corner of the cube. The cube is then defined as the following do-main in R3

Cube = [0, a]× [0, a]× [0, a]. (4.30)

Taking one corner to be fixed at the origin, we can consider rotations ofthe cube about different axes. For rotations about an edge (with one cor-ner fixed), WLOG we can take the cube to be rotating about the x-axis –whence our angular velocity vector will have the simple form:

ω = ωx∂x, (4.31)

equivalently: ω = (ωx, 0, 0) in component notation. Since there is onlyone non-zero component of the angular velocity, we can omit the coordi-nate subscript – hence: ωx = ω := ‖ω‖.

Taking the continuum limit of our earlier summation expressions for thecomponents of inertia, we are able to derive the intertia matrix for our cube(a rigid body which has continuously distributed mass). To this effect, we

16These include a ‘red ring of death’, which annihilates everything (except the user) within a300 meter radius.


have the following diagonal element of our inertia matrix:

Ixx = lim _n→∞ limmj→δm

n∑j

mj(x2j + y2

j )

=

∫cube

dm(x2 + y2)

=

∫cube

ρdV (x2 + y2)

=

∫cube

M

VdV (x2 + y2)

=

∫cube

M

Vdxdyz(x2 + y2)

=

a∫0

dx

a∫0

dy

a∫0

dzM

a3(x2 + y2)

=M

a3

2

3a5

=2

3Ma2. (4.32)

Similarly, in the continuum limit, we have:

Iyy =

∫cube

dm(z2 + x2)

Izz =

∫cube

dm(x2 + y2). (4.33)

Given the symmetry of the cube, it is clear that Iyy = Izz = Ixx.

Finally, to determine the off-diagonal elements of the inertia tensor wetake the continuum limit of our earlier results again:

Ixy =− limn→∞

limmj→δm

n∑j

mjxjyj

=−∫cube

dm(xy)

=−∫cube

ρdV (xy)

=− ρa∫

0

dx

a∫0

dy

a∫0

dz(xy)

=− ρa∫

0

xdx

a∫0

ydy

a∫0

dz

=− ρ(a2

2)(a2

2)a

=− 1

4Ma2. (4.34)


To determine the other off-diagonal elements, the continuum limit of ourearlier expressions gives:

Ixz =−∫cube

dm(xz)

Iyz =−∫cube

dm(yz)

(4.35)

where Ixy = Iyx, Iyz = Izy Iyx = Ixy by symmetry of the inertia tensor.Furthermore, the geometric symmetry of the cube tells us that Ixy = Ixz =Iyz – hence we have determined all components of the inertia matrix. Ex-plicitly, our inertia matrix in this ∂x, ∂y, ∂z basis, is given by:

I =

23Ma2 −1

4Ma2 −1

4Ma2

−14Ma2 2

3Ma2 −1

4Ma2

−14Ma2 −1

4Ma2 2

3Ma2

(4.36)

(4.37)

which we can simply by taking drawing out a common factor:

I =Ma2

12

8 −3 −3−3 8 −3

−3 −3 8

. (4.38)

(4.39)

Earlier, we derived a general expression for the angular momentum, L, ofa rotating rigid body – in particular, the angular momentum was shown tobe generated by the action of the intertia tensor (a linear operator) on theangular momentum vector. In other words, we have:

L = Iω, (4.40)

where I is our inertia matrix. Doing this matrix multiplication explicitlywith our inertia tensor and angular velocity vector, we see that:

L =Ma2ω

12(8,−3,−3). (4.41)

Hence, we have demonstrated an explicit scenario where the angular mo-mentum vector L does not point in the same direction as the angular ve-locity vector, ω.

Similarly, for rotation of the cube about its main diagonal (with one cornerfixed), a unit vector in the direction of rotation is 1?

3(1, 1, 1). It follows

that the angular velocity vector is parallel to this, giving ω = ω?3(1, 1, 1).

Thus, the angular momentum for this rotation is given by

L = Iω =Ma2

6ω. (4.42)

In this scenario, the angular momentum points in the same direction as theangular velocity vector.


2. Rotation about the cube’s center (free rotations about the center-of-mass): If the cube is rotating about its center, the natural choice of coor-dinate system is to place the origin O at the center of the cube. Therefore,the cube is defined as the following domain in R3

Cube = [−a/2, a/2]× [−a/2, a/2]× [−a/2, a/2]. (4.43)

To account for this, we must change our limits of integration accordingly.

Using the same definitions as before, it is an easy exercise to show that forthis rotational motion, we have:

Ixx =1

6Ma2, (4.44)

with Iyy = Izz = Ixx via geometrical symmetry. Furthermore, noting thatour domain of integration for each variable is now symmetric about theorigin – running from −a/2 to +a/2, it is easy to see that the off-diagonalcomponents of the inertia tensor vanish in this basis. In particular, we areintegrating odd functions of one-variable over symmetric domains.

Therefore, our intertia tensor is diagonal in this basis! This means thatthe coordinate axes for this class of rotations, are the principal axis of ourcube! Mathematically, we are operating in the eigen-basis of our inertiamatrix. Thus,

I =1

6Ma2

1 0 00 1 0

0 0 1

. (4.45)

(4.46)

Now, for rotations about the diagonal of the cube, the direction of theangular velocity vector is parallel to the vector (1, 1, 1). In particular, ω =ω 1?

3(1, 1, 1), where the 1?

3is used to make our direction vector a unit

vector. Therefore, the angular momentum vector is given by:

L = Iω =Ma2

6ω. (4.47)

Hence, for this class of rotations, the angular momentum vector is alwaysparallel to the angular velocity vector.

Note that it can be shown (an exercise – or argued by symmetry), thatthe angular momentum for rotation about any axis through the center ofthe cube is the same as the angular momentum for rotation about aboutthe main diagonal through the corner of the cube. This is because theangular momenta about the main diagonal through the center of the cubeis precisely the same as the main diagonal through the corner – hence theangular momenta for rotations about these two axes must coincide.

As we saw, out of a potential 6 possible independent components for its inertiatensor, the cube exhibited only one or two independent components, dependingon the type of rotation (and basis) we chose. The cube was relatively simpledue to being a geometric object with a high degree of symmetry – the symmetryallowed us to reduce the number of independent components. In the next guidedproblem, we shall study the rotational characteristics of another object with arelatively large degree of symmetry – the cone.


Problem 25 (The Cursed Cone) For generations, since the killing of Pelops,the family of Atreus has carried the burden of curse. In the alternate universeof the cyborg Constantine, the Greek hero Orestes, son of Agammemnon, Kingof the Greeks, must end the curse upon his family by taking a solid gold cone tothe Areopagus. At this Athenian court, Orestes and his sister Electra will standjudgement. To purify the cone and its associated curses (matricide being one ofthem), Apollo, god of the bow, must derive the rotational characteristics of thecone.

Consider a uniform, solid cone of mass M , height h and base radius R. Toconsider rotational dynamics of the cone rotating about its vertex (tip)17, we letthe origin O of some coordinate system coincide with the tip of the cone. It canbe described as the following domain in R3:

Cone = (x, y, z) ∈ R3 : x2 + y2 ≤ R2, 0 ≤ z ≤ h. (4.48)

Equivalently, in a cylindrical coordinate system (ρ, φ, z) we can describe thecone as the following subset:

Cone = (r, φ, z) ∈ R3 : 0 ≤ r ≤ R, 0 ≤ z ≤ h.. (4.49)

In this manner, we see that φ is a free variable ranging from 0 to 2π – thus in thiscoordinate system, the S1 (rotational) symmetry of the cone is explicit. Recallingthat the angular velocity vector ω defines the axis of rotation, we can describethe fixed-point rotational dynamics of the cone by determining its inertia tensorfor some arbitrary ω.

1. Let ρ = MV

be the mass density of the cone. To calculate the volumeV of the cone, we can derive an expression simply by doing the volumeintegral:

Vcone(R, h) =

φ=π∫φ=0

z=h∫z=0

r=Rzh∫

r=0

dV (4.50)

where dV = rdrdθdφ is orienting volume form of the cone (‘infinitesimalvolume element’). Note that when integrating we must use the geometricconstaint, r

R= z

h(via similar triangles), required by a cone whose bas

angle is 45 degrees.

Show that Vcone(R, h) = 13πR2h. It follows that:

ρ =3M

πR2h. (4.51)

Therefore, the infinitesimal mass element is given by dm = ρdV =3MπR2h

dV .

2. Diagonal components: moments of inertia Use the previously derivedexpressions to calculate the components of the inertia matrix of the cone.In particular, calculate the following diagonal elements (moments of iner-tia) of the inertia matrix:

Izz :=

∫Cone

dm(x2 + y2)

=

∫Cone

dV [ρ(x2 + y2)]

=. (4.52)

17Note that here are we considering the class of rotations with a fixed point – the cone’s vertex.


Due to the rotational symmetry of the cone, one should be able to see thatthe Ixx and Iyy moments of inertia should be equal (you can do the explicitcalculations to show this).

Iyy =

∫Cone

dV [ρ(x2 + z2)]

=. (4.53)

Ixx =

∫Cone

dV [ρ(y2 + z2)]

=. (4.54)

You should find that

Izz =3

10MR2

Ixx =Iyy =3

20MR2 + 4h2

. (4.55)

Hint: For the Izz integral, it is prudent to change to polar coordinates.Recall that in a Cartesian coordinate system, we have dV = dxdydz andthat in a polar coordinate system, we have dV = rdrdθdφ with r = |J |being the Jacobian determinant factor.

3. Off-Diagonal components: products of inertia Now, by either usingdirect calculation or exploiting the rotational symmetry of the cone, com-pute the off-diagonal elements of the inertia matrix. That is, compute the‘products of inertia’:

Ixz =−∫cone

dm(xz)

Ixy =−∫cone

dm(xy)

Iyz =−∫cone

dm(yz)

(4.56)

Formulae

Hint: Due to the rotational symmetry of the cone, it should be clear thatthe products of inertia are zero (but you should prove this!). That is,

Ixy = Ixz = Iyz = 0. (4.57)

Note, it may help to remember the change-of-coordinate relations, x =r cos(φ), y = r sin(φ).

4. Angular Momentum Note that the inertia matrix for this class of rotationsof the cone (rotations with the vertex fixed) is diagonal:

I =3

20M

R2 + 4h2 0 0

0 R2 + 4h2 0

0 0 2R2

. (4.58)

(4.59)


This means that if the angular velocity ω = (ωx, ωy, ωz) is directed alonga coordinate axis (x, y or z) then so is the momentum L:

L = Iω = (λ1ωx, λ2ωy, λ3ωz), (4.60)

where λj (j = 1, 2, 3) are the diagonal elements (actually, eigenvalues)of the inertia matrix. For example, if the cone is rotation about the xcoordinate axis (i.e. ω = (ωx, 0, 0)), with the vertex fixed at the origin, wewill have:

L = Iω =3

20M(R2 + 4h2)∂x, (4.61)

where ∂x is a unit vector in the x direction.

Problem 26 (Electra’s Ellipsoid) Prove that an ellipsoid of uniform mass den-sity ρ, total mass M and principal axes lengths a1, a2, a3 has a diagonal inertiatensor with principal moments of inertia (eigenvalues):

λj =2

5Ma2. (4.62)

Hint: Due to the reflection symmetries of the ellipsoid, the off-diagonal compo-nents of the inertia tensor are necessarily zero.

Hint: When doing the integrations, make use of symmetry and switch to spher-ical coordinates to do the final integrations. The Jacobian factor for sphericalcoordinates is r2 sin(θ), where θ is the azimuthal angle (angle between the rcoordinate lines and the z-axis).

Hint: The volume of an ellipsoid can be derived rather easily by integration:V =

∫Ellipsoid

dV , in spherical coordinates. You should find that V = 43πabc.

Now, compute the inertia matrix for a sphere of radius R. In other words, seta = b = c = R. What do you notice? What does this suggest about therotational dynamics of a sphere?

4.2.2 Principal Axes and Spectral Decomposition

The guaranteed existence of three mutually perpendicular principal axes for anyrigid body, is a consequence of the fact that the inertia tensor is symmetric –i.e. it is represented by a symmetric matrix: I = IT . To see this, we now stateand illustrate a proof of the following, vastly important (and powerful) theorem.

Theorem 4 (Spectral Theorem (for symmetric linear operators)) Given anysymmetric linear operator (matrix) A : Rn → Rn acting on n-dimensionalEuclidean space, Rn, we can extract n independent and mutually orthogonaleigenvectors with n corresponding eigenvalues. That is, we can diagonalize A:

A = UΛUT , (4.63)

where Λ = diag(λ1, λ2, ..., λn) is a diagonal matrix consisting of the eigenvaluesλj 6= 0 of A. The matrix U is a special orthogonal matrix (rotation matrix,U ∈ SO(3))– that is, it is composed of the (mutually orthogonal) eigenvectorsof A as its columns.

One can make sense of this theorem by observing the following sketch proof.


Proof 1 We can use an inductive proof by starting with n = 1 – i.e. a non-zero 1 × 1 matrix A = [a]. Clearly this has one eigenvalue a and eigenvector([a]).Let In denote the n × n identity matrix. Now considering the polynomialP (λ) = det(A − λIn). By the fundamental theorem of algebra, the eigenvalueequation P (λ) = 0 has n distinct roots over the field of complex numbers, C.

It follows that since the matrix (A − λ1n) is non-invertible (zero determinant)for any eigenvalue λ, that the following equation holds:

Au = λu, (4.64)

for any real vector u ∈ Rn. We can always divide u by its length ‖u‖. Since Aand u are real, it follows that:

λ = uTAu, (4.65)

(since uTu = u · u = 1) hence λ is real. Doing this for all eigenvalues λj andeigenvectors uj , we can use the Gram-Schmidt procedure to create an orthonor-mal set of n linearly independent eigen-vectors. Putting these into the columnsof a matrix: U = [u1....un], we can then use the matrix U to diagonalize A –that is,

A = UΛUT , (4.66)

where U is an orthogonal matrix (UT = U−1) and Λ = diag(λ1, λ2, ..., λn) is adiagonal matrix whose entries are the eigenvalues of A.

Since the inertia tensor is a symmetric linear operator – represented by a sym-metric 3-by-3 matrix, the spectral theorem implies the following physical re-sult.

Corollary 1 (Existence of Principal Axes) Fony rigid body R and some pointO in space, there are three mutually orthogonal (perpendicular) principal axesthrough O. In such a basis, the inertia tensor I = diag(λ1, λ2, λ3) is diagonal.When the angular velocity ω (rotational axis) points along any of these axes theangular momentum L is parallel to it.

Therefore, the principal axes of a rigid body are the eigenvectors of its iner-tia tensor. Furthermore, the ‘principal moments of inertia’ are the moments ofinertia about each of these axes – i.e. the eigenvalues of the inertia tensor.

We now illustrate the process of ‘principal axes decomposition’ (spectral de-composition) for rotational dynamics. To this extent, let us return to the cube!

Problem 27 (Clymtaemnestra’s Cunning Cube) Having purified the line of Atreusof the family curse, Orestes and Electra establish a court of justice under, Apolloin Athens. All seems well, until a cunning cube rolls into the Athenian court.With it, the cube brings the Furies – called upon by the spirit of Orestes’ motherin vengance.

To protect Orestes and transform the Furies into Eumenides (Benevolent Ones),Athena must perform a spectral decomposition of the inertia tensor for Clym-taemnestra’s Cube, rotating about its corner.

We computed the inertia tensor for a cube (of edge length a) rotating about its


corner, in our first example:

I =Ma2

12

8 −3 −3−3 8 −3

−3 −3 8

. (4.67)

(4.68)

1. Eigenvalues (Principal Moments)

By solving the characteristic equation, det(A − λ13) = 0, show that weget the following eigenvalues:

λ1 = 2µ, λ2 = λ3 = 11µ, (4.69)

where µ := Ma2

12.

2. Eigenvectors(Principal axes)). By solving the eigenvector equations:

(I − λj13)ωj = 0, (4.70)

for eigenvectors ωj with corresponding eigenvalues λj (with j = 1, 2, 3),show that:

e1 =1?3

(1, 1, 1), (4.71)

is a unit eigenvector corresponding to eigenvalue λ1 = 2µ. Therefore, oneprincipal axis of the cube is the main diagonal of the cube (between O and(a, a, a)). The corresponding moment of inertia is equal to λ1 = 2µ.

Solving the second eigenvector equation, with λ2 = λ3 = 11µ, we see thatwe get the following constraints (withj = 2, 3)

ωjx + ωjy + ωjz = 0. (4.72)

This is precisely equivalent to the orthogonality condition:

e1 · ej = 0, (4.73)

for j = 2, 3. In other words, the first principal axis is perpendicular tothe second and third principal axes – the latter, are not uniquely deter-mined since we can chosoe any two linearly independent vectors in the2-dimensinal subspace (plane through the origin) of R3 orthogonal to e1.

3. Eigenbasis (Principal axes basis).

Using the principal axes as our basis vectors, the inertia matrix with re-spect to this basis, has the diagonal form:

I =Ma2

12

2 0 00 11 0

0 0 11

. (4.74)

(4.75)

as guaranteed by the spectral decomposition (principal axes) theorem.

For any rotations about the principal axes, the angular momentum L of thecube is necessarily parallel to the angular velocity ω.


More generally, for bodies of uniform density, a geometric axis of symmetrywill serve as a principal axis for the body. The two remaining principal axes willbe in a plane perpendicular to the first principal axis. In the case that the bodyhas rotational symmetry about an axis through its center, then the remaining twoprincipal axes can have any direction perpendicular to the first principal axis.

For rigid bodies with minimal symmetry, it may happen that the principal axesare uniquely determined – i.e there is no freedom to choose the remaining twoprincipal axes once the first is established. For bodies with maximal symmetry– such as sphere, it turns out that any axis is a principal axis.

4.2.3 Parallel and Perpendicular Axis Theorems

In the following extended problem, we will see explicitly how our higher-leveltheory of rotational dynamics returns the traditional results from simple rota-tional dynamics considered in first-year physics and engineering.

Problem 28 (Special Properties) Having converted the Furies, Athena decidesto entertain the redeemed Orestes by asking him to prove the following propertiesof the inertia tensor. Help Orestes by solving these problems.

1. Elegance

Recalling the definitions of the components of the inertia tensor for a rigidbodyR, prove that we can compile these into the single elegant form:

Ijk =

∫R

dm[r · rδjk − rjrk], (4.75)

where r is the displacement vector relative to some origin O, and j, k =1, 2, 3. Our differential mass element dm = ρdV is given in terms of themass density profile ρ and differential volume element dV . Furthermore,the Kronecker delta is defined such that δjk = 1 when j = k and δjk = 0when j 6= k.

Hint: If you choose an (x, y, z) Cartesian coordinate system, one hasr = (x, y, z).

2. Additivity

Just like moments of inertia, the inertia tensor obeys an additive propertyin the following sense. Given two rigid bodies A and B – e.g. a tetrahe-dron (pyramid) stacked on a cube, the inertia tensor for the combined18

rigid body, A⊕B, has the following inertia tensor:

IA⊕B = IA + IB. (4.75)

Using this property, write down the inertia tensor for a cube stacked ontop of an inverted cone rotating about its vertex.

Likewise, we can ‘subtract’ inertia tensors in this way – i.e. if a rigid bodyAB is given by removing a rigid body B from a rigid body A, we have:

IAB = IA − IB. (4.75)

Using this property, write down the inertia tensor for a spherical shell, ofouter radius R2 and inner radius R1, in terms of the inertia tensor for a

18Here the ‘direct sum’ notation is denote the addition of two subsets of R3.


ball (solid sphere) of radius R2 and the inertia tensor for a hollow sphereof radius R1.

3. Triangle Inequalities and Representative Ellipsoid Prove that for rota-tions of an arbitrary rigid body R about an arbitrary pivot point, O, theprincipal moments of inertia of the corresponding inertia tensor (i.e. diag-onal components in the principal axes basis) obey the triangle inequality:

Ixx + Iyy ≤ Izz, Izz + Iyy ≤ Ixx, Izz + Ixx ≤ Iyy. (4.75)

Hint: Work from the definition of the components of the inertia tensor.Also, here we let the (x, y, z) coordinate axes coincide with the principalaxes – hence Ixx = λ1, Iyy = λ2, Izz = λ3.

It follows that the inertia tensor for the rigid body R, is equivalent to theinertia tensor for an ellipsoid (with principal axes lengths 2a, 2b, 2c) withthe following principal moments of inertia:

2

5Ma2 =λ2 + λ3 − Iλ1 ≥ 0

2

5Mb2 =λ3 + λ1 − Iλ2 ≥ 0

2

5Mc2 =λ1 + λ2 − Iλ3 ≥ 0.

4. Generalized Parallel Axis Theorem The ‘parallel axis’ theorem in yourtypical first-year physics or engineering textbook, will usually say some-thing along the lines: “The moment of inertia I ′ for an object of mass Mrotating about axis parallel to an axis through the, center of mass ICM , isgiven by:

I ′ = ICM +Md2, (4.73)

where d is perpendicular distance from the axis throguh the center of massand the parallel axis of rotation. In this simple statement, I ′ and ICM arescalars. We now generalize this to the inertia tensor:

I ′ = ICM +m[(r · r)g − r⊗ r], (4.73)

where g is the Euclidean metric (identity matrix in Cartesian coordinates)and r is the position vector relative to the center of mass (origin), O.

To make sense of the above, we shall go back to the component-form of theinertia tensor that you are familiar with – i.e. matrices. In particular, letICM denote the moment of inertia tensor of a rigid body of mass M aboutits center of mass. Let ICM+∆r denote the inertia tensor about a point,P = rCM + ∆r, displaced from the CM by a vector δr = (∆x,∆y,∆z).Prove that we then have:

ICM+∆rxx = ICMxx +M((∆y)2 +(∆z)2), ICM+∆r

yz = ICMyz −M(∆y)(∆z).(4.73)

4.2.4 Precession and Torque: Equinox, Spinning top

We now consider the motion of a spinning top (of total mass M ) consistingof a solid rod of length R, passing through a uniform circular disk attached.Weonsider the class of rotations about the tip of the spinning top (rotations with a


fixed point). Let the tip coincide with the origin O of some Cartesian coordinatesystem and let R be a displacement vector from O to the center of mass, CM .At the CM , the force of gravity acts vertically downward: Fgrav = Mg withg = −g∂z.

DIAGRAM

If we let the top make an angle θ with respect to the z − axis, then due to thespinning top’s axial symmetry, the axis of the rod is a principal axis. If we let e3

be unit vector in the direction of the rod, then the remaining principal axes aredefined by vectors (e1, e2)perpendicular to e3 – in such a basis, the inertia tensoris diagonal,

I = diag(λ1, λ2, λ3). (4.74)

In the absence of gravity, we analyse the motion of the top spinning about itssymmetry axis (with basepoint fixed at O). The corresponding angular velocityis given by ω = ω3, leading to the following parallel angular momentum:

L = λ3ω = λ3ωe3. (4.75)

In the case of zero gravity (no torque), the angular momentum is constant and sothe axis of rotation remains fixed. In the presence of gravity, there is a gravita-tional torque that is generated:

Γ = R×Mg. (4.76)

This has magnitude RMg sin(θ) and a direction which is perpendicular to boththe vertical z axis and the axis (R) of the top. If we take the (‘reasonable’)approximation that the torque due to gravity is small (by selecting R,M, g smallrelative to other parameters), then we show that the angular momentum direction(rotational axis) is approximately constant – with a precession about the z-axis.To see this, note that:

Γ =d

dtL. (4.77)

The non-zero time-variation of L implies a time-varying angular velocity vector,ω. Due to small torque (and hence small time-variation), since we have ωt=0 =(0, 0, ω3), it follows that ω1 and ω2 remain to be small (which can be madeprecise with more detailed analysis). Therefore, we can assume that L = λ3ω =λ3ωe3 is satisfied throughout the motion – the only time-varying quantity in thisexpression is the principal axis, e3. To see this, note that in this regime, thetorque Γ is orthogonal to L as boldsymbolΓ is perpendicular to e3:

λ3ω 9e3 = R×Mg. (4.78)

Using R = Re3 and g = −g∂z, we have:

d

dte3 = Ω× e3, (4.79)

where:Ω =

MgR

λ3ω∂z. (4.80)

Recall now, the expression for the tangential velocity v of some rotational mo-tion for an object (or center of mass) with angular velocity (rotation axis) Ω anddisplacement vector r is given by:

v = Ω× r. (4.81)


Therefore, the motion of the top is a rotational motion about the rotational axisR = Re3, with a superimposed precession about the z-axis. In particular, thevector e3 rotates about the z-axis with an angular frequency:

Ω =RMg

λ3ω. (4.82)

This makes sense since the torque vector Γ is directed into the page – the direc-tion in which the angular momentum vector changes.

Although the motion discussed here describes a spinning top, the same effect –weak torque and precession, can be applied to the dynamics of the solar system.For example, because of the earth’s bulge at the equator (oblate shape) – the sunand moon exert small torques on the earth. These torques cause the earth’s rota-tional axis (for the 24-hour day-night rotation cycle) to precess. At the moment,the earth’s spin axis is inclined at θ = 23deg from the normal to the earth’s orbitaround the sun. Due to the torques acting on the earth, the axis of spin tracesout a half-cone angle of 23 degrees around the normal to the orbital plane – thisprecession motion is known as the precession of the equinoxes. The period forthis motion is:

T = 26, 000 years. (4.83)

This means that in 13,000 years the north pole star will be 46 degrees away fromtrue North.

Note that we have only reached the tip of our spinning top 19. To understandthe full dynamics of a spinning top, one would have to introduce the concept ofEuler angles 20 In the full treatment, one can prove the validity of the approxima-tions following the ‘small torque’ setup. The general dynamics also includes anadditional motion – ‘nutation’, which is essentially a tilting motion of the top’srotational axis towards the vertical axis. One can illustrate nutation combinedwith precession by inscribing the trajectory of the ‘top end’ of the spinning toponto the surface a sphere (see John Taylor’s Classical Mechanics).

19Iceberg.20Alternatively, the quaternions, H, or the rotation group, SO(3).


4.3 Accelerating Frames: The Tides

In some manner, it is fair to say that modern ‘physics’ began with the impetusprovided by Galileo. The Galilean ‘principle of relativity’ can be summarised as– “ The laws of motion are the same in all inertial frames."21 This is an extremelyimportant statement, from which most of Newtonian mechanics can be derived.In particular, it establishes the fundamental ‘symmetry group’ for the laws ofnature to be that of the ‘Euclidean symmetry group: ISO(3)’ – rotations ofspace coupled with translations of space and time22. This is in contrast to thePoincare symmetry group (ISO(4)) underlying Einstein’s principle of ‘specialrelativity’ – i.e. the speed of light in all frames is a constant and ‘inertial frames’are defined up to 4D Lorentz transformations (instead of 3D rotations).

In this exploration we will study the symmetries underlying classical, non-relativisticphysics. In this manner, we will demonstrate how to ‘extend’ Newton’s laws toaccelerating reference frames. This will allow us to finish with a mathemati-cal and physical explanation of two ‘high’ and ‘low’ tides which are observeddaily on the earth. In particular, we will obtain a somewhat accurate estimateof the average height difference between high tides and low tides for oceanicbodies.

4.3.1 Isometries of Euclidean Space: Galileo

Euclidean isometries – Galilean Relativity

Give an explicit representation of ISO(3)

Rot trans R v 0 1

Prove that Newton’s 3 Laws are invariant under ISO(3)

4.3.2 Linearly Accelerating Frames

Let S0 be an inertial frame and let S be a frame with acceleration A relative S0

– for example, a train with some velocity V and acceleration A = 9A relative tosome station O. If a passenger in the train throws a tennis ball (mass m) withvelocity 9r0 relative to S0, then using Newton’s second law in this inertial framegives:

F = m:r0. (4.84)

where r0 is the ball’s displacement relative to S0 and F is the net force on the ball.Relative to the accelerating frame (train) S, the ball’s velocity can be expandedas its velocity 9r relative to S (the train) and the velocity V of frame S relative toS0:

9r0 = 9r + V. (4.85)

Differentiating, we see that::r = :r0 −A, (4.86)

hence we have:m:r = F−mA. (4.87)

21Galileo stated this in his “ Dialogue Concerning the Two Chief World Systems”.22‘An extra symmetry.

4.3. ACCELERATING FRAMES: THE TIDES 87

If we identify Finertial = −mA to be the inertial force, we see clearly howNewton’s law in non-inertial frames is augmented. For example, during take-offin an aircraft, one feels a force that pushes one back into their seat – likewisewhen a bus breaks and one is standing. These fictitious forces (termed ‘inertialforces’) are simply introduced to extend Newton’s laws to non-inertial referenceframes.

Exercise 30 (Accelerating Pendulum)

4.3.3 The Tides

Using the non-inertial extension of Newton’s second law derived earlier,

m:r = F−mA, (4.88)

we see can obtain a physical explanation of the high and low tides observedon Earth. The tides are the result of bulges in the earth’s oceans caused by thegravitational attraction of the moon and sun. As the earth rotates, objects on theearth’s surface move past these bulges and are subject to a rising and falling ofthe sea-level. In our analysis, we shall first only consider the effect of a singleexternal body on the earth’s oceans – in particular, the moon.

DIAGRAMS

An inaccurate explanation of the tides is one in which the oceans bulge towardsthe moon on just the side of the earth facing the mood. In such a scenario,we would only get one high tide per day. The correct explanation is that themoon’s gravitational attract to the earth, imparts a small accelerating A towardsthe moon – a centripetal acceleration of the earth and moon as they orbit a com-mon center of mass (very close to the earth’s center). This centripetal accelera-tion of any object as it orbits the earth, corresponds to the pull of the moon thatthe object would feel at the earth’s center. Any object on the moon side of theearth is pulled by the moon with a force slightly greater than that would be atthe earth’s center – hence the ocean surface bulges towards the moon. Objectson the far side from the moon are pulled by the moon with a force that is slightlyweaker than that at the center of the earth – this slight repulsion causes the oceanto bulge on the side away from the moon and accounts for the second high tideof each day.

DIAGRAM

The forces on any mass m near the earth’s surface include

• The gravitational pull of the earth: mg

• The gravitational pull of the moon: −GMmmd2

d.

• The net non-gravitational force: Fng. This could be the bouyancy force ona drop of water in the ocean.

The acceleration of the frame S0 (the origin O at the earth’s center) is givenby:

A = −GMm

d20

d0. (4.89)

Thus, using the non-inertial form Newton’s second law, we have:

m:r =F−mA

=(mg − GMmm

d2d + Fng) +

GMm

d20

d0 . (4.90)


From this we, have:m:r = mg + Ftid + Fng (4.91)

where the tidal force is:

Ftid = −GMmm(1

d2d− 1

d20

d0). (4.92)

Finish derivation of equations

Diagrams

Gravitational Potential energies and derivation of tide height difference

Corrections / higher-order effects

4.4. CENTRIFUGAL AND CORRIOLIS FORCES 89

4.4 Centrifugal and Corriolis Forces

In this section23, we shall see how to extend Newtonian mechanics to rotatingreference frames. Some examples – the surface of the earth (which rotates aboutan axis through the earth’s center of mass, a squirrel running on the arm of aspinning ice-skater (Monica Leslie) or a space station rotating about some axis togenerate artificial gravity. As we shall find, classical dynamics in rotating framesgives rise to ‘fictitious forces’ when these dynamics are analysed (using New-ton’s laws) from the perspective of a stationary observer (observer attached toan inertial reference frame). These ‘fictitious forces’ are the well-known ‘Corio-lis’ and ‘centrifugal’ forces – despite being ‘fictitious’, their effects are very real.One can feel them – for example, when moving on a turn-table (merry-go-round)or driving around a sharp corner.

In the larger scheme of things, the Coriolis and centrifugal forces affect the tra-jectories of projectiles in the atmosphere as well as the formation and dynamicsof different weather systems.

4.4.1 Rotational Motion and Angular Velocity

Recalling our previous investigation of moments of inertia and rotational dyan-mics, we characterised the rotational dynamics of a rigid body with a ‘inertiatensor’ and an angular velocity vector ω, defining the axis of rotation and therate of rotation (rotation speed) ‖ω‖= ω.

We know that for the rotational motion of a rigid body with some fixed point Oof rotation, Euler’s rotation theorem tells us that a general rotation of the rigidbody relative to O can be described as a rotation of the body about some axisthrough O. In other words, to describe the rotational dyamics of a rigid body, weneed four pieces of information: (O,ω). Explicitly,

• A fixed point O of the rotational motion. For freely-moving rigid bodies(with no fixed-point), we can instead use the center of mass, CM andanalyse dynamics in the center-of-mass frame24.In this frame, the centerof mass is a ‘fixed-point’.

• A vectorω defining the axis of rotation. Sinceω ∈ R3,ω has 3-independentcomponents in general – hence corresponds to 3 ‘pieces of information’.

Originally, Euler gave geoemtric proofs of his rotation theorem. One such proofshows that any rotation can be constructed from two reflection transformations(linear transformations R with determinant −1 and RT = R−1) – i.e. elementsof the 3-dimensinal orthogonal group, O(3). Equivalently, Euler’s rotation the-orem can be proved via ‘Rodrigue’s’ rotation formula or showing that the setof rotations (linear transformations with RT = R−1 and determinant +1) in 3-dimensions, forms a Lie group, SO(3). We give a simple proof now, making useof basic linear algebra.

Proof 2 (Euler’s Theorem via eigenvectors) Given any rotation R acting onR3, there exists some vector n invariant under R.

23Some of this material is based on Chapter 9 of John R. Taylor’s “Classical Mechanics”.24Recall the tutorials on the ‘two-body problem’ – here we showed that the motion of any

rigid body could be decomposed as the translational motion of its CM and rotations about thisCM .


To see the proof of this statement, note that for a vector n to be invariant underthe rotation R, we must have:

Rn = n, (4.93)

which means that n is an eigenvector of R with eigenvalue λ = 1. Therefore,we must show that such an eigenvector exists.

Since RT = R−1, it is easy to see that det(R) = ±1. In fact, since R is arotation, it is an isometry of Euclidean space – hence det = +1 (R preservesrelative orientations). Furthermore, since R is a linear operator acting on R3, wehave: det(−R) = (−1)3 det(R) = −1 and det(R−1) = det(R) = 1 (transposepreserves the determinant: det(A) = det(AT )).Combining these results, wehave:

det(R− I) = det((R− I)T )

= det(RT − I)

= det(R−1 − I)

= det(−R−1(R− I)) = det(−R−1) det(R− I)

=− det(R− I), (4.94)

hence det(R − I) = 0. This tells us that λ = 1 is an eigenvalue of the rotationR. IfR is a trivial rotation, then R = I and the result follows. IfR is non-trivial,then we have (R − I) 6== 0, hence (R − I) must have a non-trivial kernel –hence ∃n ∈ R3 such that:

(R− I)n = 0. (4.95)

It follows that n is an eigenvector of R with eigenvalue λ = 1, hence a vectorinvariant under R. This means R represents a rotation about the axis n.

To see Euler’s theorem realized explicitly, we now present the Euler-Rodriguerotation formula25. In particular, any rotation of the displacement vector r =x∂x + y∂y + z∂z, will take the following form:

r′ = r + 2a(ω × r) + 2ω × (ω × r), (4.96)

where ω = (b, c, d) and (a, b, c, d) are some set of parameters lying on the 3-sphere (a higher-dimensional sphere of unit radius):

a2 + b2 + c2 + d2 = 1. (4.97)

In particular, to perform a rotation of r counterclockwise through an angle θabout an axis defined by the unit-vector k = (kx, ky, kz), we have:

a = cos(θ

2), ω = sin(

θ

2)k. (4.98)

Exercise 31 (Rodrigue’s Rotation) In the timeline of the cyborg Constantine,Michael Grebla decides to run Rodrigue’s ‘Conerto de Aranjuez’ for a Sundayconcert at St. George’s College. In order to hold this concert, Grebla enliststhe help of the cyborg who must travel back in time to fetch Joaquin Rodrigo.In the midst of exam-period confusion, Constantine instead brings back OlindeRodrigues. As penance for this unnecsesary distortion of the spacetime con-tinuum, the cyborg is asked to derive Rodrigue’s rotation formula from Euler’sconstruction:

r′ = cos(θ)r + sin(θ)(k× r) + (1− cos(θ))(k · r)k. (4.99)25This can be proven with Euler’s ‘Four-Square identity’ and the Euler rotation parameters.


Help restore balance to spacetime continuum by completing this task. The fol-lowing hints will help.

Hint: Recall the vector-triple formula (BAC-CAB rule) relates the cross-productsof three vectors a,b, c to pair-wise dot-products:

a× (b× c) = b(a · c)− c(a · b). (4.100)

Hint: Recall your trigonmetric identities: sin(2z) = 2 sin(z) cos(z) and sin2(z) =12(1− cos(2z)).

Note, Rodrigue’s formula can be derived from scratch by using vector pro-jections (paralllel and orthogonal projections).The more general form gives aclosed-form explicit construction of an arbitrary rotation operator – this makesuse of matrix representations of the Lie algebra so(3) of rotations. To this ex-tent, if we let K = [k]× be a matrix representing the linear operator k×, thenthe rotation matrix representing clockwise rotations through an angle θ about anaxis k (a unit vector) is given by:

Rk(θ) = I3 + sin(θ)K + (1− cos(θ))K2. (4.101)

Turning back to the physical world, we now consider the rotational motion of apoint P relative to some fixed origin O with angular velocity (rotation axis) ω.In particular, we let r = ~OP be the displacement vector of P and let θ be theco-latitude of r – i.e. the angle between the rotation axis (vertical) and r.

DIAGRAM

Physically, we could be describing the motion of an object on the earth’s surface,with ω defining an axis through the north pole of the earth and O the center-of-mass of the earth. If follows that the point P undergoes circular motion about ωwith angular speed of ω and radius ρ = r sin(θ) (with r = ‖r‖). The instanteousvelocity of the particle at any point in its trajectory tangential to its path – inparticular, it is given by:

v = ω × r. (4.102)

However, v = ddt

r – thus it follows that the rate of change a vector r fixed in arotating body, as viewed from a non-rotating frame, is given by:

dt

dtr = ω × r. (4.103)

Hence the time-derivative differential operator ddt

acting on vectors fixed in arotating body, takes the form:

dt

dt= ω× (4.104)

in a non-rotating frame. The operator derivative ddt

is a linear operator. Similarly,the operator on the right-hand side, ω× is also a linear operator (acting on R3)which can be represented by a 3− by − 3 matrix [ω]×:

ω× → [ω]× = ωjEj =

0 −ω1 ω2

ω1 0 −ω3

−ω2 ω3 0

(4.105)

(4.106)

,an element of the lie algebra so(3) of rotations. Here ωj are components of thevectorω in some basis ej (j = 1, 2, 3), withEj being the generators for rotationsabout each basis vector, respectively.


Exercise 32 (Additive Property of Angular Velocity) Translational velocitiesare additive in the following way. Given a reference frame 1 and reference frame2, moving relative to 1 with velocity v21, it follows that a third object with veloc-ity v32 relative to frame 2, has the following velocity relative to frame 1:

v31 = v32 + v21. (4.107)

Consider now, two rotating frames 1 and 2, with frame 2 having an angularvelocity ω21 relative to frame 1. Now consider a third rigid object, rotatingwith an angular velocity ω32 relative to frame 2, about some origin O. Let r bethe displacement vector from O to some point fixed on the third object. Usingthe definition of angular velocity: v = ω × r as well as the additive propertyof translation velocities, prove that angular velocities also possess the additiveproperty. That is, show that:

ω31 = ω32 + ω21. (4.108)

Hint: It may help to observe that the vector r is arbitrary.

4.4.2 Differential Operators in Rotating Frames: Newton’sSecond Law

For the remainder of this chapter, we take the convention that capitalized vectorsdenote properties of reference frames. For example, we reserve V and A for thevelocity and acceleration of some (non-inertial) reference frame S with respectto some inertial frame S0. Likewise, we use Ω to denote the angular velocity ofa rotating frame S with respect to inertial frame S0.

We now consider a frame attached to our rotating Earth. The Earth rotates onceabout its axis every 24 hours26. Hence, for a reference frame fixed to the earth,we have a rotation rate of:

Ω =2π

24× 3600

rad

s≈ 7.3× 10−5 rad/s. (4.109)

We now let O be the origin of some inertial frame S0, with coordinate axesx0, y0, z0. Furthermore, we consider some rotating frame S (with coordinateaxes x, y, z) attached to S0, rotating with an angular velocity Ω relative to S0.Let the e1, e2, e3 be an orthonormal basis for the rotating frame S – i.e. unitvectors pointing along the coordinate axes x, y, z of the rotating frame. Timederivatives in the frame S0 will take a different form to time derivatives in theframe S, as we shall now demonstrate.

DIAGRAM

Consider an arbitrary vector, r. We shall denote its rate of change relative to theinertial frame S0 by:

(dr

dt)S0 (4.110)

and let(dr

dt)S (4.111)

26As Taylor notes, the rotation of the earth is one sidereal day – a rotation with respect tothe ‘fixed’(distant) stars. This is shorter than the solar day by a factor of 365/366, which isnegligible for the calculations we are goign to perform.


denote its rate of change relative to the rotating frame, S. In the basis of therotating frame, we can expand r as follows:

r = r1e1 + r2e2 + r3e3 = rjej, (4.112)

making use of the Einstein summation convention27in the second equality. Sincethe vectors ej are fixed in the rotating frame (attached to the coordinate axesx, y, z), the rate of change of r in this frame is given by:

(dr

dt)S = (

drj

dt)ej. (4.113)

Given that the scalar functions rj (components of r) are invariant under isome-tries (the same in either frame), we don’t need to worry about specifying whichframe these scalar derivatives are taken in. In fact, we should probably denotesuch ‘scalar derivatives’ by ∂t or ∂

∂t.

On the flip side, the vectors ej are co-rotating with the frame S – hence relativeto the inertial frame S0, they are not fixed. Thus, in the frame S0, the rate ofchange of r is given by:

(dr

dt)S0 = (

drj

dt)ej + rj(

dejdt

)S0 . (4.114)

Since S is rotating with angular velocity Ω relative to S0, it follows that thevelocities of the (co-rotating) basis vectors ej relative to S0 are given by ourtangential velocity formula:

(dejdt

)S0 = Ω× ej. (4.115)

Therefore, it follows that the rate of change of the vector r in the inertial frameS0, is given by:

(dr

dt)S0 = (

drj

dt)ej + rjΩ× ej. (4.116)

Now note that the first time appearing on the right-hand-side of this equation issimply the rate of change of r in the frame S, hence we have:

(dr

dt)S0 = (

dr

dt)S + rjΩ× ej. (4.117)

We can summarize our results in an elegant fashion, by taking the ‘operator’viewpoint. In particular, ‘vector time derivative’ operators in the rotating frameS , take the form

(d

dt)S =

∂

∂t. (4.118)

On the other hand, vector time derivative operators take the form

(d

dt)S0 =

∂

∂t+ Ω× = (

d

dt)S + Ω× (4.119)

in the inertial frame, S0. The appearance of the additional linear operator, Ω×,in the inertial frame, is extremely important. In particular, it is this term thatgenerates the apparent (fictitous) centrifugal and Corilios forces!

Mathematically speaking, we could probably formalize our results by consider-ing dynamical systems in the perspective of differential geoemtry – the ‘vector

27Repeated indices are summed over the dimension of the vector space. Here we are workingin 3-dimensions, hence j = 1, 2, 3. Thus rjej := r1e1 + r2e2 + r3.


derivatives’ would then probably correspond to some vector differential operator(e.g. a Lie derivative or covariant derivative). The effect of the rotating frame isthen to add the action of the rotation Lie algebra, so(3), on our vector space –explicitly through the Ω× term (recall that this operator can be represented by amatrix in so(3)).

Using our results, we can now investigate how Newton’s second law can be‘extended’ to non-inertial reference frames. For the rest of this section, we shalluse ‘dot’ notation to indicate time-derivatives with respect to the rotating frame,S . In other words:

9r := (dr

dt)S , (4.120)

for any vector r.

Problem 29 (The Newtonian Differential Operator) Recall that Newton’s Sec-ond Law is a definition of ‘force’. In particular, that an object with a trajectoryr(t) (path traced out by the displacement vector over time) and mass m, experi-ences a force F according to the relation:

m(d2r

dt2) = F. (4.121)

This relation holds provided that the time derivatives are taken in an inertial-frame. In particular, we should write:

m(d2r

dt2)S0 = F. (4.122)

Using our earlier relations, we wish to express the Newtonian differential oper-ator:

m(d2

dt2)S0 = m(

d

dt)S0(

d

dt)S0 , (4.123)

in terms of the time-derivative operators in the rotating (non-inertial) frame S.Note (!), for the physics we wish to consider – dynamics of objects co-rotatingwith the Earth, we make take the angular velocity Ω of the frame S to be con-stant. Hence,

(dΩ

dt)S = 0. (4.124)

From this information, show that:

m(d2

dt2)S0 = (

d2

dt2)S + 2mΩ× (

d

dt)S +mΩ× (Ω×). (4.125)

In other words, show that:

m(d2r

dt2)S0 =(

d2r

dt2)S + 2mΩ× (

dr

dt)S +mΩ× (Ω× r)

=:r + 2Ω× 9r + Ω× (Ω× r). (4.126)

Hint: First expand the operator m( ddt

)S0 using our previous results, then applythe operator ( d

dt)S0 to your expansion.

Hint: The cross-product is not an associative operator – this is easy to see sincethe so(3) matrices representing Ω× are not associatve under matrix multiplica-tion! Therefore, the brackets in the triple cross-product term are important.


In the above problem, we see that the second-order vector time derivative opera-tor in an inertial frame is clearly different from the second-order time derivativeoperator in the rotating frame (partial derivative, scalar derivative). In particular,we pick up two ‘additional’ terms. Since the second-order vector time derivativeoperator is used to construct Newton’s second law, it is clear that when usingNewton’s second law to describe the motion of an object in a rotating frame –i.e. the trajectory traced out by the vector r(t), that we have to add ‘extra’ forceterms.

Reviewing our main result

F := m(d2r

dt2)S0 = m(

d2r

dt2)S + 2mΩ× 9r +mΩ× (Ω× r), (4.127)

we see that Newtonian ‘force’ in the rotating frame, is given by:

m(d2r

dt2)S = F + 2m 9r ×Ω +m(Ω× r)×Ω. (4.128)

Exercise 33 Verify the previous statement by re-arranging our force equationand using the antisymmetric property of the vector cross product operator.

The additional terms appearing on the right-hand side of Newton’s second lawin our rotating frame are the centrifugal force

Fctf = m(Ω× r)×Ω, (4.129)

and the Coriolis forceFcor = 2m 9r×Ω. (4.130)

Note that these are ‘apparent’ forces rather than ‘real forces’, in the sense thatthey have no physical mechanism to generate them. Regardless, we experiencethese additional forces because when we are in a non-inertial frame. Gravity, onthe other hand, is a physical force and does not depend on choice of referenceframes (it is the curvature of spacetime around massive bodies). In summary,Newton’s second law in a rotating frame, can be written as

m:r = Fexternal + Fctf + Fcor. (4.131)

We now proceed to investigate the fictitious forces seperately.

4.4.3 Centrifugal Force

The Coriolis force on an object is proportional to the object’s velocity, 9r, relativeto the rotating frame. Therefore, it is negligible for objects that are movingsufficiently slowly – or motions occuring over short time-scales. As an order ofmagnitude comparison of the centrifugal and Coriolis forces, we have:

Fcor ∼ mvΩ, Fctf ∼ mrΩ2, (4.132)

where v is the (translational) speed of the object relative to the rotating frameof the Earth – i.e. the speed measured by an observer on the Earth’s surface.Therefore, we have:

FcorFctf

∼ v

RΩ∼ vV, (4.133)


where R is the radius of the earth and V is the tangential speed for a point onthe Earth’s equator. For objects near the Earth’s surface, it is valid to make theapproximation r ∼ R, since R >> r. Using the rotation rate of the Earth,Ω ∼ 7.3 × 10−5rad/s along with the Earth’s radius: R ∼ 6400km, it followsthat V ∼ 1674.3km/h or equivalently – V ∼ 465m/s. Therefore, for objectstravelling with velocity v << 1674km/h it may be reasonable to ignore theCoriolis force experienced by the object.

DIAGRAM

Free-Fall Acceleation and the true direction of g.

4.4.4 Coriolis Force

Coriolis Force

Turntables – thought experiment

Weather systems – circulation direction.

Projectile deflection. Snipers in Northern and Southern Hemisphere.

Free-fall and the Coriolis effect.

Object dropped down a mine-shaft to Earth’s center.

4.5 Focault’s Pendulum

Focault’s Pendulum and measurement of rotation rate of Earth.

Chapter 5

Nature’s Ways: The Calculus ofVariations

A section of notes for topics that individuals have requested. Disclaimer: this iswritten from memory and pen-paper calculations.

5.1 Lagrangian Mechanics

5.1.1 Background

After a while, one begins to realise that using Newton’s laws to solve problemsin classical mechanics can get very tedious and annoying. Thankfully, apartfrom making good cheese, wine and conquering most of Europe, the Frenchwere (and still are) also very good at producing world-class mathematicians.One such mathematician was Joseph Lagrange, who amongst a trillion otheraccomplishments, came up with a revolutionary reformulation of classical me-chanics in conjunction with several other mathematicians 1 and physicists. Thisapproach is now known as ‘Lagrangian mechanics’ and is an extremely power-ful and vast generalisation of Newtonian mechanics. Today, almost the entiretyof modern physics is based on the principles set down by Lagrange and Hamil-ton. It also has vast applications to optimization problems and many areas ofengineering.

5.1.2 The Principle of Stationary Action

The fundamental concept behind Lagrangian mechanics is the ‘principle of sta-tionary action’. It is more commonly referred to as the principle of ‘least action’,which is technically incorrect 2. It basically says that nature is lazy, and will al-ways (classically) take the path of stationary action – which means it makes thefollowing functional stationary:

S =

∫Ldt (5.1)

1Most notably, the Irish mathematician Sir William Rowan Hamilton.2Recall that when you are trying to find the critical points of a function, you first find its

derivative and then set it to zero. This doesn’t just give you points at which the function isminimized – you also get inflection points and maxima.

97

98 CHAPTER 5. NATURE’S WAYS: THE CALCULUS OF VARIATIONS

Here the quantity S, called the ‘action’, is a functional – an object which actson functions. The function L is called the ‘Lagrangian’ of your theory – it con-tains all necessary information about your physical system. Different theoriesand different systems will have different lagrangians. Finally, the integral

∫used here is the indefinite-integral with respect to time t, which parametrises thesystem.

For systems in classical mechanics, the Lagrangian sometimes (but not always!)takes the following form:

L = T − U (5.2)

where T is the total kinetic energy of the system and U is its potential energy.If the system is conservative (i.e. no losses due to friction etc) and the potentialenergy U is time-independent, then Lagrangian will take this special form. Notethe minus sign in T −U is important – if this was plus sign, then the Lagrangianwould be the total energy (or Hamilton in this restricted set of cases).

If the system is non-conservative, then one usually has to add extra terms theaction to account for losses / dissipation (or net gain) of energy.

If the system is constrained – e.g. a bead confined to roll on some surface, thenone needs to either use the method of Lagrange multipliers or to express thesystem in-terms of unconstrained variables.

5.1.3 The Euler-Lagrange Equations of Motion

The Euler-Lagrange equations of Motion are the equations you have to solveto determine the dynamical time evolution of your system in the Lagrangianformalism. In some subset of cases, these are simply equivalent to the equationsof motion you get using Newton’s Second Law: F = ma. Here I will specifya simple system, then show how to derive the Euler-Lagrange equations for thissystem using the principle of stationary action. Later, I will specify a moregeneral system then re-derive the Euler-Lagrange equations. Finally, I will givean example of the power of the Lagrangian formalism – in particular, a proof ofthe fact that a straight line is the shortest distance between two points in ordinaryEuclidean geometry.

In the Lagrange formalism, a system is specified by a set of generalized coordi-nates: q1(t), ..., qn(t) (parametrised by time t) and a set of generalized velocitieswhich are the derivatives of the coordinates with respect to time t: 9q1, ..., 9qn.In non-relativistic mechanics, we view the time t as the independent variableand the coordinates qi and velocities 9qi as dependent variables, parametrised byt. The configuration space is then taken to be the set of all possible values:(q1, ..., qn, 9q1, ..., 9qn) of the generalized coordinates and the corresponding ve-locities. Note that generalized coordinates represent points in some space M,and the generalized velocities are (tangent) vectors attached to these points (re-call velocity is a vector quantity). Hence the configuration space of a physicalsystem naturally takes the form of a ‘tangent bundle’ 3, denoted TM.

Abstraction aside, we now consider the Lagrangian for a simple system (e.g. apoint-particle moving with constant acceleration) with a generalized coordinate

3A collection of points and the tangent spaces attached to those points. If the coordinatespaceM is n-dimensional, then the tangent bundle TM is 2n-dimensional.

5.1. LAGRANGIAN MECHANICS 99

q and a generalized velocity 9q = dqdt

. The Lagrangian L = L(q, 9q) for this systemis a function of q and 9q, defined on the configuration space 4 TM.

The action S[L] corresponding to this Lagrangian L, is given by:

S[L] =

∫L(q, 9q) =

∫L(q, 9q)dt. (5.3)

We can compute the variation of this action δS[L] by using integration by partsand computing the variation of the Lagrangian: δL. Note that to computethe variation of the Lagrangian, δL, we simply use the same rules as we dowhen computing a total differential (or ‘exterior derivative’). In particular, wehave

δL(q, 9q) =∂L∂qδq +

∂L∂ 9qδ 9q (5.4)

Note that we have assumed that the Lagrangian L does not explicitly depend ontime t. It only depends on t implicitly through q(t) and 9q(t). If it did explicitlydepend on t, e.g. for a system with a time-varying potential energy U(t), thenwe would just include an extra term: ∂L

∂tin the variation of L.

Therefore, we have:

δS[L] =δ

∫Ldt

=

∫δLdt

=

∫(∂L∂qδq +

∂L∂ 9qδ 9q)dt (5.5)

Note that the variation ‘operator’ δ commutes with derivative operators. Hencefor example, d

dtδq = δ d

dtq = δ 9q. Our goal is to compute the ‘functional deriva-

tive’ of the functional S with respect to the generalized coordinate q. The func-tional derivative allows us to differentiate functionals with respect to functions– apart from a few technicalities, it behaves much the ordinary derivative. Thismeans we want the quantity δS

δq, so we need the term δq to right of both terms in

the integrand of (5.5). However, the second term contains δ 9q := δ ddtq. In order

to ‘move’ the total derivative ddt

away from the q, we use the integration by partstechnique 5: ∫

d

dt(∂L∂ 9qδq)dt =

∫d(∂L∂ 9qδq) =⇒∫

ddt

(∂L∂ 9q

)δqdt+

∫(∂L∂ 9qδd

dtq)dt =[

∂L∂ 9qδq]|t=tft=ti (5.6)

where ti and tf denote the range of integration over time – we almost always useti = −∞ and tf = +∞ for a classical action. Now, we make the (physically-motived) assumption that the quantity the quantity on the right-hand side van-ishes: [∂L

∂ 9qδq]|t=tft=ti = 0. This is almost-always true for most physical Lagrangians

4In general, L could also be a function of higher derivatives of q, for example – L =L(q, 9q, :q, ..), however for most practical cases we just consider L = L(q, 9q).

5Or rather, the fundamental theorem of calculus (for 1-dimensional problems) / a special caseof the generalized Stokes theorem for higher dimensions.


L 6. Therefore, taking this assumption, we get:∫ ddt

(∂L∂ 9q

)δqdt+

∫(∂L∂ 9qδd

dtq)dt =0 =⇒∫

(∂L∂ 9qδd

dtq)dt =−

∫ ddt

(∂L∂ 9q

)δqdt. (5.7)

This allows us to write the variation (5.5) of the action as:

δS[L] =

∫dt(

∂L∂qδq)−

∫dt(

d

dt(∂L∂ 9q

)δq

=

∫dt∂L

∂q− d

dt(∂L∂ 9q

)δq. (5.8)

Note that here we’ve made a common (mathematically-motivated 7) change ofnotation:

∫(Stuff)dt =:

∫dt(Stuff). Finally, we bring the δq in the integrand

(5.8) to the left-hand side and formally define the functional derivation of theaction S to be:

δS[L]

δq=∂L∂q− d

dt(∂L∂ 9q

). (5.9)

In this language, the principle of stationary action states that the variation mustvanish: δS = 0, which is equivalent to saying the functional derivative is zero:δS[L]δq

= 0. Therefore∂L∂q− d

dt(∂L∂ 9q

) = 0, (5.10)

which are precisely the Euler-Lagrange equations of motion for this dynamicalsystem! Thus we have explicitly demonstrated that the Euler-Lagrange equationsare a direct consequence of the principle of least action – furthermore, we listedthe assumptions made throughout the derivation. In particular, we assumed zeroboundary contributions to the action and that the LagrangianLwas not explicitlydependent on time (so ∂L

∂t= 0) and that it only depended on the generalized co-

ordinates and velocities: L = L(q, 9q). If we relaxed some of these assumptions,we could extra terms in the Euler-Lagrange equations.

Note, there is another way to view this derivation using Taylor expansions. Thismethod is a bit more suggestive and intuitive in regards to why we call thesetechniques ‘variational principles’ or ‘variational calculus’. The premise is thatwe perturb the action by perturbing the function it acts on: S[L+ δL] ≈ S[L] +δS, then define the variation as the difference between the perturbed action andthe original action: δS = S[L+ δL]− S[L].

Functions L which satisfy the stationary action condition: δS[L] = 0, are calledLagrangians. They are inflection points of the action functional. In some casesthey correspond to minima or maxima of the action. For this reason, they arefundamental to variational calculus. For example, if the action represented thelength of a curve or the surface area of a soap bubble, we could use variationalcalculus to find a curve with minimal length or the shape of a soap bubble surfacewith minimal area under some given constraints.

6One rare case where one gets so-called ‘boundary contributions’ to the action integral, is ingeneral relativity – in particular, the Gibbons-Hawking-York boundary term, which accounts forthe case when spacetime is a manifold with a boundary.

7In this manner, we can think of the integral sign∫

and the variables we integrate with respectto (dt) as an abstract operator or ‘functional’ called a ‘measure’. Thus

∫dt is an operator which

acts on functions to give some number – which is the value of the function it integrates.


Example 8 As an example, take the motion of a point-particle with mass mand position coordinate x, moving in one-dimension. We view x as a functionof time t: x = x(t). Then x is our generalized coordinate with correspondinggeneralized velocity 9x. If the particle’s is moving due to some conservativeforce acting on it, then it has some associated potential energy U . Assuming Uis independent of time t, we then have U = U(x) in general (e.g. the particlecould be moving vertically and experiencing a gravitational force with potentialU = U(x)). The Lagrangian is then given by:

L = Kinetic Energy− Potential Energy =1

2m 9x2 − U(x). (5.11)

The Euler-Lagrange equations then tell us that:

∂L∂x− d

dt(∂L∂ 9x

) = 0, (5.12)

hence we see that

− ∂U(x)

∂x− d

dt(m 9x) = 0. (5.13)

Since U is only a function of one variable, we write the partial derivative as atotal derivative instead, hence:

− dU(x)

dx= m:x (5.14)

since the mass m is constant. Recalling that a conservative force ~F can be de-fined as the gradient of some potential: F = −∇U , we then identify −dU(x)

dxas

the component Fx of the force acting on this particle in the x-direction. Hencewe have:

Fx = m:x (5.15)

which is precisely Newton’s second law. Note that this is based on the assump-tion that the Lagrangian L was only dependent on x and 9x. In general, one mayhave a time-varying acceleration (e.g. a radiating charge or stealth fighter jet)– in such a case, we would modify the Euler Lagrange equations and thereforemodify our statement of Newton’s second law.

5.1.4 N-Dimensional Euler-Lagrange Equations

To see how this formalism generalizes to higher-dimensional systems, we pro-ceed as follows. Let qi denote the i−th generalized coordinate for a system withn generalized coordinates, q1, ..., qn. The n corresponding generalized velocitiesare then given by 9qi, where i = 1, ..., n. Collecting the variables q1, ..., qn and9q1, ..., 9qn into vectors ~q and 9~q, respectively, we can view the Lagrangian as afunction of 2n variables, parametrised by time t:

L = L(~q, 9~q; t). (5.16)

The action functional generated by this Lagrangian is given by:

S[L] =

∫Ldt. (5.17)


To vary the action, we Taylor expand L(q1, ..., qn, 9q1, ..., 9qn) to first order in allits variables. In particular, we have:

S[L+ δL] =

∫L(~q + δ~q, 9~q + δ 9~q)dt

=

∫[L( 9~q, ~q) +

∂L∂q1

δq1 + ...+∂L∂qn

δqn +∂L∂ 9q1

δ 9q1 + ...+∂L∂ 9qn

δ 9qn]dt

=

∫L( 9~q, ~q)dt+

∫[∂L∂q1

δq1 + ...+∂L∂qn

δqn +∂L∂ 9q1

δ 9q1 + ...+∂L∂ 9qn

δ 9qn]dt

=S[L( 9~q, ~q)] +

∫[∂L∂q1

δq1 + ...+∂L∂qn

δqn +∂L∂ 9q1

δ 9q1 + ...+∂L∂ 9qn

δ 9qn]dt,

(5.18)

hence

δS :=S[L+ δL]− S[L]

=

∫[∂L∂q1

δq1 + ...+∂L∂qn

δqn +∂L∂ 9q1

δ 9q1 + ...+∂L∂ 9qn

δ 9qn]dt

=

∫[∂L∂q1

δq1 + ...+∂L∂qn

δqn − d

dt(∂L∂ 9q1

)δq1 + ...− d

dt(∂L∂ 9qn

)δqn]dt

=

∫[ ∂L∂q1− d

dt(∂L∂ 9q1

)]δq1 + ...+ [∂L∂qn− d

dt(∂L∂ 9qn

)]δqn]dt (5.19)

where we have used integration by parts to move the total derivative ddt

fromthe perturbations, ∂ 9qi, to the corresponding coefficients, ∂L

∂ 9qi. Again, one makes

the assumption of vanishing boundary contributions:∫d(( ∂L

∂ 9qi)δqi) = [ ∂L

∂ 9qi]|∞−∞=

0.

The principle of stationary action says that a physical system classically evolvessuch that the action is stationary: δS

δ~q= 0. For this to happen, the coefficients

of the variations δqi of the coordinates, must vanish in the integral (5.19). Thismeans that we obtain a system of n differential equations, which are the n −dimensional Euler-Lagrange equations:

∂L∂q1− d

dt(∂L∂ 9q1

) =0

∂L∂q2− d

dt(∂L∂ 9q2

) =0

...∂L∂qn− d

dt(∂L∂ 9qn

) =0. (5.20)

In this manner, one can now derive Newton’s Second Law in n dimensions bygeneralizing the 1-dimensional case outlined earlier. In particular, this is doneby considering a potential U = U(x1, ..., xn) which depends on the n positioncoordinates x1, .., xn. The velocities are given by dxi

dt. Putting these into vector

quantities, the kinetic energy of a point particle of mass m with velocity 9~x isgiven by:

K =1

2m‖ 9~x‖2. (5.21)

Since the potential energy U is time-independent, we can write the Lagrangianfor this system as:

L = K − U =1

2m‖ 9~x‖2−U(~x). (5.22)


The Euler-Lagrange equations can be found using the system (5.20) earlier. Inparticular, since we have

∂

∂ 9qi‖ 9~x‖2=

∂

∂ 9qi[( 9q1)2 + ...+ ( 9qn)2]

=2 9qi, (5.23)

the Euler-Lagrange equation for the i − th coordinate of the point particle, isgiven by:

md

dt9qi +

∂U

∂qi= 0. (5.24)

Re-arranging, this is simply the i − th component of the n-dimensional versionof Newton’s Second Law of motion:

m:qi = −∂U∂qi

. (5.25)

Collecting the n equations into one vector equation, this is made explicit:

~F := m:~q = −∇U, (5.26)

where∇U is the gradient (vector) of the potential energy function U . This state-ment is in fact, quite general – that is, a conservative force ~F arising from apotential U , is necessarily given by: ~F = −∇U . So for example, given a grav-itational potential U = −GM

r, we see that the (conservative) gravitational force

is given by:~F = −∇(

GM

r) = −GM

r2~r, (5.27)

where G is Newton’s gravitational constant and ~r is a unit-vector pointing in theradial direction away from a massive object of mass M . The minus sign thenaccounts for the fact that the gravitational force is directed towards the massiveobject.

5.1.5 Examples

Example 9 (Simple Pendulum) Consider a vertical pendulum of mass m andlength l. We set up a coordinate system with horizontal (pointing right) coordi-nate x and vertical (downward) coordinate y, where θ is the angle between thevertical y-axis and the arm of the pendulum. We set the origin to be at the begin-ning of the pendulum arm, from which the mass hangs at the opposite end. Sincethis system is undergoing rotational motion (the mass at the end of the pendu-lum is moving in a circular arc of radius l) with a fixed radius l (the length of thependulum arm), the mass at the end of the pendulum has a tangential velocityof: v = rω = r 9θ. Therefore, the total kinetic energy is given by:

K =1

2m‖~v‖2=

1

2ml2 9θ2. (5.28)

The potential energy is given by: U = Gravitational Force×Distance, which isthe projection of mgl in the vertical direction:

U = mgy = mgl cos(θ). (5.29)

The Lagrangian is therefore given by

L(θ, 9θ) = K − U =1

2ml2 9θ2 −mgl cos(θ), (5.30)


where θ and 9θ are the generalized coordinate and corresponding generalized ve-locity, respectively. The Euler-Lagrange equation is given by

∂L∂θ− d

dt

∂L∂ 9θ

= 0, (5.31)

which simplifies to:θ +

g

lsin(θ) = 0. (5.32)

This differential equation can be solved analytically for θ using hypergeometricfunctions. Alternatively, one can make the small angle approximation to lin-earise this non-linear differential equation: sin(θ) ≈ θ, for small displacementsθ 1 (radians).

Note that using the Lagrangian approach, one only needs to compute the poten-tial energy and kinetic energy for the pendulum system. This is a rather trivialtask (as shown) which avoids the messiness of having to consider forces and‘tension’, which is required by the Newtonian approach.

Another advantage of the Lagrangian formalism, is that one may easily changecoordinates without having to worry about introducing ‘fictitious forces’ (e.g.centrifugal, Coriolis) – the principle of ‘generalised coordinates’ essentially bidsone to express the Lagrangian in terms of the most ‘natural’ coordinate systemfor the problem at hand. Here made use of the rotational nature of the problem toswitch from the Cartesian x, y coordinates to the polar coordinates r, θ (althoughwe didn’t use r, since we the radial coordinate was fixed at r = l).

Example 10 (Harmonic Oscillator) Consider a 3-dimensional harmonic oscil-lator. Such a system may be envisioned as a mass attached to a spring, whoseother end is fixed at some origin. If we let a 3-dimensional Cartesian coordinatesystem – x, y, z – coincide with initial (non-stretched) position of the mass, thenstretching the string in any direction will induce a radial oscillatory motion. Letk denote the spring constant and m denote the mass at the end of the spring. Theforce on the mass is given by Hooke’s law:

~F = −k~r (5.33)

where r is the (radial) position vector: r = x~e1 + y~e2 + z~e3 ∼ (x, y, z). Thepotential energy of the spring is equal to the work done required to stretch thespring from its rest

U =

r∫0

~F · d~l =

∫(−kr)dr = −1

2kr2. (5.34)

The kinetic energy of the mass is given by

K =1

2m‖~v‖2=

1

2m 9r2, (5.35)

where 9r2 = 9x2 + 9y2 + 9z2. We could use Cartesian coordinates, however radialcoordinates are the ‘natural choice’ for this problem (since it is effectively a 1-dimensional problem – the motion only occurs in the radial direction, which isone-dimensional). Therefore we choose r and 9r = d

dtr to be our generalized

coordinate and generalized velocity, respectively. The Euler-Lagrange equationis given by

∂L∂r− d

dt

∂L∂ 9r

= 0, (5.36)


which reduces to

:r +k

mr = 0. (5.37)

This second-order linear differential equation is solved by the usual means. Inparticular, the characteristic equation is given by:

λ2 +k

m= 0, (5.38)

whence the eigenvalues are λ = ±ib

km

. Let ω :=b

km

denote the fundamentalfrequency. Then the general solution is giveb by:

r(t) = c1eiωt + c2e

iωt, (5.39)

where c1 and c2 are constants determined by the initial conditions. This canalternatively be expressed in real form,

r(t) = a1 cos(ωt) + a2 sin(ωt) (5.40)

where are a1 and a2 are constants determined by the initial conditions. In partic-ular, r(0) = a1 and 9r(0) = a2ω. Hence a1 is the initial displacement and a2 isthe initial velocity divided by the fundamental frequency.

Note if you’ve forgotten how to get from complex form to real form, recall that

cos(x) =eix + e−ix

2, sin(x) =

eix − eix

2i(5.41)

where i2 := −1. Comparing coefficients we see that the constants are explicitlyrelated by:

c1 =a1

2+a2

2i, c2 =

a1

2− a2

2i. (5.42)

5.1.6 Multiple Independent Parameters

For the purpose of the (modern and topical) branch of mathematical physicsknown as ‘minimal surface’ theory, along with relativity and quantum field the-ory, it is important to extend the Lagrangian formalism to include physical sys-tems – or more specifically, generalized coordinates, which depend on morethan one independent parameter. Until now, we have considered systems whichwere parametrised by one independent variable – time t. We now consider sys-tems which are parametrised by k independent variables, which we shall denotet1, ..., tn for familiarity.

For simplicity, we shall just consider systems with one generalized coordinate(parametrised by multiple variables) for now. The extension to an arbitrary num-ber of generalised coordinates is done in the obvious way, analogous to our pre-vious extension when we had just one independent parameter t.

Let t1, ..., tk denote our k independent parameters and let q := q(t1, ..., tk) denoteour generalized coordinate, dependent on these parameters. The correspond-ing generalized velocities (with respect to each parameter) are then give by:∂q∂t1

,..., ∂q∂tk

. Given some function L := L(q, ∂q∂t1, ..., ∂q

∂tk; t1, ..., tk) explicitly de-

pendent on the generalized coordinate q, generalized velocities ∂q∂ti

and implicitlydependent on the independent parameters t1, ..., tk, we now wish to formulate a


variational problem. In particular, we consider the following action functional(a k-dimensional integral performed over t1, ..., tk):

S[L] =

∫Ldt1dt2...dtk (5.43)

and ask the question – which functions L make this action stationary? To solvethe variational problem, we proceed as before to vary the action by Taylor expan-sion ofL in all its variables. In order to do this, some new notation will be handy.Let vqi denote the i − th generalized velocity corresponding to the generalizedcoordinate q – particular, we have: vq1 := ∂q

∂t1, ..., vqk := ∂q

∂tk. The variation of the

Lagrangian is then given using the same rules as the total differential:

δL =∂L∂qδq +

∂L∂vq1

δvq1 + ...+∂L∂vqk

δvqk. (5.44)

Therefore, the variation of the action is given by:

δS =

∫δLdt1...dtk

=

∫[∂L∂qδq +

∂L∂vq1

δvq1 + ...+∂L∂vqk

δvqk]dt1...dtk

=

∫[∂L∂qδq − ∂

∂t1(∂L∂vq1

)δq − ...− ∂

∂tk(∂L∂vqk

)δq]dt1...dtk

=

∫[∂L∂q− ∂

∂t1(∂L∂vq1

)− ...− ∂

∂tk(∂L∂vqk

)]δqdt1...dtk. (5.45)

where we have used integration by parts (or Stoke’s Theorem) for multiple vari-ables, to swap the derivatives ∂

∂tifrom the velocity variations δ ∂q

∂tito the corre-

sponding coefficients ∂q∂ti

– which introduces the minus signs. Therefore, we havethe functional derivative of the action with respect to the generalized coordinate,given by:

δSδq

=∂L∂q− ∂

∂t1(∂L∂vq1

)− ...− ∂

∂tk(∂L∂vqk

). (5.46)

The principal of stationary action tells us that nature classically selects this func-tional derivative to be zero, which gives us the Euler-Lagrange equations for asystem with one generalized coordinate q, parametrised by k independent vari-ables t1, ..., tk:

0 =δSδq|Nature=

∂L∂q− ∂

∂t1(∂L∂vq1

)− ...− ∂

∂tk(∂L∂vqk

). (5.47)

5.1.7 More Examples

We can use variational calculus to derive the (rather famous) minimal surfaceequation. In particular, we consider the following example.

Example 11 (Minimal Surface Equation) We consider all two-dimensional sur-faces parametrised by two independent variables, z := z(x, y), then ask thequestion – which surface of this general form has the minimal surface area? Toanswer this question, we can use the Euler-Lagrange equation (5.47) derivedearlier. Say that the surface z := z(x, y) parametrised by the two independentvariables t1 = x and t2 = y, has a domain D. Then (recall) its surface area isgiven by the double-integral:

A =

∫ d1 + (

∂z

∂x)2 + (

∂z

∂y)2dxdy. (5.48)


We can view this as a variational problem by observing that: z is generalisedcoordinate parametrised by two independent variables x and y. The correspond-ing generalised velocities are given by (various notations) vz1 = zx := ∂z

∂xand

vz2 = zy := ∂z∂y

– we shall stick with the latter notation. Now, the total surfacearea A can be viewed as an action functional: A = A[L], whilst our integrand(infinitesimal / area differential) can be viewed as the corresponding Lagrangian:L(z, zx, zy) =

b1 + ( ∂z

∂x)2 + (∂z

∂y)2 =

a1 + z2

x + z2y .

Since we seek to minimize A, we need to first find surfaces (parametric func-tions) z(x, y) which make the functionalA stationary. We then need to check thatthese stationary ‘points’ (functions) correspond to minima, rather than inflectionpoints or maxima. The first task can be achieved by solving the Euler-Lagrangeequations (5.47), which take the form:

∂L∂z− d

dx

∂L∂zx− d

dy

L∂zy

=0 =⇒

0 +d

dx

zxa1 + z2

x + z2y

+d

dy

zya1 + z2

x + z2y

=0 . (5.49)

Although the last equation, known as the ‘minimal surface equation’, was de-rived by Lagrange in 1762, non-trivial (non-planar) solutions were not found till1776 by the French Mathematical Engineer, Jean Meusnier. In particular, theplanar solution is given by:

Z(x, y) = Ax+By + C (5.50)

where A,B,C are constants. Here Zx = ∂Z∂x

= A, Zy = ∂Z∂y

= B and L =?1 + A2 +B2 e.t.c.

Switching to cylindrical coordinates: (ρ, θ, z), with x = ρ cos(θ), y = ρ sin(θ)and z = z, we have another solution to the minimal surface problem. This isgiven by the Catenoid – a surface of revolution parametrised by a single inde-pendent variable, z:

ρ = λ cosh(z

λ) (5.51)

where λ is a constant. Note that ρ is independent of the second independentvariable θ, since the surface rotationally symmetric (it was produced by rotatinga catenoid curve about the z-axis). To show this is a solution, we can either re-derive the minimal surface equation, starting from the infinitesimal area element:

dA =b

1 + (∂ρz

)2 + (∂ρ∂θ

)2ρdθdz, or try some messy crap with the chain rule andthe Cartesian coordinate equation. It’s far easier to start from the action principleagain, with the Lagrangian: L(ρ, ∂ρ

∂z, ∂ρ∂θ

). Since our Catenoid is independentof theta (symmetry in θ), we have ∂ρ

∂θ= 0. Therefore, our Lagrangian is the

coefficient function(coefficient of dθ ∧ dz) our area 2-form element:

L = L(ρ, ρz) = ρ

c1 + (

∂ρ

∂z)2 + 0. (5.52)

Letting ρz := ∂ρz

, our Euler-Lagrange equation is given by:

∂L∂ρ− d

dz

∂L∂ρz

= 0, (5.53)

which simplifies to: a1 + ρ2

z −d

dz

ρzρa1 + ρ2

z

= 0. (5.54)


With some application of the chain and product rules, along with the hyperbolictrigonometric identities

1 + sinh2(x) = cosh2(x)

d

dxcosh(λx) =λ sinh(λx),

d

dxsinh(λx) = λ cosh(λx)

d

dxtanh(x) = sech2(x) (5.55)

one can show that the Catenoid surface, given by ρ(z) = λ cosh( zλ), solves the

Euler-Lagrange equation (5.54). Hence the Catenoid corresponds to a ‘critical-surface’ (cf. ‘critical point’) of the surface area functional A and makes thisfunctional (action) stationary. To see that it is indeed a minimal surface, simplynote that the Lagrangian is given by the square root of a strictly-positive quantity.Since the Lagrangian is strictly positive, the corresponding area (action) integralis strictly positive. This means that the Catenoid surface (or in fact any surface!),cannot be a maximal surface. Hence the Catenoid is either a stationary point ora minima of the area action functional. It is in fact a minimal surface.

5.1.8 Closing Remarks

: The Lagrangian formalism is for the most part, a second-order formalism. Thismeans that the equations of motion resulting from the Euler-Lagrange equationsare usually second order differential equations. For many different reasons, it issometimes to advantageous or necessary to switch to a first-order formalism –‘Hamiltonian mechanics’. To do this, one defines the Hamiltonian as the Legen-dre transform of the Lagrangian:

H(~q, ~p; t) = ~p · 9~q − L(~q, 9~q; t) (5.56)

where the ~p is the conjugate momentum vector (related to the generalized ve-locities). The components of ~p are defined as the partial derivatives of the La-grangian with respect to the generalized velocities:

pi :=∂L∂ 9qi

. (5.57)

In this formalism, the natural variables are now the generalized coordinates ~q andthe conjugate momenta ~p. From a practical point of view, the ultimate result isthat Hamilton’s equations are coupled first-order differential equations – whichin general are easier to solve than the Euler-Lagrange equations.

Although they are essentially equivalent, there are many theoretical motivationsfor the Hamiltonian formalism – most notably, that it allows a dynamical systemto be represented in ‘phase space’. Evolution of the system is then describedby trajectories (~q(t), ~p(t)) in phase-space. With such a structure, the systemcan be analysed using symplectic geometry and Liouville theory – the key pointbeing that the Hamiltonian H(~q, ~p) defines a ‘flow’ on phase space (a map onthe cotangent bundle). This flow gives rise to a conserved, non-vanishing objectcalled the ‘symplectic form’ – the basis for many deep mathematical theoremsregarding dynamics.

SGC 2015 - Mathematical Sciences Extension Studies

Documents

Transcript of SGC 2015 - Mathematical Sciences Extension Studies