Optimization Algorithms on Homogeneous...
Transcript of Optimization Algorithms on Homogeneous...
Optimization Algorithmson
Homogeneous Spaces
WITH APPLICATIONS IN LINEAR SYSTEMS THEORY
Robert Mahony
March 1994
Presented in partial fulfilment of the requirements
for the degree of Doctor of Philosophy
at the Australian National University
Department of Systems Engineering
Research School of Information Sciences and Engineering
Australian National University
Acknowledgements
I would like to thank my supervisors John Moore and Iven Mareels for their support, insight,
technical help and for teaching me to enjoy research. Thanks to Uwe Helmke for his enthusiasm
and support and Wei-Yong Yan for many useful suggestions. I would also like to thank the
other staff and students of the department for providing an enjoyable and exciting environment
for work, especially the students from lakeview for not working too hard. I reserve a special
thanks for Peter Kootsookos because I owe him one.
I have been lucky enough to visit Unite Auto, Catholic University of Leuven, Louvain-la-Neuve
and the Department of Mathematics, University of Regensburg, for extended periods during
my studies and thank the staff and students of both institutions for their support.
A number of people have made helpful comments and contributions to the results contained in
this thesis. In particular, I would like to thank George Bastin, Guy Campion, Kenneth Driessel,
Ed Henrich, Ian Hiskens, David Hill and David Stewart as well several anonymous reviewers.
Apart from the support of the Australian National University I have also received additional
financial support from the following sources:
The Cooperative Research Centre for Robust and Adaptive Systems, funded by the Aus-
tralian Commonwealth Government under the Cooperative Research Centres Program.
Grant I-0184-078.06/91 from the G.I.F., the German-Israeli Foundation for Scientific
Research and Development
Boeing Commercial Aircraft Corporation.
Lastly I thank Pauline Allingham for her support and care throughout my doctorate.
i
Statement of Originality
The work presented in this thesis is the result of original research done by myself, in col-
laboration with others, while enrolled in the Department of Systems Engineering as a Doctor
of Philosophy student. It has not been submitted for any other degree or award in any other
university or educational institution.
Following is a list of publications in refereed journals and conference proceedings completed
while I was a Doctor of Philosophy student. Much of the technical discussion given in this
thesis is based on work described in the papers numbers [1,2,5,6,10,11] from the list below.
The remaining papers cover material I chose not to include in this thesis.
Journal Papers:
1. R. E. Mahony and U. Helmke. System assignment and pole placement for symmetric
realisations. Submitted to Journal of Mathematical Systems, Estimation and Control,
1994.
2. R. E. Mahony, U. Helmke, and J. B. Moore. Gradient algorithms for principal component
analysis. Submitted to Journal of the Australian Mathematical Society, 1994.
3. R. E. Mahony and I. M. Mareels. Global solutions for differential/algebraic systems and
implications for Lyapunov direct stability methods. To appear Journal of Mathematical
Systems, Estimation and Control, 1994.
4. R. E. Mahony, I. M. Mareels, G. Campion, and G. Bastin. Non-linear feedback laws for
output regulation. Draft version, 1994.
5. J. B. Moore, R. E. Mahony, and U. Helmke. Numerical gradient algorithms for eigenvalue
and singular value calculations. SIAM Journal of Matrix Analysis, 15(3), 1994.
ii
Conference Papers:
6. R. E. Mahony, U. Helmke, and J. B. Moore. Pole placement algorithms for symmetric
realisations. In Proceedings of IEEE Conference on Decision and Control, San Antonio,
U.S.A., 1993.
7. R. E. Mahony and I. M. Mareels. Global solutions for differential/algebraic systems and
implications for Lyapunov stability methods. In Proceedings of the 12’th World Congress
of the International Federation of Automatic Control, Sydney, Australia, 1993.
8. R. E. Mahony and I. M. Mareels. Non-linear feedback laws for output stabilization.
Submitted to the IEEE Conference on Decision and Control, 1994.
9. R. E. Mahony, I. M. Mareels, G. Campion, and G. Bastin. Output regulation for systems
linear in the input. In Conference on Mathematical Theory of Networks and Systems,
Regensburg, Germany, 1993.
10. R. E. Mahony and J. B. Moore. Recursive interior-point linear programming algo-
rithm based on Lie-Brockett flows. In Proceedings of the International Conference on
Optimisation: Techniques and Applications, Singapore, 1992.
11. J. B. Moore, R. E. Mahony, and U. Helmke. Recursive gradient algorithms for eigenvalue
and singular value decompositions. In Proceedings of the American Control Conference,
Chicago, U.S.A., 1992.
Robert Mahony
iii
iv
Abstract
Constrained optimization problems are commonplace in linear systems theory. In many cases
the constraint set is a homogeneous space and the additional geometric insight provided by
the Lie-group structure provides a framework in which to tackle the numerical optimization
task. The fundamental advantage of this approach is that algorithms designed and implemented
using the geometry of the homogeneous space explicitly preserve the constraint set.
In this thesis the numerical solution of a number of optimization problems constrained
to homogeneous spaces are considered. The first example studied is the task of determining
the eigenvalues of a symmetric matrix (or the singular values of an arbitrary matrix) by inter-
polating known gradient flow solutions using matrix exponentials. Next the related problem
of determining principal components of a symmetric matrix is discussed. A continuous-time
gradient flow is derived that leads to a discrete exponential interpolation of the continuous-time
flow which converges to the desired limit. A comparison to classical algorithms for the same
task is given. The third example discussed, this time drawn from the field of linear systems
theory, is the task of arbitrary pole placement using static feedback for a structured class of
linear systems.
The remainder of the thesis provides a review of the underlying theory relevant to the three
examples considered and develops a mathematical framework in which the proposed numerical
algorithms can be understood. This framework leads to a general form for a solution to any
optimization problem on a homogeneous space. An important consequence of the theoretical
review is that it develops the mathematical tools necessary to understand more sophisticated
numerical algorithms. The thesis concludes by proposing a quadratically convergent numerical
optimization method, based on the Newton-Raphson algorithm, which evolves explicitly on a
Lie-group.
v
vi
Contents
Acknowledgements � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � i
Statement of Originality � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ii
Abstract v
Glossary of Symbols � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � xi
1 Introduction 1
1.1 Historical Perspective � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 6
1.1.1 Dynamical Systems as Numerical Methods � � � � � � � � � � � � � � 6
1.1.2 Optimization Techniques and Numerical Solutions to Differential Equa-
tions � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 10
1.1.3 Linear Systems Theory and Pole Placement Results � � � � � � � � � � 17
1.2 Summary of Results � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 19
2 Numerical Gradient Algorithms for Eigenvalue Calculations 24
2.1 The Double-Bracket Algorithm � � � � � � � � � � � � � � � � � � � � � � � � 27
2.2 Step-Size Selection � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 32
2.3 Stability Analysis � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 37
2.4 Singular Value Computations � � � � � � � � � � � � � � � � � � � � � � � � � 41
vii
2.5 Associated Orthogonal Algorithms � � � � � � � � � � � � � � � � � � � � � � � 47
2.6 Computational Considerations � � � � � � � � � � � � � � � � � � � � � � � � � 51
2.7 Open Questions and Further Work � � � � � � � � � � � � � � � � � � � � � � � 52
2.7.1 Time-Varying Double-Bracket Algorithms � � � � � � � � � � � � � � 53
3 Gradient Algorithms for Principal Component Analysis 55
3.1 Continuous-Time Gradient Flow � � � � � � � � � � � � � � � � � � � � � � � � 57
3.2 A Gradient Descent Algorithm � � � � � � � � � � � � � � � � � � � � � � � � � 65
3.3 Computational Considerations � � � � � � � � � � � � � � � � � � � � � � � � � 69
3.3.1 An Equivalent Formulation � � � � � � � � � � � � � � � � � � � � � � 69
3.3.2 Pade Approximations of the Exponential � � � � � � � � � � � � � � � 70
3.4 Comparison with Classical Algorithms � � � � � � � � � � � � � � � � � � � � 71
3.4.1 The Power Method � � � � � � � � � � � � � � � � � � � � � � � � � � � 72
3.4.2 The Steepest Ascent Algorithm � � � � � � � � � � � � � � � � � � � � 75
3.4.3 The Generalised Power Method � � � � � � � � � � � � � � � � � � � � 76
3.5 Open Questions and Further Work � � � � � � � � � � � � � � � � � � � � � � � 79
4 Pole Placement for Symmetric Realisations 81
4.1 Statement of the Problem � � � � � � � � � � � � � � � � � � � � � � � � � � � 83
4.2 Geometry of Output Feedback Orbits � � � � � � � � � � � � � � � � � � � � � 90
4.3 Least Squares System Assignment � � � � � � � � � � � � � � � � � � � � � � � 93
4.4 Least Squares Pole placement and Simultaneous System Assignment � � � � � 102
4.5 Simulations � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 107
4.6 Numerical Methods for Symmetric Pole Placement � � � � � � � � � � � � � � 113
viii
4.7 Open Questions and Further Work � � � � � � � � � � � � � � � � � � � � � � � 118
5 Gradient Flows on Lie-Groups and Homogeneous Spaces 121
5.1 Lie-groups and Homogeneous Spaces � � � � � � � � � � � � � � � � � � � � � 123
5.2 Semi-Algebraic Lie-Groups, Actions and their Orbits � � � � � � � � � � � � � 125
5.3 Riemannian Metrics on Lie-groups and Homogeneous Spaces � � � � � � � � � 127
5.4 Gradient Flows � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 130
5.5 Convergence of Gradient Flows � � � � � � � � � � � � � � � � � � � � � � � � 133
5.6 Lie-Algebras, The Exponential Map and the General Linear Group � � � � � � 135
5.7 Affine Connections and Covariant Differentiation � � � � � � � � � � � � � � � 141
5.8 Right Invariant Affine Connections on Lie-Groups � � � � � � � � � � � � � � � 144
5.9 Geodesics � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 148
6 Numerical Optimization on Lie-Groups and Homogeneous Spaces 155
6.1 Gradient Descent Algorithms on Homogeneous Spaces � � � � � � � � � � � � 157
6.2 Newton-Raphson Algorithm on Lie-Groups � � � � � � � � � � � � � � � � � � 161
6.3 Coordinate Free Newton-Raphson Methods � � � � � � � � � � � � � � � � � � 169
6.4 Symmetric Eigenvalue Problem � � � � � � � � � � � � � � � � � � � � � � � � 172
6.5 Open Questions and Further Work � � � � � � � � � � � � � � � � � � � � � � � 180
7 Conclusion 182
7.1 Overview � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 182
7.2 Conclusion � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 184
ix
x
Glossary of Symbols
Linear Algebra, Sets and Spaces:
R The real numbers.
C The complex numbers.
N The natural numbers.
RN N -dimensional Euclidean space.
RN�M The set of all real N �M matrices, MN -dimensional Euclidean space.
C N�M The set of all complex N �M matrices, 4MN -dimensional Euclidean space.
B��x�, B� The ball of radius � � 0 around a point x � RN or the origin, B��x� � fy �RN j jjx� yjj � �g, B� � B��0�.
Sk�N� The set of all skew symmetric matrices f� � RN�N j �T � ��g.
S�N� The set of all symmetric matrices f� � RN�N j �T � �g.
GL�N�R�, GL�N� The general linear group of all real invertible N �N matrices.
GL�N� C � The general linear group of all complex invertible N �N matrices.
O�N� The set of N �N orthogonal matrices fU � RN�N j UTU � INg�
St(p� n) The Stiefel manifold of n�p orthogonal matrices, fX � Rn�p j XTX � Ipg�
Differential Geometry, Sets and Spaces:
Ck�M� The set of all at least k times differentiable functions from a manifold M to
the real numbers.
C��G� The set of all smooth functions from a manifold G to the real numbers.
Ck� C� The set of at least k times differentiable respectively smooth functions from
an understood set (usuallyRn) to the real numbers.
xi
TxM The tangent space of a manifold M at the point x �M .
TM The tangent bundle of M , the union over all x �M of each TxM .
T �xM The cotangent space of a manifold M at the point x � M . The cotangent
space is the dual space of linear functionals on the vector space TxM .
T �M The cotangent bundle of M , the union over all x �M of each T�xM .
D�G� The algebra of all smooth vector fields on a smooth manifold G.
D��G� The set of all smooth 1-forms fields on a smooth manifold G.
Sn The n-dimensional sphere inRn�1, fx � Rn�1 j xTx � 1g.
RPn n-dimensional real projective space, the set of all vector directions inRn�1.
G�H The quotient space of a group G by a normal subgroup H .
stab�X� The subgroup associated with a group action that leaves X fixed.
M�H0� The set of orthogonally similar matrices to H0.
Grass(p,n) The Grassmannian manifold of p-dimensional subspaces inRn.
gl�N�R� The Lie algebra associated with GL�N�R�. One has that gl�N�R� � RN�N
equipped with the matrix Lie-bracket �A�B� � AB � BA.
g, h The Lie-algebra associated with arbitrary Lie-groups G or H respectively.
gl�N� C � The Lie algebra associated with GL�N� C �, the set of all N �N matrices.
Linear Algebra, Notation and Operators:
IN The N �N identity matrix.
0N�M The N �M matrix with each element zero.
�i Eigenvalues.
�i Singular values.
Re�a� The real part of a � C .
xii
hx� yi The Euclidean inner product of x and y inRn.
�ij The Kronecker delta function, �ij � 0 if i �� j, �ij � 1 if i � j.
AT The transpose of a matrix A.
jjAjj, jjAjjF The Frobenius norm of A � Rn�m. One has jjAjj2F �Pn
i�1Pm
j�1 A2ij .
jjxjj2 The Euclidean two-norm of x � R.
jjAjj2 Matrix two-norm, supremum of jjAxjj for jjxjj2 � 1.
�A�B� The Lie-bracket of two matrices, �A�B� � AB �BA.
adiAB The adjoint operator on matrices, adiAB � A�
adi�1A B
���
adi�1A B
�A where
ad0AB � A.
fA�Bg The generalised Lie-bracket of two matrices fA�Bg � ATB � BTA.
tr�A� The trace of A � Rn�n.
diag��1� � � � � �n� The matrix with diagonal elements ��1� � � � � �n� and all other elements
zero.
vec�A� The vector operation that stacks the columns of a matrix A � Rm�n into a
vector inRnm.
kerT The kernel of a linear operator T .
dom T The domain of a linear operator T , defined as the subspace orthogonal to the
kernel of T with respect to a given inner product.
spfv1� � � � � vng The subspace generated by the span of the vectors v1� � � � � vn.
sp�A� The subspace generated by the span of the columns of A � Rn�n .
dimV The dimension of a subspace V � Rn.
dist The distance between two subspaces, the Frobenius norm of the residual pro-
jection operator.
O�h� Big O notation. A function f�h� is order O�h� when there exists B � 0 and
� � 0 such that jf�h�jh � B for all 0 � h � �.
xiii
o�h� Little o notation. A function f�h� is order o�h� when f�h� is order O�h� and
limh�0jf�h�jh � 0.
Differential Geometry, Notation and Operators:
Djx �� The Frechet derivative of a scalar function � C1�M� evaluated at x � M a
smooth manifold, in direction .
Tpf The tangent map associated with a function f : M � N between two man-
ifolds at the point p � M . One has Tpf : TpM � Tf�p�N , Tpf�X� :�
Df jp �X� the Frechet derivative of f .
df The differential of a function f : M � N between two manifolds. One has
df : TM � TN , df�Xp� � Tpf�Xp� where Xp � TpM .
�jJ The restriction of a map � : M � N between two sets M and N to a map
� : J � N where J �M .
J The closure of J �M in some topology on M .
g�� �� General notation for a Riemannian metric operating on tangent vectors and
�.
hh� �ii An explicit Riemannian metric operating on tangent vectors and �.
grad The gradient of a potential � C��G� on a Riemannian manifold G.
Z The derivation of � C��M�with respect to a smooth vector fieldZ � D�M�
on a manifold M . One has that Z�x� � Djx �Z�x�� for x � M and
Z�x� � TxM the value of Z at x.
�Y� Z� The Lie-bracket of two smooth vector fields Y� Z � D�M� for M a manifold.
LY Z The Lie-derivative of two vector fields Y� Z � D�M� for M a manifold. One
has that LY Z � �Y� Z�.
r An affine connection.
rY Z The action of an affine connection on (or covariant derivation of) a smooth
vector field Z � D�M� with respect to Y � D�M� on a manifold M .
xiv
Jx The Jacobi matrix of � C2�Rn� evaluated at a point x � Rn.
H� The Hessian of � C2�M� on a manifold M evaluated at a critical point � of
.
xv
Chapter 1
Introduction
The aim of the present work is to investigate a particular class of constrained optimization
problems, those where the constraint set is a smooth homogeneous manifold (embedded in
Euclidean space). Rather than rely on standard numerical techniques the approach taken is to
exploit the geometry of Lie-groups and homogeneous spaces. The advantages of this approach
are considerable especially in the areas of stability, robustness and flexibility of the algorithms
developed.
Optimization problems of the class considered are important in many fields of study. The
two areas from which the principal examples in the body of the thesis are drawn are the
fields of numerical linear algebra and linear systems theory. An advantage of considering
questions drawn from the field of numerical linear algebra is the degree of expertise in solving
such problems using classical techniques. This provides an excellent foundation on which to
develop new results as well as ensuring that there is a battery of existing numerical methods to
which proposed algorithms may be compared. In contrast, the field of linear systems theory
contains many important optimization problems for which no satisfaction solution is known.
Presently accepted solution methods tend to be awkward adaptations of numerical linear algebra
methods which do not exploit the natural structure of the problem. Many of these optimization
problems are of a form for which the methods developed in the this work are applicable.
As a consequence of the ad-hoc development of many of the existing algorithms in linear
systems theory there has been little or no effort to understand the particular requirements of
1
2 Introduction Chapter 1
numerical methods for engineering problems. The neglect of this aspect of a proper numerical
treatment of optimization problems in engineering is especially important when on-line and
adaptive processes are considered. In such processes, conventional numerical methods must be
augmented with additional check procedures and tests to guarantee robustness of the process.
Indeed, one should even consider whether the principal goals of classical numerical methods
are appropriate for numerical methods in an adaptive or on-line engineering application.
Following this line of reasoning further, it is instructive to consider a set of priorities suitable
for numerical methods which solve on-line and adaptive engineering applications. I believe the
following properties in some sense describe the characteristics desirable for such algorithms. It
should be mentioned that the algorithms considered are all recursive iterations whose limiting
value yields the solution of the problem considered and the properties mentioned below are
phrased with this in mind.
Simplicity, The algorithm should be simple in concept and flexible in implementation. The
relationship between the task performed and the computational method employed should
be easily understood.
Global Convergence, The method should converge to the desired solution for almost any initial
condition. In a sense this can be considered as a robustness and stability requirement
on the algorithm. Thus, algorithms should be highly robust with respect to noisy data
sequences and large deviations. Interestingly, this point is often an argument against
using iterative numerical methods that converge too quickly.
Constraint stability, The method should explicitly preserve the constraint set on which the
optimization problem is posed. If a numerical algorithm is implemented on-line it will
be running for a considerable period of time and it is imperative that the constraint be
maintained exactly to preserve the qualitative properties of the system.
Classical numerical optimization methods do not necessarily have these properties as
primary goals. Indeed, most classical algorithms are designed primarily to obtain the best
absolute accuracy with the least computational cost (implemented on a digital computer). For
example standard error analysis of a numerical method ensures that the numerical solution
obtained satisfies some absolute error bound. In contrast, the numerical algorithms considered
in later chapters are designed to exactly preserve the constraint while solving an optimization
x1-0 Introduction 3
problem robustly. The questions of computational cost and absolute accuracy are of secondary
importance to the properties mentioned above.
An important aspect of the properties outlined above is that they do not demand fast
convergence or high accuracy (other than to preserve the constraint set). Certainly, constraint
stability requires that only certain errors may occur (those that preserve the structure of the
problem), however, high accuracy within the constraint set is often not a necessity. For
example, if a computed linear feedback gain that stabilises a given plant, then probably any
nearby feedback will also stabilise the plant. However, if that computation is then used to
initialise a further iterate and introduces modelling errors into a process, these modelling errors
could accumulate and eventually cause more significant problems. The bursting phenomena
observed in early adaptive feedback algorithms provide a useful analogy. Fast convergence
properties within the constraint set are not necessarily desirable either. Indeed, if the algorithm
converges too quickly then it will tend to track input noise disturbances, whereas a scheme that
converges slowly will act somewhat like a low pass filter. In a practise one would like to have a
“knob” which adjusts the rate of convergence of a given algorithm (analogous to adjusting the
pass band of a filter). This is impossible for most classical algorithms, however, the algorithms
proposed in this thesis can all be sped up and slowed down to a certain degree.
In this thesis I propose recursive algorithms for solving a number of constrained opt-
imization problems which satisfy the properties outlined above. The principal algorithms
considered are based on the classical gradient descent algorithm but modified so that they ex-
plicitly preserve that constraint set. This is achieved by exploiting the geometry of Lie-groups
and homogeneous spaces, though the algorithms proposed can be understood without resorting
to deep theoretical results. The algorithms are closely related to certain continuous-time gra-
dient flows and dynamical systems that have been proposed recently by a number of authors
as potential numerical methods for engineering problems (cf. the recent monograph (Helmke
& Moore 1994b) for an excellent review of these developments). By designing numerical
algorithms based on these methods one brings dynamical systems solutions to engineering
problems one step closer to applications.
The modified gradient descent algorithms proposed display all the properties mentioned
above. In the case where there is a unique local (and global) minimum then the gradient
descent gradient algorithms will always converge to the desired minima. Gradient flows (and
4 Introduction Chapter 1
also gradient algorithms) are robust to variations in initial conditions (Smale 1961). Since
the algorithms proposed are designed to explicitly preserve the constraint then by definition
they satisfy the property of constraint stability. Finally, the basic gradient descent algorithm
is the simplest optimization method available and the modifications considered are relatively
straightforward changes. Of course, there are applications where the linear convergence rate
associated with gradient descent algorithms is not sufficient for the problem considered. In this
case one must look at more sophisticated methods. The most direct quadratically convergent
method is the Newton-Raphson algorithm for determining the zeros of a gradient vector
field. Exploiting the geometry of Lie-groups again one can formulate a Newton-Raphson
algorithm directly on the constraint set. Unfortunately the region in which the Newton-
Raphson method will converge to the desired minimum is only a local neighbourhood of that
point. Thus, the Newton-Raphson method by itself does not satisfy the global convergence
property. Nevertheless, the method is useful in certain situations and can be implemented in
parallel with a modified gradient descent algorithm to guarantee robustness.
The potential applications of the theory expounded in this work are far reaching and
varied. Originating from recent dynamical systems studies of eigenvalue problems (Brockett
1988, Brockett 1991b, Chu & Driessel 1990, Helmke & Moore 1990) one may design iterative
gradient descent algorithms for the symmetric eigenvalue and singular value problems (cf.
Moore et al. (1994) and Chapter 2). These new algorithms do not aim to compete with state
of the art solutions to these problems. Rather, the symmetric eigenvalue and singular value
problems provide an environment in which to understand the new approach used in the context
of a well understood problem and compare the algorithms generated to classical methods.
Having, understood and developed the theory necessary to implement these methods in a
simple case one is confident in tackling problems in linear systems theory which have proved
amenable to dynamical systems solutions.
For example, Helmke (1993) provided a variational characterisation of several different
classes of balanced realisations for linear systems. Dynamical systems which compute balanced
realisations were proposed several authors (Perkins, Helmke & Moore 1990, Helmke, Moore
& Perkins 1994). The problem of computing balanced realizations can be numerically ill-
conditioned, especially when a plant is nearly uncontrollable/unobservable, and calls for special
numerical methods (Laub, Heath, Paige & Ward 1987, Safanov & Chiang 1989, Tombs &
x1-0 Introduction 5
Postlethwaite 1987) with good numerical properties. Gradient descent methods have good
numerical properties for dealing with ill-conditioned problems and offer an attractive alternative
to modifications of existing numerical linear algebra methods. Yan and Moore (1991) and later
Helmke and Moore (1994a) developed gradient dynamical-systems solutions to computing
balanced realizations and minimising the L2-sensitivity of a state-space realization of a given
transfer function. Yan et al. (1994) has developed a number of recursive algorithms based on
these dynamical systems for L2 sensitivity optimization as well as Euclidean norm balancing.
A generalisation of balancing and sensitivity minimisation for time varying plants, termed
“�-balancing”, is discussed by Imae, Perkins and Moore (1992). An important application of
the dynamical systems approach for balancing and sensitivity minimisation is in the design
of finite word length implementations of controllers. Recent work on designing digital state-
space systems which draws from these ideas is outlined in the articles (Li, Anderson, Gevers
& Perkins 1992, Madievski, Anderson & Gevers 1994). A state of the art discussion of many
of these issues is contained in the recent monograph by Gevers and Li (1993).
The area of balanced realizations and sensitivity minimisation is only one facet of the po-
tential applications of dynamical systems concepts to control theory. Brockett’s (1988) original
work lead him to consider a number of applications related to analogue computing. Brockett
(1989b) went on to show that dynamical systems can be used to realize general arithmetical
and logical operations. Least squares matching problems (Brockett 1989a, Brockett 1991a)
are also a natural application of the original development with practical relevance to computer
vision and statistical principal component analysis. The geometry of least squares and prin-
cipal component analysis was developed by a number of authors in the mid eighties (Bloch
1985a, Bloch 1985b, Byrnes & Willems 1986). An interesting application of these ideas to
the dynamical theory of learning in neural-networks was discussed by Brockett (1991a). This
work was based on Brockett’s own research along with recent developments in using the sin-
gular value decomposition to understand learning procedures (Bouland & Karp 1989, Baldi
& Hornik 1989). The work ties in closely with Oja’s (1982, 1989) results on neural-network
learning. Recently, Yan, Helmke and Moore (1994) have provided a rigourous analysis of Oja’s
learning equations. Numerical methods related to these problems are presented in Mahony,
Helmke and Moore (1994) (cf. Chapter 3).
More generally, Faybusovich (1992a) has developed dynamical-system solutions for com-
6 Introduction Chapter 1
puting Pisarenko frequencies, used in certain signal processing applications. Similar techniques
provide new approaches to realization theory (Brockett & Faybusovich 1991). Another po-
tential application in signal processing is the digital quantization of continuous-time signals
considered the articles (Brockett 1989b, Brockett & Wong 1991). Yan, Teo and Moore (n.d.)
have also investigated using dynamical systems for computing LQ optimal output feedback
gains. This has motivated a number of authors (Sreeram, Teo, Yan & Li 1994, Mahony &
Helmke 1993) to use similar methods for difficult simultaneous stabilization problems that
have no classical solution. Preliminary results by Ghosh (1988) have tackled the simultaneous
stabilization problem using algebraic geometric methods, though more recent results (Blondel
1992, Blondel, Campion & Gevers 1993) indicate that the problem can not be solved exactly us-
ing algebraic operations and consequently, recursive methods offer one of the better numerical
approaches to obtain an approximation.
1.1 Historical Perspective
The material presented in this thesis builds primarily on the recent development of dynamical
systems solutions to certain linear algebraic problems. There is also dependence on classical
optimization theory from the last fifty years or so and more recent concepts of numerical stability
for computational integration methods. The pole placement results presented in Chapter 4 relate
to a considerable body of knowledge in linear systems theory developed since the seventies. To
provide a historical background for the work presented in this thesis the present section is split
into three subsections covering the fields of dynamical systems theory, numerical optimization
theory and linear systems theory. There is some overlap between these topics, especially since
the focus is on those developments which lead to the new results presented in the body of this
thesis.
1.1.1 Dynamical Systems as Numerical Methods
Much of the work covered in the following subsection is relatively recent and I know of only
one book (Helmke & Moore 1994b) that is devoted to the study of this topic with applications
to engineering. Nevertheless, there are several good review articles available which cover the
x1-1 Historical Perspective 7
early application of the Toda lattice to eigenvalue problems (Chu 1984a, Watkins 1984) and an
overview of continuous realization methods for traditionally numerical linear algebra problems
(Chu 1988).
Historically, the idea of solving a numerical problem by computing the limiting solution
of a continuous-time differential equation is not new. The accelerating development of digital
computers in the mid twentieth century tended to obscure the potential of such methods though
the study of analogue circuit design was still of interest for practical applications. In the cases
where dynamical solutions to certain problems were considered (for example Rutishauser
(1954, 1958) proposed dynamical systems for solving the symmetric eigenvalue problem) the
classical algorithms known today developed so quickly that the dynamical systems approach
was forgotten. More recently, digital techniques have improved to the point where many
traditionally analogue tasks are being performed digitally. Interestingly, there is a renewed
interest recently in analogue techniques, brought about perhaps by a feeling that the limits of
digital technology may be approaching.
The particular historical development of dynamical system solutions for numerical linear
algebra problems on which my work is based began with the study of a differential equation
proposed by Toda (1970). Toda’s original idea was to study the evolution of point masses in one
dimension related by an exponential attractive force. The differential equation that he proposed
became known as the Toda lattice and was extensively studied by a number of authors (Henon
1974, Flashka 1974, Flashka 1975, Moser 1975, Kostant 1979, Symes 1980a, Symes 1980b).
In Flashka (1974) a representation of the Toda lattice as an isospectral differential equation
on the set of tridiagonal1 symmetric matrices was developed. By isospectral it is understood
that the eigenvalues of the matrix solution to the Toda lattice remain constant for all time.
Moser (1975) extended this to show that a solution of the Toda lattice converges to a diagonal
matrix and thus provides a way in which to compute the eigenvalues of a tridiagonal symmetric
1Tridiagonal symmetric matrices are matrices of the form�BBBBBB�
�1 �1 0 � � � 0
�1 �2 �2. . .
...
0. . .
. . .. . . 0
.... . . �n�2 �n�1 �n�1
0 � � � 0 �n�1 �n
�CCCCCCAfor real numbers �1� � � � � �n and �1� � � � � �n�1.
8 Introduction Chapter 1
matrix. Symes (1982) showed that the Toda lattice was in fact related to the classical QR
algorithm for the symmetric eigenvalue problem. This paper generated considerable interest
in dynamical systems solutions of numerical linear algebra problems and was followed by
several papers (Deift, Nanda & Tomei 1983, Watkins 1984, Chu 1984a, Chu 1984b, Nanda
1985, Shub & Vasquez 1987, Watkins & Elsner 1988) which generalise the initial connection
seen by Symes. Present day interest in the Toda flow is considerable with recent work into
developing a VLSI (very large scale integrated circuit) type implementation of the Toda flow
by a nonlinear lossless electrical network (Paul, Hueper & Nossek 1992) as well as its close
connection to the double bracket equation discussed below.
Prompted in part by the potential of the Toda lattice as a theoretical (and potentially
practical) tool in numerical-linear algebra several authors undertook to investigate more general
numerical methods in the context of dynamical systems. Ammar and Martin (1986) investigated
other standard matrix eigenvalue methods and showed a strong connection between both the
discrete-time and continuous-time Riccati equation. Their results were based in part on a
Lie-theoretic interpretation of the Riccati flow developed by Hermann and Martin (1982). A
complete phase portrait of the Riccati equation was given by Shayman (1986) while Helmke
(1991) has related the Riccati flow to Brockett’s double-bracket flow (Brockett 1991b). Articles
by Riddle (1984) on minimax problems for sums of eigenvalues and by Duistermaat, Kolk and
Varadarajan (1983) on flows constrained to evolve on flag manifolds should be mentioned since
both articles have proved useful references for many of the works mentioned below.
The double bracket equation
�H�t� � �H�t�� �H�t��N ��� H�0� � H0 �1�1�1�
and its properties were first studied by Brockett (1988, 1991b) (see also independent work by
Chu and Driessel (1990) and Chu (1991b)). Here H � HT � Rn�n and N � NT � Rn�n are
symmetric matrices. When H is tridiagonal and N is diagonal then (1.1.1) reduces to the Toda
lattice. Brockett showed that (1.1.1) defines an isospectral flow whose solution H�t�, under
suitable conditions on N , converges to a diagonal matrix. Brockett spoke of using (1.1.1) to
solve various combinatorial optimization tasks such as linear programming problems and the
sorting of lists of real numbers.
x1-1 Historical Perspective 9
The double bracket equation was seen to be a fundamental generalisation of the Toda lattice
with many practical applications. Among the the diverse fields to which the double-bracket
equation appears to be relevant one finds applications to the travelling salesman problem and
quantization of continuous signals (Brockett 1989b, Brockett & Wong 1991). Least squares
matching and applications in computer vision are discussed in the paper (Brockett 1989a).
Chu and Driessel (1990) considered continuous-time solutions to structured inverse eigenvalue
problems along with matrix least squares problems. For applications in subspace learning see
the papers (Brockett 1991a, Yan, Helmke & Moore 1994). Stochastic versions of the double
bracket equation are studied the report (Colonius & Klieman 1990). An important connection
has been recognised between the double-bracket flow and the modern geometric approach to
linear programming pioneered by Khachian (1979) and Karmarkar (1984, 1990). Fundamental
work in this area has been carried out by a number of authors (Bayer & Lagarias 1989, Lagarias
& Todd 1990, Bloch 1990b, Faybusovich 1991a, Faybusovich 1991b, Helmke 1992).
A deep understanding of the double-bracket equation has been developed over the last few
years. The connection between the Toda lattice and the double-bracket flow is thoroughly
described by a series of papers (Bloch 1990a, Bloch, Brockett & Ratiu 1990, Bloch, Flaschka
& Ratui 1990). Lagarias (1991) shows certain monotonicity properties of sums of eigenvalues
of solutions of the Toda lattice. It is not surprising to find that the double-bracket equation
can be interpreted as a gradient flow on adjoint orbits of a compact Lie-group (Bloch, Brockett
& Ratiu 1990, Bloch, Brockett & Ratiu 1992). Indeed there is now an emerging theory
of completely integrable gradient and Hamiltonian flows associated with the double-bracket
equation (Faybusovich 1989, Bloch et al. 1992, Bloch 1990b). The paper by Faybusovich
(1992b) gives a complete phase portrait of the Toda flow and QR algorithm including a
discussion of structural stability.
The development of the double-bracket equation has been parallelled by a number of papers
which investigate the potential of dynamical systems solutions to numerical linear algebra prob-
lems. Watkins and Elsner (1989a, 1989b) considered both the generalised eigenvalue problem
and the singular value decomposition. The symmetric eigenvalue problem is discussed the
articles (Brockett 1988, Chu & Driessel 1990, Brockett 1991b). The singular value decompo-
sition has also been studied in detail (Smith 1991, Helmke & Moore 1990, Helmke et al. 1994).
The Jacobi method for minimising the norm of off-diagonal entries of a matrix is discussed by
10 Introduction Chapter 1
Chu (1991a) with application to simultaneous diagonalization of multiple matrices. Chu and
Driessel (1991) have also looked at inverse eigenvalue problems which are related to recent
work in pole placement for classes of structured linear systems by Mahony and Helmke (1993)
(cf. Chapter 4).
Numerical methods based on the double-bracket equation have been discussed by Brockett
(1993) and Smith (1993). Numerical methods with connections to the dynamical systems
solutions for inverse singular value problems are discussed by Chu (1992) while numerical
methods for feedback pole placement within a class of symmetric state-space systems is
discussed in the conference paper (Mahony, Helmke & Moore 1993) (cf. Section 4.6).
1.1.2 Optimization Techniques and Numerical Solutions to Differential Equa-
tions
An early reference for optimization techniques is monograph (Aoki 1971) or the book by Lu-
enburger (1973). More recent material can be obtained in Dennis and Schnabel (1983) and the
recent review of state of the art methods (Kumar 1991). For recent developments in numerical
Hamiltonian integration methods see the review article (Sanz-Serna 1991). Relationships of
these developments to classical numerical integration techniques is contained in the review
(Stuart & Humphries 1994).
The problems considered in this thesis are constrained scalar optimization problems on
smooth manifolds without boundary. That is the problem of minimising (or maximising) a
function f : M � R from the constraint set M to the real numbers. There are strong con-
nections, however, with classical numerical linear algebra problems such as that of computing
the eigenvalues of a symmetric matrix. The tools employed are derived from a geometric un-
derstanding of the problems considered combined with methods from classical unconstrained
optimization theory and Lie-theory. Of course, the geometry of most problems drawn from the
field of numerical linear algebra is well understood. For example, a geometric understanding
of the symmetric eigenvalue problems is not new. Parlett and Poole (1973) first rigourously
analysed the classical QR, LU and power iterations in a geometric framework, though Bu-
urema (1970) had done preliminary work and the geometric structure of the problem must have
been known to many. A recent survey article is Watkins (1982). A geometric understanding of
x1-1 Historical Perspective 11
the problem of determining a single eigenvector of a symmetric matrix was known long before
the general QR algorithm was understood. Indeed, steepest descent optimization techniques
for dominant eigenvector determination were proposed by Hestenes and Karush (1951). An
excellent discussion of the early optimization techniques for such problems is contained in
Faddeev and Faddeeva (1963).
Far from being a closed field there is still much interest in methods similar in scope, though
far advanced in technique (Auchmity 1991, Batterson & Smillie 1989, Batterson & Smillie
1990). For more general numerical linear algebraic techniques such as the QR algorithm it is
necessary to use the language of Grassmannians and Flag manifolds to further develop the early
work of Parlett and Poole (1973). A lot was done to understand these problems in the early
eighties in connection with studying the Toda flow (Symes 1982, Deift et al. 1983, Watkins
1984, Chu 1984a). Later Ammar and Martin (1986) analysed a number of matrix eigenvalue
methods using flows on Grassmannians and Flag manifolds and showed strong connections to
both the discrete-time and the continuous-time Riccati equations. The developing geometric
understanding of classical numerical linear algebra techniques lead to minimax style results for
the eigenvalues of matrices (Riddell 1984). These developments have resulted in a number of
elegant new proofs of matrix eigenvalue inequalities, for example Wieland-Hoffman inequality
(Chu & Driessel 1990) the Courant-Fischer minimax principal (Helmke & Moore 1994b, pg.
14) and Eckart-Young theorem (Helmke & Shayman 1992).
Applications of the double-bracket equation and dynamical systems theory to numerical
linear algebraic problems (Brockett 1988, Watkins & Elsner 1989a, Watkins & Elsner 1989b,
Smith 1991, Helmke & Moore 1990, Brockett 1991b, Helmke et al. 1994) has lead to the
design of numerical algorithms based explicitly on the dynamical systems developed. Recent
advances in such techniques are discussed in the articles (Chu 1992, Brockett 1993, Moore,
Mahony & Helmke 1994). These methods are essentially based on classical unconstrained
optimization methodologies reformulated on the constraint set.
Unconstrained scalar optimization techniques fall into roughly three categories (Aoki 1971)
i) Methods that use only the cost-function values.
ii) Methods that use first order derivatives of the cost function.
iii) Methods that use second (and higher) order derivatives of the cost function.
12 Introduction Chapter 1
Methods of the first type tend not to be useful for other than linear search and non-smooth
optimization problems due to computational cost. An excellent survey of early techniques such
as pattern searches, relaxation methods, Rosenbrock and Powell’s methods as well as random
search methods and some other variations of these ideas is contained in Aoki (1971, section
4.7). Other good references for these methods are the books (Luenburger 1973, Minoux 1986).
Recent developments are discussed in the collection of articles (Kumar 1991).
The fundamental method of type ii) is the gradient descent method. For a potential
f : Rn � Rwith the gradient denoted Df � � �f�x1 � � � � �
�f�xn �
T the method of gradient descent
is
xk�1 � xk � skDf�xk�
where sk � 0 is some pre-specified sequence of real positive integers known as step-sizes. Here
the integer k indexs the iterations of the numerical algorithm acting like a discrete-time variable
for the solution sequence fxkg�k�0. A suitable choice of step-size sk is any sequence such that
sk � 0 as k �� andP�
k�1 sk ��. Polyak (1966) showed that provided f satisfies certain
convexity assumptions then the solution sequence of the gradient descent algorithm converges
to the minimum of f . The optimal gradient descent method is known as the method of steepest
descent (Cauchy 1847, Curry 1944) where the step-size is chosen at each step by
sk � arg mins�0
f�xk � sDf�xk���
Here “arg min” means to find the value of s that minimises f�xk � sDf�xk��. The method of
steepest descent has the advantage of being associated with strong global convergence theory
(Minoux 1986, Theorem 4.4, pg. 86). The step-size selection procedure is usually completed
using a linear search algorithm or using some estimation technique based on approximations of
f�xk�skDf�xk��. Using a linear search technique generally provides a faster but less reliable
algorithm while a good approximation technique will inherit the strong global convergence
theory of the optimal method. The disadvantage of the overall approach is the linear rate
of convergence of the solution sequence fxkg to the desired limit (even for optimal step-size
selection). Nevertheless, when the reliability and not the rate of convergence of an optimization
problem is at issue the steepest descent method or an approximate suboptimal gradient descent
method remains a preferred numerical algorithm.
x1-1 Historical Perspective 13
There are a number of algorithm which improve on the convergence properties of the
steepest descent method. Of these only the Newton-Raphson method is important for the se-
quel, however, it is worth mentioning that multi-step methods, combining a series of estimates
xk�1� � � � � xk�p and derivativesDf�xk�1�� � � � � Df�xk�p� can be devised which converge with
superlinear, quadratic and higher orders of convergence, but which have much weaker conver-
gence results associated with them than the steepest descent methods. The most prominent of
these methods are the accelerated steepest descent methods (Forsythe 1968) and the method of
conjugate gradients (Fletcher & Reeves 1964).
The Newton-Raphson method falls into the third category and relies on the idea of approx-
imating the scalar function f�x� by its truncated Taylor series
f�x� � f�xk� � �x� xk�TDf�xk� � �x� xk�
TD2f�xk��x� xk��
where D2f�xk� is the square matrix with ij’th entry �2f�xk��xi�xj
. If f�x� is quadratic then this
approximation is exact and the optimal minimum can be found in a single step
x� � xk � �D2f�xk���1 Df�xk��
Of course, in general this will not be true but if the approximation is fairly good, one would
expect the residual error jjx� � xk�1jj to be of order2 O�jj�x� � xk�jj3�. Indeed, the Newton-
Raphson algorithm is the most natural algorithm that displays quadratic convergence proper-
ties. A disadvantage of the Newton-Raphson algorithm is the cost of determining the inverse
�D2f�xk���1 and a number of methods have been devised to reduce the computational cost of
this calculation. The most common of these are the Davidon-Fletcher-Powell approach (Davi-
don 1959, Fletcher & Powell 1963) and a rank-2 correction formula independently derived by
2The big O order notation, jjx� � xk�1jj is of order O�jj�x� � xk�jj3� means that there exists real numbers
B � 0 and � � 0 such thatjj�x� � xk�1�jj
jj�x�� xk�jj3� B�
for all jj�x� � xk�jj � �. If jjx� � xk�1jj is of order O�jj�x� � xk�jj3� then it follows using the little o order
notation that jjx� � xk�1jj is of order o�jj�x� � xk�jj2�,
limk��
jj�x� � xk�1�jj
jj�x� � xk�jj2� 0�
Thus the error bound at each step decreases like a quadratic function around the limit point. Methods with thisconvergence behaviour are known as quadratically convergent.
14 Introduction Chapter 1
Broyden (1970), Fletcher (1970), Goldfarb (1970) and Shanno (1970). An excellent review of
these methods is provided by Minoux (1986, Chapter 4).
The approach to optimization described above is closely related to the task of numerically
approximating the solution of an ordinary differential equation. Indeed, the gradient descent
method is just the Euler method (Butcher 1987, Section 20) applied to determine the solution
of the gradient differential equation
�x � �Df�x�� x�0� � x0� �1�1�2�
where f : Rn � R (Euler’s original work is republished in the monograph (Euler 1913)). The
Euler method is rarely used in modern numerical analysis since it is only a first order method.
That is, the error between xk�1 and x�h; xk� (the solution of (1.1.2) with x�0� � xk evaluated
at time h) is o�h�,
limh�0
jjxk�1 � x�h; xk�jjh
� 0�
(Naturally, this translates to a linear convergence rate for the gradient descent method.) More
advanced numerical integration methods exist, the most common of which in engineering
applications are the Runge-Kutta methods (Butcher 1987, Section 22) or linear multi-step
methods (Butcher 1987, Section 23).
The idea of stability for a numerical approximation of the solution to an initial value
problem is usually described in terms of the ability of the numerical method to accurately
reproduce the behaviour of the continuous-time solution. Thus, if one is considering the scalar
linear differential equation
�x � qx� x�0� � x0 � C �1�1�3�
where q � C is a fixed complex number with real part Re�q� � 0, then the solution x�t�� 0
as t � 0. A numerical approximation to this problem is loosely said to be stable if the
approximation also converges to zero. A Runge-Kutta method, with step size h, is said to be A-
stable if the numerical solution of the scalar linear differential equation given above converges
to zero for any z � hq lying in the complex left half plane. Thus, for any real positive step-size
selection h � 0 and any linear system with Re�q� � 0 an A-stable Runge-Kutta method
solution of (1.1.3) will converge to zero. The concept of AN -stability captures the same
qualitative behaviour for non-autonomous linear systems (Burrage 1978). A strengthening of
x1-1 Historical Perspective 15
the concept of A-stability for contractive numerical problems (cf. the review article (Stuart
& Humphries 1994)) termed B-stability was proposed by Butcher (1975) which can also be
generalised to non-autonomous systems (BN -stability) (Burrage & Butcher 1979). In this
paper Burrage and Butcher also introduced the important concept of “algebraic stability”
which they showed implied bothB- and BN -stability. Algebraic stability is a condition on the
parameters that define a Runge-Kutta method which has relevance to many different stability
problems (Stuart & Humphries 1994) and even to question of existence and uniqueness of
solutions to implicit Runge-Kutta methods (Cooper 1986). For systems with Re�q� �� 0
the continuous-time solution to (1.1.3) will converge very quickly to zero and one would
like this behaviour to be replicated in the numerical solution. A definition of L-stability
due to Ehle (1973) which captures this idea is also a strengthening of standard A-stability.
There are a number of useful numerical schemes that do not satisfy A-stability and weaker
definitions of stability are available, the most common of which are “A� �-stability” (Widlund
1967, Dahlquist 1978) (the numerical solution of (1.1.3) must converge to zero for any z � hq
with Re�z� � � , where � 0 is a small real number) and “stiff stability” (Gear 1968) (the
method must be stable for all z � fz � C j j arg��z�j � g for some small real number
� 0).
The unifying idea behind each of these stability definitions is the ability of the numeri-
cal method to replicate the properties of the continuous-time solution that is being approxi-
mated. The classical definitions of stability discussed above consider only simple convergence
behaviour of systems (A- and AN -stability for linear decay problems, L-stability for fast
convergence rates, B- and BN -stability for contractive problems). Another important class
of differential equations are those which preserve certain quantities, for example energy or a
Hamiltonian. Numerical methods for these two classes of problems (conservative and Hamilto-
nian systems) have been the subject of considerable research recently. Methods for conservative
systems are discussed in the articles (Greenspan 1974, Greenspan 1984). Methods for Hamilto-
nian systems are of more relevance to the present work. These methods can be divided roughly
into two types (Sanz-Serna 1991), firstly methods that are classical numerical differential equa-
tion solvers which happen also to preserve a Hamiltonian, and secondly methods which are
constructed explicitly from generating functions for solving Hamiltonian systems. The earlier
methods were based on generating functions (Ruth 1983, Channell 1983, Menyuk 1984, Feng
1985, Zhong & Marsden 1988). When it was observed that expressing these methods could
16 Introduction Chapter 1
often be interpreted as numerical Runge-Kutta methods with particular properties people be-
came interested in exactly which Runge-Kutta methods would have the property of preserving
a Hamiltonian. This question was answered independently by a number of authors (Sanz-Serna
1988, Suris 1989, Lasagni 1988). Application of these ideas to engineering problems associ-
ated with equations of motion of a rigid body has been undertaken by Crouch, Grossman and
Yan (1992). Crouch, Grossman and Yan are also working on related integration techniques
for engineering problems (Crouch & Grossman 1994, Crouch, Grossman & Yan 1994). A
recent review article for Hamiltonian integration methods is Sanz-Serna (1991). Interestingly,
the characterisation of Runge-Kutta methods that preserve Hamiltonians is related to the alge-
braic construction first described when defining algebraic stability (Burrage & Butcher 1979).
Indeed, Stuart and Humphries (1994), describe a number of connections between early sta-
bility theory and modern numerical methods for Hamiltonian and conservative systems. In
Stuart and Humphries (1994) the concept of numerical stability, the question of whether, and
in what sense the dynamical properties of a continuous-time flow are inherited by a discrete
numerical approximation, is defined. This concept is sometimes termed “practical stability”
and is closely related to the definition of constraint stability given on page 2. I have opted
not to use the term numerical stability to describe the algorithms proposed in the sequel since
the optimization tasks considered require two types of numerical stability, preservation of a
constraint and convergence to a limit.
In certain cases the Toda lattice, double-bracket flow and related dynamical systems can be
interpreted as a completely integrable Hamiltonian flow (Bloch 1985a, Bloch et al. 1992, Bloch
1990b). In these cases one could think to apply the modern Hamiltonian integration techniques
discussed by Sanz-Serna (1991). To do this however, one would have to consider the various
differential equations as Hamiltonian flows on Rn and the insight gained by considering the
solution in matrix space would be lost.
Several authors have looked directly at discretizing flows on Lie-groups and homogeneous
spaces. Moser and Veselov (1991) considered discrete versions of classical mechanical systems.
Chu (1992) considered discrete methods for inverse singular problems based on dynamical
systems insights while Brockett (1993), Smith (1993) and Moore et al. (1994) have studied
more deliberate discretizations of gradient flows on Lie-groups and homogeneous spaces.
x1-1 Historical Perspective 17
1.1.3 Linear Systems Theory and Pole Placement Results
Textbooks on feedback control and linear systems theory are those of Kailath (1980), Wonham
(1985) and Sontag (1990). An excellent reference for classical linear quadratic methods is the
book (Anderson & Moore 1971) or the more recent book (Anderson & Moore 1990). A recent
review article on developments in pole placement theory is Byrnes (1989).
The field of systems engineering during the mid seventies was the scene of a developing
understanding of the mathematical and geometric foundation of linear systems theory. Sem-
inal work by Kalman (1963) among others set a foundation of mathematical systems theory
which lead people naturally to use algebraic geometric tools to solve some of the fundamental
questions that arose. This lead to a strong geometric framework for linear systems theory
being developed in the late seventies and early eighties (Bitmead & Anderson 1977, Martin
& Herman 1979, Hazewinkel 1979, Byrnes, Hazewinkel, Martin & Rouchaleau 1980, Helmke
1984, Falb 1990). See also the conference proceedings (Martin & Hermann 1977b, Byrnes &
Martin 1980). The development of the Toda lattice was of considerable interest to researchers
working in linear systems theory in the late seventies and lead to several new developments in
scaling actions on spaces of rational functions in system theory (Byrnes 1978, Krishnaprasad
1979, Brockett & Krishnaprasad 1980). More recently Nakamura (1989) has showed a con-
nection between the Toda lattice and the study of moduli spaces of controllable linear systems.
Also Brockett and Faybusovich (1991) have made connections with realization theory.
One of the principal questions in linear systems theory that remained unanswered until
recently was the question of how the natural frequencies or poles of a multi-input multi-output
system are effected by changing feedback gain. In the case where the full state of a multi-input
multi-output state space system is available as output, Wonham (1967) showed that arbitrary
pole placement is equivalent to complete controllability of the system. The case for output
feedback (when only part of the state is available directly from the output) was found to be
far more difficult. Indeed, even after the theory of optimal linear quadratic methods was far
advanced (Anderson & Moore 1971) an understanding of the output feedback pole placement
problem remained elusive. A few preliminary results on pole shifting were obtained in the
early seventies (for example Davison and Wang (1973)) which lead to the first important result,
obtained independently by Davison and Wang (1975) and Kimura (1975). Given a linear
18 Introduction Chapter 1
system with n states, m inputs and p outputs, the result stated that for almost all controllable
and observable linear state-space systems for which
m� p� 1 n�
the poles of that system could be almost arbitrarily changed using output feedback.
In 1977 Herman and Martin published a pair of articles (Hermann & Martin 1977, Martin
& Hermann 1977a) which used the dominant morphism theorem to show that mp n is a
necessary and sufficient condition for output feedback pole placement if one allows complex
gain matrices. Observe that if m, p 1 then mp m� p � 1 and thus the results obtained
by Hermann and Martin are stronger than those obtained earlier apart from the disadvantage
of requiring complex feedback. Unfortunately, their results don’t generalise to real feedback
gains though it was hoped that the condition mp n would also be a necessary and sufficient
for real output feedback pole placement. However, Willems and Hesselink (1978) soon gave a
counter example (m � 2, p � 2, n � 4) showing that the strict inequality mp � n could not
be achieved for arbitrary pole placement using real feedback.
The case mp � n was studied by Brockett and Byrnes (1979, 1981) using tools from
algebraic geometry and constructions on Grassmannian manifolds. By using these ideas
Brockett and Byrnes generalised Nyquist and Root locus plots to multi-input and multi-output
systems, however though useful, their results only applied in the case mp � n and fell short
of completely characterising the pole placement map in this case also. In Byrnes (1983)
the Ljusternik-Snivel’mann category of real Grassmannians is used to improve on Kimura’s
original result. There have been no other significant advances in dealing with this problem
during the mid eighties. A recent review article (Byrnes 1989) outlines the early results as well
as describing the state of the art towards the end of the eighties.
Recently Wang and Rosenthal have made new contributions to the problem of output
feedback pole placement (Wang 1989, Rosenthal 1989, Rosenthal 1992). Most recently Wang
(1992) has given a necessary and sufficient condition for pole placement using the central
projection model. Given a linear system with n states,m inputs and p outputs Wang has shown
that arbitrary output feedback pole placement is possible for any strictly proper controllable
and observable plant with mp � n. If the plant is only proper then almost arbitrary pole
x1-2 Summary of Results 19
placement is possible. The case mp � n is still not fully understood.
Little has been done to study classes of linear systems and the pole placement map. In
Martin and Herman (1977a) pole placement for linear Hamiltonian systems was considered.
More recently Mahony et al. (1993) (cf. Chapter 4) studied pole placement for symmetric
state-space systems. Simultaneous pole placement for multiple systems is also a problem that
has had little study. Ghosh (1988) has written a paper on this topic using algebro-geometric
techniques and recently Blondel (1992) and Blondel, Campion and Gevers (1993) have also
contributed. Such problems can also be tackled using the ideas outlined by Mahony and Helmke
(1993) (cf. Chapter 4). The development of efficient numerical methods for pole placement by
output feedback is a challenge. Methods from matrix calculus have been applied by Godbout
and Jordan (1989) and more recently, gradient descent methods have been proposed (Mahony
et al. 1993) (cf. Section 4.6).
1.2 Summary of Results
The thesis is divided into seven chapters. Chapter 1 provides an overview of the subject matter
considered. Chapters 2 to 4 consider three example optimization problems in detail. The first
problem discussed is a smooth optimization problem which can be used to solve the symmetric
eigenvalue problem. A considerable amount is known about the continuous-time gradient
dynamical systems associated with this optimization problem and the development builds on
this knowledge to generate a recursive numerical algorithm. The next problem considered is an
optimization problem related to principal component analysis. A discussion of the continuous-
time gradient flow is given before a numerical algorithm is developed. The connection of
the numerical method proposed and classical numerical linear algebraic algorithms for the
same task is investigated. The third example, drawn from the field of linear systems theory,
is the task of pole placement for the class of symmetric linear systems. A discussion of the
geometry of the task is undertaken yielding results with the flavour of traditional pole placement
results. Continuous-time gradient flows are derived and used to investigate the structure of
the optimization problem. A numerical method is also proposed based on the continuous-time
gradient flow.
The latter chapters approach the subject from a theoretical perspective. In Chapter 5
20 Introduction Chapter 1
a theoretical foundation is laid in which the algorithms proposed in Chapters 2 to 4 may be
understood. Chapter 6 goes on to consider the particular numerical algorithms proposed in detail
and provides a template for designing numerical optimization algorithms for any constrained
optimization problem on a homogeneous space. Later in Chapter 6 a more sophisticated
numerical algorithm based on the Newton-Raphson algorithm is developed in a general context.
The algorithm is applied to a specific problem (the symmetric eigenvalue problem) to provide
an example of how to use the theory in practise. Concluding remarks are contained in Chapter
7.
The principal results contained in Chapters 2 to 6 are summarised below.
Chapter 2: In this chapter a numerical algorithm, termed the the double-bracket algorithm
Hk�1 � e��k�Hk�N �Hke�k�Hk�N ��
is proposed for computing the eigenvalues of an arbitrary symmetric matrix. For suitably small
k , termed time-steps, the algorithm is an approximation of the solution to the continuous-
time double-bracket equation. Since the matrix exponential of a skew symmetric matrix is
orthogonal it follows that this iteration has the important property of preserving the spectrum
of the iterates. That is the eigenvalues of Hk remain constant for all k. By choosing a suitable
diagonal target matrix N the sequence Hk will converge to a diagonal matrix from which the
eigenvalues ofH0 can be directly determined. To ensure that the algorithm converges a suitable
step-size k must be chosen at each step. Two possible choices of schemes are presented
along with analysis showing that the algorithm converges to the desired matrix for almost all
initial conditions. A related algorithm for determining the singular values of an arbitrary (not
necessarily square) matrix is proposed and is shown to be equivalent to the double-bracket
equation applied to an augmented symmetric system. An analysis of convergence behaviour
showing linear convergence to the desired limit points is presented. Associated with the main
algorithms presented for the computation of the eigenvalues or singular values of matrices are
algorithms evolving on Lie-groups of orthogonal matrices which compute the full eigenspace
decompositions of given matrices.
The material presented in this chapter was first published in the conference article (Moore,
Mahony & Helmke 1992). A journal paper based on an expanded version of the conference
x1-2 Summary of Results 21
paper is to appear this year (Moore et al. 1994).
Chapter 3: In this chapter an investigation is undertaken of the properties of Oja’s learning
equation
�X � XXTNX �NX� N � NT � Rn�n �
evolving on the set of matrices fX � Rn�m j XTX � Img, the Stiefel manifold of real
n �m matrices where n m are integers. This differential equation was proposed by Oja
(1982, 1989) as a model for learning in certain neural networks. Explicit proofs of convergence
for the flow are presented which extend the results in Yan et al. (1994) so that no genericity
assumption is required on the eigenvalues of N . The homogeneous nature of the Stiefel
manifold allows one to develop an explicit numerical method (a discrete-time system evolving
on the Stiefel manifold) for principal component analysis. The method is based on a modified
gradient ascent algorithm for maximising the scalar potential
RN �X� � tr�XTNX�
known as the generalised Rayleigh quotient. Proofs of convergence for the numerical algorithm
proposed are given as well as some modifications and observations aimed at reducing the
computational cost of implementing the algorithm on a digital computer. The discrete method
proposed is similar to the classical power method and steepest ascent methods for determining
the dominant p-eigenspace of a matrix N . Indeed, in the case where p � 1 (for a particular
choice of time-step) the discretization is shown to be equivalent to the power method. When
p � 1, however, there are subtle differences between the power method and the proposed
method.
The chapter is based on the journal paper (Mahony, Helmke & Moore 1994). Applications
of the same ideas have also been considered in the field of linear programming (Mahony &
Moore 1992).
Chapter 4: In this chapter, the task of pole placement is considered for a structured class
of systems (those with symmetric state space realisations) for which, to my knowledge, no
previous pole placement results are available. The assumption of symmetry of the realisation,
besides having a natural network theoretic interpretation, simplifies the geometric analysis
considerably. It is shown that a symmetric state space realisation can be assigned arbitrary
22 Introduction Chapter 1
(real) poles via symmetric output feedback if and only if there are at least as many system inputs
as states. This result is surprising since a naive counting argument (comparing the number of
free variables 12m�m � 1� of symmetric output feedback gain to the number of poles n of a
symmetric realization having m inputs and n states) would suggest that 12m�m � 1� n is
sufficient for pole placement. To investigate the problem further gradient flows of least squares
cost criteria (functions of the matrix entries of realisations) are derived on smooth manifolds
of output feedback equivalent symmetric realisations. Limiting solutions to these flows occur
at minima of the cost criteria and relate directly to finding optimal feedback gains for system
assignment and pole placement problems. Cost criteria are proposed for solving the tasks of
system assignment, pole placement, and simultaneous multiple system assignment.
The theoretical material contained in Sections 4.1 to 4.4 along with the simulations in
Section 4.5 are based on the journal paper (Mahony & Helmke 1993) while the numerical
method proposed in Section 4.6 was presented at the 1993 Conference on Decision and Control
(Mahony et al. 1993). Much of the material presented in this chapter was developed in
conjunction with the results contained the monograph (Helmke & Moore 1994b, Section 5.3),
which focusses on general linear systems.
Chapter 5: In this chapter a brief review of the relevant theory associated with developing
numerical methods on homogeneous spaces is presented. The focus of the development is
on classes of homogeneous spaces encountered in engineering applications and the simplest
theoretical constructions which provide a mathematical foundation for the numerical methods
proposed. A discussion is given of the relationship between gradient flows on Lie-groups and
homogeneous spaces (related by a group action) which motivates the choice of a particular
Riemannian structure for a homogeneous space. Convergence behaviour of gradient flows
is also considered. The curves used in constructing numerical methods in Chapters 2 to 4
are all based on matrix exponentials and the theory of the exponential map as a Lie-group
homomorphism is reviewed to provide a theoretical foundation for this choice. Moreover, a
characterisation of the geodesics associated with the Levi-Civita connection (derived from a
given Riemannian metric) is discussed and conditions are given on when the matrix exponential
maps to a geodesic curve on a Lie-group. Finally, an explicit discussion of the relationship
between geodesics on Lie-groups and homogeneous spaces is given.
Much of the material presented is standard or at least easily accessible to people working
x1-2 Summary of Results 23
in the fields of Riemannian geometry and Lie-groups. However, this material is not standard
knowledge for researchers in the field of systems engineering. Moreover, the development
strongly emphasizes the aspects of the general theory that is relevant to problems in linear
systems theory.
Chapter 6: In this chapter the gradient descent methods developed in Chapters 2 to 4 are
reviewed in the context of the theoretical developments of Chapter 5. The conclusion is that
the proposed algorithms are modified gradient descent algorithms where geodesics are used to
replace the straight line interpolation of the classical gradient descent method. This provides a
template for a simple numerical approach suitable for solving any scalar optimization problem
on a homogeneous space. Later in Chapter 6 a coordinate free Newton-Raphson method is
proposed which evolves explicitly on a Lie-group. This method is proposed in a general form
with convergence analysis and then used to generate a quadratically convergent numerical
method for the symmetric eigenvalue problem. A comparison is made to the QR algorithm
applied to an example taken from Golub and Van Loan (1989, pg. 424) which shows that the
Newton-Raphson method proposed converges in the same number of iterations as the classical
QR method.
Chapter 2
Numerical Gradient Algorithms for
Eigenvalue Calculations
A traditional algebraic approach to determining the eigenvalue and eigenvector structure of an
arbitrary matrix is theQR algorithm. In the early eighties it was observed that theQR algorithm
is closely related to a continuous-time differential equation which had become known through
study of the Toda lattice. Symes (1982), and Deift et al. (1983) showed that for tridiagonal
real symmetric matrices, the QR algorithm is a discrete-time sampling of the solution to a
continuous-time differential equation. This result was generalised to full complex matrices by
Chu (1984a), and Watkins and Elsner (1989b) provided further insight in the late eighties.
Brockett (1988) studied dynamical matrix flows generated by the double Lie-bracket1
equation,
�H � �H� �H�N ��� H�0� � H0�
for constant symmetric matrices N and H0. This differential equation is termed the double-
bracket equation, and solutionsof this equation are termed double-bracket flows. Similar matrix
differential equations appear earlier than those references given above in Physics literature. An
1The Lie-bracket of two square matrices X , Y � Rn�n is
�X�Y � � XY � Y X�
If X � XT and Y � Y T are symmetric matrices then �X�Y �T � ��X�Y � is a skew symmetric matrix.
24
x2-0 Introduction 25
example, is the Landau-Lifschitz-Gilbert equation of micromagnetics
d �m
dt�
�
1 � 2 � �m�H � �m� � �m�H�� j �mj2 � 1�
as � � and �� � k, a constant. In this equation �m�H � R3 and the cross-product
is equivalent to a Lie-bracket operation. The relationship between this type of differential
equation and certain problems in linear algebra, however, has only recently been investigated.
An important property of the double-bracket equation is that its solutions have constant
spectrum (i.e. the eigenvalues of a solution remain the same for all time) (Chu & Driessel
1990, Helmke & Moore 1994b). By suitable choice of the matrix parameter N Brockett (1988)
showed that the double-bracket flow can be used to diagonalise real symmetric matrices (and
hence compute their eigenvalues), sort lists, and even to solve linear programming problems.
In independent work by Driessel (1986), Chu and Driessel (1990), Smith (1991) and Helmke
and Moore (1990), a similar gradient flow approach was developed for the task of computing
the singular values of a general non-symmetric, non-square matrix. The differential equation
obtained in these approaches is almost identical to the double-bracket equation. In Helmke
and Moore (1990), it is shown thatthese flows can also be derived as special cases of the
double-bracket equation for a non-symmetric matrix, suitably augmented to be symmetric.
When the double-bracket equation is viewed as a dynamical solution to linear algebra
problems (Brockett 1988, Chu & Driessel 1990, Helmke & Moore 1994b) one is lead naturally
to consider numerical methods based on the insight provided by the double-bracket flow. In
particular, the double-bracket flow evolves on a smooth submanifold of matrix space, the
set of all symmetric matrices with a given spectrum (Helmke & Moore 1994b, pg. 50). A
numerical method with such a property is termed constraint stable (cf. page 2) Such methods
are particularly of interest when accuracy or robustness of a given computation is an important
consideration. Robustness is of particular interest for engineering applications where input data
will usually come with added noise and uncertainty. As a consequence when one considers
numerical approximation of solutions to the double-bracket equation it is important to study
those methods which preserve the important structure of the double-bracket flow.
For the particular problem of determining the eigenvalues of a symmetric matrix, there
are many well tested and fast numerical methods available. It is not so much to challenge
26 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
established algorithms in speed or efficiency that one would study numerical methods based
on the double-bracket equation. Rather, with the developing theoretical understanding of a
number of related differential matrix equation (many of which with important applications in
linear systems theory, for example the area of balanced realizations (Imae, Perkins & Moore
1992, Perkins et al. 1990) one may look upon a detailed study of numerical methods based
on the double-bracket flow as providing a stepping stone to a new set of robust and adaptive
computational methods in linear systems theory.
The material presented in this chapter was first published in the conference article (Moore
et al. 1992). A journal paper based on an expanded version of the conference paper is to appear
this year (Moore et al. 1994).
In this Chapter, I propose a numerical algorithm, termed the the double-bracket algorithm,
for computing the eigenvalues of an arbitrary symmetric matrix,
Hk�1 � e��k�Hk�N �Hke�k�Hk�N ��
For suitably small k, termed time-steps, the algorithm is an approximation of the solution to the
continuous-time double-bracket equation. Since the matrix exponential of a skew symmetric
matrix is orthogonal it seen that this iteration has the important property of preserving the
spectrum of the iterates. It is shown that for suitable choices of time-steps the double-bracket
algorithm inherits the same equilibria and limit points as the double-bracket flow and displays
linear convergence to its limit. A related algorithm for determining the singular values of an
arbitrary (not necessarily square) matrix is proposed and is shown to be equivalent to the double-
bracket equation applied to an augmented symmetric system. An analysis of convergence
behaviour showing linear convergence to the desired limit points is presented. Associated with
the main algorithms presented for the computation of the eigenvalues or singular values of
matrices are algorithms which compute the full eigenspace decompositions of given matrices.
These algorithms also display linear convergence to the desired limit points.
The chapter is divided into seven sections. In Section 2.1 the double-bracket algorithm is
introduced and the basic convergence results are presented. Section 2.2 deals with choosing
step-size selection schemes, and proposes two valid methods for generating the time-steps
k . Section 2.3 discusses the question of stability and proves that the double-bracket equation
x2-1 The Double-Bracket Algorithm 27
has a unique attractive fixed point under assumptions that both the step-size selection schemes
proposed in Section 2.2 satisfy. The remainder of the chapter deals with computing the singular
values of an arbitrary matrix, Section 2.4 and computing the full spectral decomposition of
symmetric (or arbitrary) matrices Section 2.5. A number of computational issues are briefly
mentioned in Section 2.6 and Section 2.7 considers some remaining open issues.
2.1 The Double-Bracket Algorithm
In this section a brief review of the continuous-time double-bracket equation is given with
emphasis on its interpretation as a gradient flow. The double-bracket algorithm is introduced
and conditions are given which guarantee convergence of the algorithm to the desired limit
point.
Let N and H be real symmetric matrices, and consider the potential function
��H� :� jjH �N jj2 (2.1.1)
� jjH jj2 � jjN jj2 � 2tr�NH��
where the norm used is the Frobenius norm
jjX jj2 :� tr�XTX� �X
x2ij �
with xij the elements of X . Note that ��H�measures the least squares difference between the
elements of H and the elements of N . Let M�H0� be the set of orthogonally similar matrices,
generated by some symmetric initial condition H0 � HT0 � Rn�n. Then
M�H0� � fUTH0U j U � O�n�g� �2�1�2�
where O�n� denotes the group of all n�n real orthogonal matrices. It is shown in Helmke and
Moore (1994b, pg. 48) that M�H0� is a smooth compact Riemannian manifold with explicit
forms given for its tangent space and Riemannian metric. Furthermore, in the articles (Bloch,
Brockett & Ratiu 1990, Chu & Driessel 1990) the gradient of ��H�, with the respect to the
28 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
normal Riemannian metric2 on M�H0� (Helmke & Moore 1994b, pg. 50), is shown to be
grad��H� � ��H� �H�N ��. Consider the gradient flow given by the solution of
�H � �grad��H� (2.1.3)
� �H� �H�N ��� with H�0� � H0�
which is termed the double-bracket flow (Brockett 1988, Chu & Driessel 1990). Thus, the
double-bracket flow is a gradient flow which acts to decrease, or minimise, the least squares
potential �, on the manifold M�H0�. Note that from (2.1.1), this is equivalent to increasing,
or maximising, tr�NH�. The matrix H0 is termed the initial condition, and the matrix N is
referred to as the target matrix.
The double-bracket algorithm proposed in this chapter is,
Hk�1 � e��k �Hk�N �Hke�k�Hk�N �� �2�1�4�
for arbitrary symmetric n� n matrices H0 and N , and some suitably small scalars k , termed
time-steps. Consider the curve Hk�1�t� � e�t�Hk�N �Hket�Hk �N � where Hk�1�0� � Hk and
Hk�1 � Hk�1� k�, the �k � 1�’th iteration of (2.1.4). Observe that
d
dt�e�t�Hk�N �Hke
t�Hk�N ��
����t�0
� �Hk� �Hk� N ���
and thus, e�t�Hk�N �Hket�Hk�N � is a first approximation of the double-bracket flow at Hk �
M�H0�. It follows that for small k , the solution to (2.1.3) evaluated at time t � k with
H�0� � Hk, is approximately Hk�1 � Hk�1� k�.
It is easily seen from above that stationary points of (2.1.3) will be fixed points of (2.1.4). In
general, (2.1.4) may have more fixed points than just the stationary points of (2.1.3), however,
Proposition 2.1.5 shows that this is not the case for suitable choice of time-step k. The term
equilibrium point is used to refer to fixed points of the algorithm which are also stationary
points of (2.1.3).
To implement (2.1.4) it is necessary to specify the time-steps k . This is accomplished by
2A brief discussion of the derivation of gradient flows on Riemannian manifolds is given in Sections 5.3 and 5.4.
x2-1 The Double-Bracket Algorithm 29
considering functions N : M�H0� � R� and setting k :� N �Hk�. The function N is
termed the step-size selection scheme.
Condition 2.1.1 Let N : M�H0� � R� be a step-size selection scheme for the double-
bracket algorithm on M�H0�. Then N is well defined and continuous on all of M�H0�,
except possibly those points H � M�H0� where HN � NH . Furthermore, there exist real
numbers B� � � 0, such that B � N �H� � for all H �M�H0� where N is well defined.
Remark 2.1.2 The variable step-size selection scheme proposed in this chapter is discontinu-
ous at all the points H �M�H0�, such that �H�N � � 0. �
Remark 2.1.3 Observe that the definition of a step-size selection scheme depends implicitly
on the matrix parameter N . Indeed, N may be thought of as a function in two matrix variables
N and H . �
Condition 2.1.4 Let N be a diagonal n� n matrix with distinct diagonal entries �1 � �2 �
� � � � �n.
Let �1 � �2 � � � � � �r be the eigenvalues of H0 with associated algebraic multiplicities
n1� � � � � nr satisfyingPr
i�1 ni � n. Since H0 is symmetric, the eigenvalues of H0 are all real
and the diagonalisation of H0 is
:�
�������1In1 0
.... . .
...
0 �rInr
� � �2�1�5�
where Ini is the ni � ni identity matrix. For generic initial condition H0 and a target matrix
N that satisfies Condition 2.1.4, the continuous-time equation (2.1.3) converges exponentially
fast to (Brockett 1988, Helmke & Moore 1994b). Thus, the eigenvalues of H0 are the
diagonal entries of the limiting value of the solution to (2.1.3). The double-bracket algorithm
behaves similarly to (2.1.3) for small k and, given a suitable step-size selection scheme,
should converge to the same equilibrium as the continuous-time equation.
30 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
Proposition 2.1.5 LetH0 andN ben�n real symmetric matrices whereN satisfies Condition
2.1.4. Let��H� be given by (2.1.1) and let N : M�H0�� R� be a step-size selection scheme
that satisfies Condition 2.1.1. For Hk �M�H0�, let k � N�Hk� and define
���Hk� k� :� ��Hk�1�� ��Hk�� �2�1�6�
where Hk�1 is given by (2.1.4). Suppose
� ��Hk� k� � 0� when �Hk� N � �� 0� (2.1.7)
Then:
a) The iterative equation (2.1.4) defines an isospectral (eigenvalue preserving) recursion
on the manifold M�H0�.
b) The fixed points of (2.1.4) are characterised by matrices H �M�H0� satisfying
�H�N � � 0� �2�1�8�
c) Every solution Hk, for k � 1� 2� � � �, of (2.1.4), converges as k � �, to some H� �M�H0� where �H�� N � � 0.
Proof To prove part a), note that the Lie-bracket �H�N �T � ��H�N � is skew-symmetric. As
the exponential of a skew-symmetric matrix is orthogonal, (2.1.4) is an orthogonal similarity
transformation of Hk and hence is isospectral.
For part b) note that if �Hk� N � � 0, then by direct substitution into (2.1.4) thenHk�1 � Hk .
Thus Hk�l � Hk for l 1, and Hk is a fixed point of (2.1.4). Conversely if �Hk� N � �� 0,
then from (2.1.7), ���Hk� k� �� 0, and thus Hk�1 �� Hk. By inspection, points satisfying
(2.1.8) are stationary points of (2.1.3), and indeed are known to be the only stationary points
of (2.1.3) (Helmke & Moore 1994b, pg. 50). Thus, the fixed points of (2.1.4) are equilibrium
points, in the sense that they are all stationary points of (2.1.3). In order to prove part c) the
following lemma is required.
x2-1 The Double-Bracket Algorithm 31
Lemma 2.1.6 Let N satisfy Condition 2.1.4 and N satisfy Condition 2.1.1 such that the
double-bracket algorithm satisfies (2.1.7). The double-bracket algorithm, (2.1.4), has exactly
n!�Qri�1�ni!� distinct equilibrium points in M�H0�. These equilibrium points are charac-
terised by the matrices �T�, where � is an n � n permutation matrix, a rearrangement of
the rows of the identity matrix, and is given by (2.1.5).
Proof Note that part b) of Proposition 2.1.5 characterises equilibrium points of (2.1.4) as
H �M�H0� such that �H�N � � 0. Evaluating this condition component wise, forH � fhijg,
gives
hij��j � �i� � 0�
and hence by Condition 2.1.4, hij � 0 for i �� j. Using the fact that (2.1.4) is isospectral, it
follows that equilibrium points are diagonal matrices which have the same eigenvalues as H0.
Such matrices are distinct, and can be written in the form �T�, for � an n � n permutation
matrix. A simple counting argument yields the number of matrices which satisfy this condition
to be n!�Qri�1�ni!�.
Consider for a fixed initial condition H0, the sequence fHkg generated by the double-
bracket algorithm. Observe that condition (2.1.7) implies that ��Hk� is strictly monotonic
decreasing for all k where �Hk� N � �� 0. Also, since � is a continuous function on the compact
set M�H0�, then � is bounded from below, and ��Hk� will converge to some non-negative
value ��. As ��Hk�� �� then ���Hk� k�� 0.
For an arbitrary positive number �, define the open set D� � M�H0�, consisting of all
points of M�H0�, within an � neighbourhood of some equilibrium point of (2.1.4). The set
M�H0��D� is a closed, compact subset ofM�H0� on which the matrix functionH � �H�N �
does not vanish. As a consequence, the difference function (2.1.6) is continuous and strictly
negative on M�H0� � D�, and thus, can be over bounded by some strictly negative number
�1 � 0. Moreover, as���Hk� k�� 0 then there exists a K � K��1� such that for all k � K
then 0 ���Hk� k� � �1. This ensures that Hk � D� for all k � K. In other words, Hk is
converging to some subset of possible equilibrium points.
Imposing the upper bound B on the step-size selection scheme N , Condition 2.1.4,
it follows that N �Hk��Hk� N � � 0 as k � �. Thus, e�N �Hk��Hk�N � � I , the identity
matrix, and hence, e��N �Hk��Hk �N �Hke�N �Hk��Hk�N � � Hk as k � �. As a consequence
32 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
jjHk�1 �Hkjj � 0 for k �� and this combined with the distinct nature of the fixed points,
Lemma 2.1.6, and the partial convergence already shown, completes the proof.
2.2 Step-Size Selection
The double-bracket algorithm (2.1.4) requires a suitable step-size selection scheme before it
can be implemented. To generate such a scheme, one can use the potential (2.1.1) as a measure
of the convergence of (2.1.4) at each iteration. Thus, one chooses each time-step to maximise
the absolute change in potential j��jof (2.1.6), such that�� � 0. Optimal time-steps can be
determined at each step of the iteration by completing a line search to maximise the absolute
change in potential as the time-step is increased. Line search methods, however, involve high
computational overheads and it is preferable to to obtain a step-size selection scheme in the
form of a scalar relation depending on known values.
Using the Taylor expansion, ���Hk� �� is expressed as a linear term plus a higher order
error term in a general time-step � . By estimating the error term one obtains a mathe-
matically simple function ��U�Hk� �� which is an upper bound to ���Hk� �� for all � .
Choosing a suitable time-step, k, based on minimising��U the actual change in potential,
���Hk� k� � ��U�Hk� k� � 0, will satisfies (2.1.7). Due to the simple nature of the
function ��U , there is an explicit form for the time-step k depending only on Hk and N .
Lemma 2.2.1 For the k’th step of the recursion (2.1.4) the change in potential
���Hk� �� of (2.1.6), for a time-step � is
���Hk� �� � �2� jj�Hk� N �jj2� 2� 2tr�NR2����� �2�2�1�
with
R2��� :�Z 1
0�1� s�H ��
k�1�s��ds� �2�2�2�
where H��k�1��� is the second derivative of Hk�1��� with respect to � .
Proof Let Hk�1��� be the �k � 1�’th recursive estimate for an arbitrary time-step � . Thus
Hk�1��� � e�� �Hk�N �Hke� �Hk�N �. It is easy to verify that the first and second time derivatives
x2-2 Step-Size Selection 33
of Hk�1 are exactly
H �k�1��� � �Hk�1���� �Hk� N ��
H ��k�1��� � ��Hk�1���� �Hk� N ��� �Hk� N ���
Applying Taylor’s theorem, then
Hk�1��� � Hk�1�0� � �d
d�Hk�1�0� � � 2
Z 1
0�1� s�H ��
k�1�s��ds� (2.2.3)
� Hk � � �Hk� �Hk� N �� � � 2R2����
Consider the change in the potential ��H� between the points Hk and Hk�1���,
���Hk� �� � ��Hk�1����� ��Hk� (2.2.4)
� �2tr�N�Hk�1����Hk��
� �2tr�N�� �Hk� �Hk� N �� � � 2R2�����
� �2� jj�Hk� N �jj2� 2� 2tr�NR2����
Observe that for � � 0 then ���Hk� 0� � 0, and also that dd����Hk� ��
�����0
�
�2jj�Hk� N �jj2. Thus, for sufficiently small � the error term �2tr�NR2���� becomes neg-
ligible, and ���Hk� �� is strictly negative. Let opt � 0 be the first time for which
dd����Hk� ��
������opt
� 0, then ���Hk� opt� � ���Hk� �� � 0 for all strictly positive
� � opt. It is not possible, however, to estimate opt directly from (2.2.4) due to the transcen-
dental nature of the error termR2���. Approximating the error term by a quadratic function in
� allows one to compute an explicit step-size selection scheme based on this estimate.
Lemma 2.2.2 (Constant Step-Size Selection Scheme) The constant time-step
cN �1
4jjH0jj jjN jj �2�2�5�
satisfies Condition 2.1.1. Furthermore, the double-bracket algorithm, equipped with the step-
size selection scheme cN , satisfies (2.1.7).
34 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
α αk
∆ψ (Η,α) > ∆ψ(Η,α)
∆ψ (Η,α)
∆ψ (Η,α) < 0U
U
U
Figure 2.2.1: The upper bound on ���Hk� �� viz ��U �Hk� ��
Proof Recall that for the Frobenius norm jtr�XY �j � jjX jj jjY jj (follows from the Schwartz
inequality). Then
���Hk� �� � �2� jj�Hk� N �jj2 � 2� 2jtr�NR2����j� �2� jj�Hk� N �jj2 � 2� 2jjN jj jjR2���jj� �2� jj�Hk� N �jj2 � 2� 2jjN jj Z 1
0�1� s�jj��Hk�1�s��� �Hk� N ��� �Hk� N ��jjds
� �2� jj�Hk� N �jj2 � 4� 2jjN jj jjH0jj jj�Hk� N �jj2
�: ��U�Hk� ��� (2.2.6)
Thus ��U�Hk� �� is an upper bound for ���Hk� �� and has the property that for sufficiently
small � , it is strictly negative, see Figure 2.2.1. Due to the quadratic form of ��U�Hk� �� in
� , it is immediately clear that ck � cN�Hk� � 1��4jjH0jjjjN jj� is the minimum of (2.2.6).
A direct norm bound of the integral error term is not likely to be a tight estimate of the
error and the function��U is a fairly crude bound for ��. The following more sophisticated
estimate results in a step-size selection scheme.
Lemma 2.2.3 (An Improved Bound for ���Hk� ��) The difference function ���Hk� �� can
be over bounded by
���Hk� �� � �2� jj�Hk� N �jj2 �jjH0jj jj�N� �Hk� N ��jj
jj�Hk� N �jj�e2� jj�Hk�N �jj � 1� 2� jj�Hk� N �jj
��
�: ���U�Hk� ��� (2.2.7)
x2-2 Step-Size Selection 35
Proof Consider the infinite series expansion for the matrix exponential
eA � I �A�12A2 �
13!A3 � �
It is easily verified that
eABe�A � B � �A�B� �12!�A� �A�B�� �
13!�A� �A� �A�B���� (2.2.8)
��Xi�0
1i!
adiAB�
Here adiAB � adA �adi�1A B�� ad0
AB � B where adA : Rn�n � Rn�n is the linear map
X � AX �XA. Substituting�� �Hk� N � and Hk for A and B in (2.2.8) and comparing with
(2.2.3), gives
�2R2��� ��Xj�2
1j!
adj�� �Hk�N ��Hk��
Considering jtr�NR2����j and using the readily established identity tr�Nadj�AB� �
tr��adjAN�B� gives
j� 2tr�NR2����j �
�������Xj�2
1j!
tr�
adj� �Hk�N ��N�Hk
��������
�Xj�2
1j!jjadj� �Hk�N ��N�jj jjH0jj
��Xj�2
1j!�2jj� �Hk� N �jj�j�1jjad� �Hk�N ��N�jj jjH0jj
�jjH0jj jjad� �Hk�N ��N�jj
2� jj�Hk� N �jj�Xj�2
1j!�2� jj�Hk� N �jj�j
�jjH0jj jj�N� �Hk� N ��jj
2jj�Hk� N �jj�e2� jj�Hk�N �jj � 1� 2� jj�Hk� N �jj
��
Thus combining this with the first line of (2.2.6) gives (2.2.7).
The variable step-size selection scheme is derived from this estimate of the error term in
the same manner the constant step-size selection scheme was derived in Lemma 2.2.2.
36 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
Lemma 2.2.4 ( Variable Step-Size Selection Scheme) The step-size selection scheme �N :
M�H0�� R�
�N�H� �1
2jj�H�N �jj log�jj�H�N �jj2
jjH0jj jj�N� �H�N ��jj � 1�� �2�2�9�
satisfies Condition 2.1.1. Furthermore, the double-bracket algorithm, equipped with the step-
size selection scheme �N , satisfies (2.1.7).
Proof I first show that �N satisfies the requirements of Condition 2.1.1. As the Frobenius
norm is a continuous function then �N is well defined and continuous at all pointsH �M�H0�
for which �H�N � �� 0. When �H�N � � 0 then �N is not well defined. To show that there
exists a positive constant �, such that �N�H� � �, consider the following lower bound,
LN :�1
2jj�Hk� N �jj log�jj�Hk� N �jj
2jj�H0jj jjN jj � 1� (2.2.10)
� 12jj�Hk� N �jj log�
jj�Hk� N �jj22jj�H0jj jjN jj jj�Hk� N �jj � 1�
� 12jj�Hk� N �jj log�
jj�Hk� N �jj2jj�H0jj jj�N� �Hk� N ��jj � 1��
which is just �N . Using L’Hopital’s rule it can be seen that the limit of LN at a point
H � M�H0� where �H�N � � 0 is 1��4jjH0jj jjN jj�. Including these points in the definition
of LN , gives that LN is a continuous, strictly positive, well defined function for allH �M�H0�.
Thus, since M�H0� is compact, there exists a real number � � 0 such that
�N LN � � 0�
on M�H0�� fH� j �H�� N � � 0g.
To show that there exists a real number B � 0, such that �N�H� � B, H � M�H0�, set
�H�N � � X � fxijg. For N given by Condition 2.1.4, then jj�N�X �jj�Pi��j��i � �j�
2x2ij ,
where xii � 0 since �H�N � is skew symmetric. Observe that
jjX jjjj�N�X �jj �
Pi��j x
2ijP
i��j��i � �j�2x2ij
� maxi��j��i � �j��2 �: b
x2-3 Stability Analysis 37
for all choices of X � �XT . It follows that
�N�H� �1
2jjX jj log�jjX jj2
jjH0jj jj�N�X �jj � 1�
� 12jjX jjlog�
jjX jjbjjH0jj � 1�
� b
2jjH0jj �: B
since log�x� 1� � x for x � 0.
Finally, for a matrix Hk � M�H0�, �Hk� N � �� 0, the time-step �N �Hk� � �k � 0
minimises (2.2.7), and from Lemma 2.2.3 it follows that 0 ���U�Hk� �� ���Hk� ��.
Thus, the double-bracket algorithm, equipped with the step-size selection scheme �N , satisfies
(2.1.7) and the proof is complete.
2.3 Stability Analysis
In this section the stability properties of equilibria of the double-bracket algorithm (2.1.4) are
investigated. It is shown that for generic initial conditions, and any step-size selection scheme
that satisfies Condition 2.1.1 and (2.1.7), a solution fHkg of the double-bracket algorithm
converges to the unique equilibrium point , given by (2.1.5). The algorithm is shown to
converge at least linearly in a neighbourhood of .
Lemma 2.3.1 Let N satisfy Condition (2.1.4) and N be some selection scheme that satisfies
Condition 2.1.1 and (2.1.7). The double-bracket algorithm (2.1.4) has a unique locally asymp-
totically stable equilibrium point , given by (2.1.5). All other equilibrium points of (2.1.4)
are unstable.
Proof It has been shown that the Hessian of the potential function � (at an equilibrium point
in M�H0�) is always non-singular and is only negative definite at the point (cf. Duistermaat
et al. (1983) or Helmke and Moore (1994b, pg. 53)). Since M�H0� is compact then the
local minimum at is also a global minimum. By assumptions on N and N , ��Hk� is
monotonically decreasing. Thus the domain of attraction of contains an open neighbourhood
of , and hence, is a locally asymptotically stable equilibrium point of (2.1.4).
38 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
All other equilibrium pointsH� are either saddle points or maxima of � (Helmke & Moore
1994b, pg. 53). Thus for any neighbourhood D of an equilibrium point H� �� , there exists
some H0 � D such that ��H0� � ��H��. It follows that the solution to the double-bracket
algorithm, with initial condition H0, will not converge to H� and thus H� is unstable.
Lemma 2.3.1 is sufficient to conclude that for generic initial conditions the double-bracket
algorithm will converge to the unique matrix . It is difficult to characterise the set of initial
conditions for which the algorithm converges to some unstable equilibrium point H� �� .
For the continuous-time double-bracket flow, however, it is known that the unstable basins of
attraction of such points are of zero measure in M�H0� (Helmke & Moore 1994b, pg. 53).
Lemma 2.3.2 Let N satisfy Condition 2.1.4. Let d � R� be a constant such that 0 � d �
1�2jjH0jj2jjN jj2 and consider the constant step-size selection scheme, dN : M�H0�� R�,
dN �H� � d�
The double-bracket algorithm (2.1.4), equipped with the step-size selection scheme dN , has
a unique locally asymptotically stable equilibrium point , given by (2.1.5). The rate of
convergence of the double-bracket algorithm converges at least linear in a neighbourhood.
Proof Since dN is a constant function, the time-step dk � dN �Hk� � d is constant. Thus,
the map
Hk � e�d�Hk �N �Hked�Hk�N �
is a differentiable map on all M�H0�, and one may consider the linearisation of this map at
the equilibrium point , given by (2.1.5). The tangent space T�M�H0� at consists of those
matrices � ���� where � � Skew�n�, the class of skew symmetric matrices (Helmke &
Moore 1994b, pg. 49). It is easily verified that the matrices � T�M�H0� are independently
parameterised by their components ij , where i � j, and �i �� �j . Thus, computing the
linearization of the double-bracket algorithm at the point one obtains
k�1 � k � d��kN �Nk�� �kN �Nk��� �2�3�1�
x2-3 Stability Analysis 39
for k � T�M�H0�. Expressing this in terms of the linearly independent parameters ij , where
i � j, and �i �� �j one has
�ij�k�1 � �1� d��i � �j���i � �j���ij�k� for i� j � 1� � n� �2�3�2�
The eigenvalues of the linearisation (2.3.1) can be read directly from (2.3.2) as 1 � d��i ��j���i � �j�, for i � j and �i �� �j . Since �i �j when i � j then if d � 1�2jjH0jj2jjN jj2(where jjX jj2 is the induced matrix 2-norm, the maximum singular value of X) it is easily
verified that j1� d��i � �j���i � �j�j � 1. It follows that is asymptotically stable with
rate of convergence at least linear. The linear scaling factor for the convergence error is
maxi�j��i ���jf1� d��i � �j���i � �j�g.
Remark 2.3.3 As jjN jj2jjH0jj2 � 2jjN jj jjH0jj, the constant step-size selection scheme cN
is an example of such a selection scheme. �
Remark 2.3.4 Let N : M�H0�� R� be a step-size selection scheme that satisfies Condition
2.1.1 and (2.1.7), and is also continuous on all M�H0�. Let be the locally asymptotically
stable equilibrium point given by (2.1.5). Set � � N �� and observe that the linearisation
of the double-bracket algorithm at is given by (2.3.1) with d replaced by �. Recall that the
LN , scheme defined in (2.2.10), is continuous, with limit LN �H�� � 1��4jjH0jj jjN jj�. Thus,
is an exponentially asymptotically stable equilibrium point for the double-bracket algorithm
equipped with the step-size selection scheme LN . �
To show that the double-bracket algorithm is exponentially stable at for the �N step-size
selection scheme is technically difficult due to the discontinuous nature of �N at equilibrium
points. A full proof of the following proposition is given in Moore et al. (1994).
Proposition 2.3.5 Let N satisfy Assumption (2.1.4) and �N be the step-size selection scheme
given by Lemma 2.2.4. The iterative algorithm (2.1.4), has a unique linearly attractive equi-
librium point given by (2.1.5).
To give an indication of the behaviour of the double-bracket algorithm two plots of a
simulation have been included, Figures 2.3.1 and 2.3.2. The simulation was run on a random
40 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70 80
Simulation 1
Iterations
Dia
gona
l ele
men
ts o
f es
timat
e
Figure 2.3.1: A plot of the diagonal elements of each iteration, Hk, of the double-bracketalgorithm (2.1.4) run on a 7 � 7 initial conditionH 0 with eigenvalues �1� � � � � 7�. The targetmatrix N was chosen to be diag�1� � � � � 7�.
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70 80
Simulation 1
Iterations
Pote
ntia
l
Figure 2.3.2: The potential��Hk� � jjHk �N jj2 for the double-bracket algorithm (2.1.4).
7� 7 symmetric initial value matrix with eigenvalues 1� � � � � 7. The target matrix N is chosen
as diag�1� � � � � 7� and as a consequence the minimum potential is �� � 0. Figure 2.3.1 is a
plot of the diagonal entries of the recursive estimate Hk. The off-diagonal entries converge to
zero as the diagonal entries converge to the eigenvalues of Hk. Figure 2.3.2 is a plot of the
potential jjHk �N jj2 verses the iteration k. This plot clearly shows the monotonic decreasing
nature of the potential at each step of the algorithm.
The results of Sections 2.1, 2.2 and 2.3 are summarised in the following theorem.
Theorem 2.3.6 Let H0 � HT0 be a real symmetric n� n matrix with eigenvalues �1 � � �
�n. Let N � Rn�n satisfy Condition 2.1.4, and let N be either the constant step-size selection
x2-4 Singular Value Computations 41
(2.2.5) or the variable step-size selection (2.2.9). The double-bracket algorithm
Hk�1 � e��k�Hk�N �Hke�k�Hk�N ��
k � N �Hk��
with initial condition H0, has the following properties:
i) The recursion is isospectral.
ii) IfHk is a solution of the double-bracket algorithm, then��Hk� � jjHk�N jj2 is strictly
monotonically decreasing for every k � N where �Hk� N � �� 0.
iii) Fixed points of the recursive equation are characterised by matrices H � M�H0� such
that
�H�N � � 0�
iv) Fixed points of the recursion are exactly the stationary points of the double-bracket
equation. These points are termed equilibrium points.
v) Let Hk, k � 1� 2� � � �, be a solution to the double-bracket algorithm, then Hk converges
to a matrix H� �M�H0�, �H�� N � � 0, an equilibrium point of the recursion.
vi) All equilibrium points of the double-bracket algorithm are strictly unstable, except
� diag��1� � � � � �n� which is locally asymptotically stable.
vii) The rate of convergence of the double-bracket algorithm to the unique asymptotically
stable equilibrium point is at least linear in a neighbourhood of .
2.4 Singular Value Computations
In this section the task of determining the the singular values of an arbitrary matrix is considered.
A singular value decomposition of a matrixH0 � Rm�n, m n, is a matrix decomposition
H0 � V T�U� �2�4�1�
42 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
where V � O�m�, U � O�n� and
� �
�BBBBBBB�
�1In1 0...
. . ....
0 �rInr
0�m�n��n
�CCCCCCCA � �2�4�2�
Here �1 � �2 � � � � � �r 0 are the distinct singular values of H0, occurring with
multiplicitiesn1� � � � � nr such thatPr
i�1 ni � n. By convention the singular values of a matrix
are chosen to be non-negative. It should be noted that though such a decomposition always
exists and � is unique, there is no unique choice of orthogonal matrices V and U .
Let S�H0� be the set of all orthogonally congruent matrices to H0,
S�H0� � fV TH0U � Rm�n j V � O�m�� U � O�n�g� �2�4�3�
It is shown Helmke and Moore (1994b, pg. 89) that S�H0� is a smooth compact Riemannian
manifold with explicit forms given for its tangent space and Riemannian metric. Following
the articles (Chu 1986, Chu & Driessel 1990, Helmke & Moore 1990, Helmke & Moore
1994b, Smith 1991) consider the task of calculating the singular values of a matrix H0, by
minimising the least squares cost function � : S�H0� � R�, ��H� � jjH � N jj2. It is
shown Helmke and Moore (1990, 1994b) that � achieves a unique local and global minimum
at the point � � S�H0�. Moreover, in the articles (Helmke & Moore 1990, Helmke & Moore
1994b, Smith 1991) the explicit form for the gradient grad� is calculated. The minimizing
gradient flow is
�H � �grad��H� (2.4.4)
� HfH�Ng� fHT � NTgH�
with H�0� � H0 the initial condition. Here the generalised Lie-bracket
fX� Y g :� XTY � Y TX � �fX� Y gT �
is used.
x2-4 Singular Value Computations 43
Condition 2.4.1 Let N be an m� n matrix, with m n,
N �
���������
�1 0...
. . ....
0 �n
0�m�n��n
��
where �1 � �2 � � � � � �n � 0 are strictly positive, distinct real numbers.
For generic initial conditions, and a target matrix N that satisfies Condition 2.4.1, it is
known that (2.4.4) converges exponentially fast to � � S�H0� (Helmke & Moore 1990, Smith
1991). For H0 and N constant m� n matrices, the singular value algorithm proposed is
Hk�1 � e��kfHTk�NT gHke
�kfHk�Ng� �2�4�5�
This algorithm is analogous to the double-bracket algorithm eqLB:eq:DB3.
Lemma 2.4.2 Let H0, N be m � n matrices. For any H � Rm�n define a map H � bH �R�m�n���m�n� , where
bH �
�B� 0m�m H
HT 0n�n
�CA � �2�4�6�
For any sequence of real numbers k, k � 1� � � � �� the iterations,
Hk�1 � e��kfHTk�NT gHke
�kfHk�Ng� �2�4�7�
with initial condition H0 and
bHk�1 � e��k�bHk� bN� bHke
�k�bHk� bN�� �2�4�8�
with initial condition bH0 are equivalent.
Proof Consider the iterative solution to (2.4.8), and evaluate the multiplication in the block
form of (2.4.6). This gives two equivalent iterative solutions, one the transpose of the other,
both of which are equivalent to the iterative solution to (2.4.7).
44 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
Remark 2.4.3 Note that bH0 and bN are symmetric �m� n�� �m� n� matrices, and that as a
result, the iteration (2.4.8) is just the double-bracket algorithm (2.1.4). �
Remark 2.4.4 The equivalence given by this lemma is complete in every way. In particular,
H� is an equilibrium point of (2.4.7) if and only if bH� is an equilibrium point of (2.4.8).
Similarly, Hk � H� if and only if bHk � bH� as k ��. �
This leads one to consider step-size selection schemes for the singular value algorithm
induced by selection schemes which were derived in Section 2.2 for the double-bracket algo-
rithm. Indeed if bN : M�cH0� � R� is a step-size selection scheme for (2.1.4), on M� bH0�,
and Hk � S�H0�, then one can define a time-step k for the singular value algorithm by
k � bN � bHk�� �2�4�9�
Thus, if (2.4.8) equipped with a step-size selection scheme bN , satisfies Condition 2.1.1
and (2.1.7), then from Lemma 2.4.2, (2.4.7) will satisfy similar conditions. For the sake of
simplicity the following development considers only the constant step-size selection scheme
(2.2.5) and the variable step-size selection (2.2.9).
Theorem 2.4.5 Let H0, N be m � n matrices where m n and N satisfies Condition 2.4.1.
Let bN : M�cH0� � R� be either the constant step-size selection (2.2.5), or the variable
step-size selection (2.2.9). The singular value algorithm
Hk�1 � e��kfHTk�NT gHke
�kfHk�Ng�
k � bN � bHk��
with initial condition H0, has the following properties:
i) The singular value algorithm is a self-equivalent (singular value preserving) recursion
on the manifold S�H0�.
ii) If Hk is a solution of the singular value algorithm, then ��Hk� � jjHk �N jj2 is strictly
monotonically decreasing for every k � N where fHk� Ng �� 0 and fHTk � N
Tg �� 0.
x2-4 Singular Value Computations 45
iii) Fixed points of the recursive equation are characterised by matrices H � S�H0� such
that
fHk� Ng � 0 and fHTk � N
Tg � 0� �2�4�10�
Fixed points of the recursion are exactly the stationary points of the singular value
gradient flow (2.4.4) and are termed equilibrium points.
iv) Let Hk, k � 1� 2� � � �, be a solution to the singular value algorithm, then Hk converges
to a matrix H� � S�H0�, an equilibrium point of the recursion.
v) All equilibrium points of the double-bracket algorithm are strictly unstable except �,
given by (2.4.2), which is locally asymptotically stable with at least linear rate of
convergence.
Proof To prove part i) note that the generalised Lie-bracket, fX� Y g � �fX� Y gT , is skew
symmetric, and thus (2.4.5) is an orthogonal congruence transformation and preserves the
singular values of Hk. Also note that the potential ��Hk� �12��
bHk�. Moreover, Lemma
2.4.2 shows that the sequence bHk is a solution to the double-bracket algorithm and thus,
from Proposition 2.1.5, 12��
bHk� must be monotonically decreasing for all k � N such that
� bHk� bN � �� 0, which is equivalent to (2.4.10). This proves part ii), and part iii) follows by noting
that if fHTk � N
Tg � 0 and fHk� Ng � 0, then Hk�l � Hk for l � 1� 2� � � �, and Hk is a fixed
point of (2.4.5). Moreover, since ��Hk� is strictly monotonic decreasing for all fHk� Ng �� 0
and fHTk � N
Tg �� 0, then these points can be the only fixed points. It is known that these are
the only stationary points of (2.4.4) (Helmke & Moore 1990, Helmke & Moore 1994b, Smith
1991).
In order to prove iv) one needs the following characterisation of equilibria of the singular
value algorithm.
Lemma 2.4.6 Let N satisfy Condition 2.4.1 and bN be either the constant step-size selec-
tion (2.2.5), or the variable step-size selection (2.2.9). The singular value algorithm (2.4.5)
equipped with time-steps k � bN � bHk�, has exactly 2nn!�Qri�1�ni!� distinct equilibrium
46 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
points in S�H0�. Furthermore, these equilibrium points are characterised by the matrices
�B� �T 0n��m�n�
0�m�n��n 0�m�n���m�n�
�CA�S��
where � is an n� n permutation matrix, and S � diag��1� � � � ��1� a sign matrix.
Proof Equilibrium points of (2.4.5) are characterised by the two conditions (2.4.10). Since N
satisfies Condition 2.4.1, then setting H � �hij� one has fH�Ng � 0 is equivalent to
�jhji � �ihij � 0� for i � 1� � � �n� j � 1� � � �n�
Similarly, the condition fHT � NTg � 0 is equivalent to
�jhij � �ihji � 0� for i � 1� � � �n� j � 1� � � �n�
hij�j � 0 for i � n� 1� � � �m� j � 1� � � �n�
By manipulating the relationships, and using the distinct, positive nature of the �i, it is
easily shown that hij � 0 for i �� j. Using the fact that (2.4.5) is self equivalent, the only
possible matrices of this form which have the same singular values as H0 are characterised
as above. A simple counting argument shows that the number distinct equilibrium points is
2nn!�Qri�1�ni!�.
The proof of part iv) is now directly analogous to the proof of part c) Proposition 2.1.5. It
remains only to prove part v), which involves the stability analysis of the equilibrium points
characterised by (2.4.10). It is not possible to directly apply the results obtained in Section 2.3
to the double-bracket algorithm bHk, since the bN does not satisfy Condition 2.1.4. However, for
the constant step-size selection scheme induced by (2.2.5), and using analogous arguments to
those used in Lemma 2.3.1 and 2.3.2, it follows that� is the unique locally attractive equilibrium
point of the singular value algorithm. Similarly, by linearizing (2.4.4) for continuous step-size
selection schemes at the point�, it can be shown that the rate of convergence is at least linear in
a neighbourhood of �. Thus, using Lemma 2.4.2 it follows that b� is the unique exponentially
attractive equilibrium point of the double-bracket algorithm on M�cH0�. To obtain the same
results for the variable step-size selection scheme (2.2.9) one applies Proposition 2.3.5 to the
x2-5 Associated Orthogonal Algorithms 47
double-bracket algorithm on M�cH0� and uses the equivalence given by Lemma 2.4.2 to obtain
the same result for the singular value algorithm (2.4.4). This completes the proof.
Remark 2.4.7 The above theorem holds true for any time-steps k � bN� bHk� induced by a
step-size selection scheme, bN , which satisfies Condition 2.1.1, such that Theorem 2.3.6 holds.
�
2.5 Associated Orthogonal Algorithms
In addition to finding eigenvalues or singular values of a matrix it is often desired to determine
the full eigen-decomposition of a matrix, i.e. the eigenvectors related to each eigenvalue. As-
sociated with the double-bracket algorithm and singular value algorithm there are algorithms
evolving on the set of orthogonal matrices which converge to the matrix of orthonormal eigen-
vectors (for the double-bracket algorithm) and separate matrices of left and right orthonormal
singular direction (for the singular value algorithm). To simplify the subsequent analysis one
imposes a genericity condition on the initial condition H0.
Condition 2.5.1 If H0 � HT0 � Rn�n is a real symmetric matrix then assume that H0 has
distinct eigenvalues �1 � � � � � �n. If H0 � Rm�n, where m n, then assume that H0 has
distinct singular values �1 � � � � � �n � 0.
For a sequence of positive real numbers k, for k � 1� 2� � � �, the associated orthogonal
double-bracket algorithm is
Uk�1 � Uke�k�U
TkH0Uk�N �� U0 � O�n�� �2�5�1�
where H0 � HT0 � Rn�n is symmetric. For an arbitrary initial condition H0 � Rm�n the
associated orthogonal singular value algorithm is
Vk�1 � Vke�kfU
TkHT
0 Vk�NT g� V0 � O�m� (2.5.2)
Uk�1 � Uke�kfV
TkH0Uk�Ng� U0 � O�n��
48 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
Note that in each case the exponents of the exponential terms are skew symmetric and thus, the
recursions will remain orthogonal.
Let H0 � HT0 � Rn�n and consider the map g : O�n�� M�H0�, U � UTH0U , which
is a smooth surjection. If Uk is a solution to (2.5.1) observe that
g�Uk�1� � e��k�g�Uk��N �g�Uk�e�k�g�Uk��N �
is the double-bracket algorithm (2.1.4). Thus, g maps the associated orthogonal double-bracket
algorithm with initial condition U0, to the double-bracket algorithm with initial condition
UT0 H0U0, on M�UT
0 H0U0� �M�H0�.
Remark 2.5.2 Consider the potential function : O�n� � R�, �U� � jjUTH0U � N jj2
on the set of orthogonal n� n matrices. Using the standard induced Riemannian metric from
Rn�n on O�n�, the associated orthogonal gradient flow is (Brockett 1988, Chu 1984a, Chu &
Driessel 1990, Helmke & Moore 1994b)
�U � �grad�U� � U �UTH0U�N ��
�
Theorem 2.5.3 Let H0 � HT0 be a real symmetric n�n matrix that satisfies Condition 2.5.1.
Let N � Rn�n satisfy Condition 2.1.4, and let N be either the constant step-size selection
(2.2.5) or the variable step-size selection (2.2.9). The recursion
Uk�1 � Uke�k�U
TkH0Uk�N �� U0 � O�n��
k � N�Hk�
referred to as the associated orthogonal double-bracket algorithm, has the following properties:
i) A solution Uk, k � 1� 2� � � �, to the associated orthogonal double-bracket algorithm
remains orthogonal.
ii) Let �U� � jjUTH0U � N jj2 be a map from O�n� to the set of non-negative reals
R�. Let Uk , k � 1� 2� � � �, be a solution to the associated orthogonal double-bracket
x2-5 Associated Orthogonal Algorithms 49
algorithm. Then �Uk� is strictly monotonically decreasing for every k � N where
�UTk H0Uk � N � �� 0.
iii) Fixed points of the algorithm are characterised by matrices U � O�n� such that
�UTH0U�N � � 0�
There are exactly 2nn! distinct fixed points.
iv) Let Uk, k � 1� 2� � � �, be a solution to the associated orthogonal double-bracket algo-
rithm, then Uk converges to an orthogonal matrix U�, a fixed point of the algorithm.
v) All fixed points of the associated orthogonal double-bracket algorithm are strictly un-
stable, except those 2n points U� � O�n� such that
UT� H0U� � �
where � diag��1� � � � � �n�. Such points U� are locally asymptotically stable with at
least linear rate of convergence and H0 � U�UT� is an eigenspace decomposition of
H0.
Proof Part i) follows directly from the orthogonal nature of e�k �UTkH0Uk�N �. Note that in part
ii) the definition of can be expressed in terms of the map g�U� � UTH0U from O�n� to
M�H0� and the double-bracket potential ��H� � jjH �N jj2 of (2.1.1), i.e.
�Uk� � ��g�Uk���
Observe that g�U0� � UT0 H0U0, and thus, g�Uk� is the solution of the double-bracket algorithm
with initial conditionU T0 H0U0. As the step-size selection scheme N is either (2.2.5) or (2.2.9),
then g�Uk� satisfies (2.1.7). This ensures that part ii) holds.
If Uk is a fixed point of the associated orthogonal double-bracket algorithm with initial
condition UT0 H0U0, then g�Uk� is a fixed point of the double-bracket algorithm. Thus, from
Proposition 2.1.5, �g�Uk�� N � � �UTk H0Uk� N � � 0. Moreover, if �UT
k H0Uk � N � � 0 for
some given k � N, then by inspection Uk�l � Uk for l � 1� 2� � � �, and Uk is a fixed
point of the associated orthogonal double-bracket algorithm. From Lemma 2.1.6 it follows
50 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
that if U is a fixed point of the algorithm then UTH0U � �T� for some permutation
matrix �. By inspection any orthogonal matrix W � SU�T , where S is a sign matrix
S � diag��1� � � � ��1�, is also a fixed point of the recursion, and indeed, any two fixed points
are related in this manner. A simple counting argument shows that there are exactly 2nn!
distinct matrices of this form.
To prove iv), note that since g�Uk� is a solution to the double-bracket algorithm, it converges
to a limit point H� � M�H0�, �H�� N � � 0 (Proposition 2.1.5). Thus Uk must converge
to the preimage set of H� via the map g. Condition 2.5.1 ensures that set generated by the
preimage of H� is a finite distinct set, any two elements U 1� and U2
� of which, are related by
U1� � U2
�S, S � diag��1� � � � ��1�. Convergence to a particular element of this preimage
follows since k�UTk H0Uk� N �� 0 as in Proposition 2.1.5.
To prove part v), observe that the dimension of O�n� is the same as the dimension of
M�H0�, due to the genericity Condition 2.5.1. Thus g is locally a diffeomorphism on O�n�,
which forms an exact equivalence between the double-bracket algorithm and the associated
orthogonal double-bracket algorithm. Restricting g to a local region the stability structure of
equilibria are preserved under the map g�1. Thus, all fixed points of the associated orthogonal
double-bracket algorithm are locally unstable except those that map via g to the unique locally
asymptotically stable equilibrium of the double-bracket algorithm recursion. Observe that due
to the monotonicity of �Uk� a locally unstable equilibrium is also globally unstable.
The proof of the equivalent result for the singular value algorithm is completely analogous
to the above proof.
Theorem 2.5.4 Let H0 � Rm�n where m n satisfies Condition 2.5.1. Let N � Rm�n
satisfy Condition 2.4.1. Let the time-step k be given by
k � bN � bH��where bN is either the constant step-size selection (2.2.5), or the variable step-size selection
scheme (2.2.9), on M� bH0�. The recursion
Vk�1 � Vke�kfU
TkHT
0 Vk�NT g� V0 � O�m�
Uk�1 � Uke�kfV
TkH0Uk �Ng� U0 � O�n��
x2-6 Computational Considerations 51
referred to as the associated orthogonal singular value algorithm, has the following properties:
i) Let �Vk� Uk� be a solution to the associated orthogonal singular value algorithm, then
both Vk and Uk remain orthogonal.
ii) Let �V� U� � jjV TH0U �N jj2 be a map from O�m��O�n� to the set of non-negative
reals R�, then �Vk� Uk� is strictly monotonically decreasing for every k � N where
fV Tk H0Uk� Ng �� 0 and fUT
k HT0 Vk� N
Tg �� 0. Moreover, fixed points of the algorithm
are characterised by matrix pairs �V� U� � O�m��O�n� such that
fV TH0U�Ng � 0 and fUTHT0 V�N
Tg � 0�
iii) Let �Vk� Uk�, k � 1� 2� � � �, be a solution to the associated orthogonal singular value
algorithm, then �Vk� Uk� converges to a pair of orthogonal matrices �V�� U��, a fixed
point of the algorithm.
iv) All fixed points of the associated orthogonal singular value algorithm are strictly unsta-
ble, except those points �V�� U�� � O�m�� O�n� such that
V T� H0U� � ��
where � � diag��1� � � � � �n� � Rm�n. Each such point �V�� U�� is locally exponentially
asymptotically stable and H0 � V T� �U� is a singular value decomposition of H0.
2.6 Computational Considerations
There are several issues involved in the implementation of the double-bracket algorithm as
a numerical tool which have not been dealt with in the body of this chapter. Design and
implementation of efficient code has not been considered and would depend heavily on the
nature of the hardware on which such a recursion would be run. As each iteration requires
the calculation of a time-step, an exponential and a k � 1 estimate it is likely that it would
be best to consider applications in parallel processing environments. Certainly in a standard
computational environment the exponential calculation would limit the possible areas of useful
application of the algorithms proposed.
52 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
It is also possible to consider approximations of the double-bracket algorithm which have
good computational properties. For example, consider a (1,1) Pade approximation to the matrix
exponential
e�k�Hk�N � � 2In � k �Hk� N �
2In � k �Hk� N ��
Such an approach has the advantage that, as �Hk� N � is skew symmetric, the Pade approximation
will be orthogonal, and will preserve the isospectral nature of the double-bracket algorithm.
Similarly an �n� n� Pade approximation of the exponential for any n will also be orthogonal.
There are difficulties involved in obtaining direct step-size selection schemes based on Pade
approximate double-bracket algorithms. Trying to guarantee that the potential � is monotonic
decreasing for such schemes by choosing step-size selection schemes based on the estimation
techniques developed in Section 2.2 yields time-steps which are prohibitively small. A good
heuristic choice of step-size selection scheme, however, can be made based on the selection
schemes given in this chapter and simulations indicate that the Pade approximate double-bracket
algorithm is viable when this is done.
2.7 Open Questions and Further Work
One of the fundamental problems tackled in this chapter is the task of step-size selection. The
best step-size selection scheme developed (2.2.9) is unsatisfactory in several ways; it is not
continuous at critical points of the cost function and it is computationally expensive to evaluate.
A better general understanding of the step-size selection task would be desirable. In particular,
it may be possible to develop linear search techniques that are guaranteed to converge to the
optimal step-size, obviating the need for approximations.
One of the primary motivations for studying the symmetric eigenvalue problem from a dy-
namical systems perspective is the potential for applications to on-line and adaptive processes.
It is instructive to consider how the double-bracket algorithm can be modified to deal with
time-varying data. Subsection 2.7.1 is by no means a comprehensive treatment of this issue,
nevertheless, it provides an indication of how such a task may be approached. To go beyond
the treatment of Subsection 2.7.1 it would be desirable to consider a particular application and
refine the algorithm to provide a useful numerical technique.
x2-7 Open Questions and Further Work 53
2.7.1 Time-Varying Double-Bracket Algorithms
Consider a sequence of ‘input’ matrices Ak � ATk for which an estimate of the eigenvalues
of Ak at each time k is required. One assumes that the spectrum of each Ak is related, for
example the sequence Ak is slowly varying with k. If the sequence Ak is a noisy observation
of some time-varying process or contains occasional large deviations then a sensible algorithm
for estimating the spectrum of Ak�1 would exploit the full data sequence A0� � � � � Ak along
with the new data Ak�1 to generate a new estimate. A gradient descent algorithm achieves this
in a fundamental manner since each new estimate is based on a small change in the previous
estimate which in turn is based on the data sequence up to time k. The issue of constraint
stability is of importance in such situations since the presence of small errors in the constraint
at each step may eventually lead the estimates to stray some distance from the true spectrum.
Given a symmetric matrix H0 � HT0 and a diagonal matrix N � diag��1� � � � � �N�
satisfying Condition 2.1.4 consider the potential
�U� � jjUTH0U �N jj2�
In Section 2.5 the relationship UTH0U � M�H0� was exploited to display the connections
between the double-bracket algorithm and the associated orthogonal algorithm. However, it is
also possible to rewrite the potential as
�U� � jjUNUT �H0jj2F �
Similarly, the associated orthogonal algorithm itself can be rewritten with the matrixNmodified
by an orthogonal congruency transformation
Uk�1 � Uke�k�U
TkH0Uk�N �
� e�k�H0�UkNUTk�Uk �
The advantage of this formulation is the fact that matrixH0 appears explicitly in the algorithm.
The time-varying associated orthogonal double-bracket algorithm is defined to be
Uk�1 � e�k�Ak�UkNkUTk�Uk� U0 � I� �2�7�1�
54 Numerical Gradient Algorithms for Eigenvalue Calculations Chapter 2
An estimate of the spectral decomposition of Ak�1 is given by Hk�1 � UTk�1Ak�1Uk�1.
Observe that the eigenvector decomposition of Ak�1 is derived from the data sequence up to
timek and is applied toAk�1 to approximate a spectral decompositionAk�1 � Uk�1Hk�1UTk�1
where it is hoped that Hk�1 is nearly diagonal.
IfAk � H0 is constant it is easily seen that the time-varying associated orthogonal algorithm
reduces to the standard associated orthogonal algorithm. Also each step of the time-varying
algorithm will reduce the potential jjAk � Uk�1NUTk�1jj � jjAk � UkNUT
k jj. Thus, as long
as the sequence of matrices Ak does not vary too quickly with time the proposed algorithm
should converge to and track the spectral decomposition of Ak .
Remark 2.7.1 A time-varying dual singular value decomposition algorithm is fully analogous
to the development given above. �
Remark 2.7.2 If the sequence Ak is a stationary stochastic process it may be sensible to
replace the driving term Ak in (2.7.1) by Bk �1n
Pnk�1 Ak . �
Chapter 3
Gradient Algorithms for Principal
Component Analysis
The problem of principal component analysis of a symmetric matrixN � NT is that of finding
an eigenspace of specified dimension p 1 which corresponds to the maximal p eigenvalues of
N . There are a number of classical algorithms available for computing dominant eigenspaces
(principal components) of a symmetric matrix. A good reference for standard numerical
methods is Golub and Van Loan (1989).
There has been considerable interest in the last decade in using dynamical systems to solve
linear algebra problems (cf. the review (Chu 1988) and the recent monograph (Helmke &
Moore 1994b)). It is desirable to consider the relationship between such methods and classical
algebraic methods. For example, Deift et al. (1983) investigated a matrix differential equation
based on the Toda flow, the solution of which (evaluated at integer times) is exactly the sequence
of iterates generated by the standard QR algorithm. In general, dynamical system solutions of
linear algebra problems do not interpolate classical methods exactly. Discrete computational
methods based on dynamical system solutions to a given problem provide a way of comparing
classical algorithms with dynamical system methods. Recent work on developing numerical
methods based on dynamical systems insight is contained Brockett (1993) and Moore et al.
(1994).
Concentrating on the problem of principal component analysis, Ammar and Martin (1986)
55
56 Gradient Algorithms for Principal Component Analysis Chapter 3
have studied the power method (for determining the dominant p-dimensional eigenspace of
a symmetric matrix) as a recursion on the Grassmannian manifold Gp�Rn�, the set of all p-
dimensional subspaces of Rn. Using local coordinate charts on Gp�Rn� Ammar and Martin
(1986) show that the power method is closely related to the solution of a matrix Riccati
differential equation. Unfortunately, the solution to a matrix Riccati equation may diverge to
infinity in finite time. Such solutions correspond to solutions that do not remain in the original
local coordinate chart. Principal component analysis has also been studied by Oja (1982, 1989)
in relation to understanding the learning performance of a single layer neural network with n
inputs and p neurons. Oja’s analysis involves computing the limiting solution of an explicit
matrix differential equation evolving on Rn�p (there is no requirement for local coordinate
representations). The evolution of the system corresponds to the ‘learning’ procedure of the
neural network while the columns of the limiting solution span the principal component of the
covariance matrix N � EfukuTk g (where EfukuTk g is the expectation of ukuTk ) of the vector
random process uk � Rn, k � 1� 2� � � �, with which the network was ‘trained’. Recent work by
Yan et al. (1994) has provided a rigourous analysis of the learning equation proposed by Oja.
Not surprisingly, it is seen that the solution to Oja’s learning equations is closely related to the
solution of a Riccati differential equation.
In this chapter I investigate the properties of Oja’s learning equation restricted to the Stiefel
manifold (the set of all n � p real matrices with orthonormal columns). Explicit proofs of
convergence for the flow are presented which extend the results of Yan et al. (1994) and Helmke
and Moore (1994b, pg. 26) so that no genericity assumptionis required on the eigenvalues ofN .
The homogeneous nature of the Stiefel manifold is exploited to develop an explicit numerical
method (a discrete-time system evolving on the Stiefel manifold) for principal component
analysis. The method proposed is a gradient descent algorithm modified to evolve explicitly
on St(p� n). A step-size must be selected for each iteration and a suitable selection scheme is
proposed. Proofs of convergence for the proposed algorithm are given as well as modifications
and observations aimed at reducing the computational cost of implementing the algorithm on
a digital computer. The discrete method proposed is similar to the classical power method and
steepest ascent methods for determining the dominant p-eigenspace of a matrix N . Indeed, in
the case where p � 1 (for a particular choice of time-step) the discretization is shown to be the
power method. When p � 1, however, there are subtle differences between the methods.
x3-1 Continuous-Time Gradient Flow 57
This chapter is based on the journal paper (Mahony et al. 1994). Applications of the same
ideas have also been considered in the field of linear programming (Mahony & Moore 1992).
The chapter is organised into five sections including the introduction. Section 3.1 reviews
the derivation of the continuous-time matrix differential equation considered and gives a general
proof of convergence. In Section 3.2 a discrete-time iteration based on the results in Section 3.1
is proposed along with a suitable choice of time-step. Section 3.3 considers two modifications of
the scheme to reduce the computational cost of implementing the proposed numerical algorithm.
Section 3.4 considers the relationship of the proposed algorithm to classical methods while
Section 3.5 indicates further possibilities arising from this development.
3.1 Continuous-Time Gradient Flow
In this section a dynamical systems solution to the problem of finding the principal component
of a matrix is developed. The approach is based on computing the gradient flow associated
with a generalised Rayleigh quotient function. The reader is referred to Warner (1983) for
technical details on Lie-groups and homogeneous spaces.
LetN � NT be a real symmetricn�nmatrix with eigenvalues�1 �2 � � � �n and an
associated set of orthonormal eigenvectors v1� � � � � vn. A maximal p-dimensional eigenspace,
or maximal p-eigenspace of N is spfv1� � � � � vpg the subspace ofRn spanned by fv1� � � � � vpg.
If �p � �p�1 then the maximal p-eigenspace of N is unique. If �p � �p�1 � � �p�r,
for some r � 0, then any subspace spfv1� � � � � vp�1� wg where w � spfvp� vp�1� � � � � vp�rg is
a maximal p-eigenspace of N .
For p an integer with 1 � p � n, let
St(p� n) � fX � Rn�p j XTX � Ipg� �3�1�1�
where Ip is the p�p identity matrix, denote the Stiefel manifold of real orthogonaln�pmatrices.
ForX � St(p� n), the columns ofX are orthonormal basis vectors for a p-dimensional subspace
of Rn.
Lemma 3.1.1 The Stiefel manifold St(p� n) is a smooth compact np� 12p�p� 1�-dimensional
58 Gradient Algorithms for Principal Component Analysis Chapter 3
submanifold ofRn�p. The tangent space of St(p� n), at a point X is given by
TX St(p� n) � f�X �X� j � � Sk�n��� � Sk�p�g� �3�1�2�
whereSk�n�, Sk�p� are the set ofn�n (respectively p�p) skew symmetric matricesA � �AT ,
A � Rn�n.
Proof It can be shown that St(p� n) is a regular1 level set of the function X � XTX � Ip
(Helmke & Moore 1994b, pg. 25). In this chapter, however, the homogeneous structure of
St(p� n) is important and it is best to introduce this structure here. Define G � O�n�� O�p�
to be the topological product of the set of n � n and p � p real orthogonal matrices O�n� �
fU � Rn j UTU � UUT � Ing. Observe that G is a compact Lie-group (Helmke & Moore
1994b, pg. 348) with group multiplication given by matrix multiplication �U1� V1� �U2� V2� �
�U1U2� V1V2�. Define a map � : G� St(p� n) � St(p� n)
���U� V �� X� :� UXV T � �3�1�3�
It is easily verified that � is a smooth, transitive, group action on St(p� n). Since G is compact
it follows that St(p� n) is a compact embedded submanifold ofRn�p (Helmke & Moore 1994b,
pg. 352). The tangent space of St(p� n) at a point X � St(p� n) is given by the image of the
linearization of �X : G� St(p� n), �X�U� :� ��U�X�, at the identity element of G (Gibson
1979, pg. 75). Recall that the tangent space of O�n� at the identity is TInO�n� � Sk�n�
(Helmke & Moore 1994b, pg. 349) and consequently that the tangent space at the identity of
G is T�In�Ip�G � Sk�n�� Sk�p�. Computing the linearization of �X gives
D�X j�In�Ip������ � �X �X��
where D�X j�In�Ip������ is the Frechet derivative of �X at �In� Ip� in direction ����� �T�In�Ip�G.
1Given a function f : M � N between two smooth manifolds, a regular point p � M is a point where thetangent map Tpf : TpM � Tf�p�N is surjective. Given q � N let U � fp � M j f�p� � qg then U is knownas a regular level set if for each p � U then Tpf is surjective. It can be shown (using the inverse function theorem)that regular level sets are embedded submanifolds of M (Hirsch 1976, pg. 22).
x3-1 Continuous-Time Gradient Flow 59
The Euclidean inner product onRn�n �Rp�p is
h��1��1�� ��2��2�i � tr��T1 �2� � tr��T
1 �2��
This induces a non-degenerate inner product on T�In�Ip�G. Given X � St(p� n) then the
linearization T�In�Ip��X decomposes the identity tangent space into
T�In�Ip�G � kerT�In�Ip��X � dom T�In�Ip��X �
where kerT�In�Ip��X is the kernel of T�In�Ip��X and
dom T�In�Ip��X � f��1��1� � T�In�Ip�G j h��1��1�� �����i� 0� ����� � kerT�In�Ip��Xg
is the domain of T�In�Ip��X (the subspace orthogonal to kerT�In�Ip��X using the Euclidean
inner product provided on T�In�Ip�G). By construction, T�In�Ip��X restricts to a vector space
isomorphism T��In�Ip�
�X ,
T��In�Ip��X : dom T�In�Ip��X � TX St(p� n)
T��In�Ip��X�X� :� T�In�Ip��X�X��
The normal Riemannian metric (cf. Section 5.3 or Helmke and Moore (1994b, pg. 52)) on
St(p� n) is the non-degenerate bilinear map on each tangent space
hh�1X �X�1��2X �X�2ii � tr����1 �T��2 � � tr����1 �
T��2 � �3�1�4�
where �iX �X�i � TX St(p� n) for i � 1� 2 and
��i��i� � ���i��� ��i���� ���i ���i �
is the decomposition of ��i��i� into components in kerT�In�Ip��X and dom T�In�Ip��X re-
spectively. It is easily verified that hh� ii varies smoothly with X and defines a Riemannian
metric.
Remark 3.1.2 It can be shown that for p � 1, St�1� n� � fx � Rn j jjxjj � 1g � Sn�1, the
�n� 1�-dimensional sphere inRn. The tangent space of Sn�1 is TxSn�1 � f � Rn j Tx �
60 Gradient Algorithms for Principal Component Analysis Chapter 3
0g, and the normal metric is hh� �ii� T�, for , � in TxSn�1 (Helmke & Moore 1994b, pg.
25). �
The classical Rayleigh quotient is the map rN : Rn � f0g � R,
rN�x� �xTNx
xTx�
This is generalised to the Stiefel manifold St(p� n) as a function termed the generalised Rayleigh
quotient
RN : St(p� n) � R� RN�X� � tr�XTNX�� �3�1�5�
The Ky-Fan minimax Principle states (Horn & Johnson 1985, pg. 191)
maxX St(p� n)
RN�X� � �1 � � � �� �p
minX St(p� n)
RN�X� � �n�1�p � � � �� �n�
Moreover, if X � St(p� n), such that RN�X� �Pp
j�1 �i, then the columns of X will generate
a basis for a maximal p-dimensional eigenspace of N .
Theorem 3.1.3 Given N � NT a real symmetric n � n matrix and p be an integer with
1 � p � n. Denote the eigenvalues of N by �1 � � � �q with algebraic multiplicities
n1� � � � � nq such thatPq
i�1 ni � n. For X � St(p� n), define the generalised Rayleigh quotient
RN : St(p� n) � R, RN�X� � tr�XTNX�. Then,
i) The gradient of RN�X�, on the Stiefel manifold St(p� n), with respect to the normal
Riemannian metric (3.1.4), is
gradRN �X� � �In �XXT�NX � �XXT � N �X� �3�1�6�
ii) The critical points of RN�X� on St(p� n) are characterised by
�XXT � N � � 0
and correspond to pointsX � St(p� n), such that the columns ofX span a p-dimensional
eigenspace of N .
x3-1 Continuous-Time Gradient Flow 61
iii) For all initial conditions X0 � St(p� n), the solution X�t� � St(p� n) of
d
dtX � gradRN�X��
� �In �XXT�NX X�0� � X0 (3.1.7)
exists for all t � Rand converges to some matrix X� � St(p� n) as t��. For almost
all initial conditions the solutionX�t� of (3.1.7) converges exponentially fast to a matrix
whose columns form a basis for the maximal p-eigenspace of N .
iv) When p � 1 the exact solution to (3.1.7) is given by
x�t� �etNx0
jjetNx0jj �3�1�8�
where x0 � Sn�1 � St�1� n�.
Proof The gradient of RN is computed using the identities
�i�� DRN jX�� � hhgradRN�X�� ii� � TX St(p� n)
�ii�� gradRN �X� � TX St(p� n)�
where DRN jX�� is the Frechet derivative of RN�X� in direction � TX St(p� n) evaluated
at the point X � St(p� n). Computing the Frechet derivative of RN in direction �X �X� �TX St(p� n) gives
DRN jX��X �X�� � 2tr�XTN��X �X���
� 2tr�XXTN��� 2tr�XTNX���
Observe that tr�XTNX�� � 0 since XTNX is symmetric and � is skew symmetric. Simi-
larly, only the skew symmetric part of XXTN contributes to tr�XXTN��. Thus,
DRN jX��X �X�� � tr��XXT � N ���
� hh�N�XXT �X��X �X��ii�
using the Riemannian metric (3.1.4). The second line follows since any component of
62 Gradient Algorithms for Principal Component Analysis Chapter 3
�N�XXT � that lies in kerT�In�Ip��X does not contribute to the value of tr��XXT � N ���
since one may choose � � dom T�In�Ip��X and of course �N�XXT� � Sk�n� which ensures
�N�XXT �X � TX St(p� n). This proves part i).
At critical points of RN the gradient gradRN�X� is zero, �N�XXT �X � 0. Consider the
orthogonal change of coordinates U � O�n�
X� � UX �
�B� Ip
0�n�p�p�
�CAand
N� � UNUT �
�B� N�11 N�
12
N�21 N�
22
�CA �
where N�11 � Rp�p, N�
12 � �N�21�
T � Rp��n�p� and N�22 � R�n�p���n�p�. Observe that
U �N�XXT �X � �N�� X��X��T �X�
�
�B� 0 �N�12
N�21 0
�CAX� �
�B� 0
N�21
�CA� 0
since �N�XXT�X � 0. Consequently, N�21 � 0 and thus �N �� X��X��T � � 0. It follows that
�N�XXT � � 0. Observe that gradRN�X� � 0 �� NX � X�XTNX� and the columns of
X form a basis for a p-eigenspace of N .
Infinite time existence of solutions to (3.1.7) follows from the compact nature of St(p� n).
By applying La’Salles invariance principle, it is easily verified that X�t� converges to a level
set of RN for which gradRN�X� � 0. These sets are termed critical level sets and denoted
Lr.
Lemma 3.1.4 Given N andRN as above. The critical level sets of RN in St(p� n) are the sets
Lr � fX � St(p� n) j RN�X� �qXi�1
ri�i� gradRN�X� � 0g�
which are indexed by vectors r � �r1� r2� � � � � rq�, such that each ri is an integer between zero
and ni inclusive �0 � ri � ni� and the sumPq
i�1 ri � p. For any X � Lr then the columns
x3-1 Continuous-Time Gradient Flow 63
of X span an eigenspace of N associated with the union of ri eigenvalues �i for i � 1� � � � � q.
Each Lr is an embedded submanifold of St(p� n). The tangent space of Lr is given by
TXLr � f�X �X� j � � Sk�n�� � � Sk�p�� and �N� ��� XXT �� � 0g� �3�1�9�
Proof The columns of a critical point X � St(p� n) of RN span a p-eigenspace of N and
thus, the eigenvalues of XTNX are a subset of p eigenvalues of N . Index this subset by
a vector r � �r1� r2� � � � � rq�, such that each ri is an integer between zero and ni inclusive
�0 � ri � ni�, the sumPq
i�1 ri � p and each ri represents the algebraic multiplicity of �i as
an eigenvalue of XTNX . It follows directly that RN�X� �Pq
i�1 ri�i and thus the collection
of sets Lr are the critical level sets of RN .
Since N is symmetric there exists U� � O�n� such that N � UT� U� and
�
�BBBB��1In1 0 0
0. . . 0
0 0 �qInq
�CCCCA �
with Ini the ni � ni identity matrix. To show that the critical level sets Lr are embedded
submanifolds of St(p� n) it is convenient to consider the problem where N is replaced by
directly. In this case the critical level sets Lr� of R� on St(p� n) are exactly Lr� � UT� Lr.
The map X � U�X is a diffeomorphism of St(p� n) into itself which preserves submanifold
structure.
LetH � O�n1��O�n2�� �O�nq��O�p� and observe thatH is a compact Lie group.
Given an arbitrary index r consider the map � : H � Lr� � Lr
�,
���U1� U2� � � �Uq� V �� X� � UXV T � where U �
�BBBB�U1 0 0
0. . . 0
0 0 Uq
�CCCCA �
Observe that UTU � In and consequently R�����U� V �� X�� � R��X�. Moreover,
gradR�����U� V �� X�� � �� UXXTUT �UXV T
� UT �� XXT �XV T � 0
64 Gradient Algorithms for Principal Component Analysis Chapter 3
since it is assumed that gradR��X� � �� XXT�X � 0. It follows that� is a group action ofH
onLr
�. IfX andY are both elements ofLr� thenXTX andY TY have the same eigenvalues
and are orthogonally similar, i.e. there exists V � O�p� such that V TXTXV � Y TY . By
inspection one can find U � �U1� U2� � � �Uq� such that UXV � Y which shows that � is a
transitive group action onLr
�. It follows thatLr
� is itself a homogeneous space (with compact
Lie transformation group) and hence is an embedded submanifold of St(p� n) (Helmke &
Moore 1994b, pg. 352). Since X � U�X is a diffeomorphism of St(p� n) into itself this
shows that Lr � U�Lr
� is also an embedded submanifold of St(p� n).
Observe that any curve Y �t� � U�t�XV T �t�, Y �0� � X , lying in Lr will satisfy
�N� Y �t�Y �t�T �Y �t� � 0. Similarly, it is easily verified that any curve (passing through
Y �0� � X � Lr) satisfying this equality must lie in Lr. Thus, the tangent space TXLr is
given by the equivalence classes of the derivatives at time t � 0, of curves Y �t� such that
�N� Y �t�Y �t�T �Y �t� � 0. Let �U�0� � � � Sk�n� and �V �0� � � � Sk�p� then
d
dt
��N� Y �t�Y �t�T �Y �t�
�����t�o
� �N��XXT �XXT��X � �N�X�XT �X�XT �X
� �N�XXT ��X � �N�XXT �X�
� �N� ��� XXT���
since �N�XXT � � 0 (cf. part ii) Theorem 3.1.3). But this is just the definition (3.1.9) and the
result is proved.
Now at a critical point of RN the Hessian HRN is a well defined bilinear map from
TX St(p� n) to the reals (Helmke & Moore 1994b, pg. 344). Let ��1X �X�1� � TX St(p� n)
and ��2X �X�2� � TX St(p� n) be arbitrary then
HRN ��1X �X�1��2X �X�2� � D�1X�X�1
�D�2X�X�2RN �X�
� D�1X�X�1tr��XXT � N ��2�
� tr����1� XXT �� N ��2��
Observe that ��1� XXT � is skew symmetric since XXT is symmetric and �1 is skew sym-
metric. Similarly, ���1� XXT �� N � is skew symmetric. Since �1 and �2 are arbitrary then
HRN is degenerate in exactly those tangent direction ��X � X�� � TX St(p� n) for which
x3-2 A Gradient Descent Algorithm 65
���� XXT�� N � � 0. But this corresponds exactly to (3.1.9) and one concludes that the Hessian
HRN degenerates only on the tangent space of Lr. It is now possible to apply Lemma 5.5.2 to
complete the proof of part iii).
Part iv) of the theorem is verified by explicitly evaluating the derivative of (3.1.8).
Remark 3.1.5 In the case 1 � p � n no exact solution to (3.1.7) is known, however, for X�t�
a solution to (3.1.7) the solution for H�t� � X�t�X�t�T is known since
�H�t� � �XXT �X �XT
� NXXT �XXTN � 2XXTNXXT
� NH�t� �H�t�N � 2H�t�NH�t�� (3.1.10)
H�0� � X0XT0 and this equation is a Riccati differential equation (Yan, Helmke & Moore
1994). �
3.2 A Gradient Descent Algorithm
In this section a numerical algorithm for solving (3.1.7) is proposed. The algorithm is based
on a gradient descent algorithm modified to ensure that each iteration lies in St(p� n).
Let X0 � St(p� n) and consider the recursive algorithm generated by
Xk�1 � e��k�XkXTk�N �Xk� �3�2�1�
for a sequence of positive real numbers k, termed time-steps. The algorithm generated by
(3.2.1) is referred to as the Rayleigh gradient algorithm. The Lie-bracket �XkXTk � N � is skew
symmetric and consequently e��k�XkXTk�N � is orthogonal and Xk�1 � St(p� n). Observe also
thatd
d�e�� �XkX
Tk�N �Xk
������0
� �In �XkXTk �NXk � gradRN�Xk��
the gradient of RN at Xk. Thus, e�� �XkXTk�N �Xk represents a curve in St(p� n), passing
through Xk at time � � 0, and with first derivative equal to gradRN�X�. The linearization of
66 Gradient Algorithms for Principal Component Analysis Chapter 3
Xk�1��� � e�� �XkXTk�N �Xk around � � 0 is
Xk�1��� � Xk � �gradRN�Xk� � �higher order terms��
The higher order terms modify the basic gradient descent algorithm onRn�p to ensure that the
interpolation occurs along curves in St(p� n). For suitably small time-steps k, it is clear that
(3.2.1) will closely approximate the gradient descent algorithm onRn�p.
To implement the Rayleigh gradient algorithm it is necessary to choose a time-step k ,
for each step of the recursion. A convenient criteria for determining suitable time-steps is to
maximise the change in potential
�RN�Xk� k� � RN�Xk�1��RN�Xk�� �3�2�2�
It is possible to use line search techniques to determine the optimal time-step for each iteration
of the algorithm. Completing a line search at each step of the iteration, however, is compu-
tationally expensive and often results in worse stability properties for the overall algorithm.
Instead, a simple deterministic formulae for the time-step based on maximising a lower bound
�RlN�Xk� �� for (3.2.2) is provided.
Lemma 3.2.1 For any Xk � St(p� n) such that gradRN�Xk� �� 0, the recursive estimate
Xk�1 � e��k�XkXTk�N �Xk, where
k �jj�XkX
Tk � N �jj2
2ppjjN �XkXT
k � N �2jj � �3�2�3�
satisfies�RN�Xk� k� � RN�Xk�1�� RN�Xk� � 0.
Proof Denote Xk�1��� � e�� �XkXTk�N �Xk for an arbitrary time-step � . Direct calculations
show
�R�N�Xk� �� � �2tr�XT
k�1���N �XkXTk � N �Xk�1����
�R��N�Xk� �� � �4tr�XT
k�1���N �XkXTk � N �2Xk�1�����
x3-2 A Gradient Descent Algorithm 67
Taylor’s formula for �RN�Xk� �� gives
�RN�Xk� �� � �2� tr�XTk N �XkX
Tk � N �Xk�
� 4� 2Z 1
0tr�XT
k�1���N �XkXTk � N �2Xk�1�����1� s�ds
2� jj�XkXTk � N �jj2
� 4� 2Z 1
0jjXk�1���X
Tk�1���jjjjN �XkX
Tk � N �2jj�1� s�ds
� 2� jj�XkXTk � N �jj2� 2� 2ppjjN �XkX
Tk � N �2jj �: �Rl
N�Xk� ��
The quadratic nature of RlN�Xk� �� yields a unique maximum occurring at � � k given by
(3.2.3). Observe that if gradRN�Xk� �� 0 then jj�XkXTk � N �jj2 �� 0 and thus Rl
N�Xk� �� � 0.
The result follows since �RN�Xk� �� �RlN�Xk� �� � 0.
Theorem 3.2.2 Given N � NT a real symmetric n � n matrix and p be an integer with
1 � p � n. Denote the eigenvalues of N by �1 � � � �n. For a given estimate
Xk � St(p� n), let k be given by (3.2.3). The Rayleigh gradient algorithm
Xk�1 � e��k�XkXTk�N �Xk�
has the following properties.
i) The algorithm defines an iteration on St(p� n).
ii) Fixed points of the algorithm are critical points of RN , X � St(p� n) such that
�XXT � N � � 0. The columns of a fixed point of (3.2.1) form a basis for a p-dimensional
eigenspace of N .
iii) If Xk, for k � 1� 2� � � �, is a solution to the algorithm then the real sequence RN�Xk�
is strictly monotonic increasing unless there is some k � N with Xk a fixed point of the
algorithm.
iv) Let Xk, for k � 1� 2� � � �, be a solution to the algorithm, then Xk converges to a critical
level set of RN on St(p� n).
v) All critical level sets of RN are unstable except the set for which the Rayleigh quotient
is maximised. The columns of an element of the maximal critical level set form a basis
for the maximal eigenspace of N .
68 Gradient Algorithms for Principal Component Analysis Chapter 3
Proof Part i) follows from the observation that e��k�XkXTk�N � is orthogonal. Part ii) is a direct
consequence of Lemma 3.2.1 (since �RN�Xk� k� � 0 if and only if Xk is a fixed point) and
Theorem 3.1.3. Part iii) also follows directly from Lemma 3.2.1.
To prove part iv) observe that since St(p� n) is a compact set, RN�Xk� is a bounded
monotonically increasing sequence which must converge. As a consequence Xk converges
to some level set of RN such that for any X in this set �RN�X� �X�� � 0. Lemma 3.2.1
ensures that any X in this set is a fixed point of the recursion.
IfX is a fixed point of the recursion whose columns do not span the maximal p-dimensional
subspace of N then it is clear that there exists an orthogonal matrix U � O�n�, with jjU � Injjarbitrarily small and such that RN�UX� � RN�X�. As a consequence, the initial condition
X0 � UX (jjX0�X jj small) will give rise to a sequence of matricesXk that diverges from the
level set containing X , Lemma 3.2.1. This proves the first statement of v) while the attractive
nature of the remaining fixed points follows from La’Salle’s principle of invariance along with
the Lyapunov function V �X� ��Pp
i�1 �i �RN�X�.
Remark 3.2.3 It is difficult to characterise the exact basin of attraction for the set of matrices
whose columns span the maximal p-eigenspace of N . It is conjectured that the attractive basin
for this set is all of St(p� n) except for other critical points. �
Remark 3.2.4 For a fixed initial condition X0 � St(p� n) let Xk be the solution to (3.2.1).
Define Hk � XkXTk and observe
Hk�1 � e��k�Hk�N �Hke�k�Hk�N �� �3�2�4�
Thus, Hk can be written as a recursion on the set of symmetric rank p projection matrices
fH � Rn�n j H � HT � H2 � H� rank H � pg. The algorithm generated in this manner is
known as the double-bracket algorithm (cf. Chapter 2), a discretization of the continuous-time
double-bracket equation (3.1.10) �
x3-3 Computational Considerations 69
3.3 Computational Considerations
In this section two issues related to implementing (3.2.1) is a digital environment are discussed.
Results in both the following subsections are aimed at reducing the computational cost asso-
ciated with estimating the matrix exponential e��k �XkX�k�N �, a transcendental n � n matrix
function. The result presented in Subsection 3.3.1 is also important in Section 3.4.
3.3.1 An Equivalent Formulation
To implement (3.2.1) on conventional computer architecture the main computational cost for
each step of the algorithm lies in computing the n � n matrix exponential e��k�XkX�k�N �. The
following result provides an equivalent formulation of the algorithm which involves the related
p� p transcendental matrix functions “cos” and “sinc”.
Define the matrix function sinc : Rp�p � Rp�p by the convergent infinite sum
sinc�A� � Ip � A2
3!�A4
5!� A6
7!� �
Observe that Asinc�A� � sin�A� and thus, if A is invertible, sinc�A� � A�1 sin�A�. Define
the matrix function cos�A� by an analogous power series expansion. The matrix functions cos
and sinc are related by cos2�A� � Ip �A2sinc2�A�.
Lemma 3.3.1 GivenN � NT a real symmetric n�n matrix with eigenvalues �1 � � � �n,
let k, for k � 1� 2� � � �, be a sequence of real positive numbers. For X0 � St�p� n� an initial
condition that is not a critical point of RN�X�, then,
Xk�1 � e��k�XkXTk�N �Xk�
� Xk
�cos� kYk�� kX
Tk NXksinc� kYk�
�� kNXksinc� kYk�� (3.3.1)
where the power expansions for cos� kYk� and sinc� kYk� are determined by the positive
semi-definite matrix Y 2k � Rp�p
Y 2k � XT
k N2Xk � �XT
k NXk�2� �3�3�2�
70 Gradient Algorithms for Principal Component Analysis Chapter 3
Remark 3.3.2 The matrix Yk need not be explicitly calculated as the power series expansions
of sinc and cos depend only on Y 2k . �
Proof The proof follows from a power series expansion of e��k �XkXTk�N �Xk,
Xk�1 �
��Xl�0
1l!�� k�XkX
Tk � N ��l
�Xk �3�3�3�
Simple algebraic manipulations lead to the relation,
�XkXTk � N �2Xk � �XkY
2k �3�3�4�
where Y 2k is defined by (3.3.2). Pre-multiplying (3.3.4) by�XT
k provides an alternative formula
for Y 2k
Y 2k � XT
k �XkXTk � N �T �XkX
Tk � N �Xk�
which is positive semi-definite.
Using (3.3.4) it is possible to rewrite (3.3.3) as a power series in ��Y 2k �
Xk�1 ��Xm�0
��� k�2m
�2m�!Xk��Y 2
k �m �
�� k�2m�1
�2m� 1�!
�Xk�X
Tk NXk��NXk
���Y 2
k �m
��
�3�3�5�
where the first and second terms in the summation follow from the odd and the even power
powers of �XkXTk � N �lXk respectively. Rewriting this as two separate power series in ��Y 2
k �
Xk�1 � Xk
�Xm�0
�� k�2m
�2m�!��Y 2
k �m � k
�Xk�X
Tk NXk��NXk
� �Xm�0
�� k�2m
�2m� 1�!��Y 2
k �m
� Xk cos� kYk�� k�Xk�X
Tk NXk��NXk
�sinc� kYk��
and the result follows by rearranging terms.
3.3.2 Pade Approximations of the Exponential
It is also of interest to consider approximate methods for calculating matrix exponentials. In
particular, one is interested in methods that will not violate the constraint Xk�1 � St(p� n). A
standard approximation used for calculating the exponential function is a Pade approximation
x3-4 Comparison with Classical Algorithms 71
of order �n�m� where n � 0 and m � 0 are integers (Golub & Van Loan 1989, pg. 557). For
example, a (1,1) Pade approximation of the exponential is
e��k �XkX�k�N � � �In �
k2�XkX
�k� N ���1�In � k
2�XkX
�k� N ���
A key observation is that when n � m and the exponent is skew symmetric the resulting Pade
approximate is orthogonal. Thus,
Xk�1 � �In � k2�XkX
�k� N ���1�In � k
2�XkX
�k� N ��Xk� �3�3�6�
with initial condition X0 � St(p� n), defines an iteration on St(p� n) which approximates the
Rayleigh gradient algorithm (3.2.1). Of course, in practise one would use an algorithm such as
Gaussian elimination (Golub & Van Loan 1989, pg. 92) to solve the linear system equations
�In � k2�XkX
�k� N ��Xk�1 � �In � k
2�XkX
�k� N ��Xk
for Xk�1 rather than computing the inverse explicitly.
The algorithm defined by (3.3.6) can also be rewritten in a similar form to that obtained in
Lemma 3.3.1. Consider the power series expansion
�In � k2�XkX
�k� N ���1 �
�Xi�0
�� k
2�XkX
�k� N �
�i�
From here it is easily shown that
Xk�1 � �Xk ��2Xk � k�Xk�X
�kNXk��NXk�
�Ip �
2k
4Y 2k �
�1� �3�3�7�
where Y 2k � Rp�p is given by (3.3.2).
3.4 Comparison with Classical Algorithms
In this section the relationship between the Rayleigh gradient algorithm (3.2.1) and some classi-
cal algorithms for determining the maximal eigenspace of a symmetric matrix are investigated.
A good discussion of the power method and the steepest ascent method for determining a single
72 Gradient Algorithms for Principal Component Analysis Chapter 3
maximal eigenvalue of a symmetric matrix is given by Faddeev and Faddeeva (1963). Practical
issues arising in implementing these algorithms along with direct generalizations to eigenspace
methods are covered by Golub and Van Loan (1989).
3.4.1 The Power Method
In this subsection the algorithm (3.2.1) in the case where p � 1 is considered. It is shown that
for a certain choice of time-step k the algorithm (3.2.1) is the classical power method.
Recall that St�1� n� � Sn�1 the �n� 1�-dimensional sphere inRn.
Theorem 3.4.1 GivenN � NT a real symmetricn�nmatrix with eigenvalues�1 � � � �n.
For xk � Sn�1 let k be given by
k �y2
2p
2jjN �xxT � N �2jj � �3�4�1�
where yk � R is given by
yk ��xTkN
2xk � �xTkNxk�2� 1
2� �3�4�2�
For x0 � St�1� n� � Sn�1 an arbitrary initial condition then:
i) The formulae
xk�1 � e��k �xkxTk�N �xk�
defines a recursive algorithm on Sn�1.
ii) Fixed points of the rank-1 Rayleigh gradient algorithm are the critical points of rN on
Sn�1, and are exactly the eigenvectors of N .
iii) If xk , for k � 1� 2� � � � is a solution to the Rayleigh gradient algorithm, then the real
sequence rN�xk� is strictly monotonic increasing, unless xk is an eigenvector of N .
iv) For a given xk � Sn�1 which is not an eigenvector of N , then yk �� 0 and
xk�1 �
�cos� kyk�� xTkNxk
sin� kyk�yk
�xk �
sin� kyk�yk
Nxk� �3�4�3�
x3-4 Comparison with Classical Algorithms 73
v) Let xk , for k � 1� 2� � � � be a solution to the rank-1 Rayleigh gradient algorithm, then xk
converges to an eigenvector of N .
vi) All eigenvectors of N , considered as fixed points of (3.4.3) are unstable, except the
eigenvector corresponding to the maximal eigenvalue �1, which is exponentially stable.
Proof Parts i)-iii) follow directly from Theorem 3.2.2. To see part iv) observe that yk �
jjgradrN�xk�jj and yk � 0 if and only if gradrN�xk� � 0 and xk is an eigenvector of N .
The recursive iteration (3.4.3) now follows directly from Lemma 3.3.1, with the substitution
sinc� kyk� �sin��kyk��kyk
. Parts v) and vi) again follow directly from Theorem 3.2.2.
Remark 3.4.2 Equation (3.4.3) involves only Nxk, xTkNxk and �Nxk�T �Nxk� vector com-
putations. This structure is especially of interest when sparse or structured matrices N are
considered. �
A geodesic (or great circle) on Sn�1, passing through x at time t � 0, can be written
��t� � cos�t�x� sin�t�V �3�4�4�
where V � ���0� is a unit vector orthogonal to x. Choosing V �xk� �gradrN�xk�jjgradrN�xk�jj
, x � xk
and evaluating ��t� at time t � k jjgradrN�xk�jj gives (3.4.3). Thus, (3.4.3) is a geodesic
interpolation of (3.1.8) the solution to the rank-1 Rayleigh gradient flow (3.1.7).
For a symmetric n� n matrix N � NT the classical power method is computed using the
recursive formula (Golub & Van Loan 1989, pg. 351)
zk � Nxk (3.4.5)
xk�1 �zkjjzkjj �
The renormalisation operation is necessary if the algorithm is to be numerically stable. The
following lemma shows that for N positive semi-definite and a particular choice of k the
rank-1 Rayleigh gradient algorithm (3.4.3) is exactly the power method (3.4.5).
Lemma 3.4.3 Given N � NT a positive semi-definite n � n matrix. For xk � Sn�1 (not an
74 Gradient Algorithms for Principal Component Analysis Chapter 3
kx
grad rN ( )xk
N kxk+1x
Sn-1
sp { kx N kx, }
0
Figure 3.4.1: The geometric relationship between the power method iterate and the the iterategenerated by (3.4.3).
eigenvector of N ) then jjgradrN�xk�jj � jjNxjj. Let k be given by
k �1
jjgradrN�xk�jj sin�1� jjgradrN�xk�jj
jjNxkjj�
�3�4�6�
where sin�1�jjgradrN �xk�jj
jjNxkjj
�� �0� ��2�. Then
NxkjjNxkjj �
�cos� kyk�� xTkNxk
sin� kyk�yk
�xk �
sin� kyk�yk
Nxk�
where yk is given by (3.4.2).
Proof Observe that jjgradrN�xk�jj2 � y2k � jjNxkjj2��xTkNx2�2 0 and thus jjgradrN�xk�jj
� jjNxkjj. Consider the 2-dimensional linear subspace spfxk� Nxkg ofRn. The new estimate
xk�1 generated using either (3.4.3) or (3.4.5) will lie in spfxk� Nxkg (cf. Figure 3.4.1). Setting
NxkjjNxkjj �
�cos��yk�� xTkNxk
sin��yk�yk
�xk �
sin��yk�yk
Nxk�
for � � 0 and observing that xk and Nxk are linearly independent then
cos��yk�� xTkNxksin��yk�
yk� 0
andsin��yk�
yk�
1jjNxkjj �
Since N � 0 is positive definite then a real solution to the first relation exists for which
�yk � �0� ��2�. The time-step value is now obtained by computing the smallest positive root
of the second relation.
x3-4 Comparison with Classical Algorithms 75
Choosing N � 0 positive definite in Lemma 3.4.3 ensures that (3.4.3) and (3.4.5) con-
verge ‘generically’ to the same eigenvector. Conversely, if N is symmetric with eigenvalues
�1 � � �n, 0 � �n and j�nj � j�ij then the power method will converge to the eigen-
vector associated with �n while (3.4.3) (equipped with time-step (3.4.1) ) will converge to the
eigenvector associated with �1. Nevertheless, one may still choose k using (3.4.6), with the
inverse sin operation chosen to lie in the interval
sin�1� jjgradrN�xk�jj
jjNxkjj�� ���2� ���
such that (3.4.3) and (3.4.5) are equivalent. In this case the geodesics corresponding to each
iteration of (3.4.3) are describing great circles travelling almost from pole to pole of the sphere.
3.4.2 The Steepest Ascent Algorithm
The gradient ascent algorithm for the Rayleigh quotient rN is the recursion (Faddeev &
Faddeeva 1963, pg. 430)
zk � xk � skgradrN�xk� (3.4.7)
xk�1 �zkjjzkjj
where sk � 0 is a real number termed the step-size. It is easily verified that the k� 1’th iterate
of (3.4.7) will also lie on the 2-dimensional linear subspace spfxk� Nxkg ofRn. Indeed for xk
not an eigenvector of N , (3.4.3) and (3.4.7) are equivalent when
sk �1y2k
�1
cos� kyk�� 1
�� �3�4�8�
The optimal step-size for the steepest ascent algorithm (i.e. rN�xk�1�soptk �� rN�xk�1�sk��
for any sk � R) is (Faddeev & Faddeeva 1963, pg. 433)
soptk � 2
�rN �xk�� rN�gradrN�xk�� � f�rN�xk�� rN �gradrN �xk���
2 � 4jjrN�xk�jjg12
��1�
�3�4�9�
76 Gradient Algorithms for Principal Component Analysis Chapter 3
It follows directly that the optimal time-step selection for (3.4.3) is given by
k �1yk
cos�1
�1
1 � �soptk �2y2
k
��
Substituting directly into (3.4.3) and analytically computing the composition of cos and sin
with cos�1 gives
xk�1 �1
1 � �soptk �2y2
k
��1� s
optk xTkNxk
q2 � �s
optk �2y2
k
�xk �
�s
optk
q2 � �s
optk �2y2
k
�Nxk
��3�4�10�
with soptk given by (3.4.9). This recursion provides an optimal steepest ascent algorithm with
scaling factor 11��sopt
k�2y2
k
which converges to one as xk converges to an eigenvector of N .
3.4.3 The Generalised Power Method
In both the power method and the steepest ascent algorithm the rescaling operation preserves
the computational stability of the calculation. To generalise classical methods to the case where
p � 1, (i.e. Xk � St(p� n)), one must decide on a procedure to renormalise new estimates to
lie on St(p� n). Thus, a generalised power method may be written abstractly
Zk � NXk (3.4.11)
Xk�1 � rescale�Zk��
Since the span of the columns ofXk (denoted sp�Xk�) is the quantity in which one is interested
the rescaling operation is usually computed by generating an orthonormal basis for sp�Zk� (i.e.
using the Gram-Schmidt algorithm (Golub & Van Loan 1989, pg. 218))) Thus, Xk�1 � ZkG,
XTk�1Xk�1 � Ip and where G � Rp�p contains the coefficients which orthonormalise the
columns of Zk. When Zk is full rank thenG is invertible and the factorizationZk � Xk�1G�1
can be computed as a QR factorisation of Zk (Golub & Van Loan 1989, pg. 211). The matrix
G depends on the particular algorithm employed in computing an orthonormal basis for Zk.
When N � 0 is positive definite the power method will act to maximise the generalised
Rayleigh quotientRN (3.1.5). Different choices of G in the rescaling operation, however, will
affect the performance of the power method with respect to the relative change in RN at each
x3-4 Comparison with Classical Algorithms 77
iteration. The optimal choice of G (for maximising the increase in Rayleigh quotient) for the
k’th step of (3.4.11) is given by a solution of the optimization problem
max
fG � Rp�p j GTZTk ZkG � Ipg
tr�GTZT
k NZkG��
where Zk � NXk. The cost criterion tr�GTZT
k NZkG�� RZT
kNZk
�G� is a Rayleigh
quotient while the constraint set is similar in structure to St(p� n). Indeed, it appears that this
optimization problem is qualitatively the same as explicitly solving for the principal components
of N .
One may still hope to obtain a similar result to Lemma 3.4.3 relating the generalised power
method to the Rayleigh gradient algorithm (3.2.1). Unfortunately, this not the case except in
non-generic cases.
Lemma 3.4.4 Given N � NT a symmetric n�n matrix. For any Xk � St(p� n) let Yk be the
unique symmetric, positive semi-definite square root of Y 2k � XT
k N2Xk� �XT
k NXk�2. There
exists a matrix G � Rp�p and scalar k � 0 such that
NXkG � e��k �XkXTk�N �Xk� �3�4�12�
if and only if one can solve
sin2� kYk�XTk NXk � cos� kYk� sin� kYk�Yk � �3�4�13�
for k.
Proof Assume that there exists a matrix G and a scalar k � 0 such that (3.4.12) holds.
Observe that rank�e��k�XkXTk�N �Xk� � p and thus rank�NXk� � p. Similarly G � Rp�p is
non-singular.
Pre-multiplying (3.4.12) by GTXTk N and using the constraint relation GTXT
k N2XkG �
Ip gives
Ip � GTXTk Ne��k�XkX
Tk�N �Xk�
78 Gradient Algorithms for Principal Component Analysis Chapter 3
Since one need only consider the case where G is invertible it follows that
G�1 � XTk e
�k�XkXTk�N �NXk�
Lengthy matrix manipulations yield
XTk �XkX
Tk � N �2lNXk � ��1�lY 2l
k XTk NXk� for l � 0� 1� � � � �
and
XTk �XkX
Tk � N �2l�1NXk � ��1�lY 2l�2
k for l � 0� 1� � � � �
Expanding e�k�XkXTk�N � as a power series inY 2
k and then grouping terms suitably (cf. Subsection
3.3.1) one obtains
G�1 � cos� kYk�XTk NXk � sin� kYk�Yk�
Using (3.3.1) for e��k�XkXTk�N �Xk then (3.4.12) becomes
NXk � e��k �XkXTk�N �XkG
�1
��Xk cos� Yk�� k �XkX
Tk � N �Xksinc� kYk�
��cos� kYk�XT
k NXk � sin� kYk�Yk��
Pre-multiplying this by XTk yields
XTk NXk � cos2� kYk�X
Tk NXT
k � cos� kYk� sin� kYk�Yk�
and thus
sin2� kYk�XTk NXT
k � cos� kYk� sin� kYk�Yk�
This shows that (3.4.12) implies (3.4.13). If k solves (3.4.13) then defining G�1 �
XTk e
�k�XkXTk�N �NXk ensures (3.4.12) also holds which completes the proof.
Writing Yk �Pp
i�1 �iyiyTi where fy1� � � � � ypg is a set of orthonormal eigenvectors for Yk ,
whose eigenvalues are denoted �i 0 for i � 1� � � � � p, then (3.4.13) becomes
pXi�1
sin2� k�i�yiyTi X
Tk NXk �
pXi�1
�i cos� k�i� sin� k�i�yiyTi �
Fixing i and pre-multiplying by yTi while also post-multiplying by yi gives the following p
x3-5 Open Questions and Further Work 79
equations for k
either sin� k�i� � 0 or cot� k�i� �1�iyTi X
Tk NXkyi�
for i � 1� � � � � p. It follows that either from the first relation k�i � m� for some integer m or
from the second relation
cot� k� ��i � cot��i�yTi X
Tk NXkyi
�i cot��i� � yTi XTk NXkyi
�
for each i � 1� � � � � p. One can easily confirm from this that the p equations will fail to have a
consistent solution for arbitrary choices of Xk and N . Thus, generically the Rayleigh gradient
algorithm (3.2.1) does not correspond to the power generalised method (3.4.11) for any choice
of rescaling operation or time-step selection.
3.5 Open Questions and Further Work
There remains the issue of characterising the basin of attraction for the Rayleigh gradient
algorithm. Simulations indicate that the only points which are not contained in this set are
the non-minimal critical point of the generalised Rayleigh quotient, however, proving this is
likely to be very difficult. Another area where further insight would be desirable is in the
implementation of the (1,1) Pade approximate algorithm (3.3.6). It would seem likely that
for the time-steps generated by (3.2.3) the (1,1) Pade approximate algorithm would inherit all
the properties of the gradient descent algorithm. This appears to be the case in the simulation
studies undertaken.
In the earlier comparison between the Rayleigh gradient algorithm and classical numerical
linear algebra algorithms no account was taken of the various inverse shift algorithms which
tend to be the accepted computational methods. Incorporating the idea of origin shifts into
dynamical systems solutions of such linear algebra problems is an important question that has
not yet been satisfactorily understood.
In Subsection 3.4.1 it was shown that the rank-1 Rayleigh gradient algorithm is closely
related to the power method. Also related to the power method is an inverse shift algorithm
80 Gradient Algorithms for Principal Component Analysis Chapter 3
known as the Rayleigh iteration. Let N � NT � Rn�n be a symmetric matrix and xk � Rn
be some vector which is not an eigenvector of N then a single step of the Rayleigh iteration is
�k �xTkNxk
xTk ck
zk�1 � �N � �kIn��1xk
xk�1 �zk�1
jjzk�1jj �
The Rayleigh iteration converges cubically in a local neighbourhood of any eigenvector of N
(Parlett 1974). By comparing the Rayleigh iteration to the power method and the Rayleigh
gradient algorithm, one is lead to consider an ordinary differential equation of the form
�x � �N � rN�x�In��1x� xT �N � rN �x�In�
�1x x�
where rN�x� � xTNxxx is the Rayleigh quotient. In the vicinity of an eigenvector of N this
differential equation becomes singular and displays finite time convergence to the eigenvector
of N corresponding to the eigenvalue �i� which is the smallest eigenvalue of N such that
�i� � rN�x0�. The connection between singularly perturbed dynamical systems and shifted
numerical linear algebra methods is of considerable interest. There is also a connection to
the theory of differential/algebraic systems. For example the ordinary differential equation
mentioned above is equivalent to the differential/algebraic system
�x � z � �xTz�x
0 � x� �N � rN�x�In�z�
Chapter 4
Pole Placement for Symmetric
Realisations
A classical problem in systems theory is that of pole placement or eigenvalue assignment of
linear systems via constant gain output feedback. This is clearly a difficult task and despite a
number of important results, (cf. Byrnes (1989) for an excellent survey), a complete solution
giving necessary and sufficient conditions for a solution to exist has not been developed. It
has recently been shown that (strictly proper) linear systems with mp � n can be assigned
arbitrary poles using real output feedback (Wang 1992). Here n denotes the McMillan degree
of a system havingm inputs and p outputs. Of course if mp � n for a given linear system then
generic pole assignment is impossible, even when complex feedback gain is allowed (Hermann
& Martin 1977). The case mp � n remains unresolved, though a number of interesting results
are available (Hermann & Martin 1977, Willems & Hesselink 1978, Brockett & Byrnes 1981).
Present results do not apply to output feedback systems with symmetries or structured feedback
systems. More generally, one is also interested in situations where an optimal state feedback
gain is sought such that the closed loop response of the system is a best approximation of a
desired response, though the exact response may be unobtainable. In such cases one would
still hope to find a constructive method to compute the optimal feedback that achieves the
best approximation. The problem appears to be too difficult to tackle directly, however, and
algorithmic solutions are an attractive alternative.
81
82 Pole Placement for Symmetric Realisations Chapter 4
The development given in this chapter is loosely related to a number of recent articles. In
particular, Brockett (1989a) considers a least squares matching task, motivated by problems
in computer vision algorithms, that is related to the system approximation problem, though
his work does not include the effects of feedback. There is also an article by Chu (1992) in
which dynamical system methods are developed for solving inverse singular value problems,
a topic that is closely related to the pole placement question. The simultaneous multiple
system assignment problem considered is a generalisation of the single system problem and is
reminiscent of Chu’s (1991a) approach to simultaneous reduction of several real matrices.
In this chapter I consider a structured class of systems (those with symmetric state space
realisations) for which, to my knowledge, no previous pole placement results are available.
The assumption of symmetry of the realisation, besides having a natural network theoretic
interpretation, simplifies the geometric analysis considerably. It is shown that a symmetric
state space realisation can be assigned arbitrary (real) poles via output feedback if and only if
there are at least as many system inputs as states. This result is surprising since a naive counting
argument (comparing the number of free variables 12m�m� 1� of symmetric output feedback
gain to the number of poles n of a symmetric realization having m inputs and n states) would
suggest that 12m�m�1� n is sufficient for pole placement. To investigate the problem further
gradient flows of least squares cost criteria (functions of the matrix entries of realisations) are
derived on smooth manifolds of output feedback equivalent symmetric realisations. Limiting
solutions to these flows occur at minima of the cost criteria and relate directly to finding optimal
feedback gains for system assignment and pole placement problems. Cost criteria are proposed
for solving the tasks of system assignment, pole placement, and simultaneous multiple system
assignment.
The material presented in this chapter is based on the articles (Mahony & Helmke 1993,
Mahony et al. 1993). The theoretical material contained in Sections 4.1 to 4.4 along with the
simulations in Section 4.5 are based on the journal paper (Mahony & Helmke 1993) while the
numerical method proposed in Section 4.6 was presented at the 1993 Conference on Decision
and Control (Mahony et al. 1993). Much of the material presented in this chapter was developed
in conjunction with the results in the monograph (Helmke & Moore 1994b, Section 5.3), which
focusses on general linear systems.
The chapter is divided into seven sections. In Section 4.1 the specific problems considered
x4-1 Statement of the Problem 83
in the sequel are formulated and necessary conditions for generic pole placement and system
assignment are given. Section 4.2 develops the geometry of the set of symmetric state space
systems necessary for the development in later sections. In Section 4.3, a dynamical systems
approach to computing systems assignment problems for the class of symmetric state space
realizations is proposed while Section 4.4 applies the previous results to the pole placement and
the simultaneous multiple system assignment problems. A number of numerical investigations
are given in Section 4.4 which substantiate the theory presented in Sections 4.1 to 4.4. In Section
4.6 a numerical algorithm for computing feedback gains for the pole placement problem is
presented. The chapter concludes with a discussion of open questions and future work in
Section 4.7.
4.1 Statement of the Problem
In this section a brief review of symmetric systems is presented before the precise formulations
of the problems considered in the sequel are given and a pole placement result for symmetric
state space realizations is proved. The reader is referred to Anderson and Vongpanitlerd (1973)
for background material network theory.
A symmetric transfer function is a proper rational matrix function G�s� � Rm�m such that
G�s� � G�s�T �
For any such transfer function there exists a minimal signature symmetric realisation (Anderson
& Vongpanitlerd 1973, pg. 324)
�x � Ax �Bu�
y � Cx�
of G�s� such that �AIpq�T � AIpq and CT � IpqB, with Ipq � diag�Ip��Iq�, a diagonal
matrix with its first p diagonal entries 1 and the remaining diagonal entries -1. A signature sym-
metric realisation is a dynamical model of an electrical network constructed from p capacitors
and q inductors and any number of resistors.
84 Pole Placement for Symmetric Realisations Chapter 4
Static linear symmetric output feedback is introduced to a state space model via a feedback
law
u � Ky � v� K � KT �
leading to the “closed loop” system
�x � �A�BKC�x� Bv�
y � BTx��4�1�1�
In particular, symmetric output feedback, where K � KT � Rm�m, preserves the structure of
signature symmetric realisations and is the only output feedback transformation that has this
property.
A symmetric state space system (also symmetric realisation) is a linear dynamical system
�x � Ax �Bu� A � AT (4.1.2)
y � BT x� (4.1.3)
with x � Rn, u� y � Rm, A � Rn�n, B � Rn�m . Without loss of generality assume
that m � n, B is full rank and BTB � Im the m � m identity matrix. Symmetric state
space systems correspond to linear models of electrical RC-networks, constructed entirely
of capacitors and resistors. The networks are characterised by the property that the Cauchy-
Maslov1 index coincides with the McMillan degree. The matrix pair �A�B� � S�n��O�n�m�,where S�n� � fX � Rn�n j X � XTg the set of symmetric n � n matrices and O�n�m� �
fY � Rn�m j XTX � Img is used to represent a linear system of the form (4.1.2) and (4.1.3).
The set O�n�m� is the Stiefel manifold (a smoothnm� 12m�m� 1� dimensional submanifold
of Rn�m) of n�m matrices with orthonormal columns (cf. Lemma 3.1.1).
Two symmetric state space systems �A1� B1� and �A2� B2� are called output feedback
equivalent if
�A2� B2� � � �A1 �B1KBT1 �
T � B1� �4�1�4�
1The Cauchy-Maslov index for a real rational transfer function z�s� is defined as the number of jumps of z�s�from �� to �� less the number from �� to��. Bitmead and Anderson (1977) generalise the Cauchy-Maslovindex to real symmetric rational matrix transfer functions and show that it is equal to p � q, the signature of Ipq(Bitmead & Anderson 1977, Corollary 3.3).
x4-1 Statement of the Problem 85
holds for � O�n� � fU � Rn�n j UTU � Ing the set of n � n orthogonal matrices and
K � S�m� the set of symmetric m�m matrices. Thus the system �A2� B2� is obtained from
�A1� B1� using an orthogonal change of basis � O�n� in the state spaceRn and a symmetric
feedback transformation K � S�m�. It is easily verified that output feedback equivalence is
an equivalence relation on the set of symmetric state space systems.
Consider the following problem for the class of symmetric state space systems.
Problem A Given �A�B� � S�n� � O�n�m� a symmetric state space system let �F�G� �S�n��O�n�m� be a symmetric state space system which possesses the desired system structure.
Consider the potential
� : Rn�n �O�n�m�� R�
��A�B� :� jjA� F jj2 � 2jjB � Gjj2�
where jjX jj2 � tr�XTX� is the Frobenius matrix norm. Find a symmetric state space system
�Amin� Bmin� which minimises � over the set of all output feedback equivalent systems to
�A�B�. Equivalently, find a pair of matrices � min� Kmin� � O�n�� S�m� such that
�� � K� :� jj �A�BKBT � T � F jj2 � 2jj B � Gjj2�
is minimised over O�n�� S�m�. �
Such a formulation is particularly of interest when structural properties of the desired
realisations are specified. For example, one may wish to choose the “target system” �F�G�
with certain structural zeros. If an exact solution to the system assignment problem exists (i.e.
��Amin� Bmin� � 0) it is easily seen that �Amin� Bmin� will have the same structural zeros as
�F�G�. For general linear systems it is known that the system assignment problem (for general
feedback) is generically solvable only if there are as many inputs and as many outputs as states.
It is not surprising that this is the case for symmetric systems also.
Lemma 4.1.1 Let n and m be integers, n m, and let �F�G� � S�n��O�n�m�. Consider
matrix pairs �A�B� � S�n��O�n�m�.
a) If m � n then for any matrix pair �A�B� of the above form, there exist matrices
86 Pole Placement for Symmetric Realisations Chapter 4
� O�n� and K � S�m� such that
�A �BKBT � T � F� B � G�
b) If m � n then the set of �A�B� � S�n� � O�n�m� for which an exact solution to the
system assignment problem exists is measure zero in S�n��O�n�m�. (I.e. for almost all
systems �A�B� � S�n�� O�n�m� no exact solution to the system assignment problem
exists.)
Proof If m � n then O�n�m� � O�n� and BT � B�1. For any �A�B� � S�n� � O�n�
choose � � K� � �GBT � GTFG �BTAB�. Thus,
�A�BKBT � T � GBTABGT �GBTB�GTFG� BTAB�BTBGT � F
and B � GBTB � G.
To prove part b) observe that since output feedback equivalence is an equivalence relation
the set of systems for which the system assignment problem is solvable are exactly those
systems which are output feedback equivalent to �F�G�. Consider the set
F�F�G� � f� �F � BGBT � T � G� j � � K� � O�n�� S�m�g � S�n��O�n�m��
It is shown in Section 4.2 (Lemma 4.2.1) that F�F�G� is a smooth submanifold of S�n� �O�n�m�. But F�F�G� is the image of O�n� � S�m� via the continuous map � � K� �� �F �BGBT � T � G� and necessarily has dimension at most dimO�n��S�m� � 1
2n�n�1�� 1
2m�m�1�. The dimension ofS�n��O�n�m�however is 12n�n�1���nm� 1
2m�m�1��
(Helmke & Moore 1994b, pg. 24). Thus,
dimS�n��O�n�m�� dimO�n�� S�m� � �n�m��m� 1��
which is strictly positive for 0 � m � n. Thus, for m � n the set F�F�G� is a submanifold
of S�n�� O�n�m� of non-zero co-dimension and therefore has measure zero.
A similar task to Problem A is that of pole placement for symmetric state space realizations.
The pole placement task for symmetric systems is; given an arbitrary set of numbers s1
x4-1 Statement of the Problem 87
� � � sn in R and an initial m � m symmetric transfer function G�s� � GT �s� with a
symmetric realisation, find a symmetric matrix K � S�m� such that the poles of GK�s� �
�Im�G�s�K��1G�s� are exactly s1� � � � � sn. Rather than tackle this problem directly, consider
the following variant of the problem.
Problem B Given �A�B� � S�n�� O�n�m� a symmetric state space system let F � S�n�
be a symmetric matrix. Define
��A�B� :� jjA� F jj2�� � K� :� jj �A�BKBT � T � F jj2�
Find a symmetric state space system �Amin� Bmin� which minimises� over the set of all output
feedback equivalent systems to �A�B�. Respectively, find a pair of matrices � min� Kmin� �O�n�� S�m� which minimises over O�n�� S�m�. �
Problem B minimises a cost criterion that assigns the full eigenstructure of the closed loop
system. Two symmetric matrices have the same eigenstructure (up to orthogonal similarity
transformation) if and only if they have the same eigenvalues (since any symmetric matrix may
be diagonalised via an orthogonal similarity transformation). Thus, Problem B is equivalent
to solving the pole placement problem for symmetric systems (assigning the eigenvalues of
the closed loop system). The advantage of considering Problem B rather than a standard
formulation of the pole placement task lies in the smooth nature of the optimization problem
obtained.
It is of interest to consider generic conditions on symmetric state space systems for the
existence of an exact solution to Problem B (i.e. the existence of � min� Kmin� such that
� min� Kmin� � 0). This is exactly the classical pole placement question about which much
is known for general linear systems (Byrnes 1989, Wang 1992). The following result answers
(at least in part) this question for symmetric state space systems. It is interesting to note that
the necessary conditions for “generic” pole placement for symmetric state space systems are
much stronger than those for general linear systems.
Lemma 4.1.2 Let n and m be integers, n m, and let F � S�n� be a real symmetric matrix.
Consider matrix pairs �A�B� � S�n��O�n�m�.
88 Pole Placement for Symmetric Realisations Chapter 4
a) If m � n then for any matrix pair �A�B� of the above form, there exist matrices
� O�n� and K � KT � Rm�m such that
�A�BKBT � T � F� �4�1�5�
b) If m � n then there exists an open set of matrix pairs �A�B� � S�n�� O�n�m� of the
above form such that eigenstructure assignment (to the matrix F ) is impossible.
Proof Part a) follows directly from Lemma 4.1.1.
Observe that the set of matrix pairs f�A�B� j A � BBTABBT g �� S�n� � O�n�m� is
Zariski closed in S�n��O�n�m� and consequently of measure zero (cf. Martin and Hermann
(1977a) for a basic discussion Zariski closed sets). There exists a matrix pair �A�B� �S�n�� O�n�m� and matrices � O�n� and K � KT � Rm�m such that (4.1.5) is satisfied
and A �� BBTABBT or else part b) is trivially true. Direct manipulations of (4.1.5), recalling
that BTB � Im, yield
K � BT � TF �A�B�
Substituting this back into (4.1.5) gives
TF � �A� BBTABBT � �BBT TF BBT �
Observe that
tr��A�BBTABBT �TBBT TF BBT
�� tr
��BBT �A� BBTABBT �BBT � TF
�� 0�
and taking the squared Frobenius norm of TF gives
jjF jj2 � jj�A�BBTABBT �jj2 � jjBBT TF BBT jj2�
recalling that the Frobenius norm is invariant under orthogonal transformations. It follows
directly that jjF jj2 jj�A�BBTABBT �jj2.
Since �A�B� was chosen deliberately such that A �� BBTABBT one may consider the
x4-1 Statement of the Problem 89
related matrix pair �A�� B�� � ��A�B�, where � ��
jjF jj2�1jjA�BBTABBT jj2
� 12 . By construction
jj�A� �B�B�TA�B�B�T �jj2 � jjF jj2 � 1 � jjF jj2
and no solution to the eigenstructure assignment problem exists for the system �A�� B��.
Moreover, the map �A�B� � jj�A � BBTABBT �jj2 is continuous and it follows that there
is an open neighbourhood of systems around �A�� B�� for which the eigenstructure assignment
task cannot be solved.
Remark 4.1.3 It follows directly from the proof of Lemma 4.1.2 that eigenstructure assignment
of a symmetric state space system �A�B� � S�n��O�n�m� to an arbitrary closed loop matrix
F � S�n� is possible only if
jjF jj2 jjA�BBTABBT jj2�
�
Remark 4.1.4 One may weaken the hypothesis of Lemma 4.1.2 considerably to deal with
matrix pairs �A�B� � S�n�� Rn�m, for which B is not constrained to satisfy BTB � Im
and for which m may be greater than n. The analogous statement is that eigenstructure
assignment is generically possible if and only if rankB n. The proof is similar to that given
above observing that the projection operator BBT (for BTB � Im) is related to the general
projection operator B�BTB�yBT , where y represents the pseudo-inverse of a matrix. For
example, the feedback matrix yielding exact system assignment for rankB n is
K � �BTB�yBT �F � A�B�BTB�y�
�
A further problem considered is that of simultaneous multiple system assignment. This is a
difficult problem about which very little is presently known. The approach taken is to consider
a generalisation of the cost criterion � for a single system.
90 Pole Placement for Symmetric Realisations Chapter 4
Problem C For any integer N � N let �A1� B1�� � � � � �AN � BN� and �F1� G1�� � � � � �FN � GN�
be two sets of N symmetric state space systems. Define
�N� � K� :�NXi�1
jj �Ai � BiKBTi �
T � Fijj2 � 2NXi�1
jj Bi �Gijj2�
Find a pair of matrices � min� Kmin� � O�n��S�m�which minimises�N over O�n��S�m�.�
4.2 Geometry of Output Feedback Orbits
It is necessary to briefly review the Riemannian geometry of the spaces on which the optimiz-
ation problems stated in Section 4.1 are posed. The reader is referred to Helgason (1978) and
the development in Chapter 5 for technical details on Lie-groups and homogeneous spaces and,
Helmke and Moore (1994b) for a development of dynamical systems methods for optimization
along with applications in linear systems theory.
The set O�n�� S�m� forms a Lie group under the group operation � 1� K1� � 2� K2� �
� 1 2� K1�K2�. It is known as the output feedback group for symmetric state space systems.
The tangent spaces of O�n�� S�m� are
T��K��O�n�� S�m�� � f�� �� j � � Sk�n�� � S�m�g�
where Sk�n� � f� � Rn�n j � � ��T g the set of n � n skew symmetric matrices. The
Euclidean inner product onRn�n �Rn�m is given by
h�A�B�� �X� Y �i � tr�ATX� � tr�BTY �� �4�2�1�
By restriction, this induces a non-degenerate inner product on the tangent space T�In�0��O�n��S�m�� � Sk�n� � S�m�. The Riemannian metric considered on O�n� � S�m� is the right
invariant group metric
h��1 �1�� ��2 �2�i � 2tr��T1 �2� � 2tr�T1 2��
x4-2 Geometry of Output Feedback Orbits 91
The right invariant group metric is generated by the induced inner product onT�In �0��O�n��S�m��, mapped to each tangent space by the linearization of the diffeomorphism ��� k� ��� � k�K� for � � K� � �O�n��S�m��. It is readily verified that this defines a Riemannian
metric which corresponds, up to a scaling factor, to the induced Riemannian metric on O�n��S�m� considered as a submanifold of Rn�n �Rn�m. The scaling factor 2 serves to simplify
the algebraic expressions obtained in the sequel.
Let �A�B� � S�n�� O�n�m� be a symmetric state space system. The symmetric output
feedback orbit of �A�B� is the set
F�A�B� � f� �A�BTKB� T � B� j � O�n�� K � S�m�g� �4�2�2�
of all symmetric realisations that are output feedback equivalent to �A�B�. Observe that no
assumption on the controllability of the matrix pair �A�B� is made.
Lemma 4.2.1 The symmetric output feedback orbit F�A�B� is a smooth submanifold of
S�n�� O�n�m� with tangent space at a point �A�B� given by
T�A�B�F�A�B� � f���� A� �BBT ��B� j � � Sk�n�� � S�m�g� �4�2�3�
Proof The set F�A�B� is an orbit of the smooth semi-algebraic group action
� : �O�n�� S�m��� �S�n��O�n�m��� �S�n��O�n�m���
��� � K�� �A�B�� :� � �A�BTKB� T � B�� (4.2.4)
It follows that F�A�B� is a smooth submanifold of S�n� � Rn�m (cf. Proposition 5.2.2 or
Gibson (1979, Appendix B)). For an arbitrary matrix pair �A�B� the map
f� � K� :� � �A� BKBT � T � B�
is a smooth submersion of O�n� � S�m� onto F�A�B� (Gibson 1979, pg. 74). The tangent
space of F�A�B� at �A�B� is the range of the linearization of f at �In� 0�
92 Pole Placement for Symmetric Realisations Chapter 4
T�In�0�f : T�In�0��O�n�� S�m��� T�A�B�F�A�B����� � ���� A� � BBT ��B�� ���� � Sk�n�� S�m��
The spaceF�A�B� is also a Riemannian manifold when equipped with the so-called normal
metric (cf. Section 5.3). Fix �A�B� � S�n�� O�n�m� a symmetric state space system and
consider the map f� � K� :� � �A� BKBT � T � B�. The tangent map T�In�0�f induces
a decomposition
T�In�0��O�n�� S�m�� � kerT�In�0�f � dom T�In�0�f�
where
kerT�In�0�f � f������ � Sk�n�� S�m� j �A���� � B�BT ���B � 0g
is the kernel of T�In�0�f and
dom T�In�0�f � f������ � Sk�n�� S�m� j tr�����T��� � 0� tr����T�� � 0�
for all ������ � kerT�In�0�fg
is the orthogonal complement of the kernel with respect to the Euclidean inner product
(4.2.1). Formally, the normal Riemannian metric on F�A�B� is the inner product (4.2.1) on
T�In�0��O�n��S�m�� restricted to dom T�In�0�f and induced onT�A�B�F�A�B� via the isomor-
phism T�In�0�f� : dom T�In�0�f � T�A�B�F�A�B�, the restriction of T�In�0�f to dom T�In�0�f .
Thus, for two tangent vectors ���i� A��BiBT ��iB� � T�A�B�F�A�B�, i � 1� 2, the normal
Riemannian metric is computed as
hh���1� A��B1BT ��1B�� ���2� A��B2B
T ��2B�ii � 2tr����1 �T��2 � � 2tr���1 �
T�2 ��
Here ��i�i� � ���i��� �i���� ���i ��i � � kerT�In�0�f � dom T�In�0�f for i � 1� 2. It is
readily verified that this construction defines a Riemannian metric on F�A�B�.
x4-3 Least Squares System Assignment 93
4.3 Least Squares System Assignment
In this section Problem A is considered, i.e. the question of computing a symmetric state space
linear system in a given orbit F�A�B� that most closely approximates a given “target” system
in a least squares sense. A brief analysis of the cost functions � and � is given leading
to existence results for global minima. Gradient flows of the cost functions are derived and
existence results for their solutions are given.
Lemma 4.3.1 Let �F�G�� �A�B� � S�n��O�n�m� be symmetric state space linear systems.
a) The function � : O�n�� S�m�� R,
�� � K� :� jj �A� BKBT � T � F jj2 � 2jj B � Gjj2�
has compact sublevel sets. I.e. the sets f� � K� � O�n� � S�m� j �� � K� � g for
any 0, are compact subsets of O�n�� S�m�.
b) The function� : F�A�B�� R,
��A�B� :� jjA� F jj2 � 2jjB � Gjj2�
has compact sublevel sets.
Proof The triangle inequality yields both
jjKjj2 � jjBKBT jj2 � 2�jjA� BKBT jj2 � jjAjj2�
and
jjA�BKBT jj2 � jj �A� BKBT � T jj2 � 2�jj �A�BKBT � T � F jj2 � jjF jj2��
Thus, for � � K� � O�n�� S�m� one has
jjKjj2 � 2�2�jj �A� BKBT � T � F jj2 � jjF jj2� � jjAjj2�� 4�jj �A�BKBT � T � F jj2 � 2jj B �Gjj2� � 4jjF jj2 � 2jjAjj2�
94 Pole Placement for Symmetric Realisations Chapter 4
� 4�� � K� � 4jjF jj2 � 2jjAjj2�
where a factor of 8jj B�Gjj2 is added to the middle line to give the correct terms for the cost
�. Thus, for � � K� � O�n�� S�m�, satisfying �� � K� � , one has
jj� � K�jj2 � jj jj2 � jjKjj2
� tr� T � � 4�� � K�� 4jjF jj2 � 2jjAjj2
� n� 4 � 4jjF jj2 � 2jjAjj2�
and the sublevel sets of � are bounded. Since � is continuous the sublevel sets are closed
and compactness follows directly (Munkres 1975, pg. 174). Part b) follows by observing that
� � � � f , where f� � K� � � �A�BKBT � T � B� for given �A�B�. Thus, the sublevel
sets of� are exactly the images of the corresponding sublevel sets of� via the continuous map
f . Since continuous images of compact sets are themselves compact (Munkres 1975, pg. 167)
the proof is complete.
Corollary 4.3.2 Let �F�G�� �A�B� � S�n� � O�n�m� be symmetric state space linear sys-
tems.
a) There exists a global minimum � min� Kmin� � O�n�� S�m� of �,
�� min� Kmin� � inff�� � K� j � � K� � O�n�� S�m�g�
b) There exists a global minimum �Amin� Bmin� � F�A�B� of �,
��Amin� Bmin� � inff��A�B� j �A�B� � F�A�B�g�
c) The submanifoldF�A�B� � S�n��O�n�m� is closed in S�n��Rn�m.
Proof To prove part a), choose 0 such that the sublevel set J � f� � K� j �� � K� � gis non empty. Then �jJ : J � �0� � is a continuous map2 from a compact space into the
2Let f : M � N be a map between two sets M and N . Let U �M be a subset of M , then f jU : U � N isthe restriction of f to the set U .
x4-3 Least Squares System Assignment 95
reals and the minimum value theorem (Munkres 1975, pg. 175) ensures the existence of
� min� Kmin�. The proof of part b) is analogous to that for part a).
To prove c) Assume that F�A�B� is not closed. Choose a boundary point �F�G� �F�A�B��F�A�B� in the closure3 ofF�A�B�. By part b) there exists a minimum �Amin� Bmin� �F�A�B� such that
��Amin� Bmin� � inff��A�B� j �A�B� � F�A�B�g�
� 0
since �F�G� is in the closure ofF�A�B�. But this implies jjAmin�F jj2�2jjBmin�Gjj2 � 0 and
consequently �Amin� Bmin� � �F�G�. This contradicts the assumption that �F�G� �� F�A�B�.
Having determined the existence of a solution to the system assignment problem one may
consider the problem of computing the global minima of the cost functions � and �.
Theorem 4.3.3 Let �A�B�� �F�G� � S�n�� O�n�m� be symmetric state space systems. Let
� : F�A�B�� R� ��A�B� :� jjA� F jj2 � 2jjB �Gjj2� �4�3�1�
measure the Euclidean distance between two symmetric realisations. Then
a) The gradient of � with respect to the normal metric is
grad��A�B� �
�B� ��A� ��A� F � �BGT � GBT �� � BBT �A� F �BBT
��A� F � �BGT � GBT �B
�CA �
�4�3�2�
b) The critical points of � are characterised by
�A� F � � GBT �BGT �
0 � BT �A� F �B��4�3�3�
3Let U �M be a subset of a topological spaceM . The closure of U , denotedU , is the intersection of all closedsets in the topology which contain the set U .
96 Pole Placement for Symmetric Realisations Chapter 4
c) Solutions of the gradient flow � �A� �B� � �grad��A�B�,
�A � �A� ��A� F � �BGT � GBT ��� BBT �A� F �BBT
�B � ���A� F � � BGT �GBT �B�4�3�4�
exist for all time t 0 and remain in F�A�B�.
d) Any solution to (4.3.4) converges as t�� to a connected set of matrix pairs �A�B� �F�A�B� which satisfy (4.3.3) and lie in a single level set of �.
Proof The gradient is computed using the identities4
�i� D�j�A�B� ��� � hhgrad��A�B�� �ii� � � ���� A� � BBT ��B� � T�A�B�F�A�B��ii� grad��A�B� � T�A�B�F�A�B��
Computing the Frechet derivative of � in direction ���� A� �BBT ��B� gives
D�j�A�B� ���� A� �BBT ��B�
� 2tr��A� F �T ���� A� �BBT �� � 4tr��B � G�T�B�
� 2tr����A� F�A� � 2B�B �G�T ��� � 2tr�BT �A� F �B� (4.3.5)
� hh����A� F � �BGT � GBT �� A� � BBT �A� F �BBT � ��A� F � �BGT � GBT �B
��
���� A� �BBT ��B�ii�
When deriving the above relations it is useful to recall that
���� A� �BBT ��B� � ����� A� � B�BT ���B�
where ���� � ������ � ������, (cf. the discussion of normal metrics at the end of
Section 4.2). Observing that ��A� F � � BGT �GBT � � Sk�n� while BT �A� F �B � S�m�
completes the proof of part a).
4 D�j�A�B� ��� is the Frechet derivative of � in direction � (Helmke & Moore 1994b, pg. 334).
x4-3 Least Squares System Assignment 97
To prove b), observe that the first identity ensures that the Frechet derivative at a critical
point (grad� � 0) is zero in all tangent directions. Setting (4.3.5) to zero, yields
2tr���F�A� � GBT � BGT ��� � 2tr�BT �A� F �B� � 0
for arbitrary ���� � Sk�n�� S�m� and �A�B� a critical point of �.
For given initial conditions �A�0�� B�0�� solutions of (4.3.4) will remain in the sublevel
set f�A�B� � F�A�B� j ��A�B� � ��A�0�� B�0��g. Since this set is compact, Lemma
4.3.1, infinite time existence of the solution follows. This proves c) while d) follows from an
application of La’Salle’s invariance principle.
Remark 4.3.4 Let N�s�D�s��1 be a coprime factorisation of the symmetric transfer function
G�s� � BT �sIn � A��1B. Then the coefficients of the polynomial matrix N�s� � R�s�m�m
are invariants of the flow (4.3.4). In particular the zeros of the system �A�B�BT � are invariant
under the flow (4.3.4) (Kailath 1980, Exercise 6.5-4, pg. 464). �
Remark 4.3.5 It would be desirable to interpret the equilibrium condition (4.3.3) in terms of
the properties of the linear system �A�B�. Unfortunately, this appears to be a difficult task. �
The above theorem provides a method of investigating best approximations to a given
“target system” within a symmetric output feedback orbit. However, it does not provide
any explicit information on the changing feedback transformations � �t�� K�t��. To obtain
such information a related flow on the output feedback group O�n� � S�m� is proposed.
The following result generalises work by Brockett (1989a) on matching problems. Brockett
considers similar cost functions but only allows state space transformations rather than output
feedback transformations.
Theorem 4.3.6 Let �A�B�� �F�G� � S�n��O�n�m�be symmetric state space linear systems.
Define
� : O�n�� S�m�� R�
�� � K� :� jj �A�BKBT � T � F jj2 � 2jj B �Gjj2 (4.3.6)
then:
98 Pole Placement for Symmetric Realisations Chapter 4
a) The gradient of � with respect to the right invariant group metric is
grad�� � K� �
�B� � �A� BKBT � T � F � � � BGT �GBT T �
BT �A�BKBT � TF �B
�CA � �4�3�7�
b) The critical points of � are characterised by
�F� �A�BKBT � T � � � BGT �GBT T ��
K � BT � TF �A�B��4�3�8�
c) Solutions of the gradient flow � � � �K� � �grad�� � K�
� � �� �A�BKBT � T � F � � � BGT � GBT T � �
�K � �BT �A� BKBT � TF �B��4�3�9�
exist for all time t 0 and remain in a bounded subset of O�n� � S�m�. Moreover,
as t � � any solution of (4.3.9) converges to a connected subset of critical points in
O�n�� S�m� which are contained in a single level set of �.
d) If � �t�� K�t�� is a solution to (4.3.9) then
�A�t��B�t�� � � �t��A�BK�t�BT � �t�T � �t�TB�
is a solution of (4.3.4).
Proof The computation of the gradient is analogous to that undertaken in the proof of Theorem
4.3.3 while the characterisation of the critical points follows directly from setting (4.3.7) to
zero. The proof of c) is also analogous to the proof of parts c) and d) in Theorem 4.3.3.
The linearization of f� � K� :� � �A�BKBT � T � TB� is readily computed to be
T��K�f�� �� � ����A� � BBT ��B�
where A � �A� BKBT � T and B � B. The image of � � � �K� via this linearization is
x4-3 Least Squares System Assignment 99
T��K�f� � � �K� ���A� �A� F � � �BGT �GB�� � BBT �A� F �BBT �
� ��A� F � � BGT �GBT �B��
Consequently � �A� �B� � �grad��A�B�. Classical O.D.E. uniqueness results complete the
proof.
The following lemma provides an alternative approach (to that given in Lemma 4.3.1) for
determining a bound on the feedback gain K�t�. The method of proof for the following result
is of interest and the result obtained is somewhat tighter than that obtained in Lemma 4.3.1.
Lemma 4.3.7 Let � �t�� K�t�� be a solution of (4.3.9). Then the bound
jjK�t��K�0�jj2 � 12��T �0�� K�0��
holds for all time.
Proof Integrating out (4.3.9) for initial conditions � 0� K0� and then taking norms gives the
integral bound
jj �t�� 0jj2 � jjK�t��K0jj2 � jjZ t
0grad�� ���� K����d� jj2
�Z t
0jjgrad�� ���� K����jj2d�
�12
Z t
0hgrad�� ���� K����� grad�� ���� K����id��
Alsod
dt�� �t�� K�t�� � �hgrad�� �t�� K�t��� grad�� �t�� K�t��i�
and thus, integrating between 0 and t and recalling that 0 � �� �t�� K�t�� � �� �0�� K�0��
for all t 0 one obtains
R t0 hgrad�� ���� K����� grad�� ���� K����id�
� �� �0�� K�0��� �� �t�� K�t�� � �� �0�� K�0���
100 Pole Placement for Symmetric Realisations Chapter 4
and consequently
jj �t�� 0jj2 � jjK�t��K0jj2 � 12�� �0�� K�0���
The result follows directly.
It is advantageous to consider a closely related flow that evolves only on O�n� rather
than the full output feedback group O�n� � S�m�. The following development uses similar
techniques to those proposed by Chu (1992).
Let �A�B� � S�n�� O�n�m� be a given symmetric state space system and define
L � fBKBT j K � S�m�g
to be the linear subspace of S�n� corresponding to the range of the linear map K � BKBT .
Define L� to be the orthogonal complement of L with respect to the Euclidean inner product
onRn�n . The projection operators
P : S�n�� L� P�X� :� BBTXBBT �4�3�10�
and
Q : S�n�� L�� Q�X� :� �I�P��X� � X � BBTXBBT �4�3�11�
are well defined. Here Irepresents the identity operator and BTB � Im by assumption. The
tangent space of O�n� at a point is TO�n� � f� j � � Sk�n�g with Riemannian metric
h�1 ��2 i � 2tr��T1 �2�, corresponding to the right invariant group metric on O�n�.
Theorem 4.3.8 Let �A�B�� �F�G� � S�n� � O�n�m� be symmetric state space systems.
Define
�� : O�n�� R
�� :� jjQ�A� TF �jj2 � 2jj B �Gjj2 (4.3.12)
then,
x4-3 Least Squares System Assignment 101
a) The gradient of �� with respect to the right invariant group metric is
grad��� � � � Q�A� TF � T � F � � � BGT �GBT T � �
b) The critical points � O�n� of �� are characterised by
�F� Q�A� TF � T � � � BGT �GBT T ��
and correspond exactly to the orthogonal matrix component of the critical points (4.3.8)
of �.
c) The negative gradient flow minimising�� is
� � �F� Q�A� TF � T � � � BGT � GBT T � � �0� � 0� �4�3�13�
Solutions to this flow exist for all time t 0 and converge as t �� to a connected set
of critical points contained in a level set of ��.
Proof The gradient and the critical point characterisation are proved as for Theorem 4.3.3.
The equivalence of the critical points is easily seen by solving (4.3.8) for independently
of K. Part c) follows from the observation that (4.3.13) is a gradient flow on a compact
manifold.
Fixing constant in the second line of (4.3.9) yields a linear differential equation in K
with solution
K�t� � e�t�K�0� �BT �A� TF �B� �BT �A� TF �B�
It follows that K�t�� �BT �A� TF �B as t��. Observe that
��� � � jjQ�A� TF �jj2 � 2jj B � Gjj2
� jj �A�B�BT �A� TF �B�BT � T � F jj2 � 2jj B �Gjj2
� �� ��BT �A� TF �B��
102 Pole Placement for Symmetric Realisations Chapter 4
Recall also that for exact system assignment it has been shown that K � BT � F T �A�B, Lemma 4.1.2. Thus, it is reasonable to consider solutions �t� of (4.3.13) along with
continuously changing feedback gain
K�t� � BT � �t�TF �t�� A�B� �4�3�14�
as an approach to solving least squares system assignment problems. A numerical scheme
based on this approach is presented in Section 4.6.
4.4 Least Squares Pole placement and Simultaneous System Assign-
ment
Having developed the necessary tools it is a simple matter to derive gradient flow solutions to
Problem B and Problem C described in Section 4.1.
Corollary 4.4.1 Pole Placement Let �A�B� � S�n�� O�n�m� be a symmetric state space
system and let F � S�n� be a given symmetric matrix. Define
� : F�A�B�� R� �A�B� � jjA� F jj2� : O�n�� S�m�� R� � � K� � jj �A�BKBT � T � F jj2�
then
a) The gradient of � and with respect to the normal and the right invariant group metric
respectively are
grad��A�B� �
�B� ��A� �A� F �� �BBT �A� F �BBT
�A� F �B
�CA � �4�4�1�
and
grad� � K� �
�B� � �A�BKBT � T � F �
BT �A� BKBT � TF �B
�CA � �4�4�2�
x4-4 Least Squares Pole placement and Simultaneous System Assignment 103
b) The critical points of � and are characterised by
�A� F � � 0
BT �A� F �B � 0� (4.4.3)
and
� �A�BKBT � T � F � � 0
BT � TF �A�B � K� (4.4.4)
respectively.
c) Solutions of the gradient flows � �A� �B� � �grad��A�B�
�A � �A� �A� F ��� BBT �A� F �BBT
�B � ��A� F �B�4�4�5�
exist for all time t 0 and remain in F�A�B�. Moreover, any solution of (4.4.5)
converges as t �� to a connected set of matrix pairs �A�B� � F�A�B� which satisfy
(4.4.3) and lie in a single level set of �.
d) Solutions of the gradient flow � � � �K� � �grad� � K�
� � �� �A�BKBT � T � F �
�K � �BT �A� BKBT � TF �B�4�4�6�
exist for all time t 0 and remain in a bounded subset of O�n� � S�m�. Moreover,
as t � � any solution of (4.4.6) converges to a connected subset of critical points in
O�n�� S�m� which are contained in a single level set of �.
e) If � �t�� K�t�� is a solution to (4.4.6) then� �t��A� BK�t�BT � T �t�� T �t�B
�is a
solution of (4.4.5).
Proof Consider the symmetric state space system �A�B� � S�n�� O�n�m� and the matrix
pair �F�G0� � S�n�� Rn�m where G0 is the n �m zero matrix. Observe that ��A�B� �
��A�B� � 2jjBjj2 and similarly �� � K� � � � K� � 2jjBjj2, where � and � are given by
104 Pole Placement for Symmetric Realisations Chapter 4
(4.3.1) and (4.3.6) respectively. Since the norm jjBjj2 is constant on F�A�B� the structure
of the above optimization problems is exactly that considered in Theorem 4.3.3 and Theorem
4.3.6. The results follow as direct corollaries.
Similar to the discussion at the end of Section 4.3 the pole placement problem can be solved
by a gradient flow evolving on the orthogonal group O�n� alone.
Corollary 4.4.2 Let �A�B� � S�n� � O�n�m� be a symmetric state space system and let
F � S�n� be a symmetric matrix. Define
�� : O�n�� R
�� :� jjQ�A� TF �jj2
where Q�X� � �I�P��X� � X � BBTXBBT (4.3.11). Then,
a) The gradient of �� with respect to the right invariant group metric is
grad��� � � � Q�A� TF � T � F � �
b) The critical points � O�n� of �� are characterised by
�F� Q�A� TF � T � � 0�
and correspond exactly to the orthogonal matrix component of the critical points (4.4.4)
of �.
c) The negative gradient flow minimising �� is
� � �F� Q�A� TF � T � � �0� � 0� �4�4�7�
Solutions to this flow exist for all time t 0 and converge as t �� to a connected set
of critical points contained in a level set of ��.
Proof Consider the matrix pair �F�G0� � S�n��Rn�m where G0 is the n�m zero matrix.
It is easily verified that��� � � ��� ��2jjBjj2 where �� is given by (4.3.12). The corollary
follows as a direct consequence of Theorem 4.3.8.
x4-4 Least Squares Pole placement and Simultaneous System Assignment 105
Simultaneous system assignment is known to be a hard problem which generically does
not have an exact solution. The best that can be hoped for is an approximate solution provided
by a suitable numerical technique. The following discussion is a direct generalisation of the
development given in Section 4.3. The generalisation is similar to that employed by Chu
(1991a) when considering the simultaneous reduction of real matrices.
For any integerN � N let �A1� B1�� � � � � �AN � BN� � S�n��O�n�m�be given symmetric
state space systems. The output feedback orbit for the multiple system case is
F��A1� B1�� � � � � �AN � BN�� :�
f� �A1 � B1KBT1 �
T � B1�� � � � � � �AN � BNKBTN �
T � BN � j � O�n�� K � S�m�g�
An analogous argument to Lemma 4.2.1 shows that F��A1� B1�� � � � � �AN � BN�� is a smooth
manifold. Moreover, the tangent space is given by
T��A1�B1����AN �BN��F��A1� B1�� � � � � �AN � BN��
f���� A1� �B1BT1 ��B1�� � � � � ���� AN � � BNB
TN ��BN � j � � Sk�n�� � S�m�g�
Indeed, F��A1� B1�� � � � � �AN � BN �� is a Riemannian manifold when equipped with the normal
metric, defined analogously to the normal metric on F�A�B�.
Corollary 4.4.3 For any integerN � N let �A1� B1�� � � � � �AN � BN � and �F1� G1�� � � �, �FN � GN�
be two sets of N symmetric state space systems. Define
�N � F��A1� B1�� � � � � �AN � BN ��� R
�N ��A1� B1�� � � � � �AN � BN�� :�NXi�1
�jjAi � Fijj2 � 2jjBi �Gijj2
�
and
�N � O�n�� S�m�� R
�N� � K� :�NXi�1
�jj �Ai �BiKBT
i � T � Fijj2 � 2jj Bi �Gijj2
��
Then,
106 Pole Placement for Symmetric Realisations Chapter 4
a) The negative gradient flows of �N and �N with respect to the normal and the right
invariant group metric are
�Ai � �Ai�NXj�1
��Aj � Fj� � BjG
Tj �GjB
Tj
���
NXj�1
BiBTj �Aj � Fj�B
Tj Bi�
�Bi � �NXj�1
��Aj � Fj � � BjGTj �GjB
Tj �Bi� (4.4.8)
for i � 1� � � �N , and
� �NXj�1
��Aj � Fj� �BjG
Tj � GjB
Tj
�
�K � �NXj�1
BTj �Aj � BjKBT
j � Fj T �Bj � (4.4.9)
respectively.
b) The critical points of �N and �N are characterised by
NXj�1
�Aj � Fj � �NXj�1
�GjB
Tj � BjG
Tj
�NXj�1
BTj �Aj � Fj�Bj � 0� (4.4.10)
and
NXj�1
� �Aj � BjKBTj �
T � Fj � �NXj�1
� BjG
Tj �GjB
Tj
T�
K �NXj�1
BTj � Fj
T �Aj�Bj � (4.4.11)
respectively.
c) Solutions of the gradient flow (4.4.8) exist for all time t 0 and remain in F��A1� B1�,
� � � � �AN � BN ��. Moreover, any solution of (4.4.8) converges as t � � to a connected
set of matrix pairs ��A1� B1�, � � �, �AN � BN�� � F��A1� B1�, � � �, �AN � BN�� which
satisfy (4.4.10) and lie in a single level set of �N .
x4-5 Simulations 107
d) Solutions of the gradient flow (4.4.9) exist for all time t 0 and remain in a bounded
subset of O�n� � S�m�. Moreover, as t � � any solution of (4.4.9) converges to a
connected subset of critical points in O�n��S�m�which are contained in a single level
set of �N .
e) If � �t�� K�t�� is a solution to (4.4.6) then �Ai�t��Bi�t�� � � �Ai�BiKBTi �
T � Bi�,
for i � 1� � � � � N , is a solution of (4.4.8).
Proof Observe that the potentials�N and �N are linear sums of potentials of the form � and
� considered in Theorem 4.3.6 and Theorem 4.3.3. The proof is then a simple generalisation
of the arguments employed in the proofs of these theorems.
4.5 Simulations
A number of simulations studies have been completed to investigate the properties of the
gradient flows presented and obtain general information about the system assignment and pole
placement problems5.
In the following simulations the solutions of the ordinary differential equations considered
were numerically estimated using the MATLAB function ODE45. This function integrates
ordinary differential equations using the Runge-Kutta-Fehlberg method with an automatic step
size selection. Numerical integration is undertaken using fourth order Runge-Kutta method
while the accuracy of each iteration over the step length is checked against a fifth order method.
At each step of the interpolation the step length is reduced until the error between the fourth
and fifth order method estimates is less than a pre-specified constantE � 0. In the simulations
undertaken the error bound was set to E � 1 � 10�7, this allowed for reasonable accuracy
without excessive computational cost.
Due to Lemma 4.1.1 one does not expect to see convergence of the solution of (4.3.4) to an
exact solution of the System Assignment problem for arbitrary initial condition (unless n � m
in which case a solution can be computed algebraically). The typical behaviour of solutions to
5Indeed, computing the gradient flows (4.3.4) and (4.4.1) has already helped in understanding of the poleplacement and system assignment tasks since it was the non-convergence of the original simulations that lead to afurther investigation of the existence of exact solutions to the problems, and eventually to Lemmas 4.1.1 and 4.1.2.
108 Pole Placement for Symmetric Realisations Chapter 4
Time
Pote
ntia
l Ψ
Figure 4.5.1: Plot of ��A�t�� B�t�� verses t for �A�t�� B�t�� a typical solution to (4.3.4).
(4.3.4) is shown in Figure 4.5.1, where the potential,��A�t�� B�t��, for �A�t�� B�t�� a solution
to (4.3.4), is plotted verses time. The potential is plotted on log10 scaled axis for all the plots
presented to display the linear convergence properties of the solution. The initial conditions
�A0� B0� � S�5� � O�5� 4� and the target system �F�G� � S�5� � O�5� 4� were randomly
generated apart from symmetry and orthogonality requirements. The state dimension, n � 5,
and the input and output dimension, m � 4, were arbitrarily chosen. Similar behaviour is
obtained for all simulations for any choice of n and m for which m � n. In Figure 4.5.1,
observe that the potential converges to a non-zero constant limt����A�t�� B�t�� � 9�3. For
the limiting value of the solution to be an exact solution to the system assignment problem one
would require limt����A�t�� B�t�� � 0.
In contrast, Lemma 4.1.2 ensures only that the pole placement task is not solvable on some
open set of symmetric state space systems but leaves open the question of whether other open
sets of systems exists for which the pole placement problem is solvable. Simulations show that
the pole placement problem is indeed solvable for some open sets of symmetric state space
systems. Figure 4.5.2 shows a plot of the potential ��A�t�� B�t�� (cf. Corollary 4.4.1) verses
time for �A�t�� B�t�� a solution to (4.4.5). The initial conditions and target matrix here were
the initial conditions �A0� B0� and the state matrix F , from �F�G�, used to generate Figure
4.5.1. The plot clearly shows that the potential converges exponentially (linearly in the log10
verses unscaled plot) to zero. Consequently, the solution �A�t�� B�t�� converges to an exact
solution the pole placement problem, limt��A�t� � F . Comparing Figures 4.5.1 and 4.5.2
and recalling that they were generated using the same initial conditions, one sees explicitly that
x4-5 Simulations 109
Simulation ��A�40�� B�40��1 2�63� 10�10
2 2�09� 10�9
3 5�65� 10�9
5 3�35� 10�10
6 3�16� 10�11
7 1�62� 10�11
8 1�05� 10�10
9 3�68� 10�10
10 1�20� 10�8
11 2�72� 10�8
Table 4.5.1: Potentials��Ai�40�� Bi�40�� for experiments i � 1� � � � � 10 where �Ai�t�� Bi�t��is a solution to (4.3.4) with initial conditions �A i�0�� Bi�0�� � �A0 � Ni� UiB0� � S�n� �O�n�m�. Here Ni � NT
iis a randomly generated symmetric matrix with jjNijj � 0�25 and
Ui � O�n� is an randomly generated orthogonal matrix with jjUi � Injj � 0�25.
the system assignment problem is strictly more difficult than the pole placement problem.
One may ask does the particular initial condition �A0� B0� lie in an open set of initial
conditions for which the pole placement problem can be exactly solved. A series of ten
simulations was completed, integrating (4.4.5) for initial conditions �Ai� Bi� close to �A0� B0�,
jjA0 �Aijj� jjB0 �Bijj � 0�5. Each integration was carried out over a time interval of forty
seconds and the final potential��A�40�� B�40�� for each simulation is given in Table 4.5. The
plot of � verses time for each simulation was qualitatively the same as Figure 4.5.2. It is my
conclusion from this that the pole placement problem could be exactly solved for all initial
conditions in a neighbourhood of �A0� B0�.
Remark 4.5.1 It may appear reasonable that the pole placement problem could be solved for
all initial conditions with state matrix A0 in a neighbourhood of the desired structure F . In
fact simulations have shown this to be false.
LetC � O�n� n�m� be a matrix orthogonal toB, (i.e. BTC � 0). Observe that a solution
to the pole placement problem requires TF � A � BKBT and thus
TF C �AC � 0 �� F C � AC � 0�
Since A and C are specified by the initial condition (the span of C is the important object)
110 Pole Placement for Symmetric Realisations Chapter 4
Time
Pote
ntia
l Φ
Figure 4.5.2: Plot of ��A�t�� B�t�� verses t for �A�t�� B�t�� a solution to (4.4.5) with initialconditions �A0� B0� for which the solution �A�t�� B�t�� converges to a global minimum of �.
then � Rn�n must lie in the linear subspace defined by the kernel of the linear map
� F C � AC. Of course must also lie in the set of orthogonal matrices and the
intersection of the kernel of � F C � AC with the set of orthogonal matrices provides
an exact criterion for the existence of a solution to the pole placement problem.
The difficulty for initial conditions where jjA0 � F jj is small is related to the fact that the
solution to the pole placement problem for initial conditions �A0� B0� � �F�B0�, (i.e. the state
matrix already has the desired structure), is given by the matrix pair �In� 0� � O�n� � S�m�
in the output feedback group. The matrix In lies at an extremity of O�n� in Rn�n and
it is reasonable that small perturbations of �A0� B0� may shift the kernel of the linear map
� F C � A0C such that it no longer intersects with O�n�. �
An advantage, mentioned in Section 4.3, in computing the limiting solution of (4.4.7)
(Figure 4.5.3) compared to computing the full gradient flow (4.4.5) (Figure 4.5.2) is the
associated reduction in the order of the O.D.E. that must be solved. Interestingly, it appears
that the solutions of the projected flow (4.4.7) will also converge more quickly than those
of (4.4.5). Figure 4.5.3 shows the potential ��� �t�� (cf. Corollary 4.4.2) verses time for
�t� a solution to (4.4.7). The initial conditions for this simulation were 0 � In while the
specified symmetric state space system used for computing the norm �� was �A0� B0� the
initial conditions for Figures 4.5.1 and 4.5.2. Observe that from time t � 1�2 to t � 2, Figure
4.5.3 displays unexpected behaviour which I interpret to be numerical error. The presence
of this error is not surprising since the potential (and consequently the gradient) is of order
x4-5 Simulations 111
Simulation � �� ����
1 2.05 53 25.852 1.73 43.5 25.143 2.03 27.75 13.664 0.52 20 38.465 1.6 44 27.5
Table 4.5.2: Linear rate of convergence for the solution of (4.4.5), given by �, and (4.4.7)given by ��. The final column shows the ratio between the rates of convergence for the twodifferential equations.
10�12 � E2, where E is the error bound chosen for the ODE routine in MATLAB. The
relationship of numerical error to order of the potential was checked by adjusting the error
bound E in a number of early simulations.
The exponential (linear) convergence rates of the solution to (4.4.7) and to (4.4.5) are
computed by reading off the slope of the linear sections of plots 4.5.2 and 4.5.3. For the
example shown in Figures 4.5.2 and 4.5.3 convergence of the solutions is characterised by
��A�t�� B�t�� � e��t� � � 2�05
��� �t�� � e���t� �� � 53
where �A�t�� B�t�� is a solution to (4.4.5) and �t� is a solution to (4.4.7). Five separate
experiments were completed in which the two flows were computed for randomly generated
target matrices and initial conditions with n � 5 and m � 4. The linear convergence rates
computed from these five experiments are given in Table 4.5. I deduce that solutions of (4.4.7)
converge around twenty times faster than solutions to (4.4.5) when the systems considered
have five states and four inputs and outputs. A brief study of the behaviour of systems with
other numbers of states and inputs indicate that the ratio between convergence rates is around
an order of magnetude.
In the system assignment problem Lemma 4.1.1 ensures that an exact solution to the
system assignment problem does not generically exist. The gradient flow (4.3.4), however, will
certainly converge to a connected set of local minima of the potential �, Theorem 4.3.3. An
important question to ask concerns the structure the critical level set associated with the local
minima of � may have. In particular, one may ask whether the level set is a single point or is
112 Pole Placement for Symmetric Realisations Chapter 4
Time
φ∗Po
tent
ial
Figure 4.5.3: Plot of �����t�� verses t for ��t� a solution to (4.4.5) with initial conditions��0� � In the identity matrix. The potential ����� :� jj�A0 � �TF�� � B0B
T
0 �A0 ��TF��B0B
T
0 jj2 is computed with respect to the initial conditions �A0� B0� used in Figures
4.5.1 and 4.5.2.
it a submanifold (at least locally) of F�A�B�.
Remark 4.5.2 Observe that critical level sets of � are given by two algebraic conditions
jjgrad��A�B�jj � 0 and ��A�B� � �0, for some fixed �0, thus they are algebraic varieties
of the closed submanifold F�A�B� � Rn�n �Rn�m. It follows, apart from a set of measure
zero in F�A�B� (singularities of the algebraic conditions), that the critical sets will locally
have submanifold structure in F�A�B�. �
Rather than consider the computationally huge task of mapping out the local minima of
� by integrating out (4.3.4) for many different initial conditions in F�A�B�, it is possible
to obtain some qualitative information in the vicinity of a given local minima). Choosing
any initial condition and integrating (4.3.4) for a suitable time interval an estimate of a local
minima �A�� B�� is obtained. If this point is an isolated minima then it should be locally
attractive. By choosing a number of initial conditions �Ai� Bi� in the vicinity of �A�� B��
and integrating (4.3.4) a second time one obtains new estimates of local minima �A�i � B�i �.
If �A�� B�� approximates an isolated local minima then the ratio
ri �jj�A�i � B�
i �� �A�� B��jjjj�Ai� Bi�� �A�� B��jj �4�5�1�
should be approximately zero. If �A�� B�� is not isolated then one expects the ratio ri to
be significantly non-zero. Of course ri should be less than one on average since the flow is
x4-6 Numerical Methods for Symmetric Pole Placement 113
Ratio r i
Freq
uenc
y
Figure 4.5.4: Plot of frequency distribution of r i given by (4.5.1) computed for the limitingvalues of 100 simulations with initial conditions close to �A�� B��.
convergent. The difficulty in this approach is deciding on suitable time intervals for the various
integrations. The first time interval was determined by repeatedly integrating over longer and
longer time intervals (for the same initial conditions) until the norm difference between the
final values was less than 1 � 10�8. An initial time interval of two hundred seconds was
found to be suitable. Each subsequent simulation was integrated over a time interval of fifty
seconds. The results of one hundred measurements of the ratio ri for a given estimated local
minima �A�� B�� are plotted as a frequency plot, Figure 4.5.4. The frequency divisions for
this plot are 0�05, thus in the one hundred experiments undertaken eleven experiments yielded
an estimate of ri between 0�325 and 0�375. It is obvious from Figure 4.5.4 that the probability
of ri being zero is small and one concludes that the critical sublevel sets of � have a local
submanifold structure of non-zero dimension. In particular, the local minima of � are not
isolated.
4.6 Numerical Methods for Symmetric Pole Placement
In this section a numerical algorithm, based on the continuous-time flow (4.4.7) coupled with
the feedback gain (4.3.14) is proposed. The algorithm is analogous to those discussed in
Chapters 2 and 3.
Let �A�B� be a symmetric output feedback system and let F � S�n� posses the desired
114 Pole Placement for Symmetric Realisations Chapter 4
closed loop eigenstructure. For 0 � O�n� consider the iterative algorithm generated by
i�1 � ie��i�T
iFi�Q�T
iFi�A��� (4.6.1)
Ki � BT � Ti F i �A�B� (4.6.2)
for i � N and i a sequence of positive real numbers termed time-steps. Observe that the
Lie-bracket � Ti F i, Q� T
i F i � A�� is skew symmetric, hence, e��i�Ti Fi�Q�T
i Fi�A��
is orthogonal and i�1 lies in O�n�.
To motivate the algorithm observe that
d
d� ie
�� �TiFi�Q�T
iFi�A��
������0
� i� Ti F i�Q�
Ti F i �A��
� �F� iQ�A� Ti F i�
Ti � i
the negative gradient of �� at i (cf. Corollary 4.4.2). Thus, ie�� �Ti Fi�Q�T
i Fi�A��
represents a curve in O�n�, passing through i at time � � 0, and with first derivative equal to
�grad��� i�. Indeed, the algorithm proposed can be thought of as a modified gradient descent
algorithm where instead of straight line interpolation the curves ie�� �T
i Fi�Q�Ti Fi�A�� are
used.
To implement (4.6.1) it is necessary to choose a time-step i for each step of the recursion.
A convenient criteria for determining a suitable time-step is to minimise the smooth function
���� i� i� � ��� i�1�� ��� i�� �4�6�3�
In particular, one would like to ensure that ���� i� �� is strictly negative unless i is a
equilibrium point of ��. The following argument is analogous to the derivation of step-size
selection schemes given in Section 2.2.
Lemma 4.6.1 Let �A�B� be a controllable6 symmetric output feedback system and F �S�m�, F �� 0, posses the desired closed loop eigenstructure. For any i � O�n� such that
6I.e. the controllability matrix �B AB A2B � � �An�1B� is full rank. It is easily shown that controllability of�A�B� ensures thatQ�A� �� 0.
x4-6 Numerical Methods for Symmetric Pole Placement 115
grad��� i� �� 0, the recursive estimate i�1 � ie��i�T
iFi�Q�T
iFi�A��, where
i �1
4jjF jj�jjP� Ti F i�jj� jjQ�A�jj�
� �4�6�4�
satisfies���� i� i� � ��� i�1�� ��� i� � 0.
Proof Let i�1��� � ie�� �T
i Fi�Q�Ti Fi�A�� for an arbitrary time-step � and define
Xi � Ti F i and Xi�1��� � T
i�1���F i�1���. The Taylor expansion for Xi�1��� is
Xi�1��� � Xi � � �Xi� �Xi�Q�Xi� A��� � �2R2���
where
R2��� �Z 1
0��Xi�1�s��� �Xi�Q�Xi� A���� �Xi�Q�Xi� A����1� s�ds�
Substituting into (4.6.3) and bounding yields (after some algebraic manipulations)
���� i� �� � jjQ�Xi�1�Xi�jj2 � 2� tr�Q�Xi�A��Xi� �Xi�Q�Xi�A����
� 2tr�Q�Xi� A�R2�����
� �2� jj�Xi�Q�Xi�A��jj2 � 4� 2�jjP�Xi�jj� jjQ�A�jj� jj�Xi�Q�Xi�A��jj2jjF jj
:� ���u� i� ��
The controllability of �A�B� along with the assumptions grad��� i� �� 0 and F �� 0 ensures
that the quadratic coefficient of���u� i� �� does not vanish and it is easily seen that its unique
minimum is strictly negative and occurs for � � i of (4.6.4). The result follows since
0 � ���u� i� i� ���� i� i�.
Theorem 4.6.2 Let �A�B� be a controllable symmetric output feedback system and let F �S�n�, F �� 0, posses the desired eigenstructure. For a given estimate i � O�n�, let i be
given by (4.6.4). The algorithm (4.6.1)
i�1 � ie��i�T
iFi�Q�T
iFi�A��� �4�6�5�
116 Pole Placement for Symmetric Realisations Chapter 4
has the following properties.
a) The algorithm defines an iteration on O�n�.
b) Fixed points of the algorithm are the equilibrium points of (4.4.7).
c) If i is a solution to (4.6.5) then the real sequence ��� i� is strictly monotonic decreasing
unless there is some i � N with i a fixed point of the algorithm.
d) Any solution i to (4.6.5) converges as i�� to a set of equilibrium points contained in a
level set of ��.
Proof Part a) follows from the observation that e��i�TiFi�Q�T
iFi�A�� is orthogonal. Fixed
points of the recursion are those for which the first derivative of � i�1��� vanishes (Lemma
4.6.1) and correspond exactly to the equilibrium points of (4.4.7). This proves part b) while
part c) is a corollary of Lemma 4.6.1.
To prove part d) observe that O�n� is a compact set, and thus��� i�, a bounded monoton-
ically decreasing sequence, must converge. This implies that ���� i� i�� 0 as i� �. It
follows that i converges to a level set of�� such that for any in this set���� � � �� � 0.
Lemma 4.6.1 ensures that any point in this set is an equilibrium point of (4.4.7).
Remark 4.6.3 Observe that there is an associated sequence of realisations
Ai � i�A�BKiBT � T
i
Bi � iB
for any solution � i� Ki� of (4.6.1) and (4.6.2). �
A primary aim in developing the algorithm (4.6.1) is to provide a reliable numerical tool
with which to investigate the structure of the pole placement (system assignment) problem
for symmetric realisations. Figure 4.6.1 is a simulation for a fifth order symmetric state space
system with four inputs. The initial condition is 0 � In, the identity matrix, and the algorithm
is run for 1000 steps. The linear convergence properties of the algorithm are shown by the
linear appearance of the log verses iteration plot, Figure 4.6.1. The time-step selection for this
simulation is displayed in Figure 4.6.2 and indicates both the non-linear nature of the selection
x4-6 Numerical Methods for Symmetric Pole Placement 117
IterationL
og o
f po
tent
ial
Figure 4.6.1: Iteration verses ����i� showing linear convergence properties.
Iteration
Tim
e–st
ep s
elec
tion
Figure 4.6.2: Iteration verses time-step selection �i.
scheme as well as its limiting behaviour. The existence of a limit to the time-step selection
scheme (4.6.4) as i��, ensures that the linearization of (4.6.1) around a critical point exists.
By computing this linearization the linear convergence properties displayed in Figure 4.6.1 can
be confirmed theoretically.
Simulation studies have shown the presence of many local minima in the cost potential��.
Figure 4.6.3 is a plot of both the cost �� and the norm of the gradient jjgrad��� i�jj2 for a
simulation of a seventh order symmetric state space system with four inputs. The system was
chosen such that an exact solution to the pole placement problem existed. Thus, the global
minimum of �� was known to be zero, however, Figure 4.6.3 shows the cost �� converging
to a constant while the gradient converges to zero. The algorithm (4.6.1) provides a reliable
numerical method to investigate the presence and position of such local minima.
118 Pole Placement for Symmetric Realisations Chapter 4
Iteration
Pote
ntia
l and
nor
m o
f gr
adie
ntNorm of gradient
Potential
Figure 4.6.3: Iteration verses both potential����i� and the norm of the gradient jj�F��Q�A��TF���T �jj2.
4.7 Open Questions and Further Work
An important question that has not been addressed in this chapter is that of understanding the
equilibrium conditions for the various dynamical systems in the context of classical systems
theory. It would be nice to relate conditions such as (4.4.5) to properties such as the frequency
response of the achieved system. Unfortunately, even finding a relationship between the desired
and the achieved pole positions appears to be difficult. The discussion of Problem C, multiple
systems assignment, is another area that would benefit from further work. The results presented
in this chapter are far from comprehensive.
A natural extension of the theory presented in this chapter is to consider more general
systems. For example a class of systems �A�B�C� with a given Cauchy-Maslov index (i.e.
�AIpq�T � AIpq and CT � IpqB where Ipq � diag�Ip��Iq�) could be approached using the
same techniques developed earlier. The Lie transformation group associated with the set of
such systems is
G � fT � Rn�n j TTIpqT � Ipq� det�T � �� 0g
which has identity tangent space (or Lie-algebra)
g � f� � Rn�n j ��Ipq�T � ��Ipqg�
the set of signature skew symmetric matrices.
x4-7 Open Questions and Further Work 119
Related to the general construction for systems with an arbitrary Cauchy-Maslov index is
the problem for Hamiltonian linear systems. These are systems �A�B�C�where �AJ�T � AJ
and CT � JB where
J �
�B� 0 �InIn 0
�CA �
The set of Hamiltonian linear systems is a homogeneous space with Lie transformation group
Sp�n�R� � fT � R2n�2n j TTJT � J� det�T � �� 0g�
termed the symplectic group. The Lie-algebra associated with the symplectic group is the set
of 2n� 2n Hamiltonian matrices
Ham�n�R� � f� � R2n�2n j ��J�T ��J � 0g�
Hamiltonian systems are important for modelling mechanical systems.
One may also consider pole placement problems on the set of general linear systems. A
discussion of some basic results is contained in the monograph (Helmke & Moore 1994b,
Section 5.3). One area in which these results could be extended is to consider dynamic output
feedback. Assume that one knows the degree d of a dynamic compensator applied to a given
linear state space system. The dynamics of the closed loop system can be modelled by the
differential equation
�x � Ax� Bu
y � Cx
�w � Gw � Cx
u � Fw �Ky�
where the feedback law u is allowed to depend both on the dynamic compensator state w and
the direct output y. This system can be rewritten as an augmented system with static feedback
d
dt
�B� x
w
�CA �
�B� A 0
C G
�CA�B� x
w
�CA �
�B� B
0
�CA u
120 Pole Placement for Symmetric Realisations Chapter 4
�B� y
w
�CA �
�B� C 0
0 Id
�CA�B� x
w
�CAu �
�K F
� �B� y
w
�CA �
Once the system is written in this form it is amenable to analysis via the general linear theory
presented in Helmke and Moore (1994b, Section 5.3). Of course, one could also exploit the
structure of the augmented problem itself to reduce computational cost and ensure that the roles
of system and compensator states do not become confused.
Gradient descent methods could also be used to compute canonical forms for system
realizations. For example, to compute the companion form of a given state matrix A consider
the smooth cost function
�A� �nXi�2
Xj ��i�1
A2ij �
nXi�2
�Ai�i�1� � 1�2
on the homogeneous space
S�A� � fTAT�1 j T � Rn�n� det�T � �� 0g�
Given that computating canonical forms is often an ill conditioned numerical problem, dy-
namical system techniques and related numerical gradient descent algorithms with their strong
stability properties may prove to be an important numerical tool in certain situations.
Chapter 5
Gradient Flows on Lie-Groups and
Homogeneous Spaces
The optimization problems considered in Chapters 2, 3 and 4 are all problems where the
constraint set is a homogeneous space. In each case the approach taken is to consider a suitable
Riemannian metric on the homogeneous space and compute the maximising (or minimising)
gradient flow. The limiting value of a solution of the gradient flow (for arbitrary initial condition)
then provides an estimate of the maximum (or minimum). The numerical methods discussed
in Chapters 2 to 4 are closely related to each other. They each rely on using a ‘standard’ curve
lying within the homogeneous space, which can be assigned an arbitrary initial condition and
arbitrary initial tangent vector, to interpolate the solution of the continuous-time gradient flow.
Thus, for an arbitrary point in the constraint set one estimates the solution of the gradient flow
by travelling a short distance along the ‘standard’ curve starting from the present estimate with
initial tangent vector equal to the gradient at that point. It is natural to ask whether there is an
underlying structure on which the numerical solutions proposed in Chapters 2 to 4 are based
and, if there is, to what degree can such an approach be applied to any generic optimization
problem on a homogeneous space.
With the developing interest in dynamical system solutions to linear algebraic problems
(Symes 1982, Deift et al. 1983, Brockett 1988, Chu & Driessel 1990, Helmke & Moore 1994b)
during the eighties there came an interest in the potential of continuous realizations of classical
121
122 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
problems as efficient numerical methods (Chu 1988). Interestingly, it has taken several years
before the connection between dynamical systems and linear algebraic problems is examined
in the other direction, namely, can one use insights and understanding developed by studying
problems using the dynamical systems approach to design efficient numerical algorithms for
problems in linear algebraic. Recently Chu (1992) has shown that by using the insight provided
by a geometric understanding of a structured inverse eigenvalue problem a better understanding
of a quadratically convergent algorithm first proposed by Friedland et. al. (1987) is obtained.
Perhaps more directly based on the dynamical systems literature is the work by Brockett (1993)
that looks at the design of gradient algorithms on the adjoint orbits of compact Lie-groups. The
methods proposed in Chapters 2 to 4 are gradient descent algorithms constructed explicitly on
the homogeneous space (Moore et al. 1992, Mahony et al. 1993, Moore et al. 1994, Mahony et
al. 1994).
Certainly the numerical methods proposed satisfy the broad requirements of simplicity,
global convergence and constraint stability discussed on page 2. Moreover, the numerical
methods described in each chapter have strong similarities, for example the Riemannian metrics
used are all of a similar form and the ‘standard’ curves used to generate the numerical methods
are all based on matrix exponentials. To develop a general understanding of these methods,
however, it is apparent that one must develop a better understanding of the Riemannian geometry
of the homogeneous constraint sets on which the algorithms are constructed.
In this chapter I attempt to provide a rigourous but brief review of the relevant theory
associated with developing numerical methods on homogeneous spaces. The focus of the
development is on the classes of homogeneous spaces encountered in engineering applications
and the simplest theoretical constructions which provide a rigourous basis for the numerical
methods developed. A careful development is given of the relationship between gradient flows
on Lie-groups and homogeneous space (related by a group action) which motivates the choice of
a particular Riemannian structure for a homogeneous space. Convergence behaviour of gradient
flows is also considered. The curves used in constructing numerical methods in Chapters 2
to 4 were all based on matrix exponentials and the well understood theory of the exponential
map as a Lie-group homomorphism is reviewed to provide a basis for this choice. Moreover,
the geodesic structure of the spaces considered (following from Levi-Civita connection) is
developed and conditions are given on when the matrix exponential maps to a geodesic curve
x5-1 Lie-groups and Homogeneous Spaces 123
on a Lie-group. Finally, an explicit discussion of the relationship between geodesics on Lie-
groups and homogeneous spaces is given. The conclusion is that the algorithms proposed in
Chapters 2 to 4 are modified gradient descent algorithms with geodesic curves used to replace
the straight line interpolation of the classical gradient descent algorithm.
Much of the material presented is standard or at least accessible to people working in the
fields of Riemannian geometry and Lie-groups, however, this material would not be standard
knowledge for researchers in an engineering field. Moreover, the development strongly em-
phasizes the aspects of the general theory that is relevant to problems in linear systems theory.
Due to to the focus of the work, explicit proofs are given for a number of results which do
not appear to be standard in the literature. In particular, I have not seen the results concerning
the interrelation of gradient flows on Lie-groups and homogeneous spaces nor a careful pre-
sentation of the relationship between geodesics on Lie-groups and homogeneous spaces in any
existing reference.
The chapter is divided into nine sections. Section 5.1 presents a brief review of Lie-groups
and homogeneous spaces while Section 5.2 considers a certain class of homogeneous space,
orbits of semi-algebraic Lie-groups, which includes all the constraint sets considered in this
thesis. Section 5.3 describes a natural choice of Riemannian metric for a given homogeneous
space while Section 5.4 discusses the derivation of gradient flows on Lie-groups and homoge-
neous spaces and shows why the choice of Riemannian metric made in Section 5.3 is the most
natural. Section 5.5 discusses the convergence properties of gradient flows. Sections 5.6 to
5.9 develop the geometry of Lie-groups and homogeneous spaces concentrating on providing
a basis for understanding the exponential map and geodesics.
5.1 Lie-groups and Homogeneous Spaces
In this section a brief review of Lie-groups and homogeneous spaces is presented. The reader
is referred to Helgason (1978) and Warner (1983) for further technical details.
A Lie-groupG is an abstract group which is also a smooth manifold on which the operations
of group multiplication(� � �� , for �, � � G) and taking the inverse (� � ��1, for � � G) are
smooth diffeomorphisms of G onto G. For � � G one defines automorphisms of G associated
124 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
with right and left multiplications by a constant
r� : G� G� r���� :� ��� (5.1.1)
l� : G� G� l���� :� ���
Observe that r� and l� are diffeomorphisms of G with smooth inverse given by r��1 and l��1
respectively.
Let M be a manifold and G be a Lie-group. A smooth group action of G on M is a smooth
mapping
: G�M �M
which satisfies
���� q� � ��� ��� q��� q �M� �� � � G�
�e� q� � q� q �M� e is the identity of G�
The action is known as transitive if for any q and r in M there exists � � G such that
��� q� � r. Observe that ��� � : M � M is a diffeomorphism of M into M since
��� � is smooth, surjective (let � � G, then for any q � M , ��� ���1� q�� � q� and
has smooth inverse �1��� � � ���1� �. A smooth manifold M with a transitive group of
diffeomorphisms ( ��� � : M �M ) is known as a smooth homogeneous space.
Let p �M and define the stabiliser of p by
stab�p� � f� � G j ��� p� � pg�
By construction stab�p� � G is an abstract subgroup of G. By inspection the map
p : G�M
p��� :� ��� p� (5.1.2)
is a smooth map which is onto if and only if is transitive. As a consequence, if is a
smooth transitive group action of G onto M one has that dimG dimM . The stabilizer,
stab�p� � �1p �p�, is the inverse image of a single point under a continuous map and is a
x5-2 Semi-Algebraic Lie-Groups, Actions and their Orbits 125
closed set in the manifold topology on G. Consequently, stab�p� is a closed abstract subgroup
of G and is a Lie-subgroup of G with the relative topology inherited from G (Warner 1983,
pg. 110). The left coset space G�stab�p� � f�stab�p� j � � Gg has a natural topology
such that the surjective mapping � : G � G�stab�p�, � � stab�p��, is a continuous, open
mapping. Similarly, equipping G�stab�p� with the unique differential structure that makes �
smooth (Warner 1983, pg. 120), it is easily verified that � is a submersion.
The right coset space is itself a homogeneous space under the group action
� : G� G�stab�p�� G�stab�p�
���� �stab�p�� :� ��stab�p��
Consider the smooth map
�p : G�stab�p��M (5.1.3)
�p ��stab�p�� :� ��� p��
It is a standard result that �p : G�stab�p� � M is a diffeomorphism (Helgason 1978,
Proposition 4.3, pg. 124). By construction, the following diagram commutes
�p p
�
�
HHHHHHHHHHj
M
G�stab�p��G
In particular, p � �p � � is the composition of a submersion and a diffeomorphism and is
itself a submersion.
5.2 Semi-Algebraic Lie-Groups, Actions and their Orbits
A setG � Rs is known as semi-algebraic when it can be obtained by finitely many applications
of the operations of intersection, union and set difference starting from sets of the form
126 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
fx � Rs j f�x� � 0g with f a polynomial function on Rs. A semi-algebraic Lie-group is a
Lie-group which is also a semi-algebraic subset ofRs. The following two sets are examples of
semi-algebraic Lie-groups.
Example 5.2.1 a) The general linear group
GL�N�R� � fT � RN�N j det�T � �� 0g
b) The orthogonal group
O�n� � O�n�R� � fT � RN�N j TT � � INg�
where IN is the N �N identity matrix.
�
Let G be a Lie-group and be a smooth group action : G�Rr � Rr. Fix p � Rr and
define the orbit of the action to be the set
O�p� � f ��� p� j � � Gg�
The set O�p� is an immersed1 submanifold of Rr in the sense that it is a subset of Rr with a
differential structure given by that induced by the diffeomorphism �p : G�stab�p� � O�p�,(cf. (5.1.3)). The map is a smooth transitive group action of G acting on O�p� and thus,
O�p� is given the structure of a homogeneous space. It is certainly not clear that the differential
structure induced by the immersion is compatible with the Euclidean differential structure on
Rr. In the case where the two differential structures are compatible then O�p� is an embedded
submanifold of Rr.
Let G be a subset of Rs. A map f : G � Rr is semi-algebraic when the graph of f ,
f�x� f�x�� j x � Gg � Rs � Rr is semi-algebraic. In particular, if G is a semi-algebraic
1An immersion is a one-to-one map f : M � N between two manifolds M and N which for which thedifferential df is full rank at all points. An immersed submanifold is a subset U � N such that U � f�M� is theimage of some manifoldM via an immersion f . The setU � N inherits the differential structure onM via the mapf , however, this need not correspond to the differential structure associated with the manifold N . An embeddingis an immersion f : M � N such that the image U � f�M� is a manifold with subspace differential structureinherited from N (Warner 1983, pg. 22).
x5-3 Riemannian Metrics on Lie-groups and Homogeneous Spaces 127
subset of Rs and f : Rs � Rr is a rational map (i.e. the i’th component of f is a ratio of two
polynomial maps) then the map f : G� Rr is semi-algebraic (Gibson 1979, pg. 223).
The following result shows that for semi-algebraic Lie-groups and semi-algebraic group
actions the orbit of a point p � Rr is always an embedded submanifold of Rr. The result is
standard where G is a compact Lie-group (Varadarajan 1984, pg. 81). For G semi-algebraic
the reader is referred to Gibson (1979, pg. 224).
Proposition 5.2.2 LetG be a Lie-group and : G�RN � Rr be a smooth group action ofG
onRr. Let p � Rr be an arbitrarypoint and denote the orbit of p byO�p� � f ��� p� j � � Gg.Then, O�p� is an embedded submanifold ofRr with the embedding
�p : G�stab�p�� O�p�
given by (5.1.3), if either:
a) G is a compact Lie-group.
b) G is a semi-algebraic Lie-group and : G � RN � RN is a semi-algebraic group
action.
5.3 Riemannian Metrics on Lie-groups and Homogeneous Spaces
Let G be a Lie-group, M be a smooth manifold and : G�M � M be a smooth transitive
group action of G on M . Denote the tangent space of G at the identity e by TeG. Let
ge : TeG � TeG be an inner product on TeG, i.e. a positive definite, bilinear map. The inner
product chosen in the sequel is always a Euclidean inner product computed by choosing an
arbitrary fixed basis fE1� � � � � Eng for TeG, expressing given tangent vectors X �Pn
i�1 xiEi
and Y �Pn
i�1 yiEi in terms of this basis and setting
ge�X� Y � �nXi�1
xiyi�
Of course, this construction depends on the basis vectors used. One could also consider other
inner products, for example when G is semi-simple the negative of the killing form (Helgason
128 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
1978, pg. 131) is a positive definite inner product. A number of authors have used the killing
form in related work (Faybusovich 1989, Bloch et al. 1992, Brockett 1993), however, the choice
of a particular inner product is immaterial to the following development.
Let ge be an inner product on TeG. Let r� of (5.1.1), be right translation by � � G. Since
r� is a diffeomorphism, the differential2 at the identity e of G, Ter� : TeG� T�G, is a vector
space isomorphism. Using Ter� one can define an inner product on each tangent space of G
g� : T�G� T�G� R
g���� �� :� ge�Ter
�1� ���� Ter
�1� ���
��
where � and � are elements of T�G. It is easily verified that g� varies smoothly on G and
consequently defines a Riemannian metric,
g��� �� :� g���� ��� �5�3�1�
for � � G, �, � in T�G. This Riemannian metric is termed the right invariant group metric for
G. Observe that for any two smooth vector fields X and Y on G one has
g�dr�X� dr�Y � � g�X� Y �� �5�3�2�
Let p � M be arbitrary and recall that p of (5.1.2) is a submersion of G onto M (since
the group action is transitive). Thus, the differential of p at the identity Te p : TeG� TpM
is a linear surjection of vector spaces. Decompose TeG into the topological product
TeG � kerTe p � dom Te p�
2Let � : M � N be a smooth map between smooth manifolds M and N . Let p �M be an arbitrary point thenthe differential of � at p (or the tangent map of � at p) is the linear map
Tp� : TpM � T��p�N�
Tp��X� :� D�jp �X�
where D�jp �X� is the Frechet derivative (Helmke & Moore 1994b, pg. 334) of � in direction X � TpM . Thefull differential of � is a map from the tangent bundle of M , TM � �p�MTpM to the tangent bundle of N
d� : TM � TN�
d��Xp� :� Tp��Xp��
where Xp is an element of the tangent space TpM for arbitrary p �M .
x5-3 Riemannian Metrics on Lie-groups and Homogeneous Spaces 129
where kerTe p is the kernel of Te p and
dom Te p � fX � TeG j ge�X� Y � � 0� Y � kerTe pg �5�3�3�
is the domain (the subspace orthogonal to kerTe p using the inner product provided on TeG).
By construction, Te p restricts to a vector space isomorphism T�e p,
T�e p : dom Te p � TpM
T�e p�X� :� Te p�X��
Thus, one may define an inner product on the tangent space TpM by
gMp �X� Y � � ge�T�e
�1p �X�� T�e
�1p �Y �
��
where T�e �1p �X� � TeG via the natural inclusion dom Te p � TeG. It is easily verified that
this construction defines a smooth inner product on the tangent bundle TM . Thus, one defines
a Riemannian metric,
gM�X� Y � :� gMq ��� ��� �5�3�4�
for q �M and X , Y in TqM . This is termed the normal metric on M .
Let q � M be arbitrary then the normal Riemannian metric on M and the right invariant
group metric on G are related by the differential of p : G� M . To see this observe that for
any p �M there exists � � G such that ��� p� � q. Thus,
q��� � ��� ��� p�� � ���� p�
� p � r�����
Differentiating at the identity gives the following commutating diagram of vector space homo-
morphisms.
130 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
d pd q
dr�
TqM
T�GTeG�
HHHHHHHHHHj �
In particular, the normal Riemannian metric can also be defined by
gM�X� Y � :� g�T�� �1p ���� T��
�1p ����� �5�3�5�
where g�� � is the right invariant group metric on G, X and Y in TqM and T�� p is the
restriction of T� p to
dom T� p � fY � T�G j g�Y�X� � 0� for all X � kerT� pg� �5�3�6�
the domain of T� p. Observe that dom T� p � dr��dom Te q�.
5.4 Gradient Flows
LetM be a Riemannian manifold (with Riemannian metric gM ) and let : M � Rbe a smooth
potential function. The gradient of on M is defined pointwise on M by the relationships
Djp ��� � gM�grad�p�� �
� � � TpM (5.4.1)
grad�p� � TpM� (5.4.2)
where Djp ��� is the Frechet derivative of in direction � at the point p � M (Helmke &
Moore 1994b, pg. 334). Existence follows from the positive definiteness and bilinearity of the
inner product along with linearity of the Frechet derivative.
Observe that grad is a smooth vector field on M which vanishes at local maxima and
minima of . Consider the ordinary differential equation on M , termed the gradient flow of ,
�p � grad�p�
x5-4 Gradient Flows 131
whose solutions are integral curves3 of grad. Let p0 � M be some initial condition then
solutions of the gradient flow with initial condition p0 exists and are unique (apply classical
O.D.E. theory to the local co-ordinate representation of the differential equation).
Let G be a Lie-group and : G�M � M be a smooth transitive group action of G on
M . Fix p �M and consider the ‘lifted’ potential � : G� R,
���� :� � p���� �5�4�3�
where p is given by (5.1.2). Let ge�� � be an inner product on TeG and define the right
invariant group metric g on G and the normal metric gM onM as described in Section 5.3. The
smooth potential : M � Rand the ‘lifted’ potential � give rise to the two gradient flows
�q � grad�q�� q �M� (5.4.4)
�� � grad���� � � G� (5.4.5)
defined with respect to the normal metric and the right invariant group metric respectively.
Lemma 5.4.1 Let p � M be some fixed element of M . Let q0 � p��0� (where p is given
by (5.1.2)) be an arbitrary initial condition in M . Let q�t� denote the solution of (5.4.4) with
initial condition q0 and let ��t� denote a solution of (5.4.5) with initial condition �0 then
q�t� � p���t���
Proof By construction q0 � p��0�. Consider the time derivative of p���t��
d
dt p���t�� � d p
�d
dt��t�
�� T� p � grad����t���
3Let X be a smooth vector field on a manifold M . An integral curve of X is a smooth map
� : R�M�
���t� � X���t��
where ���� :� d��
ddt
���
and d
dt
���
denotes the tangent vector toRat the point �R.
132 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Thus, it is sufficient to show that
T� p � grad���� � grad� p�����
and use the uniqueness of solutions to (5.4.4) and (5.4.5) to complete the proof.
Let grad� � grad�0 � grad�� be the unique decomposition of grad� into grad�0 �kerT� p and grad�� � dom T� p (cf. (5.3.6)). Observe that
g�grad�0���� grad�0���� � g�grad����� grad�0����
� D�j� �grad�0����
� Dj�p��� �T� p � grad�0���� � 0�
since T� p � grad�0��� � 0. Since the metric g is positive definite then grad�0 � 0 and it
follows that grad� � dom �T� p�.
Let q � M be arbitrary and choose � � G such that p��� � q. Let X � TqM be an
arbitrary tangent vector and observe
gM�T� p � grad����� X� � g�T��
�1p � T� p � grad����� T��
�1p �X�
�
using (5.3.5). Of course T�� �1p � T� p�grad����� � grad���� since grad� � dom �T� p�.
It follows that
gM�T� p � grad����� X
� g
�grad����� T��
�1p �X�
�� D�j� �T�� �1
p �X��
� Djq �T� p � T�� �1p �X��
� Djq �X� � gM�grad�q�� X
�
Since X is arbitrary and gM is positive definite then T� p � grad���� � grad� p���� and
the proof is completed.
x5-5 Convergence of Gradient Flows 133
5.5 Convergence of Gradient Flows
Let M be a Riemannian manifold and let : M � R be smooth function. Let grad denote
the gradient vector field with respect to the Riemannian metric on M . The critical points of
: M � Rcoincide with the equilibria of the gradient flow on M .
�q�t� � �grad�q�t��� �5�5�1�
For any solution x�t� of the gradient flow
d
dt�q�t�� � g�grad�q�t��� �q�t��
� �g�grad�q�t��� grad�q�t��� � 0
and therefore �q�t�� is a monotonically decreasing function of t. The following proposition
is discussed in Helmke and Moore (Helmke & Moore 1994b, pg. 360).
Proposition 5.5.1 Let : M � R be a smooth function on a Riemannian manifold with
compact sublevel sets4. Then every solution q�t� �M of the gradient flow (5.5.1) on M exists
for all t 0. Furthermore, x�t� converges to a connected component of the set of critical
points of as t� ��.
Note that the condition of the proposition is automatically satisfied if M is compact.
Solutions of a gradient flow (5.5.1) display no periodic solutions or strange attractors and there
is no chaotic behaviour. If has isolated critical points in any level set fq � M j �q� � cg,
c � R, then every solution of the gradient flow (5.5.1) converges to one of these critical points
as t � ��. This is also the case where the critical level sets are smooth submanifolds ofM .
In general, however, it is possible that the solution of a gradient flow converges to a connected
level set of critical points of the function . Such ‘non-generic’ behaviour is undesirable when
gradient flows are being used as a numerical tool. For problems of the type considered in this
thesis the following lemma is generally applicable.
Lemma 5.5.2 Let : M � Rbe a smooth function compact sublevel sets, such that
4Let c � R then the sublevel set of � associated with the value c is fq � M j ��q� � cg. If � has compactsublevel sets then each such set (possibly empty) is a compact subset of M .
134 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
(i) The set of critical points of is the union of closed, disjoint, critical level sets, each of
which is a submanifold of M .
(ii) The Hessian5 H at a critical point degenerates exactly on the tangent space of the
critical level sets of . Thus, for q � M a critical point of and � TqM then
Hjq �� � � 0 if and only if is in the tangent space of the critical level set of .
Then every solution of the gradient flow
�q � �grad�q�
converges exponentially fast to a critical point of .
Proof Denote the separate connected components of the critical level sets of by Ni for
i � 1� 2� � � � � K where K is the number of disjoint critical level sets. Thus, the limit set of a
solution to the gradient flow �x � grad is fully contained in someNj for j � �1� 2� � � � � K�. Let
a � Nj be an element of this limit set. Condition ii) ensures that each Nj is a non-degenerate
critical set. It may be assumed without loss of generality that the value of constrained to Nj
is zero. The generalised Morse lemma (Hirsch 1976, pg. 149) gives a open neighbourhoodUa
of a in M and a diffeomorphism f : Ua � Rn, n � dimM , nj � dimNj , such that
�i� f�Ua �Nj� � Rnj � f0g�ii� � f�1�x1� x2� x3� �
12�jjx2jj2 � jjx3jj2��
with x1 � Rnj , x2 � Rn� , x3 � Rn� and nj � n� � n� � n. Let W � f�Ua� � Rn then the
gradient flow of � f�1 on W is
�x1 � 0� �x2 � �x2� �x3 � x3� �5�5�2�
5Let M be a smooth manifold. The Hessian of a smooth function � : M � R is a symmetric bilinear mapH�jq : TqM TqM � Rgiven by
H�jq��� � � ��i
�2 ��
�xi�xj�j�
where x � fx1� � � � � xng is a local coordinate chart on M and ��, � are the local coordinate representations of � and � TqM while �� is a local coordinate representation of �.
x5-6 Lie-Algebras, The Exponential Map and the General Linear Group 135
W +
W-
W-
x2
x3
W +
Figure 5.5.1: Flow around a saddle point
Let W� :� f�x1� x2� x3� j jjx2jj jjx3jjg and W� :� f�x1� x2� x3� j jjx2jj � jjx3jjg. Using
the convergence properties of (5.5.2) it follows that every solution of original gradient flow
starting in f�1�W��f�x1� x2� x3� j x3 � 0g�will enter the region f�1�W�� for which � 0
(cf. Figure 5.5.1). On the other hand, every solution starting in ff�1�x1� x2� x3� j x3 � 0g will
converge to the point f�1�x1� 0� 0� � Nj � x1 � Rnj . As is strictly negative on f�1�W��,
all solutions starting in f�1 �W� SW� � ff�1�x1� x2� x3� j x3 � 0g will eventually and
converge to some Ni �� Nj. By repeating this analysis for each Ni and recalling that any
solution must converge to a connected subset of some Ni (Proposition 5.5.1) the proof is
completed.
5.6 Lie-Algebras, The Exponential Map and the General Linear
Group
Let G be a Lie-group. The Lie-algebra of G, denoted g, is the set of all left invariant smooth
vector fields on G, i.e. smooth vector fields X��� � T�G such that
X � l���� � dl�X���
where l���� :� �� is left multiplication by �. In particular,
X��� � dl�X�e�� �5�6�1�
136 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Let X��� and Y ��� be two smooth vector fields on G and think of them as derivations6 of
C��G� (the set of smooth functions which map G� R). The Lie-bracket of X��� and Y ���
is defined with respect to the action of X and Y as derivations
�X���� Y ����f � X���Y ���f � Y ���X���f�
where f � C��G�. By checking the linearity of this map it follows that the Lie-bracket of
two vector fields is itself a derivation and corresponds to a vector field, denoted �X� Y ����.
The set of smooth vector fields on G, denoted D�G�, is a vector space over Runder pointwise
addition of vector fields and scalar multiplication. Considering the Lie-bracket operation as a
multiplication rule, D�G� is given the structure of an algebra. Assume that X��� � X and
Y ��� � Y are left invariant vector fields on G, then �X� Y � � �X� Y ���� is also a left invariant
vector field on G, since
�dl��X� Y ��f � �X� Y ��f � l�� (5.6.3)
� X D�f � l��j� �Y �� Y D�f � l��j� �X�
� X Df jl���� �dl�Y �� Y Df jl���� �dl�X�
� X Df jl���� �Y � l��� Y Df jl���� �X � l�� � ��X� Y � � l��f�
Thus g forms a subalgebra of the algebra of derivations. Note that there is a one-to-one
correspondence between g and TeG the tangent space of G at the identity given by (5.6.1).
Thus, g is a finite dimensional algebra of the same dimension as the Lie-group G. Indeed,
an alternative way of thinking about g is as the tangent space TeG equipped with the bracket
operation
�X�e�� Y �e�� � �dl��X�e��� dl��Y �e����e��
6Let C��G� be the set of all smooth maps from G intoR. The setC��G� acquires a vector space structure underscalar multiplication and pointwise addition of functions. A derivation of C��G� is a linear mapC��G�� C��G�.The set of all derivations of C��G�, denoted D�G�, itself forms a vector space under scalar multiplication andpointwise addition of functions. A smooth vector field is a smooth mapX : G� TGwhich assigns a vectorX��� �T�G to each element � � G. Any smooth vector field X defines a derivation X�f� � Xf�� :� Df j� �X���,the Frechet derivative of f in direction X at the point . Indeed, this correspondence is an isomorphism
D�G� � fthe set of smooth vector fields on Gg� �5�6�2�
betweenD�G� and the vector space of smooth vector fields on G (Varadarajan 1984, pg. 5).
x5-6 Lie-Algebras, The Exponential Map and the General Linear Group 137
Example 5.6.1 Let N be a positive integer and consider the set of all real non-singularN �N
matrices
GL�N�R� � f� � RN�N j det��� �� 0g�
where det��� is the determinant of �. The set GL�N�R� is known as the general linear
group and is a Lie-group under the group operation of matrix multiplication. Since GL�N�R�
is an open subset of RN�N it inherits the relative Euclidean topology and differential struc-
ture. The tangent space at the identity IN (the N � N identity matrix) of GL�N�R� is
TINGL�N�R� � RN�N the set of all real N �N matrices. Consequently, the dimension of
the Lie-groupGL�N�R� is n � N 2. The tangent space ofGL�N�R� at a point � � GL�N�R�
can be represented by the image of TINGL�N�R� � RN�N via the linearization of the
diffeomorphism generated by left multiplication l�,
T�GL�N�R� � TIN l��TINGL�N�R��
� f�A j A � RN�Ng�
The Lie-algebra of GL�N�R�, denoted gl�N�R�, is the set of all left invariant vector fields of
GL�N�R�. From (5.6.1) it follows that
gl�N�R� � fX��� � �A j � � G� A � RN�Ng�
Let f � C��GL�N�R�� be any smooth real function then the Lie-bracket of two elements �A,
�B � gl�N�R� acting on f is
��A� �B�f � ��A���B�f � ��B���A�f
� ��A�Df j� ��B�� ��B� Df j� ��B��� D
�Df j� ��B�
j� ��A�� D�Df j� ��A�
j� ��B�Now since GL�N�R� inherits the Euclidean differential structure from RN�N the Frechet
derivative Df j� ��X�, X � RN�N , can be written
Df j� ��X� �NX
i�j�1
df
d�ij��X�ij
138 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
where Xij is the �i� j�’th entry of X � RN�N . Writing ��B�ij �PN
s�1 �isBsj and applying
the product rule of differentiation gives
D�Df j� ��B�
j� ��A� �NX
i�j�1
NXp�k�1
d2f
d�ijd�pk��B�ij��A�pk
�NX
i�j�1
df
d�ij
NXp�k�1
d
d�pk
�NXs�1
�isBsj
���A�pk
�NX
i�j�1
NXp�k�1
d2f
d�ijd�pk��B�ij��A�pk �
NXi�j�1
df
d�ij
NXk�1
Bkj��A�ik
since dfd�pk
�PNs�1 �isBsj
�� 0 unless p � i and s � k. It follows that
��A� �B�f �NX
i�j�1
df
d�ij
�NXk�1
��A�ikBkj �NXk�1
��B�ikAkj
�
�NX
i�j�1
df
d�ij���AB � BA��ij � ���AB �BA��f�
where ��AB � BA� is a smooth left invariant vector field on GL�N�R�. For any two
matrices A�B � RN�N define the matrix Lie-bracket by �A�B� � AB � BA. The bracket
operation on the Lie-algebra can now be written in terms of the matrix Lie-bracket operation
on TeGL�N�R� � RN�N
��A� �B� � ��A�B�� �5�6�4�
Indeed, it is usual to think of gl�N�R� as the set
gl�N�R� � fA j A � RN�Ng �5�6�5�
with the matrix Lie-bracket operation �A�B� � AB �BA. �
Let G and H be two Lie-groups and let g and h be their associated Lie-algebras. A
map : G � H is called a Lie-group homomorphism (or just homomorphism) from G to
H if is smooth and is a group homomorphism (i.e. �g1g�12 � � �g1��g2��1). A map
� : g � h is called a Lie-algebra homomorphism (or just homomorphism) from g to h if �
is linear and preserves the bracket operation ���X� Y �� � ���X�� ��Y ��. The tangent map
Te : TeG� TeH induces a map � : g� h, ��X� � dl�Te�X�e��, which is a Lie-algebra
homomorphism (Warner 1983, pg. 90). Abusing notation slightly it is standard to identify g
x5-6 Lie-Algebras, The Exponential Map and the General Linear Group 139
with TeG, h with TeH (cf. (5.6.1)) and write Te : g � h as the Lie-algebra homomorphism
induced by a Lie-group homomorphism : G � H . The following result is fundamental in
the theory of Lie-groups. A typical proof is given in Warner (1983, Theorem 3.27).
Proposition 5.6.2 Let G and H be Lie-groups with Lie-algebras g and h respectively and
assume thatG is simply connected. Let � : g� h be a Lie-algebra homomorphism, then there
exists a unique Lie-group homomorphism : G� H such that Te � �.
Let G be any Lie-group and denote its Lie-algebra by g. Denote the identity component
of G by Ge, the set of all points in G path connected to the identity e. Observe that R is a
Lie-group under addition. The Lie-algebra of R is a one-dimensional vector space r � � ddr ,
� � R, where ddr denotes the derivative inR. Let X � g be arbitrary and consider the map
� : r� g
���d
dr� :� �X�
It is easily seen that� is a Lie-algebra homomorphism and using Proposition 5.6.2 there exists
a unique Lie-group homomorphism
expX : R� Ge �5�6�6�
such that Te � �. Since expX is a Lie-group homomorphism then expX�t1 � t2� �
expX�t1� expX�t2� and the set
�tRexpX�t� � G
is known as a one-parameter subgroup of G.
One may define the full exponential by
exp : g� G (5.6.7)
exp�X� :� expX�1��
The exponential map is a local diffeomorphism from an open neighbourhoodN0 of 0 � g into
an open neighbourhood Me of e � G (Helgason 1978, Proposition 1.6).
140 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Lemma 5.6.3 Let exp : gl�N�R�� GL�N�R� denote the exponential map (5.6.7) and let
eX � IN �X �X2
2!�X3
3!
be the standard matrix exponential. Let X � gl�N�R� � RN�N , then
exp�X� � eX �
Proof Recall that the matrix exponential eX is well defined for all X � RN�N (Horn &
Johnson 1985, pg. 300). Let X � gl�N�R� (thought of as the set of N �N matrices equipped
with the matrix Lie-bracket) then define X : R� GL�N�R� by
X�t� � etX �
Observe that X is well defined smooth map since the matrix exponential is itself smooth and
always non-singular (det�eX� � etr�X� �� 0). Indeed, X is a group homomorphism (R is a
Lie-group under addition) since X�t1 � t2� � X�t1�X�t2� and X��t� � X�t��1. The
tangent space of Rat 0 is the set f� ddr j � � Rg where d
dr denotes normal derivation. Observe
that
TeX��d
dr� � DX j0 ��
d
dr�
� �d
dtetX
����t�0
� �X�
But this is exactly the Lie-algebra homomorphism� that induces the Lie-group homomorphism
expX (5.6.6). Since expX is the unique Lie-group homomorphism that has the property
Te expX��ddr� � �X then it follows that X�t� � expX�t�. The full result follows from the
definition of exp (5.6.7).
x5-7 Affine Connections and Covariant Differentiation 141
5.7 Affine Connections and Covariant Differentiation
Let G be a smooth manifold. An affine connection is a rule r which assigns to each smooth
vector field X � D�G� a linear mapping rX : D�G�� D�G�, rX�Y � :� rXY , satisfying
rfX�gY � frX � grY (5.7.1)
rX�fY � � frXY � �Xf�Y� (5.7.2)
where f , g � C��G� and X , Y � D�G�.
An affine connection naturally defines parallel transport on a manifold. Let X and Y �D�G� be smooth vector fields and let ��t� be a smooth integral curve of X on some time
interval �0� ��, � � 0, then the family of tangent vectors t � Y ���t�� is said to be transported
parallel to ��t� if
�rXY ����t�� � 0� �5�7�3�
Expressing (5.7.3) in local coordinates one can show that the relationship depends only on the
values of the vector fields X and Y along the curve ��t� (Helgason 1978, pg. 29). Thus,
given a curve ��t� on �0� �� and a smooth assignment �0� �� � Y �t� � T�t�G then Y �t� is
transported parallel to � if and only if any smooth extensions X , Y � D�G�, X���t�� � ���t�,
Y ���t�� � Y �t� satisfy (5.7.3).
A geodesic is a curve ��t� for which the family of tangent vectors ���t� is transported
parallel to the curve ��t�. It is usual to write the parallel transport equation for a geodesic
� : R� G as
r �� � 0� �5�7�4�
where by this one means that any smooth extensionX � D�G� of �� satisfiesrXX���t�� � 0.
Given a point � � G and a tangent vector X � T�G there exists a maximal open interval I� Rcontaining zero and a unique geodesic �X : I� G with ��0� � � and ���0� � X (Helgason
1978, pg. 30).
Given a fixed curve between two points �, � � G, (� : �0� ��� G, ��0� � �, ���� � � )
there exists a set of n linearly independent smooth assignments t � Yi�t� � T�t�G, i �
1� � � � � n, t � �0� �� (where each Yi�t� is transported parallel to �) and which span the set of all
142 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
smooth assignments t � Y �t� (Y �t� transported parallel to �) (Helgason 1978, pg. 30). These
solutions correspond to choosing n linearly independent vectors in T�G as initial conditions
and solving (5.7.3) for Yi�t�. The construction induces an isomorphism
P�0��� : T�G� T�G� P�0����Z� �nXi�1
ziYi���� �5�7�5�
where Z �Pn
i�1 ziYi�0� � T�G. Of course, this isomorphism will normally depend on the
on the curve �.
Parallel transport of a smooth covector field w : G� T�G, is defined in terms of its action
on an arbitrary vector field X � D�G�
�P�0���w��X� � w�P���0�X��
where P���0� is parallel transport from ���� backwards to ��0� along the curve ��t�. Parallel
transport of an arbitrary tensor field T : G � T�G � � T �G � TG � � TG of type
�r� s� is given by its action on arbitrary covector and vector fields
�P�0���T ��w1� � � � � wr�W1� � � � �Ws� � T �P���0�w1� � � � � P���0�wr� P���0�W1� � � � � P���0�Ws��
Parallel transport of a function f � C��G� is just
P�0���f � f�������
An affine connection on a manifold G induces a unique differentiation on tensor fields
known as covariant differentiation (Helgason 1978, pg. 40). It is usual to denote the covariant
differentiation associated with a given affine connection by the same symbolr. One may think
of covariant differentiation of a tensor T (with respect to a vector field X � D�G�) evaluated
at a point � � G, as the limit
�rXT ���� � lims�0
1s�P���0�T ���s��� T ����� �5�7�6�
where ��t� is the integral curve associated with X , ��0� � �, (Helgason 1978, pg. 42). In
particular, if T is a tensor of type �r� s� then rXT is also a tensor of type �r� s�. Considering
x5-7 Affine Connections and Covariant Differentiation 143
the above definition applied to a function f � C��G� one has
�rXf���� � lims�0
1sf���s��� f���
� Df j� �X���� � �Xf�����
Thus as expected, covariant differentiation on C��G� corresponds to derivation with respect
to the vector field X .
It is easily seen that covariant differentiation inherits property (5.7.1) from the affine
connection. To see that it satisfies the Leibniz formulae (rZ (T � R) � rZT�R�R�rZR)
one observes that any operation defined by a limit of the form (5.7.6) has the properties of a
classical derivative. A rigourous proof is given in Mishchenko and Fomenko (1980, pg. 329).
In particular, given a �0� 2� tensor g�� � contracted with two vector fields X and Y then
rZ�g�X� Y �� � rZg�X� Y � � g�rZX� Y � � g�X�rZY ��
GivenG a Riemannian manifold (with Riemannian metric g : TG�TG� R) there exists
a unique covariant differentiation satisfying
rXY �rYX � �X� Y � (5.7.7)
rZg � 0� (5.7.8)
for any smooth vector fields X , Y , Z � D�G� (Helgason 1978, pg. 48). The affine connection
associated with this covariant differentiation is known as the Levi-Civita connection. Consider
the action of the Levi-Civita connection on g�X� Y � for arbitrary vector fieldsX , Y ,Z � D�G�,
rZg�X� Y � � rZg�X� Y � � g�rZX� Y � � g�X�rZY �
� g�rXZ� Y � � g�X�rZY � � g��Z�X�� Y ��
By permuting the vector fieldsX , Y andZ, and then eliminatingrX andrY from the resulting
equations one obtains
2g�X�rZY � � Zg�X� Y ��g�Z� �X�Y ���Y g�X�Z��g�Y� �X�Z���Xg�Y�Z��g�X� �Y�Z����5�7�9�
144 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Since X , Y and Z are arbitrary, this equation uniquely determines the Levi-Civita connection
in terms of the metric g.
5.8 Right Invariant Affine Connections on Lie-Groups
Let G be a smooth manifold and let : G� G be a smooth map from G into itself. An affine
connection r on G is invariant under if
drXY � rd�XdY�
IfG is a Lie-group thenr is termed right invariant ifr is invariant under each map r���� :� ��,
� � G.
Lemma 5.8.1 Let G be a Lie-group. There is a one-to-one correspondence between right
invariant affine connections on G and bilinear maps
� : TeG� TeG� TeG�
given by
��Y� Z� � �rdr�Y dr�Z��e�� �5�8�1�
for Y , Z � TeG.
Proof If r is an affine connection then (5.8.1) certainly defines a bilinear map from TeG �TeG� TeG.
Conversely, given a bilinear map � : TeG� TeG � TeG, let fE1� � � � � Eng be a linearly
independent basis for TeG. Define the n smooth right invariant vector fields �Ei � dr�Ei,
i � 1� � � � � n. Thus, for arbitrary vector fields Y , Z � D�G� there exist functions yi � C��G�,for i � 1� � � � � n and zj � C��G�, for j � 1� � � � � n such that Y �
Pni�1 yi
�Ei and Z �Pnj�1 zj
�Ej . One defines rY : D�G�� D�G�
rY Z �nXi�1
yi
nXj�1
zjdr���Ei� Ej� � � �Eizj� �E
j� �5�8�2�
x5-8 Right Invariant Affine Connections on Lie-Groups 145
To see that r is well defined observe that both � �Eizj� �Ej and � are bilinear in �Ei and �Ej
and thus the definition is independent of the choice of fE1� � � � � Eng. To see that r is an
affine connection one observes that linearity in �Ei ensures that (5.7.1) holds; while for any
f � C��G�
rY �fZ� �
�� nXi�1
yi
nXj�1
fzjdr���Ei� Ej� � f� �Eizj� �E
j
�A �nXi�1
yi
nXj�1
zj� �Eif� �Ej
� frY Z � �Y f�Z�
and (5.7.2) also holds.
Consider two arbitrary vector fields Y and Z and observe that
rdr�Y dr�Z �nXi�1
yi
nXj�1
zjdr���Ei� Ej� � ��dr� �E
i�zj�dr� �Ej�
� dr�
�� nXi�1
yi
nXj�1
zj��Ei� Ej� � � �Eizj� �E
j
�A �
� dr�rY Z
since for any � � G
�dr� �Ei�zj��� � Dzj j� �dr� �Ei� (5.8.3)
� D�zj � r��j���1 � �Ei�
� Dzj j� � �Ei� � �Eizj����
Thus, r is a right invariant affine connection. Moreover, for any two right invariant vector
fields Y and Z
rY Z�e� � rdr�Yedr�Ze�e� � dre��Ye� Ze� � ��Ye� Ze��
and thusr satisfies (5.8.1). This completes the proof.
The following result provides an important relationship between the exponential map on
G (5.6.6), and geodesics with respect to right invariant affine connections. A proof for left
invariant connections is given in Helgason (1978, pg. 102).
146 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Proposition 5.8.2 Let r be a right invariant affine connection and let � be given by (5.8.1)
then for any X � TeG
��X�X� � 0�
if and only if the geodesic �X : R� Gwith ���0� � X is an analytic Lie-group homomorphism
ofR into G.
In particular, if �X is a group homomorphism then then �X must be the unique group
homomorphism with d�X�1� � X (cf. Proposition 5.6.2). Thus, if ��X�X� � 0 then the
geodesic �X is just
�X � expX � �5�8�4�
the exponential map (5.6.6).
LetG be a Lie-group with an inner product ge : TeG�TeG� Ron the tangent space at the
identity. Let g be the right invariant group metric (cf. (5.3.1)), then the Levi-Civita connection
defined by g is also right invariant. To see this one computes rdr�Zdr�Y for arbitrary vector
fields X , Y , Z � D�G�. Using (5.7.9) it follows that
2g�dr�X�rdr�Zdr�Y � � dr�Zg�X� Y � � g�dr�Z� dr��X� Y �� � dr�Y g�X�Z�
� g�dr�Y� dr��X�Z��� dr�Xg�Y� Z�� g�dr�X� dr��Y� Z���
since g is right invariant (cf. (5.3.2)) and d�X� Y � � �dX� dY � (Helgason 1978, pg. 24).
Recalling (5.8.3) one obtains
2g�dr�X�rdr�Zdr�Y � � Zg�X� Y � � g�Z� �X�Y �� � Y g�X�Z� � g�Y� �X�Z��
�Xg�Y� Z�� g�X� �Y�Z��
� 2g�X�rZY ��
But g is right invariant, and thus 2g�dr�X�rdr�Zdr�Y � � 2g�dr�X� dr�rZY � which shows
that dr�rZY � rdr�Zdr�Y .
Example 5.8.3 Consider the general linear group GL�N�R� (cf. example Section 5.6). The
tangent space of GL�N�R� at the identity is TINGL�N�R� � RN�N since GL�N�R� is an
x5-8 Right Invariant Affine Connections on Lie-Groups 147
open subset ofRN�N . Consider the Euclidean inner product on TINGL�N�R�
hX� Y i � tr�XTY ��
The tangent space ofGL�N�R� at a point � � G is represented as T�GL�N�R� � fX� j X �RN�Ng the image ofTINGL�N�R� viadr�. The right invariant metric forGL�N�R�generated
by h� i is just
g : T�GL�N�R�� T�GL�N�R�� R�
g�Y� Z� � tr����1�TY TX��1��
The Levi-Civita connection r associated with g can be explicitly computed on the set of
right invariant vector fields on GL�N�R�. Let X , Y , Z � RN�N then X�, Y � and Z� are
the unique right invariant vector fields associated with X , Y and Z. Using (5.7.9) one has
2g�X��rZ�Y �� � Z�g�X� Y � � g�Z�� �X�� Y��� � Y �g�X�Z� � g�Y� �X��Z���
�X�g�Y� Z�� g�X�� �Y�� Z����
Now �Y �� Z�� is certainly right invariant (since d�X� Y � � �dX� dY � (Helgason 1978, pg.
24)). In particular, observe that Z�g�X� Y � � 0 � Y �g�X�Z� � X�g�Y� Z� since in each
case the metric computation is independent of �. Parallelling the argument leading to (5.6.4)
given in the example Section 5.6 for right invariant vector fields one obtains
�A��B�� � �BA �AB�� � ��A�B���
Using this it follows that
2g�X��rZ�Y �� � �g�Z� �X� Y ��� g�Y� �X�Z��� g�X� �Y� Z��
� tr�ZT �Y�X �� Y T �Z�X ��XT �Y� Z���
148 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Evaluating the left hand side of this equation at � � IN and writing rZ�Y ��e� � ��Z� Y �
one obtains7
tr�XT��Z� Y �� �12
tr��ZT � Y �X � �Y T � Z�X �XT �Y� Z��
�12
tr�XT �Y T � Z� �XT �ZT � Y � �XT �Y� Z���
Consequently the bilinear map � for the Levi-Civita connection is given by
��Y� Z� �12
��Y T � Z� � �ZT � Y � � �Y� Z�
�� �5�8�5�
Note that ��X�X� � �XT � X � is zero if and only if X is a normal matrix, (i.e. commutes with
its transpose). Consequently, the exponential exp�tX� � etX on GL�N�R� is a geodesic if
and only if X is normal. �
5.9 Geodesics
In this section the relationship between geodesics on a Lie-group to geodesics on a homogeneous
space equipped with the normal metric is outlined. Though intuitively natural the result
is difficult to prove. The approach taken is to construct a coordinate basis which block
decomposes the Riemannian metric on the Lie-group into two parts, one of which is related
to the homogeneous space and the other of which lies in the kernel of the group action. This
construction is of interest in itself and justifies the somewhat long proof.
Let G be a Riemannian Lie-group with metric denoted g. Let r denote the Levi-Civita
connection. If � � G is arbitrary then the geodesics through � are just the curves ��t� :�
r� � �X�t� � �X�t�� where �X is a geodesic of G passing through e, the identity of G. To see
this one computes (cf. (5.7.4))
r ��� � rdr� dr� ��
� dr�r �� � 0�
7One also needs the easily verified results tr�A�B�C�� � tr��A�B�C�, �AT �B�T � �BT �A� and tr�A� � tr�AT �for arbitrary matrices A, B, C �RN�N .
x5-9 Geodesics 149
When dealing with a Riemannian manifold (equipped with the Levi-Civita connection)
there is an equivalent characterisation of geodesics using variational arguments. Loosely,
geodesics are curves of minimal (or maximal) length between two given points on a manifold.
The following result is proved in Mishchenko and Fomenko (1980, pg. 417).
Proposition 5.9.1 Let G be a Riemannian manifold with metric denoted g. Consider the cost
functional
E��� �Z 1
0g� ������ ������d��
on the set of all smooth curves � : �0� 1�� G. Then the extremals of E��� are geodesics on
M .
The cost functional E��� measures the action of a curve �. The length of a curve � is
measured by the related cost functional
L��� �Z 1
0
qg� ������ ������d��
Extremals of L��� correspond to curves that minimise (or maximise) the curve length between
��0� and ��1� on M . Extremals of E��� are also extremals of L��� (Mishchenko & Fomenko
1980, Theorem 2, pg. 417), however, the converse is not true. The reason for this is that the
uniqueness of geodesics ensures that a geodesic � : �0� 1� � G is uniquely parametrized by
t � �0� 1� whereas any curve ��t� � � � T , for T : �0� 1� � �0� 1� a smooth map, will have
the same length and consequently is an extremal of L���.
Theorem 5.9.2 Let G be a Lie-group, M be a smooth manifold and : G �M � M be a
smooth transitive group action of G on M . Let g denote a right invariant Riemannian metric
on G and gM denote the induced normal metric on M . If � : R� G is a geodesic (with
respect to the Levi-Civita connection) on G then the curve � : R�M
��t� :� ���t�� p��
is a geodesic (with respect to the Levi-Civita connection generated by the induced normal
metric) on M .
Proof It is necessary to develop some preliminary theory before proving the main result.
150 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Denote the dimension of G by n and the dimension of M by m. Let p � M be arbitrary and
recall that the stabilizer of p, H � stab�p� � f� � G j ��� p� � pg is a Lie-subgroup of
of dimension n �m of G. In particular, h the Lie-algebra of H is a Lie-subalgebra of g the
Lie-algebra of G. Let X � h and let exp be the exponential map on G, then t � exp�tX� is
a smooth curve lying in H (Warner 1983, pg. 104). Moreover, let fEm�1� � � � � Eng be a basis
for TeH then one can choose local coordinates for H around e
�xm�1� � � � � xn� � exp�xm�1Em�1� exp�xm�2Em�2� exp�xnEn��
These coordinates are known as canonical coordinates of the second kind for H and are
described in Varadarajan (Varadarajan 1984, pg. 89).
Extend the partial basis fEm�1� � � � � Eng of TeG to a full basis of TeG choosing the
remaining tangent vectors fE1� � � � � Emg to satisfy
g�Ei� Ej� � 0� i � 1� � � � � m� j � m� 1� � � � � n�
Let � � G be an arbitrary point in G and define canonical coordinates of the second kind on
G, centred at �,
x : Rn � G�
x�x1� � � � � xn� :� exp�x1E1� exp�x2E2� exp�xnEn���
Identify Rn � Rm �Rn�m as a canonical decomposition into the first m coordinates and the
remaining n�m coordinates. Define the two inclusion maps
i1 : Rm � Rn� i1�x1� � � � � xm� :� �x1� � � � � xm� 0� � � � � 0�
i2 : Rn�m � Rn� i2�xm�1� � � � � xn� :� �0� � � � � 0� xm�1� � � � � xn��
One now has maps
x1 : Rm � G� x1 :� x � i1 � exp�x1E1� exp�xmEm���
x5-9 Geodesics 151
and
x2 : Rn�m � G� x2 :� x � i2 � exp�xm�1Em�1� exp�xnEn���
The map x2 is just the canonical coordinates of the second kind for the embedded submanifold
r��H� � H�. The relationship of these maps is shown in the commutative diagram, Figure
5.9.
Observe that the range of dx1 is exactly dr��spfE1� � � � � Emg� since the map xi � r� �exp�xiEi�, which is exactly x�0� � � � � 0� xi� 0� � � � � 0� has differential
dx��
�xi� � dr�Ei � �Ei� �5�9�1�
where �Ei is the unique right invariant vector field on G associated withEi � TeG. In addition,
one has d p �Ei � 0 for i � m � 1� � � � � n since H is a coset of the stabilizer stab�p�. Recall
the definition of dom d p (cf. (5.3.6)). It follows directly that dom d p � spf �E1� � � � � �Emg.
Consider the map
y : Rm �M�
y�y1� � � � � ym� :� p � x � i1�y1� � � � � ym��
Observe from the above discussion that the differential dy � d p � dx1 is bijection. Thus,
the map y forms local coordinates for the manifold M centred at p���. This completes the
construction of the local coordinate charts shown in the commutative diagram, Figure 5.9.
������������
PPPP
PPPP
PPPPi
R � M
�R G
��������
�
Rm
Rn
�
�
�
��
p
y
i1
i2
x�
� R���
�
x1
x2
Rn�m
Figure 5.9.1: A commutative diagram showing the various coordinate charts and smooth
curves on G and M constructed in the proof of Lemma 5.9.2.
152 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
Consider the local coordinate representation of the Riemannian metric g in the coordinates
x. Canonically associating tangent vectors of Rn at a point x, Z � TxRn with the full space
Rn
Z �nXi�1
zi�
�xi � �z1� � � � � zn��
then the local coordinates representation of the metric g, denoted g� can be written in matrix
form
g��Y� Z� � Y TG�x�Z�
where G�x� � Rn�n is a positive definite, symmetric matrix. Now consider arbitrary vector
fields Y � �y1� � � � � ym� 0� � � � � 0� and Z � �0� � � � � zm�1� � � � � zn� then
g��Y� Z� � g�dxY� dxZ�
�mXi�1
nXj�m�1
yizjg�Ei� Ej� � 0�
Thus, the matrix G�x� is block diagonal of form
G�x� �
�B� G11�x� 0
0 G22�x�
�CA �
Moreover, since the maps shown in Figure 5.9 are commutative and the metric gM on M is
induced by the action of g on dom d p � spf �E1� � � � � �Emg it is easily shown that the local
coordinate representation of gM onRm is
g�M�Y� Z� � Y TG11�i1�y��Z � �di1Y �TG�i1�y���di1Z��
I proceed now to prove the main result. Let � : R� G be a geodesic and define
� : R� M�
��t� :� p � ��t��
Let � � R be a parameter and consider any one parameter smooth variation �� of the curve �
on M . Assume that �0 � � and ���t� is a smooth map from R� M . Both � and �� have
local coordinate representations on Rm in the coordinates described above. Denote the local
x5-9 Geodesics 153
coordinate representations by �� :� y�1 � � and ��� :� y�1 � ��. Let �� : R� Rm be the
smooth curve
�� :� ��� � ���
since subtraction of vectors is well defined in Rm. The curves �, �, �� and �� are shown
on the commutative diagram, Figure 5.9. Denote the local coordinate representation of � by
�� � x�1 � �. Observe that since � is a geodesic of G then �� is a geodesic of Rn equipped
with the metric g� (Mishchenko & Fomenko 1980, Lemma 3, pg. 345). Consider the following
one parameter smooth variation of ��
��� :� � � � i1 � �� � ���1 � ����1� � � � � ��m � ����m� �
�m�1� � � � � �
�n��
The action E����� onRn with respect to the Riemannian metric g� is
E����� �
Z 1
0g�� �� ������ ��
������d�
�Z 1
0� �������1� � � � � ��
�����m� 0� � � � � 0�
TG�x�� �������1� � � � � �������m� 0� � � � � 0�
�2� �� �����1� � � � � �������m� 0� � � � � 0�
TG�x��0� � � � � 0� �� �����m�1� � � � � �������n�
� �0� � � � � 0� �� �����m�1� � � � � �������n�
TG�x��0� � � � � 0� �� �����m�1� � � � � �������n�d��
The middle term of this expansion is zero due to the block diagonal structure of G�x�while the
last term is independent of � since the perturbation i1 ��� only enters in the firstm coordinates.
Thus, recalling the construction of ���, one has
d
d�E�����
������0
�d
d�
Z 1
0� �� ����1� � � � � ��
����m� 0� � � � � 0�TG�x�� ������1� � � � � ������m� 0� � � � � 0�
�� ������1� � � � � ������m� 0� � � � � 0�TG�x�� ������1� � � � � ������m� 0� � � � � 0�d�
�����0
�d
d�
Z 1
0� �������1� � � � � ��
�����m�
TG11�x�� �������1� � � � � ��
�����m�d�
�������0
�
�d
d�
Z 1
0g�M� ��
������ ��
������d�
�������0
�d
d�E�����d�
������0
However, since �� is a geodesic it follows that � � is extremal and
d
d�E����
������0
� 0�
154 Gradient Flows on Lie-Groups and Homogeneous Spaces Chapter 5
which means that the derivative dd�E��
��������0
� 0. Thus, dd�E����
�����0
� 0 on M and since
�� is an arbitrary smooth one parameter perturbation it follows that � is an extremal of the
action E on M . From Proposition 5.9.1 one now concludes that � is a geodesic and the proof
is complete.
Remark 5.9.3 It is also possible to construct geodesics on G from geodesics on M . Let
� : R � M be a geodesic and define � : R � G, by � :� x � i1 � y�1 � �. As above,
define �� :� y�1 � � and �� :� x�1 � � � i1 � ��. Then let �� : R � Rn be any one
parameter perturbation of �, ��� :� x�1 ��� and define �� : � �����. This construction induces a
perturbation in �� given by ��� :� �������1� � � � � ����m
������1� � � � � ����m
. Furthermore,
one has by construction that the i’th component of �� is zero for i � m � 1� � � � � n. It follows
that
d
d�E����
������0
�d
d�E�����
������0
�d
d�
Z 1
0� �������1� � � � � ��
�����m� 0� � � � � 0�
TG�x�� �������1� � � � � �������m� 0� � � � � 0�
��0� � � � � 0� ������m�1� � � � � ������n�TG�x��0� � � � � 0� ������m�1� � � � � ������n�d�
�����0
�d
d�E�����
������0
�d
d�E��2
��
������0
�
Here �2� denotes the curve in Rn with first m components zero and the remaining n � m
components given by the corresponding components of ��. Observe that �20 � 0 since �0 � 0
and thus � � 0 is a local minimum of E��2�� since E is positive definite. It follows that
dd�E��
2�������0
� 0 while the first relationship dd�E��
��������0
� 0 since �� is a geodesic. This
shows thatd
d�E����
������0
� 0
for any one-parameter perturbation �� of � which proves that � is a geodesic. �
Chapter 6
Numerical Optimization on
Lie-Groups and Homogeneous Spaces
The numerical algorithms proposed in Chapters 2 to 4 are based on a single idea, that of
interpolating the integral solution of a gradient flow via a series of curves lying wholly within
the constraint set. For each iteration, the particular curve chosen is tangent to the gradient flow at
the present estimate and the next estimate is evaluated using a time-step chosen to ensure the cost
function is monotonic decreasing (for minimisation problems) on the sequence of estimates
generated. Algorithms of this type are related to the classical gradient descent algorithms
on Euclidean space, for which the interpolation curves are straight lines. Consequently, the
algorithms proposed in the preceding chapters are termed modified gradient descent algorithms
where the modification is the use of a curve rather that a straight line to interpolate the gradient
flow.
The property of preserving the constraint while solving the optimization problem is a
fundamental property of the algorithms proposed. This property, termed constraint stability
(cf. page 2) is conceptually related to recent work in developing numerical integration schemes
that preserve invariants of the solution of an ordinary differential equation. Results on numerical
integration methods for Hamiltonian systems are particularly relevant to the general class of
problems considered in this thesis. Early work in this area is contained in the articles (Ruth
1983, Channell 1983, Menyuk 1984, Feng 1985, Zhong & Marsden 1988). A recent review
155
156 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
article of this work is Sanz-Serna (1991). Following from this approach is a general concept
of numerical stability (Stuart & Humphries 1994) which is loosely defined as the ability of
a numerical integration method to reproduce the qualitative behaviour of the continuous-time
solution of a differential equation. The development given in Stuart and Humphries (1994) is
not directly applicable to the solution of optimization problems since it is primarily focussed on
integration methods and considers only a single qualitative behaviour at any one time, either the
preservation of an invariant of a flow (Hamiltonian systems) or the convergence of the solution
to a limit point (contractive problems). In contrast, the optimization problems discussed in
this thesis require two simultaneous forms of numerical stability, namely preservation of the
constraint relation and convergence to a limit point within the constraint set.
This leads one to consider what properties a numerical method for optimization on a
homogeneous space should display. In Chapter 1 the three properties of simplicity, global
convergence and constraint stability were defined (page 2) in the context of numerical methods
for on-line and adaptive processes. The modified gradient descent algorithms proposed in the
early part of this thesis all displayed these properties. It is natural to ask whether the proposed
algorithms are in fact closely related. In particular, since the the only difference between the
proposed algorithms is in the curves used to interpolate the gradient flow it is important to
investigate the properties of these curves more carefully. Indeed, one may ask whether the
choice of curves can be justified or whether there may be more suitable choices available.
In this chapter I begin by reviewing the gradient descent algorithms proposed in Chapters 2
to 4 and using the theoretical results of Chapter 5 to develop a mathematical framework which
explains each algorithm as an example of the same concept. This provides a design procedure
for a deriving numerical methods suitable for solving any constrained optimization problem on
a homogeneous space.
The remainder of the chapter is devoted to developing a more sophisticated constrained
optimization algorithm exploiting the general theoretical framework provided by Chapter 5.
The method considered is based on the Newton-Raphson method reformulated (in coordinate
free form) to evolve explicitly on a Lie-group. Local quadratic convergence behaviour is proved
though the method is not globally convergent. To provide an interesting example the symmetric
eigenvalue problem is considered (first discussed in Chapter 2) and a Newton-Raphson method
derived for this case. It is interesting to compare the behaviour of this example with the classical
x6-1 Gradient Descent Algorithms on Homogeneous Spaces 157
shifted QR algorithm, however, it is not envisaged that the proposed method is competitive for
solving traditional problems. The interest in such methods is for solving numerical problems
for on-line and adaptive processes.
The chapter is divided into five sections. Section 6.1 discusses the theoretical foundation
of the modified gradient descent algorithms proposed in Chapters 2 to 4 and develops a general
template for generating such methods. Section 6.2 develops the general form of the Newton-
Raphson iteration on a Lie-group and proves quadratic convergence of the algorithm in a
neighbourhood of a given critical point. Section 6.3 provides a coordinate free formulation of
the Newton-Raphson algorithm. The theory is applied to the symmetric eigenvalue problem in
Section 6.4 and a comparison is made to the performance of the QR algorithm.
6.1 Gradient Descent Algorithms on Homogeneous Spaces
In this section the numerical algorithms proposed in Chapters 2 to 4 are discussed in the context
of the theoretical discussion of Chapter 5.
Recall the constrained optimization problem posed in Chapter 2 for computing the spectral
decomposition of a matrix H0. The algorithm proposed for this task was the double-bracket
algorithm (2.1.4)
Hk�1 � e��k �Hk�D�Hke�k�Hk�D��
where1 D � diag��1� � � � � �N�. The algorithm has the property of explicitly evolving on the
set
M�H0� � fUTH0U j U � O�N�g
of all orthogonal congruency transformations of H0. The set of orthogonal matrices O�N� is
certainly an abstract group and indeed is a Lie-subgroup ofGL�N�R� (Warner 1983, pg. 107).
The orthogonal group O�N� features in all of the numerical algorithms considered and it
seems a good opportunity to review its geometric structure.
1. The identity tangent space of O�N� is the set of skew symmetric matrices (Warner 1983,
1In Chapter 2 the diagonal target matrix was denoted N , however, to avoid confusion with the notation ofChapter 5 the target matrix is now denoted D and the dimension of the matrices is denoted N .
158 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
pg. 107)
TINO�N� � Sk�N� � f� � RN�N j � � ��T g�
2. The tangent space at a point U � O�N� is given by the image TINO�N� via the
linearization of rU : O�N�� O�N�, rU�W � :� WU (right translation by U )
TUO�N� � f�U � RN�N j � � Sk�N�g� �6�1�1�
3. By inclusion Sk�N� � RN�N is a Lie-subalgebra of the Lie-algebra gl�N�R� of
GL�N�R�. In particular, Sk�N� is closed under the matrix Lie-bracket operation
�X� Y � � Sk�N� if X and Y are skew symmetric.
4. The scaled Euclidean inner product on Sk�N�
h�1��2i � 2tr��T1 �2�
generates a right invariant group metric on O�N�,
g��1U��2U� � 2tr��T1 �2�� �6�1�2�
Observe that g��1U��2U� � 2tr�UT�T1 �2U� � h�1U��2Ui since UTU � IN . Thus
the right invariant group metric onO�N� is the scaled Euclidean inner product restricted
to each individual tangent space.
5. The Levi-Civita connection generated by the right invariant group metric (6.1.2) (cf.
Example 5.8.3) is associated with the bilinear map � : Sk�N�� Sk�N�� Sk�N�,
���1��2� � ��1��2��
This follows directly from (5.8.5) while observing that� � Sk�N� implies ��T ��� � 0.
The extra factor of 2 in(6.1.2) cancels the factor of 1�2 in (5.8.5).
6. The value of ������ � 0 for any � � Sk�N� and thus all curves
��t� � exp�t��
x6-1 Gradient Descent Algorithms on Homogeneous Spaces 159
are geodesics on O�N� passing through IN at time t � 0. By uniqueness this includes
all the possible geodesics on O�N� passing through IN .
7. Geodesics on O�N� passing through U � O�N� and with tangent vector ���0� � �U �TUO�N� at time t � 0 are given by (cf. Section 5.9)
��t� � exp�t��U�
Recall once more the double-bracket algorithm (2.1.4) Hk�1 � e��k�Hk�D�Hke�k�Hk�D�,
mentioned above. In Section 2.5 the associated orthogonal algorithm
Uk�1 � Uke�k �U
TkH0Uk�D�
was discussed and shown to be related to the double-bracket equation via the algebraic rela-
tionship
Hk � UTk H0Uk�
Unfortunately, Uke�k�UTkH0Uk�D� does not appear to be in the correct form for a geodesic
exp�t��U onO�N�. The reason for this lies in the characterisation ofM�H0� � fUTH0U j U �O�N�g. In particular, �U�H� � UTHU is not a group action of O�N� on M�H0�. The use
of this awkward definition for M�H0� is historical (cf. Brockett (1988) and the development
in Helmke and Moore (1994b, Chapter 2)). By considering the related characterisation
M�H0� � fWH0WT jW � O�N�g�
M�H0� is seen to be a homogeneous space with transformation group O�N� and group action
�W�H� � WHWT . Of course, all that has been done is to take the transpose of the orthogonal
matrices. It is easily shown that the associated orthogonal iteration for the new characterisation
of M�H0� is
Wk�1 � e��k �WkH0WTk�D�Wk�
Observe that this iteration is constructed from geodesics on O�N�. Thus, the associated
orthogonal iteration for the double-bracket algorithm is a geodesic interpolation of the flow
�W � �WH0WT � D�W . Using Lemma 5.9.2 geodesics on O�N� will map to geodesics on
M�H0� and one concludes that the the double-bracket algorithm itself is a geodesic interpolation
160 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
of the double-bracket flow
�H � �H� �H�D���
Recall that geodesics are curves of minimum length between two points on a curved surface
and are the natural generalization of straight lines to non-Euclidean geometry. Then, at least
for the double-bracket algorithm, the question posed in the introduction to this chapter, whether
the choice of interpolating curves in the proposed numerical algorithms is justified, is answered
in the affirmative.
It should not come as a surprise that the other algorithms proposed in Chapters 2 to 4 are
also geodesic interpolations of continuous-time flows. The algorithm proposed in Section 2.4 is
based directly on the double-bracket equation and can be analysed in exactly the same manner.
In Chapter 3 the Rayleigh gradient algorithm (3.2.1) is immediately in the correct form to
observe its geodesic nature. Indeed, for the rank-1 case (cf. Subsection 3.4.1) the geodesic
nature of the recursion has already been observed explicitly. Finally the pole placement
algorithm (4.6.1) proposed in Chapter 4
i�1 � ie��i�T
iFi�Q�T
iFi�A���
� e��i�F�iQ�TiFi�A�T
i� i
is explicitly a geodesic interpolation of the gradient flow (4.4.7)
� � �F� Q�A� TF � T � �
evolving directly on the Lie-group O�N�.
Thus, the algorithms proposed in Chapters 2 to 4 form a template for a generic numerical
approach to solving optimization problems on homogeneous spaces associated with the orthog-
onal group. In every case considered exponential interpolation of the relevant continuous-time
flow is equivalent to geodesic interpolation of the flow due to the specific structure of O�N�.
Care should be taken before the same approach is used for more abstract Lie-groups (the easily
constructed exponential interpolation curves may no longer be geodesics), nevertheless, the
basic structure of the algorithms presented is extremely simple and could be applied to almost
any optimization problem on a homogeneous space. Of course, step-size selection schemes
x6-2 Newton-Raphson Algorithm on Lie-Groups 161
must be determined for each new situation and the stability analysis depends on the step-size
selection. The basic properties of the algorithms will remain consistent, however, and provide
a useful technique for practical problems where the properties of constraint stability and global
convergence (cf. page 1) are more important than those of computation cost.
6.2 Newton-Raphson Algorithm on Lie-Groups
In this section a general formulation of the Newton-Raphson algorithm is proposed which
evolves explicitly on a Lie-group. Interestingly, the iteration can be expressed in terms of
Lie-derivatives and the exponential map. In practise, one still has to solve a linear system of
equations to determine the regression vector.
The Newton-Raphson algorithm is a classical (quadratically convergent) optimization tech-
nique for determining the stationary points of a smooth vector field (Kincaid & Cheney 1991,
pg. 64). Given Z : Rn � Rn a smooth vector field2 on Rn, let p � Rn be a stationary
point of Z (i.e. Z�p� � 0) and let q � Rn be an estimate of the stationary point p. Let
k � �k1� k2� � � � � kn�, with k1� � � � � kn non-negative integers be a multi-index and denote its
size by jkj � k1 � k2 � � kn. Expanding Z as a Taylor series around q one obtains for
each element of Z � �Z1� Z2� � � � � Zn�,
Zi�x� � Zi�q� ��X
jkj�1
�h1�k1 �hn�knk1! kn!
�jkjZi
��x1�k1 ��xn�kn �q�
where h � x � q � Rn and hj is the j’th element of h, and the sum is taken over all
multi-indices k with jkj � j for j � 1� 2� � � �. The Taylor series of an analytic3 function is
uniformly and absolutely convergent in a neighbourhood of q (Fleming 1977, pg. 97). Indeed,
if q is a good estimate of p one expects that only the first few terms of the Taylor series are
sufficient to provide a good approximation of Zi. Assume that p is known and consider setting
2When dealing with Euclidean space one naturally associates the element � �x i of the basis of TxRn with thebasis element ei of Rn (the unit vector with a 1 in the i’th position). This induces an isomorphism TxR
n � Rn
(Warner 1983, pg. 86) and one writes a vector field as map Z : Rn � Rn rather than the technically more correctZ : Rn � TRn , Z�x� � TxR
n .3In fact a smooth function f � C�M� on a smooth manifold M is defined to be analytic at a point p �M if the
Taylor series of f �, the expression of f in local coordinates centred at p, is uniformly and absolutely convergent ina neighbourhood of 0.
162 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
x � p � h � q. Ignoring all terms with jkj 2 one obtains the approximation
0 � Zi�p� � Zi�q� �nXj�1
�Zi
�xj�q�hj �
The Jacobi matrix is defined as the n � n matrix with �i� j�’th element �JqZ�ij � �Zi
�xj�q�
(Mishchenko & Fomenko 1980, pg. 16). Thus, the above equation can be rewritten in matrix
form as 0 � Z�q� � JqZ h. When JqZ is non-singular one can solve this relation uniquely
for h, an estimate of the residule error between q and p. Thus, one obtains a new estimate q� of
p based on the previous estimate q and the correction h
q� � q � h�
This estimate is the next estimate of the Newton-Raphson algorithm. Given an initial estimate
q0 � Rn, the Newton-Raphson algorithm is:
Algorithm 6.2.1 [Newton-Raphson Algorithm onRn]
Given qk � Rn compute Z�qk�.
Compute the Jacobi matrix JqkZ given by �JqkZ�ij ��Zi
�xj�qk�.
Set h � ��JqkZ��1Z�qk�.
Set qk�1 � qk � h.
Set k � k � 1 and repeat.�
The convergence properties of the Newton-Raphson algorithm are given by the following
proposition (Kincaid & Cheney 1991, pg. 68).
Proposition 6.2.2 Let Z : Rn � Rn be an analytic vector field on Rn and p � Rn be a
stationary point of Z. Then there is a neighbourhood U of p and a constant C such that the
Newton-Raphson method (Algorithm 6.2.1) converges to p for any initial estimate q0 � U and
the error decreases quadratically
jjqk�1 � pjj � Cjjqk � pjj2�
It is not clear how best to go about reformulating the Newton-Raphson algorithm on an
x6-2 Newton-Raphson Algorithm on Lie-Groups 163
arbitrary Lie-group G. One could use the Euclidean Newton-Raphson algorithm in separate
local coordinate charts on G. Care must be taken, however, since local coordinate charts may
display extreme sensitivity to perturbation in the Euclidean coordinates, leading to numerically
ill conditioned algorithms.
Given a Lie-groupG, let � C��G� be an analytic real function onG. Denote the identity
element of G by e and associate the tangent space TeG with the Lie-algebra g of G in the
canonical manner (cf. Section 5.6). For X � TeG arbitrary define a right invariant vector
field �X � D�G� by �X � dr�X , for i � 1� � � � � n, where r���� :� �� (cf. (5.1.1) and the
analogous definition for left invariant vector fields (5.6.1)). Recall that the map t � exp�tX�
(where the exponential is the unique Lie-group homomorphism associated with the Lie-algebra
homomorphism ��d�dt� � �X , cf. (5.6.7)) is an integral curve of �X passing through e at
time zero. Given � � G arbitrary, the map t � exp�tX�� is an integral curve of the right
invariant vector field �X passing through the point � � G at time zero. It follows directly from
this observation that
� �X��exp��X��������t
�d
dt�exp��X���
������t
�
Indeed, there is a natural extension of this idea which generalizes to higher order derivatives.
These derivatives can be combined into a Taylor theorem for analytic real functions on a Lie-
group. Proposition 6.2.3 is proved in Varadarajan (Varadarajan 1984, pg. 96) and formalises
this concept. Before this result can be stated it is necessary to introduce some notation.
Notation: Let k � �k1� k2� � � � � kn�, with k1, k2� � � � non-negative integers, represent a multi-
index and denote its size by jkj � k1 � k2 � � kn. Let Z1� � � � � Zn be n objects and
let t � �t1� � � � � tn� be any set of n real numbers. The set of objects (in Proposition 6.2.3
the objects will be vector fields) of the form t1Z1 � � tnZn forms a vector space under
addition and scalar multiplication. One also considers formal products of elements, for example
�t1Z1��t2Z2��t1Z1� � t21t2�Z1Z2Z1�, where the scalar multiplication is commutative but
multiplication between elements Z1 and Z2 is non-commutative. One defines an additional
element 1 � Z0 which acts as a multiplicative identity,Z0�t1Z1� � �t1Z1� � �t1Z1�Z0. Given
a multi-index k � �k1� k2� � � � � kn� consider a second multi-index �i1� � � � � ijkj� with jkj entries
ip � �0� 1� � � � � n� such that the number of occurrences where ip � j for 1 � j � n is exactly
164 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
kj . Let Z � t1Z1 � � tnZn then the formal power Zk is defined by
�t1Z1 � � tnZn�k �
1jkj!
X�i1�i2��ijkj�
�tk11 tknn ��Zi1Zi2 Zijkj��
In other words, the sum is taken over all permutations of elements of the form �ti1Zi1��ti2Zi2�
�tijkjZijkj� such that there are exactly k1 occurrences of t1Z1, k2 occurrences of t2Z2 etc.
Of course, if the size of jkj is equal to either zero or one then the situation is particularly simple
�t1Z1 � � tnZn�k � 1� for jkj � 0
�t1Z1 � � tnZn�k � tkjZkj � for jkj � 1�where kj � 1 is the only
nonzero element of k�
Proposition 6.2.3 Given G a Lie-group and � C��G� an analytic real function in a neigh-
bourhood of a point � � G, let X1� � � � � Xn � TeG be a basis for the identity tangent space of
G. Define the associated right invariant vector fields �Xi � dr�Xi, for i � 1� � � � � n and let k
represent a multi-index with n entries. The asymptotic expansion
�exp�t1X1 � � tnXn��
�
�Xjkj�0
tk11 tknnk1! kn!
�� �X1 � � �Xn�k���� �6�2�1�
converges absolutely and uniformly in a neighbourhood of �.
Let G be a Lie-group and � C��G� be an analytic map on G. Choose a basis
X1� � � � � Xn � TeG for the identity tangent space of G and define the associated right in-
variant vector fields �Xi � dr�Xi, for i � 1� � � � � n. Expressing as a Taylor series around a
point � � G one has
�exp�t1X1 � � tnXn��
� ��� �
nXj�1
tj� �Xj���� � O�jjtjj2� �6�2�2�
where O�jjtjj2� represents the remainder of the Taylor expansion, all terms for which jkj 2.
By taking the derivative of this relation with respect to the vector fields �Xi and discarding the
higher order terms one obtains the approximation
�Xi�exp�t1X1 � � tnXn��
� �Xi��� �nXj�1
� �Xi�Xj����tj� �6�2�3�
x6-2 Newton-Raphson Algorithm on Lie-Groups 165
Define the Jacobi matrix of to be the n� n matrix with �i� j�’th element
�J��ij � � �Xi�Xj���� �6�2�4�
which is dependent on the choice of basisX1� � � � � Xn for TeG. Define the two column vectors
t � �t1� � � � � tn�T and ���� � � �X1���� � � � � �Xn����
T . Recalling the discussion of the
Newton-Raphson method on Rn it is natural to consider the following iteration defined for
� � G
t � ��J���1����
�� � exp�t1X1 � � tnXn���
The motivation for considering this algorithm parallels that given above for the Newton-
Raphson method on Rn. If � is a critical point of then �Xi��� � 0 for each �Xi. Thus
assuming that exp�t1X1 � � tnXn�� � � and then solving the approximate relation for
�t1� � � � � tn� gives a new estimate �� � exp�t1X1 � � tnXn��. It follows that if � was a
good estimate of � then the difference between and the approximate Taylor expansion should
be of order O�jjtjj2� and consequently the new estimate �� will be a correspondingly better
estimate of �. Given an initial point �0 � G and a choice of n basis elements fX1� � � � � Xngfor TeG the Newton-Raphson algorithm on G is:
Algorithm 6.2.4 [Newton-Raphson Algorithm on a Lie-group G]
Given �k � G compute ���k�.
Compute the Jacobi matrix �J�k� given by �J�k�ij � � �Xi�Xj���k�.
Set t � ��J�k��1���k�.
Set �k�1 � exp�t1X1 � � tnXn��k.
Set k � k � 1 and repeat.�
Lemma 6.2.5 Given G a Lie-group and � C��G� an analytic real function with a critical
point� � G, let � � G be arbitrary and define f : Rn � G, f�t� :� exp�t1X1� � tnXn��
to be canonical coordinates of the first kind on G centred at � (Varadarajan 1984, pg. 88).
Define a smooth vector field Z on Rn by Z�t� � � �X1�f�t��� � � � � �Xn�f�t���. An iteration
of the Newton-Raphson algorithm (Algorithm 6.2.4) on G with initial condition � is the image
166 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
of a single iteration of the Newton-Raphson algorithm (Algorithm 6.2.1) on Rn with initial
condition 0 via the canonical coordinates f .
Proof Observe that
Z�0� � � �X1�f�0��� � � � � �Xn�f�0���
� �����
Also for 1 � i� j � n one finds
�
�tiZj
����t�0
��
�ti�Xj�f�t��
����t�0
��
�ti� �Xj � r���exp�tiXi��
����ti�0
� �Xi�Xj����
since ddrg�exp�rX��
���r�0
� �Xg�e� for any g � C��G� and X � g. Thus, the two Jacobi
matrices J0Z � J�� are equal. The Newton-Raphson algorithm onRn is just
t � 0� �J0Z��1Z�0� � ��J����1������
and the image of t is exactly �� � exp�t1X1 � � tnXn�� the Newton-Raphson algorithm
on G.
It is desirable to prove a similar result to Proposition 6.2.2 for the Newton-Raphson method
on a Lie-group. To compute the rate of convergence one needs to define a measure of distance
in a neighbourhood of the critical point considered. Let� � G be a neighbourhood of a critical
point � � G of an analytic function � C��G� and let fX1� � � � � Xng be a basis for TeG as
above. There exists a subset U � � such that the canonical coordinates of the first kind on
G centred at �, �t1� � � � � tn� � exp�t1X1 � � tnXn�� are a local diffeomorphism onto U
(Helgason 1978, pg. 104). One defines distance withinU by the distance induced on canonical
coordinates centred at � by the Euclidean norm inRn,
jj exp�t1X1 � � tnXn��jj ��
nXi�1
�ti�2
� 12
�
x6-2 Newton-Raphson Algorithm on Lie-Groups 167
Lemma 6.2.6 Given � C��G� an analytic real function on a Lie-group G let � � G be a
critical point of . There exists a neighbourhoodW � G of p and a constantC � 0 such that
the Newton-Raphson algorithm on G (Algorithm 6.2.4) converges to � for any initial estimate
�0 � W and the error, measured with respect to distance induced by canonical coordinates of
the first kind, decreases quadratically
jj�k�1 � �jj � Cjj�k � �jj2�
Proof LetU1 � Rn be an open neighbourhood of 0 inRn and define a smooth vector byZ�x� �
� �X1�f�x��� � � � � �Xn�f�x���, where f : Rn � G, f�x� :� exp�x1X1 � � xnXn�� are
canonical coordinates of the first kind. Since � is a critical point of then 0 is a stationary
point of Z, i.e. Z�0� � 0. Applying Proposition 6.2.2 one obtains an open neighbourhood
U2 � U1 of 0 and a constant C1 such that the Newton-Raphson algorithm on Rn (Algorithm
6.2.1) converges quadratically to zero for any initial condition in U2.
A standard result concerning the exponential of the sum of two elements of a Lie-algebra
is (Helgason 1978, pg. 106)
exp�X� exp�Y � � exp��X � Y � � O�jjX jj jjY jj���
for X and Y sufficiently small. By this one means there exists an open set � � g containing
0 and a real number C2 � 0 such that for X , Y � � � g there exists Z� � g such that
exp�Z� � exp�X� exp�Y � and jjZjj � C2jjX jj jjY jj. Of course Rn � g via the isomorphism
x � x1X1 � � xnXn and � corresponds to an open set U3 � Rn. Let r � 0 such that the
open ball Br � fx � Rn j jjxjj � rg is fully contained in U3 � U2.
Let
U �
�x � Rn j x � B� � � � minfr�2�
14�C1 � C2�
g��
and define W � f�U� � G as the image of U via the canonical coordinates of the first kind
centred at �.
The proof proceeds by induction. Assume �k � W and qk � U such that f�qk� � �k. Let
t�k�1 denote the next iterate of the Newton-Raphson algorithm on Rn (Algorithm 6.2.1) with
168 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
initial condition qk . From above one has
jjt�k�1jj � C1jjqkjj2 � 12jjqkjj�
where the first inequality follows from the fact that U � U1 and the second follows from
the fact that jjqkjj � 14�C1�C2�
. Define tk�1 � t�k�1 � qk and observe that the affine change
of basis x � x � qk � x� preserves the form of the Newton-Raphson algorithm (Algorithm
6.2.1) applied to the transformed vector field Z��x�� � Z�x� � qk� � Z�x�. Thus, tk�1 is the
next iterate of the Newton-Raphson algorithm (Algorithm 6.2.1) for the vector field Z� and the
initial condition 0. Moreover
x� � exp��x��1X1 � � �x��nXn��k�
are the canonical coordinates of the first kind centred at �k. In particular, applying Lemma
6.2.5, it follows that the next iterate of the Newton-Raphson algorithm on G (Algorithm 6.2.4)
�k�1 is
�k�1 � exp�t1k�1X1 � � tnk�1Xn��k�
Substituting �k � exp�q1kX1 � � qnkXn�� one has
�k�1 � exp�nXi�1
tik�1Xi� exp�nXi�1
qikXi���
But
jjtk�1jj � jjt�k�1jj� jjqkjj � 2jjqkjj � 2Br 2 � U3
and thus there exists qk�1 such that
exp�nXi�1
qik�1Xi� � exp�nXi�1
tik�1Xi� exp�nXi�1
qikXi��
jjqk�1 � �tk�1 � qk�jj � jjqk�1 � t�k�1jj� C2jjqkjjjjtk�1jj � 2C2jjqkjj2�
By construction �k�1 � exp�Pn
i�1 qik�1Xi�� and
x6-3 Coordinate Free Newton-Raphson Methods 169
jjqk�1jj � jjqk�1 � t�k�1jj� jjt�k�1jj� 2C2jjqkjj2 � C1jjqkjj2�
To see that the sequence qk�1 does in fact converge to zero one observes that jjqk�1jj � 12 jjqkjj
since jjqkjj � 14�C1�C2�
. Observing that qk�1 is just the representation of the next iterate �k�1
of the Newton-Raphson algorithm on G (Algorithm 6.2.4) in local coordinates, one has
jj�k�1 � �jj � Cjj�k � �jj2�
where C � 2C2 � C1 and the proof is complete.
Remark 6.2.7 An interesting observation is that though each single iteration of the Newton-
Raphson algorithm (Algorithm 6.2.4) on G is equivalent to an iteration of the Euclidean
Newton-Raphson algorithm (Algorithm 6.2.1) in a certain set of local coordinates this is not
true of multiple iterations of the algorithm and the same coordinate chart. �
6.3 Coordinate Free Newton-Raphson Methods
The construction presented in the previous section for computing the Newton-Raphson method
on a Lie-group G depends on the construction of the Jacobi matrix J� (cf. (6.2.4)) which is
explicitly defined in terms of an arbitrary choice of n basis vectors fX1� � � � � Xng for TeG. In
this section the Newton-Raphson algorithm on an arbitrary Lie-group equipped with a right
invariant Riemannian metric is formulated as a coordinate free manner iteration.
LetG be a Lie-group with an inner product ge�� � defined on TeG. Denote the right invari-
ant group metric that ge generates on G by g (cf. Section 5.3). Choose a basis fX1� � � � � Xngfor TeG which is orthonormal with respect to the inner product ge�� �, (i.e. ge�Xi� Xj� � �ij ,
where �ij is the Kronecker delta function, �ij � 0 unless i � j in which case �ij � 1). Define
the right invariant vector fields
�Xi � dr�Xi�
170 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
associated with the basis vectors fX1� � � � � Xng. Since the basis fX1� � � � � Xng was chosen to
be orthonormal it follows that the decomposition of an arbitrary smooth vector fieldZ � D�G�,can be written
Z �nXj�1
zj �Xj �nXj�1
g� �Xj� Z� �Xj�
In particular, let � C��G� be an analytic real map on G and grad be defined with respect
to the metric g (cf. Section 5.4)
grad �nXj�1
g� �Xj� grad� �Xj �nXj�1
� �Xj� �Xj � �6�3�1�
Let t � �t1� � � � � tn� � Rn and define the vector field �X � D�G� by �X �Pn
j�1 tj�Xj which
is the right invariant vector field associated with the unique element X �Pn
j�1 tjXj � TeG.
Observe thatPn
j�1� �Xj����tj � � �X���� and consequently post-multiplying (6.2.3) by �Xi
and summing over i � 1� � � � � n one obtains the approximation
nXi�1
� �Xi��exp�X��� �Xi �nXi�1
� �Xi���� �Xi�nXi�1
�� �Xi
nXj�1
�tj �Xj����
�A �Xi
� grad��� � grad� �X�����
Now assuming that exp�X�� is a critical point of then computing the regression vector for
the Newton-Raphson algorithm is equivalent to solving the coordinate free equation
0 � grad��� � grad� �X����� �6�3�2�
for the vector field �X (or equivalently the tangent vector X � TeG that uniquely defines �X). In
Algorithm 6.2.4 the choice of fX1� � � � � Xng was arbitrary and it follows that solving directly
for �X using (6.3.2) is equivalent to setting X � t1X1 � � tnXn, where t � �t1� � � � � tn�
is the error estimate t � ��J������. Given an initial point �0 � G the Newton-Raphson
algorithm on a Lie-group G can be written in a coordinate free form as:
x6-3 Coordinate Free Newton-Raphson Methods 171
Algorithm 6.3.1 [Coordinate Free Newton-Raphson Algorithm]
Find Xk � TeG such that �Xk � dr�Xk solves
0 � grad��k� � grad� �Xk���k��
Set �k�1 � exp�Xk��k.
Set k � k � 1 and repeat.�
To compute grad� �X� one may use the identity
g�grad� �X�� �Y ���� � �Y �X���
��2
�t1�t2�exp�t1Y ��exp�t2X���
�����t1�t2�0
�
where �Y is an arbitrary right invariant vector field. Explicitly computing the derivatives on the
right hand side for arbitrary �Y completely determines grad� �X� since the metric g is positive
definite. An example of the nature of this computation is given in the next section.
Remark 6.3.2 Without the insight provided by the Taylor expansion (Proposition 6.2.3) one
may guess the Newton-Raphson algorithm would be given by solving
0 � grad�r �Xgrad
for the right invariant vector field �X, where r is the Levi-Civita connection. However,
r �Xgrad �� grad� �X� except in particular cases. Let �Y � D�G� be an arbitrary right invariant
vector field, then from (5.7.9), one hasr �Xg�grad� �Y � � g�r �Xgrad� �Y �� g�grad�r �X�Y �.
Nowr �Xg�grad� �Y � � r �X�Y � �X �Y whiler �X
�Y � r�Y�X�� �X� �Y � since the Levi-Civita
connection is symmetric. Thus, one obtains
0 � � �X �Y � g�r �Xgrad� �Y � � g�grad�r�Y�X� � g�grad� � �X� �Y ��
� � �Y �X� g�r �Xgrad� �Y � � g�grad�r�Y�X�
172 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
and consequently
g�grad� �X�� �Y � � g�r �Xgrad� �Y � � g�grad�r�Y�X��
The value of r�Y�X is given by the unique bilinear map associated with the right invari-
ant affine connection r (cf. Section 5.8). One has r�Xgrad � grad� �X� if and only if
g�grad�r�Y�X� � 0 for all �Y . The most likely situation for this to occur is when the bilinear
map associated with r is identically zero. For the examples considered in this thesis this will
not be true. �
6.4 Symmetric Eigenvalue Problem
In this section the general structure developed in the previous two sections is used to derive a
coordinate free Newton-Raphson method for the symmetric eigenvalue problem. An advantage
of considering the symmetric eigenvalue problem is that one can compare the Newton-Raphson
algorithm to classical methods such as the shifted QR algorithm. This provides a good
perspective on the performance of the Newton-Raphson algorithm, however, I stress that the
method is not proposed as competition to state of the art numerical linear algebra methods for
solving the classical symmetric eigenvalue problem. Rather, the focus is still on adaptive and
on-line applications.
Recall the constrained optimization problem that was posed in Chapter 2 for computing
the spectral decomposition of a matrix H . It was shown that minimising the functional
��H� :� jjH �Djj2
� jjH jj2 � jjDjj2 � 2tr�DH��
on the set4
M�H0� � fUH0UT j U � O�N�g� H0 � HT
0 � �6�4�1�
where D � diag��1� � � � � �N� (a diagonal matrix with independent eigenvalues) is equivalent
4The original definition (2.1.2) is slightly different M�H0� � fUTH0U j U � O�N�g to the definition usedhere. The mapU � UTH0U , however, is not a group action and the definition given above is equivalent to (2.1.2).
x6-4 Symmetric Eigenvalue Problem 173
to computing the eigenvalues of H (Brockett 1988, Helmke & Moore 1994b). To apply the
theory developed in the previous section one must reformulate this optimization problem on
O�N� the Lie-group associated with the homogeneous space M�H0�. The new optimization
problem that considered is:
Problem A Let H0 � HT0 � S�N� be a symmetric matrix and let D � diag��1� � � � � �N � be
a diagonal matrix with real eigenvalues �1 � � � � � �N . Consider the potential
: O�N�� R�
�U� :� �tr�DUH0UT ��
Find an orthogonal matrix which minimises over O�N�. �
It is easily seen that if one computes a minimum U� of Problem A then U�H0UT� is a
minimum of �. Recalling Section 5.4 one easily verifies that the minimising gradient flow
solutions to Problem A will map via the group action to the minimisinggradient flow associated
with � (Helmke & Moore 1994b, pg. 50).
Computing a single iteration of the Newton-Raphson method (Algorithm 6.3.1) relies on
computing both grad and grad� �X� for an arbitrary right invariant vector field �X . Recall the
discussion in Section 6.1 regarding the Riemannian geometry of O�N�.
Lemma 6.4.1 Let H0 � HT0 � S�N� be a symmetric matrix and let D � diag��1� � � � � �N�
be a diagonal matrix with real eigenvalues �1 � � � � � �N and let
: O�N�� R�
�U� :� �tr�DUH0UT ��
Express the tangent spaces of O�N� by (6.1.1) and consider the right invariant group metric
(6.1.2).
a) The gradient of on O�N� is
grad � ��UH0UT � D�U�
174 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
b) Let X � Sk�N� be arbitrary and set �X � XU � drUX the right invariant vector field
on O�N� generated by X . The gradient of �X on O�N� is
grad� �X� � ��X�D�� UH0UT �U�
Proof Recall the definition of gradient (5.4.1) and (5.4.2). The Frechet derivative of in a
direction �U � TUO�N� is
DjU ��U� � �tr�D�UH0UT �� tr�DUH0��U�
T�
� tr���D�UH0UT �T�� � g���D�UH0U
T �U��U��
Observing that ��D�UH0UT �U � TUO�N� completes the proof of part a).
For part b) observe that
�X � DjU �XU�
� tr��X�D�UH0UT ��
Taking a second derivative of this in an arbitrary direction �U one obtains
Dtr��X�D�UH0UT ����U��U� � tr��UH0U
T � �X�D����
� g���X�D�� UH0UT �U��U��
and thus grad� �X� � ��X�D�� UH0UT �U .
Recall the equation for the coordinate free Newton-Raphson method (6.3.2). Rewriting
this in terms of the expression derived in Lemma 6.4.1 gives the algebraic equation
0 � ��UH0UT � D�U � ��X�D�� UH0U
T �U
which one wishes to solve for X � Sk�N�.
Remark 6.4.2 To see that a solution to this equation exists observe that given a general linear
solutionX � Rn�n (which always exists since the equation is a linear systems of N 2 equations
x6-4 Symmetric Eigenvalue Problem 175
in N 2 unknowns) then
����XT�� D�� UH0UT � � ���D�X �T � UH0U
T �
� ���X�D�� UH0UT �T
� ��UH0UT � D�T � �UH0U
T � D��
Thus, �XT is also a solution and by linearity so is �X�XT �2 . The question of uniqueness for
the solution X � Sk�N� obtained is unclear. In the case where UH0UT � is diagonal
with distinct eigenvalues it is clear that ��X�D��� � 0 �� �X�D� � 0 �� X � 0 and the
solution is unique. As a consequence a genericity assumption on the eigenvalues of H0 would
need to be made to obtain a general uniqueness result. I expect that once such an assumption
is made on the eigenvalues of H0 the skew solution of the linear system would be unique.
Unfortunately I have no proof for this result at the present time. �
Given an initial matrix H0 and choosing U0 � In then the Newton-Raphson solution to
Problem A is:
Algorithm 6.4.3 [Newton-Raphson Algorithm for Spectral Decomposition]
Find Xk � Sk�N� such that
��Xk� D�� UkH0UTk � � �UkH0U
Tk � D�� �6�4�2�
Set Uk�1 � eXkUk , where eXk is the matrix exponential of Xk.
Set k � k � 1 and repeat.�
Remark 6.4.4 To solve (6.4.2) one can reformulate the matrix system of linear equations as a
constrained vector linear system. Denote by vec�A� the vector generated by taking the columns
of A � Rl�m (for l and m arbitrary integers) one on top of the other. Taking the vec of both
sides of (6.4.2) gives5
��DUH0U
T �T � IN � �UH0UT ��D �D � �UH0U
T � � IN � �UH0UTD�
�vec�Xk�
� vec��UH0UT � D��� (6.4.3)
5Let A, B and C be real N N matrices and let Aij denote the ij’th entry of the matrix A. The Kronecker
176 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
Iteration
kD
ista
nce
|| H
– D
||
Gradient Descent
Newton–Raphson
Figure 6.4.1: Plot of jjHk �Djj where Hk � UkH0UT
kand Uk is a solution to both (6.4.4)
and Algorithm 6.4.2. The eigenvalues ofH0 are chosen to be ��1� � � � � �N � the eigenvalues ofD thoughH0 is not diagonal. Thus, the minimum Euclidean distance between Hk �M �H0�and D is zero. By plotting the Euclidean norm distance jjHk�Djj on a logarithmic scale thequadratic convergence characteristics of Algorithm 6.4.2 are displayed.
The constraint Xk � Sk�N� can be written as a vector equation
�IN 2 � P �vec�Xk� � 0�
where P is the N2 �N2 permutation matrix such that vec�A� � Pvec�AT �, A � RN�N .
In practice, it is known that a skew symmetric solution to (6.4.3) exists and one proceeds
by extracting the 12N�N� 1�� 1
2N�N� 1� submatrix of the N 2�N2 Kronecker product and
using Gaussian elimination to solve for the free variables Xij , i � j. �
Of course a Newton-Raphson algorithm cannot be expected to converge globally in O�N�
and for arbitrary choice of H0 one must couple the Newton-Raphson algorithm with some
other globally convergent method to obtain a practical numerical method. In the following
simulations the associated orthogonal iteration described in Section 2.5 is used. In fact the
product of two matrices is defined by
A�B �
�B� A11B � � � A1nB...
...An1B � � � AnnB
�CA � RN2�N2
�
A readily verified identity relating the vec operation and Kronecker products is (Helmke & Moore 1994b, pg. 314)
vec�ABC� � �CT � A�vecB�
x6-4 Symmetric Eigenvalue Problem 177
algorithm implemented is a slight variation of (2.5.1)
Uk�1 � e��k�UkH0UTk�D�Uk � �6�4�4�
where the modification is due to the new definition (6.4.1) of M�H0�. The step size selection
method used is that given in Lemma 2.2.4
k �1
2jj�Hk� D�jj log�jj�Hk� D�jj2
jjH0jj jj�D� �Hk� D��jj � 1��
where Hk � UkH0UTk and Uk is a solution to (6.4.4). The minor difference between (6.4.4)
and the associated orthogonal double-bracket algorithm (2.5.1) does not affect the convergence
results proved in Chapter 1. It follows that (6.4.4) is globally convergent to an orthogonal
matrix U� such that U�H0UT� is a diagonal matrix with diagonal entries in descending order.
Figure 6.4.1 is an example of (6.4.4) combined with the Newton-Raphson algorithm 6.4.3.
The aim of the simulation is to display the quadratic convergence behaviour of the Newton-
Raphson algorithm. The initial condition used was generated via a random orthogonal congru-
ency transformation of the matrix D � diag�1� 2� 3�,
H0 �
�BBBB�2�1974 �0�8465 �0�2401
�0�8465 2�0890 �0�4016
�0�2401 �0�4016 1�7136
�CCCCA �
Thus, the eigenvalues ofH0 are 1, 2 and 3 and the minimum distance between D andM�H0� is
zero. In Figure 6.4.1 the distance jjHk�Djj is plotted for Hk � UkH0UTk and Uk is a solution
to both (6.4.4) and Algorithm 6.4.2. In this example the modified gradient descent method
(6.4.4) was used for the first six iterations and the Newton-Raphson algorithm was used for the
remaining three iterations. The plot of jjHk � Djj measures the absolute Euclidean distance
betweenHk andD. Naturally, there is some distortion involved in measuring distance along the
surface of M�H0�, however for limiting behaviour, jjHk �Djj is a reasonable approximation
of distance measured along M�H0�. The distance jjHk � Djj is expressed on a log scale to
show the linear and quadratic convergence behaviour. In particular, the quadratic convergence
behaviour of the Newton-Raphson algorithm is displayed by iterations seven, eight and nine in
Figure 6.4.1.
178 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
Iteration �Hk�21 �Hk�31 �Hk�41 �Hk�32 �Hk�42 �Hk�43
0 2 zero zero 4 zero 61 1.6817 3.2344 0.86492 1.6142 2.5755 0.00063 1.6245 1.6965 10�13
4 1.6245 0.0150 converg.5 1.5117 10�9
6 1.1195 converg.7 0.70718 converg.
Table 6.4.1: The evolution of the lower off-diagonal entries of the shiftedQRmethod describedby Golub and Van Loan (1989, Algorithm 8.2.3., pg. 423). The initial condition used is H �
0(6.4.5).
To provide a comparison of the coordinate free Newton-Raphson method to classical
algorithms the following simulation is completed for both the Newton-Raphson algorithm and
the shifted QR algorithm (Golub & Van Loan 1989, Section 8.2). The example chosen is
taken from page 424 of Golub and Van Loan (1989) and rather than simulate the symmetric
QR algorithm again the results used are taken directly from the book. The initial condition
considered is the tridiagonal matrix
H�0 �
�BBBBBBB�
1 2 0 0
2 3 4 0
0 4 5 6
0 0 6 7
�CCCCCCCA� �6�4�5�
To display the convergence properties of the QR algorithm (Golub & Van Loan 1989) give
a table in which they list the the values of the off-diagonal elements of each iterate generated
for the example considered. This table is included (in a slightly modified format) as Table 6.4.
Each element �Hk�ij is said to have converged when it has norm of order 10�12 or smaller.
The initial condition H �0 is tridiagonal and the QR algorithm preserves tridiagonal structure
and consequently the elements �Hk�31, �Hk�41 and �Hk�42 remain zero for all iterates. The
convergence behaviour of the symmetric QR algorithm is cubic in successive off-diagonal
entries. Thus, �Hk�43 converges cubically to zero, then �Hk�32 converges cubically and so on
(Wilkinson 1968). The algorithm as a whole, however, does not converge cubically since each
x6-4 Symmetric Eigenvalue Problem 179
off-diagonal entry must converge in turn.
It is interesting to also display the results in a graphical format (Figure 6.4.2). Here the
norm jjHk � diag�Hk�jj
jjHk � diag�Hk�jj ���Hk�
221 � �Hk�
231 � �Hk�
241 � �Hk�
232 � �Hk�
242 � �Hk�
243
� 12�
is plotted verses iteration. This would seem to be an important quantity which indicates
robustness and stability margins of the numerical methods considered when the values of Hk
are uncertain or subject to noise in an on-line or adaptive environment. The dotted line show
the behaviour of the QR algorithm. The plot displays the property of the QR algorithm that it
must be run to completion to obtain a solution.
Figure 6.4.2 also shows the plot of jjHk� diag�Hk�jj for a sequence generated initially by
the modified gradient descent algorithm (6.4.4) (the first five iterations) and then the Newton-
Raphson algorithm (for the remaining three iterations). Since the aim of this simulation is to
show the potential of Newton-Raphson algorithm the parameters were optimized to provide
good convergence properties. The step-size for (6.4.4) was chosen as a constant k � 0�1
which is somewhat larger than the variable step-size used in the first simulation. This ensures
slightly faster convergence in this example, although in general there are initial conditions H0
for which the modified gradient descent algorithm is unstable with step-size selection fixed at
0.1. The point at which the modified gradient descent algorithm was halted and the Newton-
Raphson algorithm was begun was also chosen by experiment. Note that the Newton-Raphson
method acts directly to decrease the cost jjHk � diag�Hk�jj, at least in a local neighbourhood
of the critical point. It is this aspect of the algorithm that suggests it would be useful in an
on-line or adaptive environment.
Remark 6.4.5 It is interesting to note that in this example the combination of the modified gra-
dient descent algorithm (6.4.4) and the Newton-Raphson method (Algorithm 6.4.2) converges
in the same number of iterations as the QR algorithm. �
180 Numerical Optimization on Lie-Groups and Homogeneous Spaces Chapter 6
Iteration
kk
|| H
–
diag
( H
)
||
Gradient descentNewton–Raphson
QR algortihm
Figure 6.4.2: A comparison of jjHk�diag�Hk�jjwhereHk is a solution to the symmetricQRalgorithm (dotted line) and Hk � UkH0U
T
kfor Uk a solution to both (6.4.4) and Algorithm
6.4.2 (solid line). The initial condition is H �
0 (6.4.5).
Iteration �Hk�21 �Hk�31 �Hk�41 �Hk�32 �Hk�42 �Hk�43
0 2 0 0 4 0 61 2.5709 -0.0117 -0.0233 4.9252 -0.4733 4.07172 3.7163 -0.2994 0.2498 4.3369 -0.2838 1.47983 4.7566 -0.7252 -0.1088 2.5257 -0.0176 0.86434 1.1572 -0.2222 -0.8584 1.1514 -0.1216 0.28225 -0.0690 -0.0362 0.0199 -0.1112 0.0649 0.00756 0.0011 10�6 10�5 10�5 10�6 0.00117 converg. 10�9 10�10 10�10 10�9 10�11
8 converg. converg. converg. converg. converg.
Table 6.4.2: The evolution of the lower off-diagonal entries of Hk � UkH0UT
kwhere Uk is a
solution to Algorithm 6.4.2. The initial condition is H �
0 (6.4.5).
6.5 Open Questions and Further Work
There are several issues that have not been resolved in the present chapter. In Section 6.1 it
is concluded that the modified gradient descent algorithms proposed in Chapters 2 to 4 can
be interpreted as geodesic interpolations of gradient algorithms. This provides a template for
generating numerical algorithms that solve optimization problems on homogeneous spaces
which have Lie transformation group O�N�, however things are somewhat more complicated
if one considers general matrix Lie-groups. Certainly, it is a simple matter to derive exponential
interpolation algorithms based on the same ideas and it would be interesting to investigate the
relationship between exponential and geodesic interpolations for GL�N�R�.
The full Newton-Raphson algorithm could also benefit from further study. In particular,
x6-5 Open Questions and Further Work 181
issues relating to rank degeneracy in the Jacobi matrix need to be addressed. These issues are
important since many relevant optimization problems are defined on a homogeneous space of
lower dimension than its Lie transformation group. In this situation the lifted potential on the
Lie-group will certainly have level sets of non-zero dimension and there will be directions in
which the Jacobi matrix is degenerate. This issue is related to the difficulties encountered in
determining whether a unique solution exists (6.4.2).
Once the Newton-Raphson method on a Lie-group is fully understood it should be a simple
matter to generalize the theory to an arbitrary homogeneous space. If there is an associated drop
in dimension this may result in computational advantages and the development of algorithms
that do not suffer from the degeneracy problems discussed above.
It is also interesting to consider the computational cost of the Newton-Raphson method
relative to classical algorithms such as the QR method. One would hope that the total
computational cost of a single step of the Newton-Raphson method would be comparable
to taht of a step of the QR method, especially if the matrix linear systems can be solved using
parallel algorithms.
The relationship between the modified gradient descent algorithms, the Newton-Raphson
algorithm and modern integration techniques that preserve a Hamiltonian function (Sanz-Serna
1991, Stuart & Humphries 1994) is worth investigating. It is hoped that the insights provided
by Hamiltonian integration techniques along with the perspective given by the present work can
be combined to design efficient optimization methods that preserve homogeneous constraints.
Chapter 7
Conclusion
7.1 Overview
The following summary outlines the contributions of this thesis.
Chapter 2: Two numerical algorithms are proposed for the related tasks of estimating the
eigenvalues of a symmetric matrix and estimating the singular values of an arbitrary matrix.
Associated algorithms which compute the eigenvectors and singular vectors associated with
the spectral decomposition of a matrix are also presented. The algorithms are based on gradient
descent methods and evolve explicitly on a homogeneous constraint set. Step-size selection
criteria are developed which ensure good numerical properties and strong stability results are
proved for the proposed algorithms. To reduce computational cost on conventional machines
a Pade approximation of the matrix exponential is proposed which also explicitly preserves
the homogeneous constraint. An indication is given of the manner in which a time-varying
symmetric eigenvalue problem could be solved using the proposed algorithms.
Chapter 3: The problem of principal component analysis of a symmetric matrix is considered
as a smooth optimization problem on a homogeneous space. A solution in terms of the limiting
solution of a gradient dynamical system is proposed. It is shown that solutions to the dynamical
system considered do indeed converge to the desired limit for almost all initial conditions.
A modified gradient descent algorithm, based on the gradient dynamical system solution,
182
x7-1 Overview 183
is proposed which explicitly preserves the homogeneous constraint set. A step-size selection
scheme is given along with a stability analysis that shows the numerical algorithm proposed
converges for almost all initial conditions.
Comparisons are made between the proposed algorithm and classical methods. It is shown
that in the rank-1 case the modified gradient descent algorithm is equivalent to the classical
power method and steepest ascent method for computing a single dominant eigenvector of
a symmetric matrix. However this equivalence does not hold for higher dimension power
methods and orthogonal iterations.
Chapter 4: The problems of system assignment and pole placement are considered for the set
of symmetric linear state space systems. A major contribution of the chapter is the observation
that the additional structure inherent in symmetric linear systems forces the solution to the
“classical” pole placement question to be considerably different to that expected based on
intuition obtained from the general linear case. In particular, generic pole placement can not
be achieved unless the system considered has as many inputs (and outputs) as states.
To compute feedback gains which assign poles as close as possible to desired poles (in a
least squares sense) a number of ordinary differential equations are proposed. By computing
the limiting solution to these equations for arbitrary initial conditions, estimates of locally
optimal feedback gains are obtained. A gradient descent numerical method, based on the
dynamical systems developed, is presented along with a step-size selection scheme and full
stability analysis.
Chapter 5: A review of the mathematical theory underlying the numerical methods proposed
in Chapters 2 to 4 is given. A brief review of Lie-groups and homogeneous spaces is given,
especially the class of homogeneous space which is most common in linear systems theory,
orbits of semi-algebraic Lie-groups. A detailed discussion of Riemannian metrics on Lie-
groups and homogeneous spaces is provided along with the motivation for the choice of the
metrics used elsewhere in this thesis. The derivation of gradient flows and the relationship
between gradient flows on a homogeneous space and its Lie transformation group is covered.
The convergence properties of gradient flows are discussed and a theorem is proved which is
useful for proving convergence of gradient flows in many practical situations.
The remainder of the chapter works towards developing a practical understanding of geo-
184 Conclusion Chapter 7
desics on Lie-groups and homogeneous spaces. The theory of Lie-algebras is discussed and
the exponential map is introduced. Affine connections are discussed and the the Levi-Civita
connection associated with a given Riemannian metric is introduced. Geodesics are defined
and the theory of right invariant affine connections is used to derive conditions under which the
exponential map generates a geodesic curve on a Lie-group. Finally, it shown that geodesics
on a Lie-group G are related to geodesics on a homogeneous space (with Lie transformation
group G) via the group action.
Chapter 6: The numerical algorithms proposed in Chapters 2 to 4 are reconsidered in the
context of the theory developed in Chapter 5. The proposed algorithms are seen to be specific
examples of general gradient descent methods using geodesic interpolation.
The remainder of the chapter is devoted to developing a Newton-Raphson algorithm which
evolves explicitly on an arbitrary Lie-group. The iteration is derived in canonical coordinates
of the first kind and then generalised into a coordinate free form. A theorem is given proving
quadratic convergence in a local neighbourhood of a critical point. An explicit Newton-
Raphson algorithm, based on the general theory developed, is derived for the symmetric
eigenvalue problem.
7.2 Conclusion
A primary motivation for considering the problems posed in this thesis is the recognition of
the advantages of numerical algorithms that exploit the natural geometry of the constrained
optimization problem that they attempt to solve. This idea is especially important for on-line
and adaptive engineering applications where the properties of simplicity, global convergence
and constraint stability (cf. page 2) become the principal goals.
The starting point for the new results proposed in this work is the consideration of con-
strained optimization problems on homogeneous spaces and Lie-groups. The regular geometry
associated with these sets is suitable for the constructions necessary to develop practical numer-
ical methods. Moreover, there are numerous examples of constrained optimization problems
arising in linear systems theory where the constraint set is a homogeneous space or Lie-group.
The early results presented in Chapters 2 to 4 of this thesis do not rely heavily on abstract
x7-2 Conclusion 185
Lie theory. Nevertheless, the algorithms proposed are specific examples of a more general
construction outlined in Chapter 6. For any smooth optimization problem on an orbit (cf.
Section 5.2) of the orthogonal group O�N� this construction can be summarised as follows:
1. Given a smooth homogeneous spaceM embedded in Euclidean space with transitive Lie
transformation group O�N� and group action : O�N��M �M , let : M � R be
a smooth cost function.
2. Equip O�N� with the right invariant group metric induced by the Euclidean metric
acting on the identity tangent space. I.e. for �1, �2 � Sk�N�, U � O�N� then �1U ,
�2U � TUO�N� and
h�1U��2Ui � tr��T1 �2��
3. Fix q �M and define the lifted potential
� : O�N�� R
��U� :� � �U� q���
4. Compute the gradient descent flow
�U � �grad��U��
on O�N� with respect to the metric h� i.
5. The modified gradient descent algorithm for � using geodesic interpolation on O�N� is
Uk�1 � e��skgrad��Uk�UTk�Uk�
where sk � 0 is a small positive number.
6. The modified gradient descent algorithm using geodesic interpolation for on M is
pk � �Uk� q��
186 Conclusion Chapter 7
7. Determine a step-size selection scheme f : M � R,
sk :� f�pk��
which guarantees �pk�1� � �pk� except in the case where pk is a critical point of .
In Chapter 6 a general construction is outlined for computing a Newton-Raphson algorithm
on a Lie-group (Algorithm 6.3.1). The advantage of this construction is that the algorithm
generated converges quadratically in a neighbourhood of the desired equilibrium. There are
two main disadvantages; firstly, convergence can only be guaranteed in a local neighbourhood
of the equilibrium and secondly, the theoretical construction is complicated and relies on
abstract geometric construction.
A comparison between the gradient descent algorithm and the Newton-Raphson algorithm
nicely displays the trade off between a simple linearly convergent numerical method associated
with strong convergence theory and a numerical method designed to converge quadratically
(or better) and associated with weaker convergence theory. The stability and robustness of the
first approach suggests that it would be of use in on-line and adaptive engineering applications
where reliability is more important than computational cost of implementation. The second
approach may also have applications in adaptive processes where accurate estimates are needed
and the uncertainties are small.
Bibliography
Ammar, G. & Martin, C. (1986). The geometry of matrix eigenvalue methods,Acta Applicandae
Mathematicae 5: 239–278.
Anderson, B. D. O. & Moore, J. B. (1971). Linear Optimal Control, Electrical Engineering
Network Series, Prentice-Hall Inc., Englewood Cliffs, N.J., U.S.A.
Anderson, B. D. O. & Moore, J. B. (1990). Optimal Control: Linear Qudratic Methods,
Prentice-Hall Inc., Englewood Cliffs, N.J., U.S.A.
Anderson, B. D. O. & Vongpanitlerd, S. (1973). Network Analysis and Synthesis: A Modern
Systems Theory Approach, Electrical Engineering, Prentice-Hall, Englewood Cliffs, N.J.,
U.S.A.
Aoki, M. (1971). Introduction to Optimization Techniques: Fundamentals and Applications of
Nonlinear Programming, Macmillan Co., New York, U.S.A.
Auchmity, G. (1991). Globally and rapidly convergent algorithms for symmetric eigenprob-
lems, SIAM Journal of Matrix Analysis and Applications 12(4): 690–706.
Baldi, P. & Hornik, K. (1989). Neural networks and principal component analysis: Learning
from examples without local minima, Neural Networks 2: 53–58.
Batterson, S. & Smillie, J. (1989). The dynamics of the Rayleigh quotient iteration, SIAM
Journal of Numerical Analysis 26(3): 624–636.
Batterson, S. & Smillie, J. (1990). Rayleigh quotient iteration for nonsymmetric matrices,
Math. Comp. 55(191): 169–178.
187
188 Bibliography
Bayer, D. A. & Lagarias, J. C. (1989). The nonlinear geometry of linear programming I, II,
Transactions of the American Mathematical Society 314: 499–580.
Bitmead, R. R. & Anderson, B. D. O. (1977). The matrix Cauchy index: properties and
applications, SIAM Journal of Applied Mathematics 33(4): 655–672.
Bloch, A. M. (1985a). A completely integrable Hamiltonian system associated with line fitting
in complex vector spaces, Bulletin of the American Mathematical Society 12(2): 250–254.
Bloch, A. M. (1985b). Estimation, principal components and Hamiltonian systems, Systems
and Control Letters 6: 103–108.
Bloch, A. M. (1990a). The Kaehler structure of the total least squares problem, Brockett’s
steepest descent equations and constrained flows, in A. C. M. R. M. A. Kaashoek, J.
H. van Schuppen (ed.), Realization and Modelling in System Theory, Birkhauser Verlag,
Boston.
Bloch, A. M. (1990b). Steepest descent, linear programming and Hamiltonian flows, Contem-
porary Math. 114: 77–88.
Bloch, A. M., Brockett, R. W. & Ratiu, T. (1990). A new formulation of the generalised Toda
lattice equations and their fixed point analysis via the momentum map, Bulletin American
Mathematical Society 23(2): 477–485.
Bloch, A. M., Brockett, R. W. & Ratiu, T. (1992). Completely integrable gradient flows,
Communications in Mathematical Physics 23: 447–456.
Bloch, A. M., Flaschka, H. & Ratui, T. (1990). A convexity theorem for isospectral sets of
Jacobi matrices in a compact Lie algebra, Duke Math. J. 61: 41–65.
Blondel, V. (1992). Simultaneous Stabilization of Linear Systems, PhD thesis, Faculte des
Sciences Appliquee, Universite Catholique de Louvain.
Blondel, V., Campion, G. & Gevers, M. (1993). A sufficient condition for simultaneous
stabilization, IEEE Transactions on Automatic Control 38: 1264–1266.
Bouland, H. & Karp, Y. (1989). Auto-association by multilayer perceptrons and the singular
value decomposition, Biological Cybernetics 59: 291–294.
Bibliography 189
Brockett, R. W. (1988). Dynamical systems that sort lists, diagonalise matrices and solve
linear programming problems, Proceedings IEEE Conference on Decision and Control,
pp. 799–803.
Brockett, R. W. (1989a). Least squares matching problems, Linear Algebra and its Applications
122-124: 761–777.
Brockett, R. W. (1989b). Smooth dynamical systems which realize arithmetical and logical
operations, Three Decades of Mathical Systems Theory, number 135 in Lecture Notes in
Control and Information Sciences, Springer-Verlag, pp. 19–30.
Brockett, R. W. (1991a). Dynamical systems that learn subspaces, in A. C. Antoulas (ed.),
Mathematical sytems theory - The Influence of Kalman.
Brockett, R. W. (1991b). Dynamical systems that sort lists, diagonalise matrices and solve
linear programming problems, Linear Algebra and its Applications 146: 79–91.
Brockett, R. W. (1993). Differential geometry and the design of gradient algorithms, Proceed-
ings of Symposia in Pure Mathematics, Vol. 54, pp. 69–92.
Brockett, R. W. & Byrnes, C. B. (1979). On the algebraic geometry of the output feedback pole
placement map, IEEE Conference on Decisions and Control, Fort Lauderdale, Florida,
U.S.A., pp. 754–757.
Brockett, R. W. & Byrnes, C. I. (1981). Multivariable Nyquist criteria, root loci and pole
placement: A geometric viewpoint, IEEE Transactions on Automatic Control 26(1): 271–
283.
Brockett, R. W. & Faybusovich, L. E. (1991). Toda flows, inverse spectral transform and
realisation theory, Systems and Control Letters 16: 79–88.
Brockett, R. W. & Krishnaprasad, P. S. (1980). A scaling theory for linear systems, IEEE
Transactions of Automatic Control 25: 197–207.
Brockett, R. W. & Wong, W. S. (1991). A gradient flow for the assignment problem, in
G. Conte, A. M. Perdon & B. Wyman (eds), Progress in System and Control Theory,
Birkhauser, pp. 170–177.
190 Bibliography
Broyden, C. G. (1970). The convergence of a class of double-rank minimization algorithms,
II: The new algorithm, Journal Institute of Mathematics and its Applications 6: 222–231.
Burrage, K. (1978). High order algebraically stable Runge-Kutta methods, B.I.T. 18: 373–383.
Burrage, K. & Butcher, J. C. (1979). Stability criteria for implicit Runge-Kutta methods, SIAM
Journal of Numerical Analysis 16(1): 46–57.
Butcher, J. (1987). The Numerical Analysis of Ordinary differential equations: Runge Kutter
and general linear methods, John Wiley and Sons, Chichester, U.K.
Butcher, J. C. (1975). A stability property of implicit Runge-Kutta methods, B.I.T. 15: 358–361.
Buurema, H. J. (1970). A geometric proof of convergence for the QR method, Phd. thesis,
Rijksuniversiteit Te Groningen.
Byrnes, C. I. (1978). On certain families of rational functions arising in dynamics, Proceedings
of the IEEE pp. 1002–1006.
Byrnes, C. I. (1983). On the stability of multivariable systems and the Ljusternik-snivel’mann
category of real Grassmanians, Systems and Control Letters 3: 255–262.
Byrnes, C. I. (1989). Pole placement by output feedback, Three Decades of Mathematical
Systems Theory, Vol. 135 of Lecture Notes in Control and Information Sciences, Springer-
Verlag, pp. 31–78.
Byrnes, C. I. & Martin, C. F. (eds) (1980). Algebraic and geometric methods in linear systems
theory, Vol. 18 of Lectures in Applied Mathematics, American Mathematical Society,
Providence, Rhode Island, U.S.A.
Byrnes, C. I. & Willems, J. C. (1986). Least-squares estimation, linear programming and
momentum: A geometric parametrization of local minima, IMA Journal of Mathematical
Control and Information 3: 103–118.
Byrnes, C. I., Hazewinkel, M., Martin, C. & Rouchaleau, Y. (1980). Introduction to geometrical
methods for the theory of linear systems, Geometrical Methods for the Theory of Linear
Systems, D. Reidel Publ. Comp. see also Reprint Series, 273, Erasmus University,
Rotterdam.
Bibliography 191
Cauchy, A. L. (1847). Methode generale pour la resolution des systems d’equationssimultanees,
Comptes Rendus Academie Science Paris XXV: 536–538.
Channell (1983). Symplectic integration algorithms, Technical Report 83-9, Los Alamos
National Laboratory.
Chu, M. T. (1984a). The generalized Toda flow, the QR-algorithm and the center manifold
theory, SIAM Journal Discrete Mathematics 5(2): 187–201.
Chu, M. T. (1984b). On the global convergence of the Toda lattice for real normal matrices
and its application to the eigenvalue problems, SIAM Journal Mathematical Analysis
15: 98–104.
Chu, M. T. (1986). A differential equation approach to the singular value decomposition of
bidiagonal matrices, Linear Algebra and its Applications 80: 71–80.
Chu, M. T. (1988). On the continuous realization of iterative processes, SIAM Review
30(3): 375–387.
Chu, M. T. (1991a). A continuous Jacobi-like approach to the simultaneous reduction of real
matrices, Linear Algebra and its Applications 147: 75–96.
Chu, M. T. (1991b). Least squares approximation by real normal matrices with specified
spectrum, SIAM Journal of Matrix Analysis and Applications 12(1): 115–127.
Chu, M. T. (1992). Numerical methods for inverse singular value problems, SIAM Journal on
Numerical Analysis 29(3): 885.
Chu, M. T. & Driessel, K. R. (1990). The projected gradient method for least squares matrix ap-
proximations with spectral constraints, SIAM Journal of Numerical Analysis 27(4): 1050–
1060.
Chu, M. T. & Driessel, K. R. (1991). Constructing symmetric non-negative matrices with
prescribed eigenvalues by differential equations, SIAM Journal on Mathematical Analysis
22(5): 1372–1387.
Colonius, F. & Klieman, W. (1990). Linear control semigroup acting on projective space,
Technical Report 224, Universitat Augsburg, Germany.
192 Bibliography
Cooper, G. J. (1986). On the existence of solution for algebraically stable Runge-Kutta methods,
IMA Journal of Numerical Analysis 6: 325–330.
Crouch, P. E., Grossman, R. & Yan, Y. (1992). On the numerical integration of the dynamic
attitude equations, Proceedings of the IEEE Conference on Decision and Control, Tucson,
Arizona, pp. 1497–1501.
Crouch, P. E., Grossman, R. & Yan, Y. (1994). A third order Runge-Kutta algorithm on a
manifold, preprint.
Crouch, P. E. & Grossman, R. (1994). Numerical integrations of ordinary differential equations
on manifolds, to appear in Journal of Nonlinear Science.
Curry, H. (1944). The method of steepest descent for nonlinear minimization problems,
Quarterly Applied Mathematics 2: 258–261.
Dahlquist, G. (1978). G-stability is equivalent to A-stability, B.I.T. 18: 384–401.
Davidon, W. C. (1959). Variable metric methods for minimization,Technical Report ANL-5990,
Argonne National Laboratory.
Davison, E. J. & Wang, S.-H. (1973). Properties of linear time-invariant multivariable systems
subject to arbitrary output and state feedback, IEEE Transactions on Automatic Control
pp. 24–32.
Davison, E. J. & Wang, S.-H. (1975). On pole assingment in linear multivariable systems using
output feedback, IEEE Transactions on Automatic Control pp. 516–518.
Deift, P., Nanda, T. & Tomei, C. (1983). Ordinary differential equations for the symmetric
eigenvalue problem, SIAM Journal of Numerical Analysis 20(1): 1–22.
Driessel, K. R. (1986). On isospectral gradient flows - solving matrix eigenproblems using
differential equations, in J. R. Cannon & U. Hornung (eds), Inverse Problems, Birkhauser
Verlag, pp. 69–90.
Duistermaat, J. J., Kolk, J. A. C. & Varadarajan, V. S. (1983). Functions, flows and oscilla-
tory integrals on flag manifolds and conjugacy classes in real semi-simple Lie groups,
Composito Mathematica 49: 309–398.
Bibliography 193
Ehle, B. L. (1973). A-stable methods and pade approximations to the exponential, SIAM
Journal of Mathematical Analysis 4: 671–680.
Euler, L. (1913). De integratione aequationum differentialium per approximationem, Opera
Omnia, 1’st series, Vol. 11, Institutiones Calculi Integralis, Teubner, Leipzig, Germany,
pp. 424–434.
Faddeev, D. K. & Faddeeva, V. N. (1963). Computational Methods of Linear Algebra, W. H.
Freeman and Co., San Francisco.
Falb, P. (1990). Methods of Algebraic Geometry in Control Theory: Part I, Vol. 4 of Systems
and Control: Foundations and Applications, Birkhauser, Boston, U.S.A.
Faybusovich, L. (1992a). Inverse problems for orthogonal matrices, Toda flows, and signal
processing, Proceedings of the IEEE Conference on Decision and Control.
Faybusovich, L. E. (1989). QR-type factorisations, the Yang-Baxter equation and eigenvalue
problem of control theory, Linear Algebra and its Applications 122-124: 943–971.
Faybusovich, L. E. (1991a). Dynamical systems which solve optimisation problems with linear
constraints, IMA Journal of Information and Control 8: 135–149.
Faybusovich, L. E. (1991b). Hamiltonian structure of dynamical systems which solve linear
programming problems, Physica D .
Faybusovich, L. E. (1992b). Toda flows and isospectral manifolds, Proceedings of the American
Mathematical Society 115(3): 837–847.
Feng (1985). Difference schemes for Hamiltonian formalism and symplectic geometry, Journal
of Computational Mathematics 4: 279–289.
Flashka, H. (1974). The Toda lattice, II. Existence of integrals, Physical Review B 9(4): 1924–
1925.
Flashka, H. (1975). Discrete and periodic illustrations of some aspects of the inverse methods,
in J. Moser (ed.), Dynamical System, Theory and Applications, Vol. 38 of Lecture Notes
in Physics, Springer-Verlag, Berlin.
Fleming, W. (1977). Functions of Several Variables, Undergraduate texts in Mathematics,
Springer-Verlag, New York, U.S.A.
194 Bibliography
Fletcher, R. (1970). A new approach to variable metric algorithms, Computer Journal
13(3): 317–322.
Fletcher, R. & Powell, M. J. D. (1963). A rapidly convergent descent method for minimization,
Computer Journal 6: 163–168.
Fletcher, R. & Reeves, C. M. (1964). Function minimization by conjugate gradients, Computer
Journal 7: 149–154.
Forsythe, G. E. (1968). On the asymptotic directions of the s-dimensional optimum gradient
method, Numerische Mathematik 11: 57–76.
Friedland, S., Nocedal, J. & Overton, M. L. (1987). The formulation and analysis of numerical
methods for inverse eigenvalue problems, SIAM Journal of Numerical Analysis 24: 634–
667.
Gear, C. W. (1968). The automatic integration of stiff ordinary differential equations, in
A. J. Morrell (ed.), Information Processing 68: Proceedings IFIP Congress, Edinburgh,
pp. 187–193.
Gevers, M. R. & Li, G. (1993). Parametrizations in Control,Estimation and FilteringProblems:
Accuracy Aspects, Communications in Control Engineering, Springer Verlag, London,
United Kingdom.
Ghosh, B. K. (1988). An approach to simultaneous system design, part II: Nonswitching gain
and dynamic feedback compensation by algebraic geometric methods, SIAM Journal of
Control and Optimization 26(4): 919–963.
Gibson, C. G. (1979). Singular Points of Smooth Mappings, Vol. 25 of Research Notes in
Mathematics, Pitman, London, United Kingdom.
Godbout, L. F. & Jordan, D. (1989). Gradient matrices for output feedback systems, Interna-
tional Journal of Control 32(5): 411–433.
Goldfarb, D. (1970). A family of variable metric methods derived by variational means,
Mathematics of Computation 24: 23–26.
Golub, G. H. & Van Loan, C. F. (1989). Matrix Computations, The Johns Hopkins University
Press, Baltimore, Maryland U.S.A.
Bibliography 195
Greenspan, D. (1974). Discrete numerical methods in physics and engineering, Academic
Press, New York, U.S.A.
Greenspan, D. (1984). Discrete numerical methods in physics and engineering, Journal of
Computational Physics 56: 21.
Hazewinkel, M. (1979). On identification and the geometry of the space of linear systems,
Lecture Notes in Control and Information Science, Vol. 16, Springer-Verlag, Berlin. see
also Reprint Series, No. 245, Erasmus University, Rotterdam.
Helgason, S. (1978). Differential Geometry, Lie Groups and Symmetric Spaces, Academic
Press, New York.
Helmke, U. (1984). Topology of the moduli space for reachable linear dynamical systems:
The complex case, Technical Report 122, Forschungsschwerpunkt Dynamische Systeme,
University of Bremen.
Helmke, U. (1991). Isospectral flows on symmetric matrices and the Riccati equation, Systems
and Control Letters 16: 159–165.
Helmke, U. (1992). Isospectral flows and linear programming, Journal of Australian Mathe-
matical Society 34(3).
Helmke, U. (1993). Balanced realisations for linear systems: A variational approach, SIAM
Journal of Control and Optimization 31: 1–15.
Helmke, U. & Moore, J. B. (1990). Singular value decomposition via gradient flows, Systems
and Control Letters 14: 369–377.
Helmke, U. & Moore, J. B. (1994a). L2-sensitivity minimization of linear system representa-
tions via gradient flows, to appear in Journal of Mathematical Systems, Estimation and
Control.
Helmke, U. & Moore, J. B. (1994b). Optimization and Dynamical Systems, Communications
and Control Engineering, Springer-Verlag, London.
Helmke, U. & Shayman, M. A. (1992). Critical points of matrix least square distance functions,
2nd IFAC workshop on System Structure and Control, Prague, Czechoslovakia, pp. 116–
118. to appear in Linear Algebra and its Applications.
196 Bibliography
Helmke, U., Moore, J. B. & Perkins, J. E. (1994). Dynamical systems that compute bal-
anced realizations and the singular value decomposition, to appear SIAM Journal Matrix
Analysis.
Henon, M. (1974). Integrals of the Toda lattice, Physical Review B 9(4): 1921–1923.
Hermann, R. & Martin, C. F. (1977). Applications of algebraic geometry to systems theory -
part I, IEEE Transactions on Automatic Control 22: 19–25.
Hermann, R. & Martin, C. F. (1982). Lie and Morse theory of periodic orbits of vector fields
and matrix Riccati equations, I: General Lie theoretic methods, Math. Systems Theory
15: 277–284.
Hestenes, M. R. & Karush, W. (1951). A method of gradient for the calculation of the
characteristic roots and vectors of a real symmetric matrix, Journal of the Resources
National Burea of Standards 47: 45–61.
Hirsch, M. W. (1976). Differential Topology, number 33 in Graduate Texts in Mathematics,
Springer-Verlag, New York.
Horn, R. A. & Johnson, C. R. (1985). Matrix Analysis, Cambridge University Press, Cambridge,
U.K.
Imae, J., Perkins, J. E. & Moore, J. B. (1992). Towards time-varying balanced realisation via
Riccati equations, Mathematics of Control, Signals and Systems 5: 313–326.
J. E. Dennis, J. & Schnabel, R. B. (eds) (1983). Numerical methods for unconstrained opt-
imization and nonlinear equations, Computationa Mathematics, Prentice-Hall Inc., New
Jersey, U.S.A.
Kailath, T. (1980). Linear Systems, Prentice-Hall, Inglewood Cliffs, N.J., U.S.A.
Kalman, R. E. (1963). Mathematical description of linear systems,J.S.I.A.M. Control 1(2): 152–
192.
Karmarkar, N. (1984). A new polynomial time algorithm for linear programming, Combina-
torica 4 pp. 373–395.
Karmarkar, N. (1990). Riemannian geometry underlying interior-point methods for linear
programming, Contemp. Math. 114: 51–75.
Bibliography 197
Khachian, L. G. (1979). A polynomial algorithm in linear programming, Soviet Mathematics
Doklady 20: 191–194.
Kimura, H. (1975). Pole assignment by gain output feedback, IEEE Transactions on Automatic
Control pp. 509–516.
Kincaid, D. & Cheney, W. (1991). Numerical Analysis: Mathematics of Scientific Computing,
Brooks/Cole Publishing Company, Pacific Grove, California, u.S.A.
Kostant, B. (1979). The solution to a generalized Toda lattice and representation theory,
Advances in Mathmatics 34: 195–338.
Krishnaprasad, P. S. (1979). Symplectic mechanics and rational functions, Richerch Automatic
10: 107–135.
Kumar, S. (ed.) (1991). Recent developments in Mathematical Programming, Gordon and
Breach Science Publishers, Philadelphia, U.S.A.
Lagarias, J. C. (1991). Monotonicity properties of the Toda flow, the QR-flow, and subspace
iteration, SIAM Journal of Matrix Analysis and Applications 12(3): 449–462.
Lagarias, J. & Todd, M. J. (eds) (1990). Mathematical Development Arising from linear
Programming, Vol. 114 of Contemporary Mathematics, American Mathematical Society,
Providence, R.I., U.S.A.
Lasagni, F. (1988). Canonical Runge-Kutta methods, ZAMP 39: 952–953.
Laub, A. J., Heath, M. T., Paige, C. C. & Ward, R. C. (1987). Computation of system balancing
transformations and other applications of simultaneous diagonalization algorithms, IEEE
Transactions on Automatic Control 32(2): 115–121.
Li, G., Anderson, B. D. O., Gevers, M. & Perkins, J. E. (1992). Optimal FWL design of state-
space digital sytems with weighted sensitivityminimization and sparseness consideration,
IEEE Transactions on Circuits and Systems - I: Fundamental theory and applications
39: 365–377.
Luenburger, D. G. (1973). Introduction to Linear and Nonlinear Programming, Addison-
Wesley, Reading, U.K.
198 Bibliography
Madievski, A. G., Anderson, B. D. O. & Gevers, M. R. (1994). Optimum realizations of
sampled data controlers for FWL sensitivity minimization, submitted to Automatica.
Mahony, R. E. & Helmke, U. (1993). System assignment and pole placement for symmetric
realisations, Submitted to Journal of Mathematical Systems, Estimation and Control.
Mahony, R. E. & Moore, J. B. (1992). Recursive interior-point linear programming algorithm
based on Lie-Brockett flows, Proceedings of the International Conference on Optimisa-
tion: Techniques and Applications, Singapore.
Mahony, R. E., Helmke, U. & Moore, J. B. (1993). Pole placement algorithms for symmetric
realisations, Proceedings of IEEE Conference on Decision and Control, San Antonio,
U.S.A.
Mahony, R. E., Helmke, U. & Moore, J. B. (1994). Gradient algorithms for principal component
analysis, Submitted to Journal of the Australian Mathematical Society.
Martin, C. F. & Hermann, R. (1977a). Applications of algebraic geometry to systems theory -
part II: Feedback and pole placement for linear Hamiltonian systems, Proceedings of the
IEEE 65: 841–848.
Martin, C. F. & Hermann, R. (eds) (1977b). The 1976 Ames Research Centre (NASA) Conference
on Geometric Control Theory, Vol. VII of Lie Groups: History Frontiers and Applications,
Math Sci Press, Brookline, Massachusetts, U.S.A.
Martin, C. & Herman, R. (1979). Applications of algebraic geometry to systems theory:
The McMillan degree and Kroneker indices of transfer functions as topological and
holomorphic system invariants, SIAM Journal of Control and Optimization 16(5): 743–
755.
Menyuk (1984). Some properties of the discrete Hamiltonian method, Physica, series D
11: 109–129.
Minoux, M. (1986). Mathematical Programming: Theory and Algorithms, John Wiley and
sons, Chichester, Britain.
Mishchenko, A. & Fomenko, A. (1980). A Course in Differential Geometry and Topology, Mir
publishers, Moscow, Russia.
Bibliography 199
Moore, J. B., Mahony, R. E. & Helmke, U. (1992). Recursive gradient algorithms for eigenvalue
and singular value decompositions, Proceedings of the American Control Conference,
Chicago, U.S.A.
Moore, J. B., Mahony, R. E. & Helmke, U. (1994). Numerical gradient algorithms for eigenvalue
and singular value calculations, SIAM Journal of Matrix Analysis 15(3).
Moser, J. (1975). Finitely many mass points on the line under the influence of an exponen-
tial potential - an integrable system, in J. Moser (ed.), Dynamic Systems Theory and
Applications, Springer-Verlag, New York, pp. 467–497.
Moser, J. & Veselov, A. P. (1991). Discrete versions of some classical integrable systems and
factorization of matrix polynomials, Communications in Mathematical Physics 139: 217–
243.
Munkres, J. R. (1975). Topology, a first course, Prentice-Hall, Englewood Cliffs, N.J., U.S.A.
Nakamura, Y. (1989). Moduli spaces of controllable linear dynamical systems and nonlinear
integrable systems of Toda type, in J. M. J. Harnad (ed.), Proceedings CRM workshop on
Hamiltonian Systems, Transformation Groups and Spectral Transform Methods, pp. 103–
112.
Nanda, T. (1985). Differential equations and the QR algorithm, SIAM Journal Numerical
Analaysis 22(2): 310–321.
Oja, E. (1982). A simplified neuron model as a principal component analyzer, Journal of
Mathematical Biology 15: 267–273.
Oja, E. (1989). Neural networks, principal components, and subspaces, International Journal
on Neural Systems 1: 61–68.
Parlett, B. N. (1974). The Rayleigh quotient iteration and some generalisations for nonnormal
matrices, Mathematics of Computation 28(127): 679–693.
Parlett, B. N. & Poole, W. G. (1973). A geometric theory for the QR, LU, and power iterations,
SIAM Journal of Numerical Analysis 10(2): 389–412.
Paul, S., Hueper, K. & Nossek, J. A. (1992). A class of non-linear lossless dynamical systems,
Achiv fur Elektronik und Ubertragungstechnik 46: 219–227.
200 Bibliography
Perkins, J. E., Helmke, U. & Moore, J. B. (1990). Balanced realizations via gradient flow
techniques, Systems and control Letters 14: 369–380.
Polyak, B. T. (1966). A general method for solving extremum problems, Soviet Mathematics
Doklady 8: 593–597.
Riddell, R. C. (1984). Minimax problems on Grassmann manifolds: Sums of eigenvalues,
Advances in Mathematics 54: 107–199.
Rosenthal, J. (1989). Tuning natural frequencies by output feedback, in K. Bowers & J. Lund
(eds), Computation and Control, Proceedings of the Bozeman Conference, Bozeman,
Montana, Vol. 1 of Progress in Systems and Control Theory, Birkhauser, pp. 277–282.
Rosenthal, J. (1992). New results in pole assignment by real output feedback, SIAM Journal
of Control and Optimization 30(1): 203–211.
Ruth (1983). A canonical integration technique, IEEE Transactions on Nuclear Science
30: 2669–2671.
Rutishauser, H. (1954). Ein infinitesimales analogon zum quotienten-differenzen-algorithmus,
Archiv der Mathematik 5: 132–137.
Rutishauser, H. (1958). Solution of eigenvalue problems with the LR-transformation, National
Bureau of Standards: Applied Mathematics Series 49: 47–81.
Safanov, M. G. & Chiang, R. Y. (1989). A Schur method for banced-truncations, IEEE
Transactions on Automatic Control 34: 729–733.
Sanz-Serna, J. M. (1988). Runge-Kutta schemes for Hamiltonian systems, B.I.T. 28: 877–883.
Sanz-Serna, J. M. (1991). Symplectic integrators for Hamiltonian problems: An overview,
ACTA Numerica 1: 243–286.
Shanno, D. (1970). Conditioning of quasi-newton methods for function minimization, Mathe-
matics of Computation 24: 641–656.
Shayman, M. A. (1986). Phase portrait of the matrix Riccati equation, SIAM Journal of Control
and Optimization 24(1): 1–65.
Bibliography 201
Shub, M. & Vasquez, A. T. (1987). Some linearly induced Morse-Smale systems, the QR
algorithm and the Toda lattice, in L. Keen (ed.), The Legacy of Sonya Kovaleskaya,
Vol. 64 of Contemporary Mathematics, American Mathematical Society, Providence,
U.S.A., pp. 181–194.
Smale, S. (1961). On gradient dynamical systems, Annals of Mathematics 74(1): 199–206.
Smith, S. T. (1991). Dynamical systems the perform the singular value decomposition, Systems
and Control Letters 16(5): 319–327.
Smith, S. T. (1993). Geometric Optimization methods for adaptive filtering, PhD thesis,
Division of Applied Science.
Sontag, E. D. (1990). Mathematical Control Theory, Springer-Verlag, New York, U.S.A.
Sreeram, V., Teo, K. L., Yan, W.-Y. & Li, C. (1994). A gradient flow approach to simultaneous
stailization problem, submitted to 1994 Conference on Decision and Control.
Stuart, A. M. & Humphries, A. R. (1994). Model problems in numerical stability for initial
value problems, to appear SIAM Review.
Suris, Y. B. (1989). Hamiltonian Runge-Kutta type methods and their variational formulation,
Math. Sim. 2: 78–87. in Russian.
Symes, W. W. (1980a). Hamiltonian group actions and integrable systems, Physica D 1: 339–
374.
Symes, W. W. (1980b). Systems of the Toda type, inverse spectral problems, and representation
theory, Inventiones Mathematicae 59: 13–51.
Symes, W. W. (1982). The QR algorithm and scattering for the finite nonperiodic Toda lattice,
Physica 4D pp. 275–280.
Toda, M. (1970). Waves in nonlinear lattice, Supplement of the Progress in Theoretical Physics
45: 174–200.
Tombs, M. S. & Postlethwaite, I. (1987). Truncated balanced realization of stable non-minimal
state-space system, International Journal of Control 46: 1319–1330.
202 Bibliography
Varadarajan, V. S. (1984). Lie Groups, Lie Algebras, and their Representations, Vol. 102 of
Graduate texts in Mathematics, Springer-Verlag, New York, U.S.A.
Wang, X. (1989). Geometric inverse eigenvalue problem, in K. Bowers & J. Lund (eds),
Computation and Control, Proceedings of the Bozeman Conference, Bozeman, Montana,
Vol. 1 of Progress in Systems and Control THeory, Birkhauser, pp. 375–383.
Wang, X. (1992). Pole placement by static ouptput feedback, Journal of Mathematical Systems,
Estimation and Control 2(2): 205–218.
Warner, F. W. (1983). Foundations of Diferentiable manifolds and Lie Groups, Graduate texts
in Mathematics, Springer-Verlag, New York, U.S.A.
Watkins, D. S. (1982). Understanding the QR algorithm, SIAM Review 24(4): 427–440.
Watkins, D. S. (1984). Isospectral flows, SIAM Review 26(3): 379–391.
Watkins, D. S. & Elsner, L. (1988). Self similar flows, Linear Algebra and its Applications
110: 213–242.
Watkins, D. S. & Elsner, L. (1989a). Self-equivalent flows associated with the generalized
eigenvalue problem, Linear Algebra and its Applications 118: 107–127.
Watkins, D. S. & Elsner, L. (1989b). Self-equivalent flows associated with the singular value
decomposition, SIAM Journal Matrix Analysis Applications 10: 244–258.
Widlund, O. (1967). A note on unconditionally stable linear multistep methods, B.I.T. 7: 65–70.
Wilkinson, J. H. (1968). Global convergence of the QR algorithm, Linear Algebra and its
Applications 1.
Willems, J. C. & Hesselink, W. H. (1978). Generic properties of the pole placement map,
Proceedings of the 7th IFAC Congress, pp. 1725–1729.
Wonham, W. M. (1967). On pole assignment in multi-input controllable linear systems, IEEE
Transactions on Automatic Control 12: 660–665.
Wonham, W. M. (1985). Linear Multivariable Control, third edition edn, Springer-Verlag,
New York, U.S.A.
Bibliography 203
Yan, W.-Y. & B., M. J. (1991). On L2-sensitivity minimization of linear state-space systems,
preprint.
Yan, W.-Y., Helmke, U. & Moore, J. B. (1994). Global analysis of Oja’s flow for neural
networks, To appear IEEE Transactions on Neural Networks.
Yan, W.-Y., Moore, J. B. & Helmke, U. (1994). Recursive algorithms for solving a class of
nonlinear matrix equations with applications to certain sensitivity optimization problems,
to appear SIAM Journal of Control and Optimization.
Yan, W.-Y., Teo, K. L. & Moore, J. B. (n.d.). A gradient flow approach to computing LQ optimal
output feedback gains, Submitted to Journal of Optimal Control and Applications.
Zhong, G. & Marsden, J. E. (1988). Lie-Poisson Hamilton-Jacobi theory and Lie-Poisson
integrators, Physics Letters A 133(3): 134–139.