Introduction to the Calculus of Variations and Control with Modern Applications

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Introduction to

The Calculus of Variations

and Control

with Modern Applications

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

CHAPMAN & HALL/CRC APPLIED MATHEMATICS AND NONLINEAR SCIENCE SERIES

Series Editor H. T. Banks

Published Titles

Advanced Differential Quadrature Methods, Zhi Zong and Yingyan Zhang Computing with hp-ADAPTIVE FINITE ELEMENTS, Volume 1, One and Two Dimensional Elliptic and Maxwell Problems, Leszek DemkowiczComputing with hp-ADAPTIVE FINITE ELEMENTS, Volume 2, Frontiers: Three Dimensional Elliptic and Maxwell Problems with Applications, Leszek Demkowicz, Jason Kurtz, David Pardo, Maciej Paszynski, Waldemar Rachowicz, and Adam ZdunekCRC Standard Curves and Surfaces with Mathematica®: Second Edition, David H. von SeggernDiscovering Evolution Equations with Applications: Volume 1-Deterministic Equations, Mark A. McKibbenDiscovering Evolution Equations with Applications: Volume 2-Stochastic Equations, Mark A. McKibbenExact Solutions and Invariant Subspaces of Nonlinear Partial Differential Equations in Mechanics and Physics, Victor A. Galaktionov and Sergey R. SvirshchevskiiFourier Series in Several Variables with Applications to Partial Differential Equations, Victor L. ShapiroGeometric Sturmian Theory of Nonlinear Parabolic Equations and Applications, Victor A. Galaktionov Green’s Functions and Linear Differential Equations: Theory, Applications, and Computation, Prem K. Kythe Group Inverses of M-Matrices and Their Applications, Stephen J. Kirkland and Michael Neumann Introduction to Fuzzy Systems, Guanrong Chen and Trung Tat PhamIntroduction to non-Kerr Law Optical Solitons, Anjan Biswas and Swapan KonarIntroduction to Partial Differential Equations with MATLAB®, Matthew P. ColemanIntroduction to Quantum Control and Dynamics, Domenico D’AlessandroIntroduction to The Calculus of Variations and Control with Modern Applications, John A. BurnsMathematical Methods in Physics and Engineering with Mathematica, Ferdinand F. CapMathematical Theory of Quantum Computation, Goong Chen and Zijian DiaoMathematics of Quantum Computation and Quantum Technology, Goong Chen, Louis Kauffman, and Samuel J. LomonacoMixed Boundary Value Problems, Dean G. DuffyModeling and Control in Vibrational and Structural Dynamics, Peng-Fei YaoMulti-Resolution Methods for Modeling and Control of Dynamical Systems, Puneet Singla and John L. Junkins Nonlinear Optimal Control Theory, Leonard D. Berkovitz and Negash G. Medhin Optimal Estimation of Dynamic Systems, Second Edition, John L. Crassidis and John L. JunkinsQuantum Computing Devices: Principles, Designs, and Analysis, Goong Chen, David A. Church, Berthold-Georg Englert, Carsten Henkel, Bernd Rohwedder, Marlan O. Scully, and M. Suhail ZubairyA Shock-Fitting Primer, Manuel D. Salas Stochastic Partial Differential Equations, Pao-Liu Chow

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

CHAPMAN & HALL/CRC APPLIED MATHEMATICS

AND NONLINEAR SCIENCE SERIES

Introduction to

The Calculus of Variations

and Control

with Modern Applications

John A. BurnsVirginia Tech

Blacksburg, Virginia, USA

CRC PressTaylor & Francis Group6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742

© 2014 by Taylor & Francis Group, LLCCRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paperVersion Date: 20130715

International Standard Book Number-13: 978-1-4665-7139-6 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information stor-age or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copy-right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-vides licenses and registration for a variety of users. For organizations that have been granted a pho-tocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.com

and the CRC Press Web site athttp://www.crcpress.com

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Contents

Preface xi

Acknowledgments xvii

I Calculus of Variations 1

1 Historical Notes on the Calculus of Variations 31.1 Some Typical Problems . . . . . . . . . . . . . . . . 6

1.1.1 Queen Dido’s Problem . . . . . . . . . . . . 61.1.2 The Brachistochrone Problem . . . . . . . . 71.1.3 Shape Optimization . . . . . . . . . . . . . . 8

1.2 Some Important Dates and People . . . . . . . . . . 11

2 Introduction and Preliminaries 172.1 Motivating Problems . . . . . . . . . . . . . . . . . 17

2.1.1 Problem 1: The Brachistochrone Problem . . 172.1.2 Problem 2: The River Crossing Problem . . 182.1.3 Problem 3: The Double Pendulum . . . . . 202.1.4 Problem 4: The Rocket Sled Problem . . . 212.1.5 Problem 5: Optimal Control in the Life Sci-

ences . . . . . . . . . . . . . . . . . . . . . . 222.1.6 Problem 6: Numerical Solutions of Bound-

ary Value Problems . . . . . . . . . . . . . . 242.2 Mathematical Background . . . . . . . . . . . . . . 26

2.2.1 A Short Review and Some Notation . . . . . 262.2.2 A Review of One Dimensional Optimization 352.2.3 Lagrange Multiplier Theorems . . . . . . . . 42

v

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

vi

2.3 Function Spaces . . . . . . . . . . . . . . . . . . . 572.3.1 Distances between Functions . . . . . . . . 642.3.2 An Introduction to the First Variation . . . 68

2.4 Mathematical Formulation of Problems . . . . . . . 692.4.1 The Brachistochrone Problem . . . . . . . . 692.4.2 The Minimal Surface of Revolution Problem 722.4.3 The River Crossing Problem . . . . . . . . . 732.4.4 The Rocket Sled Problem . . . . . . . . . . 742.4.5 The Finite Element Method . . . . . . . . . 76

2.5 Problem Set for Chapter 2 . . . . . . . . . . . . . . 86

3 The Simplest Problem in the Calculus of Variations91

3.1 The Mathematical Formulation of the SPCV . . . . 913.2 The Fundamental Lemma of the Calculus of Varia-

tions . . . . . . . . . . . . . . . . . . . . . . . . . . 953.3 The First Necessary Condition for a Global Mini-

mizer . . . . . . . . . . . . . . . . . . . . . . . . . 1023.3.1 Examples . . . . . . . . . . . . . . . . . . . 112

3.4 Implications and Applications of the FLCV . . . . 1173.4.1 Weak and Generalized Derivatives . . . . . . 1183.4.2 Weak Solutions to Differential Equations . 124

3.5 Problem Set for Chapter 3 . . . . . . . . . . . . . 125

4 Necessary Conditions for Local Minima 1314.1 Weak and Strong Local Minimizers . . . . . . . . . 1324.2 The Euler Necessary Condition - (I) . . . . . . . . . 1354.3 The Legendre Necessary Condition - (III) . . . . . . 1394.4 Jacobi Necessary Condition - (IV) . . . . . . . . . . 146

4.4.1 Proof of the Jacobi Necessary Condition . . 1524.5 Weierstrass Necessary Condition - (II) . . . . . . . 155

4.5.1 Proof of the Weierstrass Necessary Condition 1594.5.2 Weierstrass Necessary Condition for a Weak

Local Minimum . . . . . . . . . . . . . . . . 1714.5.3 A Proof of Legendre’s Necessary Condition . 176

4.6 Applying the Four Necessary Conditions . . . . . . 1784.7 Problem Set for Chapter 4 . . . . . . . . . . . . . 180

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

vii

5 Sufficient Conditions for the Simplest Problem 1855.1 A Field of Extremals . . . . . . . . . . . . . . . . . 1865.2 The Hilbert Integral . . . . . . . . . . . . . . . . . 1905.3 Fundamental Sufficient Results . . . . . . . . . . . 1925.4 Problem Set for Chapter 5 . . . . . . . . . . . . . . 197

6 Summary for the Simplest Problem 203

7 Extensions and Generalizations 2137.1 Properties of the First Variation . . . . . . . . . . . 2137.2 The Free Endpoint Problem . . . . . . . . . . . . . 215

7.2.1 The Euler Necessary Condition . . . . . . . 2187.2.2 Examples of Free Endpoint Problems . . . . 221

7.3 The Simplest Point to Curve Problem . . . . . . . . 2247.4 Vector Formulations and Higher Order Problems . . 238

7.4.1 Extensions of Some Basic Lemmas . . . . . 2427.4.2 The Simplest Problem in Vector Form . . . 2497.4.3 The Simplest Problem in Higher Order Form 252

7.5 Problems with Constraints: Isoperimetric Problem . 2557.5.1 Proof of the Lagrange Multiplier Theorem . 259

7.6 Problems with Constraints: Finite Constraints . . . 2637.7 An Introduction to Abstract Optimization Prob-

lems . . . . . . . . . . . . . . . . . . . . . . . . . . 2657.7.1 The General Optimization Problem . . . . 2657.7.2 General Necessary Conditions . . . . . . . . 2677.7.3 Abstract Variations . . . . . . . . . . . . . . 2717.7.4 Application to the SPCV . . . . . . . . . . . 2737.7.5 Variational Approach to Linear Quadratic

Optimal Control . . . . . . . . . . . . . . . 2747.7.6 An Abstract Sufficient Condition . . . . . . 275


8 Applications 2838.1 Solution of the Brachistochrone Problem . . . . . . 2838.2 Classical Mechanics and Hamilton’s Principle . . . 287

8.2.1 Conservation of Energy . . . . . . . . . . . . 2928.3 A Finite Element Method for the Heat Equation . . 295

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

viii


II Optimal Control 307

9 Optimal Control Problems 3099.1 An Introduction to Optimal Control Problems . . . 3099.2 The Rocket Sled Problem . . . . . . . . . . . . . . 3139.3 Problems in the Calculus of Variations . . . . . . . 315

9.3.1 The Simplest Problem in the Calculus ofVariations . . . . . . . . . . . . . . . . . . . 315

9.3.2 Free End-Point Problem . . . . . . . . . . . 3189.4 Time Optimal Control . . . . . . . . . . . . . . . . 319

9.4.1 Time Optimal Control for the Rocket SledProblem . . . . . . . . . . . . . . . . . . . . 319

9.4.2 The Bushaw Problem . . . . . . . . . . . . . 3339.5 Problem Set for Chapter 9 . . . . . . . . . . . . . . 338

10 Simplest Problem in Optimal Control 34110.1 SPOC: Problem Formulation . . . . . . . . . . . . . 34110.2 The Fundamental Maximum Principle . . . . . . . 34310.3 Application of the Maximum Principle to Some

Simple Problems . . . . . . . . . . . . . . . . . . . 35110.3.1 The Bushaw Problem . . . . . . . . . . . . 35110.3.2 The Bushaw Problem: Special Case γ = 0

and κ = 1 . . . . . . . . . . . . . . . . . . . 35810.3.3 A Simple Scalar Optimal Control Problem . 362

10.4 Problem Set for Chapter 10 . . . . . . . . . . . . . 367

11 Extensions of the Maximum Principle 37311.1 A Fixed-Time Optimal Control Problem . . . . . . 373

11.1.1 The Maximum Principle for Fixed t1 . . . . 37511.2 Application to Problems in the Calculus of Varia-

tions . . . . . . . . . . . . . . . . . . . . . . . . . . 37711.2.1 The Simplest Problem in the Calculus of

Variations . . . . . . . . . . . . . . . . . . . 37711.2.2 Free End-Point Problems . . . . . . . . . . 38411.2.3 Point-to-Curve Problems . . . . . . . . . . 385

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ix

11.3 Application to the Farmer’s Allocation Problem . . 39311.4 Application to a Forced Oscillator Control Problem 40011.5 Application to the Linear Quadratic Control

Problem . . . . . . . . . . . . . . . . . . . . . . . . 40411.5.1 Examples of LQ Optimal Control Problems 41011.5.2 The Time Independent Riccati Differential

Equation . . . . . . . . . . . . . . . . . . . 41911.6 The Maximum Principle for a Problem of Bolza . . 42911.7 The Maximum Principle for Nonautonomous

Systems . . . . . . . . . . . . . . . . . . . . . . . . 43611.8 Application to the Nonautonomous LQ Control

Problem . . . . . . . . . . . . . . . . . . . . . . . . 44611.9 Problem Set for Chapter 11 . . . . . . . . . . . . . 453

12 Linear Control Systems 45912.1 Introduction to Linear Control Systems . . . . . . . 45912.2 Linear Control Systems Arising from Nonlinear

Problems . . . . . . . . . . . . . . . . . . . . . . . 47312.2.1 Linearized Systems . . . . . . . . . . . . . . 47412.2.2 Sensitivity Systems . . . . . . . . . . . . . 475

12.3 Linear Quadratic Optimal Control . . . . . . . . . . 47812.4 The Riccati Differential Equation for a Problem of

Bolza . . . . . . . . . . . . . . . . . . . . . . . . . . 48012.5 Estimation and Observers . . . . . . . . . . . . . . 490

12.5.1 The Luenberger Observer . . . . . . . . . . 49412.5.2 An Optimal Observer: The Kalman Filter . 498

12.6 The Time Invariant Infinite Interval Problem . . . 50612.7 The Time Invariant Min-Max Controller . . . . . . 50912.8 Problem Set for Chapter 12 . . . . . . . . . . . . . 511

Bibliography 519

Index 539

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Preface

It is fair to say that variational calculus had its beginnings inthe 17th century when many of the mathematical giants of thattime were focused on solving “calculus of variations” problems.In modern terminology, these early problems in the calculus ofvariations may be formulated as optimization problems over infi-nite dimensional spaces of functions. Although this might seem tobe a very specialized area, many of the mathematical ideas thatwere developed to analyze such optimization problems providedthe foundations of many areas of modern mathematics. The rootsof functional analysis, optimal control, mechanics and the mod-ern theory of partial differential equations can all be traced backto the classical calculus of variations. In addition to its historicalconnections to many branches of modern mathematics, variationalcalculus has applications to a wide range of current problems in en-gineering and science. In particular, variational calculus providesthe mathematical framework for developing and analyzing finiteelement methods. Thus, variational calculus plays a central role inmodern scientific computing.

Note that the word “modern” appears five times in the previousparagraph. This is no accident. Too often the calculus of variationsis thought of as an old area of classical mathematics with little orno relevance to modern mathematics and applications. This is farfrom true. However, during the first half of the 20th century, mostmathematicians in the United States focused on the intricacies ofthe mathematics and ignored many of the exciting new (modern)applications of variational calculus. This point was not lost on E.J. McShane who made many fundamental contributions to thisarea. In a 1978 lecture on the history of calculus of variations and

xi

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

xii

control theory (see [133] and [134]), McShane explained why hisfundamental papers on the classical Bolza problem in the calculusof variations “... burst on the mathematical world with the eclatof a butterfly’s hiccough.” McShane observed:

The problem of Bolza was the most general of thesingle-integral problems of the calculus of variations.Its mastery gave us the power to answer many deepand complicated questions that no one was asking. Thewhole subject was introverted. We who were working init were striving to advance the theory of the calculus ofvariations as an end in itself, without attention to itsrelation with other fields of activity.

In the same lecture, McShane provided the one reason why Pon-tryagin and his followers lead the development of optimal controltheory:

In my mind, the greatest difference between theRussian approach and ours was in mental attitude.Pontryagin and his students encountered some prob-lems in engineering and in economics that urgentlyasked for answers. They answered the questions, andin the process they incidentally introduced new and im-portant ideas into the calculus of variations. I think itis excusable that none of us in this room found an-swers in the 1930’s for questions that were not askeduntil the 1950’s. But I for one am regretful that whenthe questions arose, I did not notice them. Like mostmathematicians in the United States, I was not payingattention to the problems of engineers.

The importance of applications as noted by McShane is stillvalid today. Optimal control is a vibrant and important offshootof the classical calculus of variations. The development of a mod-ern framework for the analysis and control of partial differentialequations was developed by J. L. Lions and is based on variational

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

xiii

theory (see [123], [128], [127] and [125]). Moreover, variational ap-proaches are essential in dealing with problems in stochastic con-trol [21] and differential games [20]. Perhaps one of the most impor-tant current applications of variational theory is to modern com-putational science. The 1972 book by Aubin [12] was in some senseahead of its time. This book uses variational theory to develop avery general framework for constructing numerical approximationof elliptic boundary problems. Finite element methods produce nu-merical algorithms that are based on variational (weak) solutionsto partial differential equations and provide a powerful approachto simulating a wide variety of physical systems. It is interestingto note that recent advances in computational science have comeabout because many people in this research community have “paidattention to applications”. Thus, this is a case where focusing on agood application can lead to exciting new mathematics and pavethe way for major breakthroughs in computational algorithms.

The main goal of this book is to provide an introduction tothe calculus of variations and (finite dimensional) optimal controltheory with modern applications. The book is based on lecturenotes that provided the raw reading material for a course I havetaught at Virginia Tech for the past thirty five years. However, theexamples and motivating applications have changed and evolvedover the years. The object of the course is to introduce the mainideas in a completely rigorous fashion, but to keep the contentat a level that is accessible by first year graduate students in en-gineering and science. For example, we focus mostly on functionspaces of piecewise continuous and piecewise smooth functions andthereby avoid measure theory. This is sufficient for the variationalcalculus and the simplest problem in optimal control. In Part Iwe develop the calculus of variations and provide complete proofsof the main results. Detailed proofs are given, not for the sake ofproving theorems, but because the ideas behind these proofs arefundamental to the development of modern optimization and con-trol theory. Indeed, many of these ideas provide the foundationsto all of modern applied and computational mathematics includ-ing: functional analysis, distribution theory, the theory of partialdifferential equations, optimization and finite element methods.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

xiv

In Part II we focus on optimal control problems and show howoptimal control is a natural extension of the classical calculus ofvariations to more complex problems. Although the proof of theMaximum Principle was a tour de force in applied mathematics,the basic Maximum Principle for the simplest problem is givenwithout proof. A complete (rigorous) proof is not given for tworeasons. First, from the author’s teaching experience I have foundthat the time spent to develop and present a proof of the MaximumPrinciple adds very little to the understanding of key ideas. Basi-cally there are two approaches to proving the Maximum Principle.One approach is based on functional analysis techniques and wouldrequire that the student have a working knowledge of Lebesgueintegration and measure theory. This approach is typical in themore mathematical treatments such as found in [100] and [119].Although there is a second approach that only uses advanced cal-culus and geometric ideas, the complete proof is rather lengthy.This approach can be found in several excellent references suchas [18], [101] and [120] and will not be repeated here. However,the basic Maximum Principle is used to rigorously develop neces-sary conditions for more complex problems and for optimal controlof linear systems. The author feels that going through a rigorousdevelopment for extensions of the simplest optimal control prob-lem provides the student with the basic mathematical tools toattack new problems of this type in their particular applicationarea.

During the past fifty years a huge number of texts have beenpublished on calculus of variations, optimization, design and con-trol. Clearly this book can not capture this entire body of work. Amajor objective of this book is to provide the fundamental back-ground required to develop necessary conditions that are the start-ing points for theoretical and numerical approaches to variationaland control problems. Although we focus on necessary conditions,we present some classical sufficient conditions and discuss the im-portance of distinguishing between the two. In all cases the em-phasis is on understanding the basic ideas and their mathematicaldevelopment so that students leave the course with mathematicaltools that allow them to attack new problems of this type. After

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

xv

a thorough treatment of the simplest problems in the Calculusof Variations and Optimal Control, we walk through the processof moving from the simplest problem to more complex problemsto help the student see how one might begin to modify the basicoptimality conditions to address more difficult problems. This isimportant since we stress the point:

It is impossible to solve all problems in a courseand when the student moves on to the ”real workingworld” there will be new applications and problems tobe solved. This book provides the ideas and methodolo-gies that might be used as a starting point to addressyet unsolved problems.

It is assumed that the reader has a basic background in dif-ferential equations and advanced calculus. The notes focus on thefundamental ideas that are needed to rigorously develop necessaryand sufficient conditions and to present cases where these ideashave impact on other mathematical areas and applications. In PartI we provide complete proofs of the main theorems on necessaryand sufficient conditions. One goal is to make sure the studenthas a clear understanding of the difference between necessary con-ditions and sufficient conditions and when to use them. Althoughthis may seem like a trivial issue to mathematicians, the author hasfound that some students in other disciplines have trouble distin-guishing between the two which can lead to mistakes. Moreover,since very little “advanced mathematics” is assumed, the initialproofs are very detailed and in a few cases these details are re-peated to emphasize the important ideas which have applicationsto a variety of problems.

In order to keep the book at a reasonable length and to keepthe mathematical requirements at the advanced calculus level, wehave clearly omitted many important topics. For example, multi-integral problems are not discussed and “direct methods” thatrequire more advanced mathematics such as Sobolev Space theoryare also missing. The interested reader is referred to [86] for anelementary introduction to these topics and more advanced treat-ments may be found in [64], [65], [89] and [186].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

xvi

Suggested References and TextsThe best textbooks to supplement this book are George Ew-

ing’s book, Calculus of Variations with Applications [77] and Leit-mann’s book, The Calculus of Variations and Optimal Control[120]. Also, the books by Joseph Z. Ben-Asher [18] and Lee andMarkus [119] provide nice introductions to optimal control. Otherbooks that the reader might find useful are [65], [122] and [135].Finally, the following texts are excellent and will be cited oftenso that the interested reader can dive more deeply into individualtopics.

B. Anderson and J. Moore [6] A. Isidori [105]A. Bensoussan, G. Da Prato, T. Kailath [110]

M. Delfour and S. Mitter [22] H. Knobloch, A. Isidoriand [23] and D. Flockerzi [114]

L. D. Berkovitz [24] H. Kwakernaak andG. Bliss [29] R. Sivan [115]O. Bolza [31] B. Lee and L. Markus [119]A. Bryson and Y. Ho [46] J. L. Lions [123] and [125]C. Caratheodory [54] D. Luenberger [131]F. Clarke [57] L. Neustadt [144]R. Courant [59] D. Russell [157]R. Curtain and H. Zwart [63] H. Sagan [158]A. Forsyth [80] D. Smith [166] and [167]I. Gelfand and S. Fomin [87] J. Troutman [180]L. Graves [95] F. Wan [183]H. Hermes and J. LaSalle [100] L. Young [186]M. Hestenes [101] and [102]

Disclaimer:This book is based on “raw” lecturenotes developed over the years and the notes were of-ten updated to include new applications or eliminateold ones. Although this process helped the author findtypos and errors, it also meant the introduction of newtypos and errors. Please feel free to send the author alist of typos, corrections and any suggestions that mightimprove the book for future classes.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Acknowledgments

I would like to acknowledge all my students who for the past thirtyfive years have provided feedback on the raw notes that were thebasis of this book. Also, I thank my long time colleagues Gene Cliffand Terry Herdman who have provided valuable input, guidanceand inspiration over the past three decades. This constant feedbackhelped me to update the subject matter as the years progressedand to find both typos and technical errors in the material. How-ever, as noted above the constant updating of the material meansthat even this version is sure to have some typos. This is not thefault of my students and colleagues, so I wish to apologize in ad-vance for not catching all these errors. I especially wish to thankmy student Mr. Boris Kramer for a careful reading of the currentversion. I also thank the reviewers for their comments and valu-able suggestions for improving the manuscript, the acquisitionseditor Bob Stern at Taylor & Francis for his help and support andthe series editor H. T. Banks for encouraging this project. Mostimportantly, I wish to thank my family for their support and un-derstanding, particularly my wife Gail. They graciously gave upmany nights of family time while this manuscript was being writ-ten.

xvii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Part I

Calculus of Variations

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 1

Historical Notes on theCalculus of Variations

It is widely quoted that the calculus of variations (as a mathemat-ical subject) had its beginnings in 1686 when Newton proposedand solved a problem of finding a body of revolution that pro-duces minimum drag when placed in a flow. The problem and itssolution were given in his 1687 Philosophiae Naturalis PrincipiaMathematica (Mathematical Principles of Natural Philosophy).

A second milestone occurred in 1696 when Johann (John)Bernoulli proposed the brachistochrone problem as a mathemati-cal challenge problem. In 1697 his brother Jacob (James) Bernoullipublished his solution and proposed a more general isoperimetricproblem. In addition to the Bernoulli brothers, Newton, Leibnizand L’Hopital also gave correct solutions to the brachistochroneproblem. This is clearly a precise time in history where a “new”field of mathematics was born.

Between 1696 and 1900 a large core of mathematical giantsworked in this area and the book by Herman H. Goldstine [91] pro-vides a detailed treatment of this body of work. In particular, Johnand James Bernoulli, Leonhard Euler, Isaac Newton, Joseph-LouisLagrange, Gottfried Wilhelm von Leibniz, Adrien Marie Legendre,Carl G. J. Jacobi and Karl Wilhelm Theodor Weierstrass wereamong the main contributors in this field. Other important contri-butions during this period were made by Paul du Bois-Reymond,

3

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

4 Chapter 1. Historical Notes

Johann Peter Gustav Lejeune Dirichlet, William Rowan Hamiltonand Pierre-Louis Moreau de Maupertuis. At the end of the 19th

century and into the first half of the 20th century, the single inte-gral problem in the calculus of variations was expanded and refinedby David Hilbert (Germany), Leonida Tonelli (Italy), Oskar Bolza(Germany and U.S.A.) and the “Chicago school” including G. A.Bliss, L. M. Graves, M. R. Hestenes, E.J. McShane, and W. T.Reid. Around 1950 the basic problem of minimizing an integralsubject to differential equation constraints became a major prob-lem of interest because of various military applications in the USAand the USSR. These problems required the treatment of “hardconstraints” which were basically ignored in the classical calculusof variations and led to the theory of optimal control.

The history of the development of optimal control is less pre-cise and the subject of varying opinions. The paper “300 Yearsof Optimal Control: From the Brachystochrone to the MaximumPrinciple” by Sussmann and Willems [172] clearly states that op-timal control was born in 1667. Although everyone agrees thatoptimal control is an extension of the classical calculus of varia-tions, others ([149], [47]) suggest that optimal control theory hadits beginning around 1950 with the “discovery” of the MaximumPrinciple by various groups.

The road from the “classical” calculus of variations to “mod-ern” optimal control theory is certainly not linear and it can hon-estly be argued that optimal control theory had its beginnings withthe solution to the brachistochrone problem in 1697 as suggestedin [172]. However, two important steps in moving from classicalvariational approaches to modern control theory occurred between1924 and 1933. In L. M. Graves’ 1924 dissertation [93] he treatedthe derivative as an independent function and hence distinguishedbetween state and control variables. In 1926 C. Caratheodory gavethe first formulation of the classical Weierstrass necessary condi-tion in terms of a Hamiltonian [53] which, as noted in [172], is the“first fork in the road” towards modern control theory. Finally, in1933 L. M. Graves [94] gave a control formulation of the classi-cal Weierstrass condition for a Bolza type problem. These ideasare key to understanding the power of modern optimal control

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 1. Historical Notes 5

methods. The papers [149], [154] and [172] provide a nice histori-cal summary of these results and their impact on modern optimalcontrol.

Clearly, everyone agrees to some level that the classical calculusof variations is a starting point for modern optimal control theory.However, what is often lost in this historical discussion is that thecalculus of variations also laid the foundations for the creation ofother “new” fields in both applied and pure mathematics. Mod-ern functional analysis, the theory of distributions, Hamiltonianmechanics, infinite (and finite dimensional) optimization and themodern theory of partial differential equations all trace their rootsto the classical calculus of variations. Perhaps even more relevanttoday is the role that variational theory plays in modern scientificcomputing.

A key theme in this book is that all these problems fall withinthe purview of optimization. Although the theoretical issues firstappear to be no more difficult than those that occur in finite di-mensional optimization problems, there are major differences be-tween infinite and finite dimensional optimization. Moreover, thecomputational challenges are different and a direct reduction to afinite dimensional problem through approximation is not alwaysthe best approach. For example, information about the form andproperties of a solution to an infinite dimensional optimizationproblem can be lost if one introduces approximation too early inthe problem solution. The paper [113] by C. T. Kelley and E. W.Sachs provides an excellent example of how the theory of infinitedimensional optimization can yield improved numerical algorithmsthat could not be obtained from finite dimensional theory alone.

We specifically mention that Lucien W. Neustadt’s book on op-timization [144] contains the most complete presentation of gen-eral necessary conditions to date. This book provides necessaryconditions for optimization problems in topological vector spaces.Although a rather deep and broad mathematical background isrequired to follow the development in this book, anyone seriouslyconsidering research in infinite dimensional optimization, controltheory and their applications should be aware of this body ofwork.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Most early problems in the calculus of variation were motivatedby applications in the physical sciences. During the past century,calculus of variations was key to the development of game theory,existence theory for partial differential equations and convergencetheory for finite element methods for numerical approximation ofpartial differential equations. Modern applications in optimal con-trol, especially of systems governed by partial differential equa-tions (PDEs), and computational science has sparked a renewedinterest in variational theory and infinite dimensional optimiza-tion. The finite element method is based on minimizing certainfunctionals over spaces of functions and PDE optimization andcontrol problems have applications ranging from fluid flow con-trol to large space structures to the design and control of energyefficient buildings. All these applications lead to problems thatrequire infinite dimensional optimization. In this book we focuson necessary conditions for the classical calculus of variations andthen provide a short introduction to modern optimal control andthe Maximal Principle.

1.1 Some Typical Problems

The term “calculus of variations” originally referred to problemsinvolving the minimization of a functional defined by an integral

J(x(·)) =

t1∫t0

f (s, x (s) , x (s)) ds (1.1)

over a suitable function space. Most of the problems we considerwill have an integral cost functions of the form (1.1). However,because of significant developments during the past century, wewill expand this classical definition of the calculus of variations toinclude problems that now fall under the topic of optimal controltheory. To set the stage, we begin with some standard problems.

1.1.1 Queen Dido’s Problem

As noted above, the calculus of variations as a mathematical sub-ject has its beginnings in 1696, when John Bernoulli suggested the

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


brachistochrone problem. Although folklore presents Queen Dido’sproblem as one of the first problems in the calculus of variations,real scientific and mathematical investigations into such problemsprobably started with Galileo and Newton. In any case QueenDido’s problem provides a nice illustrative example to start thediscussion.

In Roman mythology, Dido was the Queen of Carthage(modern-day Tunisia). She was the daughter of a king of Tyre.After her brother Pygmalion murdered her husband, she fled toLibya, where she founded and ruled Carthage. The legend has itthat she was told that she could rule all the land around the coastthat she could “cover with the hide of a cow”. Being very clever,she cut the hide into one continuous thin string and used the stringto outline the area she would rule. Thus, Queen Dido’s problem isto find the maximum area that can be encompassed with a stringof fixed length.

1.1.2 The Brachistochrone Problem

Galileo discovered that the cycloid is a brachistochrone curve, thatis to say it is the curve between two points that is covered in theleast time by a body rolling down the curve under the action ofconstant gravity. In 1602 Galileo provided a geometrical demon-stration that the arc of circumference is the brachistochrone path.The apparatus is a wooden frame, mounted on two feet with ad-justing screws and carrying a cycloidal groove. Parallel to this isa straight groove, the inclination of which may be varied by in-serting wooden pins in the brass-reinforced openings made justunderneath the cycloid. A lever with two small clamps allows therelease of two balls along the two channels simultaneously. The ballthat slides down the cycloidal groove arrives at the point of inter-section of the two channels in less time than the ball rolling alongan inclined plane. This device can still be seen in Museo Galileoat the Institute and Museum of the History of Science in Florence,Italy. The mathematical formulation of the problem of finding thecurve of least descent time, the brachistochrone, was proposed by

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


John Bernoulli in 1696. In 1697 the problem was solved by John,his brother James, Newton and others. The solution curve wasshown to be a cycloid and Jakob’s solution contained the basicideas leading to the theory of the calculus of variations.

1.1.3 Shape Optimization

As noted above, in 1686 Newton proposed the problem of finding abody of revolution (nose cone) that produces minimum drag whenplaced in a flow. In modern terms, the problem is to find a shapeof the nose cone to minimize the drag with the constraint thatthis shape is defined by a surface of revolution. From a historicalpoint of view this is one of the first mathematical formulations ofa “shape optimization” problem. It is also interesting to note thatbecause of Newton’s choice of model for the aerodynamic forces,his solution is not accurate for subsonic flows (see [136], [137] andpage 52 in [46]). On the other hand, Newton’s assumption is fairlyaccurate for a hypersonic flow which is important today. In anycase, this problem was an important milestone in the calculus ofvariations.

Today Newton might have considered the problem of findinga shape of a body attached to the front of a jet engine in orderto produce a flow that matches as well as possible a given flowinto the jet (a forebody simulator). The idea is to use a smallerforebody so that it can be placed in a wind tunnel and the enginetested. This problem had its beginnings in 1995 and is based ona joint research effort between the Air Force’s Arnold EngineeringDesign Center (AEDC) and The Interdisciplinary Center for Ap-plied Mathematics (ICAM) at Virginia Tech. The goal of the initialproject was to help develop a practical computational algorithmfor designing test facilities needed in the free-jet test program.At the start of the project, the main bottleneck was the time re-quired to compute cost function gradients used in an optimizationloop. Researchers at ICAM attacked this problem by using theappropriate variational equations to guide the development of ef-ficient computational algorithms. This initial idea has since been

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


FBS

ENGINE INLET

Figure 1.1: The AEDC Test Section

refined and has now evolved into a practical methodology known asthe Continuous Sensitivity Equation Method (CSEM) for optimaldesign.

The wind tunnel is large enough to hold the engine and asmaller “forebody simulator” (FBS), but not large enough to holdthe entire front of the airplane. The objective is to determine theshape of the forebody simulator and the inflow conditions (MACHnumber, angle, etc.) so that the flow going into the engine inletmatches (as well as possible) a given flow. This given flow is theflow that would be present if the jet were in free flight. This datacan be generated by flight test or full 3D simulation.

Consider a 2D version of the problem. The green sheet repre-sents a cut through the engine reference plane and leads to thefollowing 2D problem. The goal is to find a shape Γ (that is con-strained to be 1/2 the length of the long forebody) and an in-flow mach number M0 to match the flow entering the engine inletgenerated by the long forebody. This shorter curve is called the“forebody simulator”. This problem is much more complex thanNewton’s minimum drag problem because it is not assumed thatthe forebody is a surface of revolution. Although both problems areconcerned with finding optimal shapes (or curves), the forebodysimulator problem is still a challenge for mathematics and numer-

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


IN FLOW

IN FLOW

ENGINEINLET

Long Forebody

FBS

ENGINEINLET

OUT FLOW

OUT FLOW

Figure 1.2: The 2D Forebody Simulator Problem

ical computations. Given data generated over the long forebody,the goal is to find a shorter (maybe “fatter”) forebody simulatorto optimally match the “real data” at the engine inlet.

Modern “optimal design” is one area that has its roots in thecalculus of variations. However, the computer has changed boththe range of problems that are possible to solve and the type ofmethods used to solve these problems. For more on the analysis,design and optimization of modern engineering systems see [8], [9],[13], [19], [38], [34], [32], [33], [35], [40], [41], [36], [37], [43], [42],[67], [68], [82], [83], [90], [98], [99], [108], [109], [111], [142], [150],[163], [168], [174], [177] and [178].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


1.2 Some Important Dates and People

The following dates provide some insight into the history and thepeople who contributed to the development of calculus of varia-tions and its modern realization in Optimal Control.1600-1900

• 1630 - Galileo Galilei (1564-1642) formulated the brachis-tochrone problem.

• 1686 - Isaac Newton (1642-1727) proposed and gave solutionto the surface of revolution of minimum drag when the bodyis moved through a fluid.

• 1696 - John Bernoulli (1667-1748) proposed the brachis-tochrone problem as a challenge to all mathematicians, butto his brother James Bernoulli (1654-1705) in particular. Theproblem was also solved by Newton, Leibniz (1646-1716),L’Hopital (1661-1704), as well as both Bernoulli brothers.

• 1697 - James Bernoulli (1654-1705) published his solutionand proposed a more general isoperimetric problem.

• 1724 - Jacopo Francesco Riccati (1676-1754) wrote his fa-mous paper on the Riccati Equation. James Bernoulli hadalso worked on this equation.

• 1744 - Leonard Euler (1707-1783) extended James Bernoulli’smethods (geometric, analytical) to general problems, anddiscovered the “Euler Equation”. He also derived Newton’ssecond law from “the principle of least action”.

• 1760 - Joseph Louis Lagrange (1736-1813) first used the term“calculus of variations” to describe the methods he used inhis work.

• 1762 and 1770 - Lagrange devised an analytic method forgeneral constrained problems. He also indicated that me-chanics and variational theory are connected, and he intro-duced the notation y (x) + δy (x). This symbol, δy (·), wascalled the variation of y (·).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


• 1786 - Lagrange published his necessary condition for a min-imum of Newton’s problem.

• 1786 - Adrien-Marie Legendre (1752-1833) studied the sec-ond variation δ2J .

• 1788 - Lagrange showed that a large part of Newtonian dy-namics could be derived from the principle of least action for“conservative” systems.

• 1788 - Lagrange showed that a curve could satisfy the Euler-Lagrange equation for Newton’s problem and not minimizethe functional. He used a proof essentially equivalent toWeierstrass’s condition.

• 1835 - Sir William Rowan Hamilton (1805-1865) expandedthe principle of least action to “Hamilton’s Principle” of sta-tionary action.

• 1837 - Karl Gustav Jacob Jacobi (1804-1851) used some ofLegendre’s ideas to construct Jacobi’s (second order) neces-sary condition.

• 1842 - Jacobi gave an example to show that the principle ofleast action does not hold in general.

• 1879 - Karl Theodor Wilhelm Weierstrass (1815-1897) gavehis necessary condition for a strong local minimum.

• 1879 - Paul David Gustav Du Bois-Reymond (1831-1889)gave proof of the fundamental lemma. In addition to provid-ing a correct proof of the FLCV, in 1873 he gave an exampleof a continuous function with divergent Fourier series at ev-ery point. The term “integral equation” is also due to DuBois-Reymond.

• 1898 - Adolf Kneser (1862-1930) defined focal point.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


1900-1965

• 1900 - David Hilbert (1862-1943) gave a derivation of the“Hilbert Invariant Integral”.

• 1904 - Hilbert gave his famous existence proof for the sim-plest problem.

• 1913 - Oskar Bolza (1857-1942) stated the problem of Bolza.The problem of Bolza is the forerunner of modern controlproblem.

• 1933 - Lawrence M. Graves (1896-1973) transformed theproblem of Lagrange into a control theory formulation, andproved a “maximum principle” for normal problems. Be-tween 1958 and 1962, V. G. Boltyanskii, R. Gamkrelidze,E. F. Mischenko and L. S. Pontryagin established the Pon-tryagin Maximum Principle for more general optimal controlproblems.

• 1937 - Lawrence C. Young (1905-2000) introduced general-ized curves and relaxed controls.

• 1940 - Edward James McShane (1904-1989) established theexistence of a relaxed control and proved that a generalizedcurve was real curve under certain convexity conditions.

• 1950 - Magnus R. Hestenes (1906-1991) formulated the firstoptimal control problems, and gave the maximum principlefirst published in a Rand Report.

• 1952 - Donald Wayne Bushaw (1926-2012) gave a mathe-matical solution to a simple time optimal control problemby assuming the bang-bang principle.

• 1959 - Joesph P. LaSalle (1916-1983) gave the first proof of(LaSalle’s) Bang-Bang Principle. He also extended the clas-sical Lyapunov theory to the LaSalle Invariance Principle.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


• 1959 - Richard F. Bellman (1920-1984) developed the dy-namic programming principle of optimality for control prob-lems.

• 1962 - Lev Semenovich Pontryagin (1908-1988) derived thePontryagin Maximum Principle along with V. G. Boltyan-skii, R. V. Gamkrelidze and E. F. Mishchenko.

Other Important Players

• Gilbert Ames Bliss (1876-1951) - Bliss’s main work was onthe calculus of variations and he produced a major book,Lectures on the Calculus of Variations, on the topic in 1946.As a consequence of Bliss’s results a substantial simplifica-tion of the transformation theories of Clebsch and Weier-strass was achieved. His interest in the calculus of variationscame through two sources, firstly from lecture notes of Weier-strass’s 1879 course of which he had a copy, and secondly,from the inspiring lectures by Bolza which Bliss attended.Bliss received his doctorate in 1900 for a dissertation TheGeodesic Lines on the Anchor Ring which was supervisedby Bolza. Then he was appointed as an instructor at theUniversity of Minnesota in 1900. He left Minnesota in 1902to spend a year in Gottingen where he interacted with Klein,Hilbert, Minkowski, Zermelo, Schmidt, Max Abraham, andCaratheodory.

• Constantin Caratheodory (1873-1950) - Caratheodory madesignificant contributions to the calculus of variations, thetheory of point set measure, and the theory of functions ofa real variable. He added important results to the relation-ship between first order partial differential equations and thecalculus of variations.

• Jean Gaston Darboux (1842-1917) - Darboux studied theproblem of finding the shortest path between two points ona surface and defined a Darboux point.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


• Ernst Friedrich Ferdinand Zermelo (1871-1953) - His doc-torate was completed in 1894 when the University of Berlinawarded him the degree for a dissertation Untersuchungenzur Variationsrechnung which followed the Weierstrass ap-proach to the calculus of variations. In this thesis he ex-tended Weierstrass’ method for the extrema of integrals overa class of curves to the case of integrands depending onderivatives of arbitrarily high order. He also introduced thenotion of a “neighborhood” in the space of curves.

• George M. Ewing (1907-1990) - Ewing’s 1959 book, Calcu-lus of Variations with Applications remains one of the bestintroductions to the classical theory.

• Christian Gustav Adolph Mayer (1839-1907) - Mayer focusedon the principle of least action and is credited with formu-lating variational problems in “Mayer form” where the costfunctional is given in terms of end conditions.

• Harold Calvin Marston Morse (1892-1977) - Morse developed“variational theory in the large” and applied this theory toproblems in mathematical physics. He built his “Morse The-ory” on the classical results in the calculus developed byHilbert.

• William T. Reid (1907-1977) - Reid’s work in the calculusof variations combined Sturm-Liouville theory with varia-tional theory to study second order necessary and sufficientconditions. Reid established a generalization of Gronwall’sInequality which is known as the Gronwall-Reid-Bellman In-equality. The Reid Prize awarded by SIAM for contributionsto Differential Equations and Control Theory is named afterW. T. Reid.

• Frederick A. Valentine (1911-2002) - Valentine attended theUniversity of Chicago where he received his Ph.D. in mathe-matics in 1937. His dissertation was entitled “The Problem ofLagrange with Differential Inequalities as Added Side Con-ditions” and was written under the direction of Bliss. Most

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


of his work was in the area of convexity and his book ConvexSets [182] is a classic.

• Vladimir Grigorevich Boltyanskii (1925-) - Boltyanskii wasone of the four authors of the book The Mathematical Theoryof Optimal Processes and was awarded the Lenin Prize forthe work presented in that book on optimal control.

• Revaz Valer’yanovich Gamkrelidze (1927-) - Gamkrelidzewas one of the four authors of the book The Mathemati-cal Theory of Optimal Processes and was awarded the LeninPrize for the work presented in that book on optimal control.

• Evgenii Frolovich Mishchenko (1922-2010) - Mishchenko wasone of the four authors of the book The Mathematical Theoryof Optimal Processes and was awarded the Lenin Prize forthe work presented in that book on optimal control.

• Lev Semenovich Pontryagin (1908-1988) - Pontryagin was 14when an accident left him blind. The article [4] describes howPontryagin’s mother, Tat’yana Andreevna Pontryagin tookcomplete responsibility for seeing that her son was educatedand successful. As noted in [4]: For many years she worked,in effect, as Pontryagin’s secretary, reading scientific worksaloud to him, writing in the formulas in his manuscripts,correcting his work and so on. In order to do this she had,in particular, to learn to read foreign languages. Pontryagin’searly work was on problems in topology and algebra. In theearly 1950’s Pontryagin began to study applied mathemat-ics, differential equations and control theory. In 1961 he pub-lished The Mathematical Theory of Optimal Processes withhis students V. G. Boltyanskii, R. V. Gamkrelidze and E. F.Mishchenko.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 2

Introduction andPreliminaries

In this chapter we discuss several problems that will serve as mod-els and examples for the theory to follow. We first provide roughoverviews of some classical problems to give a preliminary indi-cation of the type of mathematical concepts needed to formulatethese problems as mathematical optimization problems. We in-troduce some notation, discuss various classes of functions andbriefly review some topics from calculus, advanced calculus, anddifferential equations. Finally, we close this chapter with precisemathematical statements of some of these model problems.

2.1 Motivating Problems

In order to motivate the topic and to set the stage for futureapplications, we provide a brief description of some problems thatare important from both historical and scientific points of view.

2.1.1 Problem 1: The Brachistochrone Prob-lem

This problem is perhaps the first problem presented in every en-deavor of this type. It is called the brachistochrone problem and it

17

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

18 Chapter 2. Introduction and Preliminaries

y(x)

a

b

x

,

,

Figure 2.1: The Brachistochrone Problem

was first considered by Galileo Galilei (1564-1642). The mathemat-ical formulation of the problem was first given by John Bernoulliin 1696, and is often quoted as the beginning of the classical cal-culus of variations. Suppose that P1 = [0 0]T and P2 = [a b]T aretwo points in the plane with P1 “higher” than P2 (see Figure 2.1).

Suppose that we slide a (frictionless) bead down a wire thatconnects the two points. We are interested in finding the shape ofthe wire down which the bead slides from P1 to P2 in minimumtime. At first glance one might think that the wire should be thestraight line between P1 and P2, but when we solve this problemit will be seen that this is not the case. The mathematical formu-lation of this problem will be derived below. But first, we describeother “typical problems” in the calculus of variations.

2.1.2 Problem 2: The River Crossing Problem

Another “minimum time” problem is the so called river crossingproblem. We assume that the river of a fixed width of one mile has

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 2. Introduction and Preliminaries 19

v(x)

x

y

x=1

Figure 2.2: The River Crossing Problem

parallel banks and we let the y-axis be the left bank (see Figure2.2). The current is directed downstream along the y-axis, and thevelocity v(·) of the current depends only on x, i.e. v = v (x) is thevelocity of the current at a distance of x feet from the left bank.Given v (x) and the assumption that the boat travels at a constantvelocity (relative to the surrounding water), we wish to find the“steering angle” of the boat that will move the boat from thepoint [0 0]T to the right bank in minimum time. Note that we arenot concerned with where along the right bank the boat lands. Thegoal is to find the path of the boat that starts at a prescribed pointand reaches the opposite bank in minimum time. This problem issimilar to the brachistochrone problem, except that landing siteis not prescribed so that the downstream location is free. Thus,this is an example of a “free endpoint problem”. Intuition wouldseem to imply that the shape of the minimum time crossing pathis not dependent on the starting point. In particular, if the boatwere to start at [0 y0]T with y0 > 0 rather than [0 0]T , then theshape (graph) of the minimum time crossing path would look thesame, only shifted downstream by an additional distance of y0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


2.1.3 Problem 3: The Double Pendulum

Consider an idealized double pendulum with masses m1 and m2

attached to weightless rods of lengths `1 and `2 as shown below.Suppose that we observe the pendulum at two times t0 and t1 andnote the angles α1 (t0), α2 (t0), α1 (t1), and α2 (t1) and/or angularvelocities αi (tj), i, j = 0, 1. The problem is to write down thegoverning equations for the pendulum. In particular, we wish tofind a system of differential equations in α1 (t) and α2 (t) thatdescribe the motion of the double pendulum. If you recall Newton’slaws of motion and a little mechanics, then the problem is not verydifficult. However, we shall see that the problem may be solved byusing what is known as Hamilton’s Principle in dynamics.

Although Problem 3 does not appear to be related to the firsttwo “optimization problems”, it happens that in order to applyHamilton’s Principle one must first be able to solve problems sim-ilar to each of these.

1( )t

2( )t

1l

2l1m

2m

Figure 2.3: The Double Pendulum Problem

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


2.1.4 Problem 4: The Rocket Sled Problem

The orbit transfer problem cannot be solved by the classical tech-niques of the calculus of variations. The more modern theory ofoptimal control is required to address problems of this type. An-other very simple example that falls under the category of optimalcontrol is the so-called rocket car problem. This example will il-lustrate some of the elementary mathematics that are needed inorder to solve the problems we have presented earlier. Consider arocket sled illustrated in Figure 2.4 below. We assume the car ison a frictionless track, that the sled is of mass m, and that it iscontrolled by two rocket engines thrusting in opposite directions.

It is assumed that the mass m is so large that the weight ofthe fuel is negligible compared to m. Also, the thrusting force ofthe rockets is bounded by a maximum of 1. Let x (t) denote theposition of the sled with respect to the reference R, at time t.Given that at time t = 0, the sled is at an initial position x0 withinitial velocity v0, we wish to find a thrust force action that willmove the sled to position R and stop it there, and we want toaccomplish this transfer in minimum time.

We derive a simple mathematical formulation of this problem.Let x (t) be as in the figure and let u (t) denote the thrusting forcedue to the rockets at time t. Newton’s second law may be written as

mx (t) = u(t),

STATIONx=0

( )x t

Figure 2.4: The Rocket Sled Control Problem

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


or equivalently,

x (t) =

(1

m

)u (t) . (2.1)

The initial conditions are given by

x (0) = x0, x (0) = v0. (2.2)

The fact that the thrust force is bounded is written as the con-straint

|u (t)| ≤ 1. (2.3)

The problem we wish to solve may be restated as the followingoptimal control problem.

Find a control function u∗ (t) satisfying the con-straint (2.3) such that if x∗ (t) satisfies (2.1)-(2.2), thenx∗ (t∗) = 0 = x∗ (t∗) for some time t∗. Moreover, if u (t)is any other control function satisfying (2.3) with x (t)satisfying (2.1)-(2.2) and x

(t)

= 0 = x(t)

for some t,then t∗ ≤ t.

Although this problem seems to be the easiest to understandand to formulate, it turns out to be more difficult to solve thanthe first three problems. In fact, this problem falls outside of theclassical calculus of variations and to solve it, one must use themodern theory of optimal control. The fundamental new ingredientis that the control function u(t) satisfies the “hard constraint”|u(t)| ≤ 1. In particular, u(t) can take values on the boundary ofthe interval [−1,+1].

2.1.5 Problem 5: Optimal Control in the LifeSciences

Although many motivating problems in the classical calculus ofvariations and modern optimal control have their roots in thephysical sciences and engineering, new applications to the life sci-ences is a very active area of current research. Applications tocancer treatment and infectious diseases present new challenges

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


PROLIFERATINGCELLS

QUIESCENTCELLS

1( )x t 2 ( )x t

( )eu t

Figure 2.5: Cancer Control Problem

and opportunities for control (see [1], [14], [26], [62], [74], [106],[117], [118], [141], [140], [145], [146], [148], [165], [173]). The fol-lowing optimal chemotherapy treatment problem may be found inthe paper by Fister and Panetta [148]. The cancer cells are dividedinto two types. The proliferating cells are in a drug-sensitive phaseand quiescent cells are in a drug-resistant phase (see Figure 2.5).The growth dynamics are given by the system

x1(t) = (γ − δ − α− eu(t))x1(t) + βx2(t)

x2(t) = αx1(t)− (λ+ β)x2(t)

with initial data

x1(0) = x1,0 and x2(0) = x2,0.

Here, x1(·) is the cell mass of proliferating cancer cells and x1(·) isthe cell mass of quiescent cells in the bone marrow. The parametersare all constant. Here γ is the cycling cells’ growth rate, α is thetransition rate from proliferating to resting, δ is the natural celldeath rate, β is the transition rate from resting to proliferating, λis cell differentiation (where mature bone marrow cells leave the

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


bone marrow and enter the blood stream as various types of bloodcells) and e is the effectiveness of the treatment. The functionu(·) is the control describing the effects of the chemotherapeutictreatment which has impact only on the proliferating cells.

The control function is assumed to be PWC(0, T ) and satisfies0 ≤ u(t) ≤ 1. The cost function is defined by

J(u(·)) = −∫ T

0

b

2(1− u(s))2 − q(x1(t) + x2(t))

ds

where b and q are weighting parameters and the goal is to maximizeJ(·) on the set

Θ = u(·) ∈ PWC(0, T ) : u(t) ∈ [0, 1] .

This cost function is selected so that one can give as much drugas possible while not excessively destroying the bone marrow. Theweighting parameters b and q are selected depending on the impor-tance of the terms. Notice the negative sign in front of the integralso that maximizing J(·) is equivalent to minimizing

J(u(·)) =

∫ T

0

b

2(1− u(s))2 − q(x1(t) + x2(t))

ds

on Θ.We note that similar problems occur in the control of HIV (see

[1] and [14]) and the models vary from application to application.In addition, the models of cell growth have become more complexand more realistic leading to more complex control problems. Asindicated by the current references [14], [26], [62], [74], [117], [118],[146] and [165], this is a very active area of research.

2.1.6 Problem 6: Numerical Solutions of Bound-ary Value Problems

The finite element method is a powerful computational methodfor solving various differential equations. Variational theory andthe calculus of variations provide the mathematical framework re-quired to develop the method and to provide a rigorous numerical

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


analysis of the convergence of the method. In order to describemethod and to illustrate the key ideas, we start with a simpletwo-point boundary problem.

Given an integrable function f(·), find a twice differentiablefunction x(·) such that x(·) satisfies the differential equation

− x(t) + x(t) = f(t), 0 < t < 1, (2.4)

subject to the Dirichlet boundary conditions

x(0) = 0, x(1) = 0. (2.5)

Observe that it is assumed that x(·) has two derivatives sincex(·) appears in the equation. The goal here is to find a numer-ical solution of this two-point boundary value problem by approx-imating x(·) with a continuous piecewise linear function xN(·)as shown in Figure 2.6 below. The idea is to divide the inter-val [0, 1] into N + 1 subintervals (called elements) with nodes0 = t0 < t1 < t2 < . . . < tN−1 < tN < tN+1 = 1 and construct the

()Nx ( )x

x

t0 1

Figure 2.6: A Piecewise Linear Approximation

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


approximation xN(·) so that it is continuous on all of [0, 1] and lin-ear between the nodes. It is clear that such a continuous piecewiselinear approximating function xN(·) may not be differentiable atthe nodes and hence can not satisfy the differential equation (2.4).The finite element method is based on developing a weak (or vari-ational) form of the two-point boundary value problem and usingthis formulation to devise a computational method for construct-ing the approximating solution xN(·). We will work through thisprocess in Section 2.4.5 below.

2.2 Mathematical Background

In order to provide a mathematical formulation of the above prob-lems, we must first introduce some notation and discuss variousclasses of functions. At this point it is assumed that the reader isfamiliar with the basics of advanced calculus and elementary dif-ferential equations. However, we briefly review some topics fromcalculus, advanced calculus, and differential equations. Since thisis a review of background material, we will not go into much detailon any topic.

2.2.1 A Short Review and Some Notation

In mathematical writings employing consistent and careful nota-tion helps improve the understanding of the theory and reducescommon errors in the application of the theory to problems. How-ever, as is typical in mathematical texts it is sometimes useful toabuse notation as long as the precise meaning is made clear inthe presentation. Every attempt is made to use precise notationand when notation is abused, we point out the abuse to keep theexposition as clear as possible. These observations are especiallyimportant for the notation used to describe functions since un-derstanding functions is key to the development of the materialin this book. In particular, it is essential to distinguish between afunction, its name and the value of the function at a point in itsdomain.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Remark 2.1 Let X and Y be two sets and assume that F (·) isa function with domain in X and range in Y . Observe that weuse the symbol F (·) to denote the function rather than using the“name” of the function F . For example, if x is a real numberand one writes F (x) = x2, then the corresponding function F (·)has domain and range contained in the set of real numbers and foreach x in the domain of F (·), the value of F (·) at x is given byF (x) = x2. Thus, we distinguish between the function F (·), itsname F and its value at a specific point F (x). At first glancethese distinctions may seem like “overkill”. However, using thesame symbol for a function and its name is often a major sourceof confusion for students and leads to a lack of understanding ofthe theory and its application.

Let F (·) be a function with domain D ⊆ X and range R ⊆ X.To emphasize the relationship between the function F (·), its do-main and its range we write D(F ) = D and R(F ) = R. Thisnotation merely states the obvious that the names of the do-main and range of a function F (·) should be attached to thename F of the function F (·). We denote this by writing

F : D(F ) ⊆ X → Y.

If the domain of F (·) is all of X, i.e. if D(F ) = X, then we write

F : X → Y.

For completeness, recall that the range of a function F (·) is givenby

R(F ) = y ∈ Y | y = F (x), x ∈ D(F ) .In elementary calculus the sets X and Y are often intervals of theform [a, b], (a, b], [a, b) or (a, b). In advanced calculus X and Y maybe n-dimensional Euclidean spaces. In the calculus of variationsand optimal control X and Y are most often spaces of functions.The following notation is rather standard and will be usedthroughout the book.

• The space of real numbers is denoted by R and the space ofcomplex numbers by C, respectively. However, in some cases

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


we may use the notation R1 to emphasize the fact that thereal line is also the state space for a 1-dimensional system.

• Real n-dimensional Euclidean space, denoted by Rn, is thespace of n-dimensional real column vectors

Rn =

x =

x1

x2...xn

: xi ∈ R

.

Likewise, complex n-dimensional Euclidean space, denotedby Cn, is the space of n-dimensional complex column vectors

Cn =

z =

z1

z2...zn

: zi ∈ C

.

Remark 2.2 Note that we use boldface letters for vectorsso that if one writes x ∈ Rn, then one knows that x is avector with n real components. Also, in special cases we willabuse notation and use the symbols R1 for R and C1 for C,respectively.

• If x ∈ R, then the absolute value of x is denoted by |x| andif z = x + iy ∈ C, then the complex modulus is denotedby |z| =

√x2 + y2. The complex conjugate of z = x + iy is

z = x− iy.

• The transpose of a column vector

z1

z2...zn

is a row vector

given by z1

z2...zn

T

,[z1 z2 · · · zn

]

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and conversely,

[z1 z2 · · · zn

]T=

z1

z2...zn

.

• If x =[x1 x2 · · · xn

]T ∈ Rn, then the Euclidean (or2) norm of x is given by

‖x‖ = ‖x‖2 =

√√√√ n∑i=1

|xi|2

and likewise, if z =[z1 z2 · · · zn

]T ∈ Cn, then theEuclidean (or 2) norm of z is given by

‖z‖ = ‖z‖2 =

√√√√ n∑i=1

|zi|2.

• If x =[x1 x2 · · · xn

]T ∈ Rn and 1 ≤ p < +∞, thenthe p-norm of x is given by

‖x‖p = p

√√√√ n∑i=1

|xi|p =

[n∑i=1

|xi|p]1/p

and likewise, if z =[z1 z2 · · · zn

]T ∈ Cn, then thep-norm of z is given by

‖z‖p = p

√√√√ n∑i=1

|zi|p =

[n∑i=1

|zi|p]1/p

.

The norms above are special cases of the general concept of anorm which measures the magnitude of a vector and provides amechanism to define distance between vectors. In general, we havethe following definition.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Definition 2.1 A norm on Rn (or Cn) is a function ‖·‖ : Rn (orCn)→ R satisfying:

1. ‖x‖ ≥ 0 and ‖x‖ = 0 if and only if x = 0.

2. ‖α · x‖ = |α| · ‖x‖ , α ∈ R (or C) and x ∈ Rn (or Cn).

3. ‖x+ y‖ ≤ ‖x‖ + ‖y‖ , for x,y ∈ Rn (or Cn) (the triangleinequality).

Let ‖·‖ : Rn → R be a norm on Rn. If x ∈ Rn and δ > 0,then the (open) δ-neighborhood of x is the open ball of radius δcentered at x given by

U (x, δ) = x ∈ Rn : ‖x− x‖ < δ .

On the real line, U (x, δ) is the open interval centered at x withradius δ. We sometimes abbreviate the δ -neighborhood by δ-nbd.

If f : D(f) ⊆ Rn −→ R is a real valued function of n real vari-ables and x ∈ D(f), then we often write f(x) = f(x1, x2, ..., xn)rather than

f(x) = f

x1

x2...xn

= f

([x1 x2 · · · xn

]T).

Remark 2.3 The use of f(x) = f(x1, x2, ..., xn) rather thanf(x) = f([x1, x2, ..., xn]T ) is standard “abuse of notation” andshould cause little misunderstanding in the material.

We shall use various notations for partial derivatives. For ex-ample, we use subscripts for partial derivatives such as

fxi(x),= fxi(x1, x2, ..., xn) =∂f(x1, x2, ..., xn)

∂xi=∂f(x)

∂xi

and

fxy(x, y) =∂2f(x, y)

∂x∂y.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


If x : I → Rn is a vector-valued function with domain definedby an interval I ⊂ R, then it follows that there exist n real valuedfunctions xi(·), i = 1, 2, . . . , n, such that

x(t) =

x1(t)x2(t)

...xn(t)

=[x1(t) x2(t) · · · xn(t)

]T.

We use the standard definitions of continuity and differentia-bility for real valued functions. Assume x : [t0, t1] −→ R is a realvalued function. If t = t ∈ [t0, t1), then the right-hand limit of x(·)at t = t is defined by

x(t+)

= limt→t+

[x(t)] ,

provided that this limit exist (it could be infinite). Likewise, ift = t ∈ (t0, t1], then the left-hand limit of x(·) at t = t is definedby

x(t−)

= limt→t−

[x(t)] ,

provided that this limit exist (it could be infinite).If x : [t0, t1] −→ R is a real valued function which is differen-

tiable at t = t, then x(t), x′(t), and dx(t)dt

all denote the derivative of

x(·) at t = t. If t = t0, then x(t0) denotes the right-hand derivativex+(t0) defined by

d+x(t0)

dt= x+(t0) = lim

t→t+0

[x(t)− x(t0)

t− t0

]and if t = t1, then x(t1) denotes the left-hand derivative x−(t0)defined by

d−x(t1)

dt= x−(t1) = lim

t→t−1

[x(t)− x(t1)

t− t1

],

provided these limits exist and are finite.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Remark 2.4 It is important to note that even if d+x(t)dt

and x(t+)

both exist at a point t, they may not be the same so that in general

x+(t) =d+x(t)

dt6= x

(t+).

Clearly, the same is true for the left-hand derivatives and limits.In words, the one-sided derivative of a function at a point is notthe one-sided limit of the derivative at that point.

Example 2.1 Let x : [−1, 1] −→ R, be defined by

x(t) =

t2 sin(1/t), t > 0,

0 t = 0,t2 t < 0.

Computing the right-hand derivative at t = 0, it follows that

d+

dtx(0) = x+(0) = lim

t→0+

[x(t)− x(0)

t

]= lim

t→0+

[t2 sin(1/t)− 0

t

]= lim

t→0+[t sin(1/t)] = 0

exists and is finite. On the other hand, if 0 < t ≤ 1, then x(t)exists and

x(t) = 2t sin(1/t)− cos(1/t)

andlimt→0+

[2t sin(1/t)− cos(1/t)]

does not exist. Hence,

d+

dtx(0) = x+(0) 6= x(0+).

However, it is true that

d−

dtx(0) = x−(0) 6= x(0−) = 0.

Although, as the previous example illustrates, one can not in-terchange the limit process for one-sided derivatives, there are im-portant classes of functions for which this is true. Consider thefollowing example.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Example 2.2 If x(t) = |t|, then

d+x(0)

dt= lim

t→0+

x(t)− x (0)

t− 0= lim

t→0+

|t|t

= 1

and

d−x(0)

dt= lim

t→0−

x(t)− x (0)

t− 0= lim

t→0−

|t|t

= limt→0−

−tt

= −1.

Note also thatd+x(0)

dt= lim

t→0+x(t) = 1

andd−x(0)

dt= lim

t→0−x(t) = −1

are the left and right-hand limits of x(·) at 0. Note that x(t) = |t|is differentiable at all points except t = 0.

Definition 2.2 Let I denote an interval and assume that x : I ⊆R1 −→ R1 is a real valued function. For a given integer k ≥ 1, wesay that x(·) is Ck on I if x(·) and all its derivatives of order kexist and are continuous at all points t ∈ I. We say that x(·) is C0

on I if x(·) is continuous and x(·) is C∞ on I if x(·) is Ck for allk ≥ 0. If x(·) is C∞ on I, then we call x(·) a smooth function.Thus, we define the function spaces Ck(I) = Ck(I;R1) by

Ck(I) = Ck(I;R1) =x : I ⊆ R1 −→ R1 : x(·) is Ck on I

.

Definition 2.3 Let I denote an interval and assume that x : I →Rn is a vector-valued function. We say that the function x(·) =[x1(·) x2(·) · · · xn(·)

]Tis continuous at t if, for each ε >

0, there is a δ > 0 such that if t ∈ I and0 <| t− t |< δ, then ∥∥x (t)− x

(t)∥∥ < ε.

The function x(·) is said to be a continuous function if it iscontinuous at every point in its domain I.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Definition 2.4 A function x : I → Rn is differentiable at t iffor each i = 1, 2, . . . , n, the scalar function xi(·) is differentiableat t and we define x(t) by

x(t) ,[x1(t) x2(t) · · · xn(t)

]T.

The right-hand and left-hand derivatives at t = t0 and t = t1 aredefined as above. A function x(·) is said to be a differentiablefunction if it is differentiable at every point in its domain I.

Remark 2.5 Observe that the definitions given above imply thata vector-valued function x : I → Rn is continuous at a point tif and only if all of the component functions xi : I → R1, i =1, 2, ..., n, are continuous at t. Likewise, the vector-valued functionx : I → Rn is differentiable at the point t if and only if all of thecomponent functions xi : I → R1, i = 1, 2, ..., n, are differentiableat t.

Definition 2.5 Let I denote an interval and assume that x : I ⊆R1 −→ Rn is a vector valued function. For a given integer k ≥ 1,we say that x(·) is Ck on I if xi(·) is Ck on I for all i = 1, 2, ..., n.We define the function spaces Ck(I) = Ck(I;Rn) by

Ck(I) = Ck(I;Rn) =x : I ⊆ R1 −→ Rn : x(·) is Ck on I

.

Note that we use the same notation Ck(I) for Ck(I) = Ck(I;R1)and Ck(I) = Ck(I;Rn). This should cause no confusion since astatement like x(·) ∈ Ck(I) clearly implies that Ck(I) = Ck(I;Rn)because x(·) is boldfaced, and hence, a vector valued function.

Remark 2.6 Although the above definition of a derivative is suf-ficient for the initial introduction here, this definition must be re-visited when we move to more general functions. In particular,the concept of a “derivative” is best presented in terms of linearapproximations of a non-linear function.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


2.2.2 A Review of One Dimensional Optimiza-tion

Here we consider the simple optimization problem that one“solves” in a first calculus course. Let f : I → R be a differen-tiable function on the interval I (I may be open, closed, infinite,or of the form (a, b], etc.). We are interested in finding minimizingpoints for f(·). We remind the reader of the following definitions.

Definition 2.6 Given f : I → R, we say that x∗ provides a localminimum for f(·) on I (or x∗ is a local minimizer for f(·) onI) if

1. x∗ ∈ I and

2. there is a δ > 0 such that

f (x∗) ≤ f (x)

for all x ∈ U (x∗, δ) ∩ I.

If in addition,f (x∗) < f (x)

for all x ∈ U (x∗, δ) ∩ I with x 6= x∗, then we say that x∗ providesa proper local minimum for f(·) on I. If x∗ ∈ I is such that

f (x∗) ≤ f (x)

for all x ∈ I, then x∗ is said to provide a global minimum forf(·) on I.

In theory, there is always one way to “find” global minimizers.Simply pick x1 ∈ I and “test” x1 to see if f (x1) ≤ f (x) for allx ∈ I. If so, then x1 is a global minimizer, and if not, there is ax2 with f (x2) < f (x1). Now test x2 to see if f (x2) ≤ f (x) for allx ∈ I, x 6= x1. If so, x2 is the minimizer, and if not, we can find x3

so that f (x3) < f (x2) < f (x1). Continuing this process generatessequence of points xk : k = 1, 2, 3, . . . satisfying

. . . f (xk+1) < f (xk) < . . . < f (x3) < f (x2) < f (x1) .

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Such a sequence is called a minimizing sequence and under certaincircumstances the sequence (or a subsequence) will converge to apoint x∗ and if x∗ ∈ I it will be a minimizer. The above “direct pro-cedure” may be impossible to accomplish in finite time. Therefore,we need some way to reduce the number of possible candidates forx∗. We can do this by applying elementary necessary conditionsfrom calculus.

Clearly, if x∗ is a global minimizer, then x∗ is a local minimizer.We shall try to find all the local minima of f(·) by using standardnecessary conditions. The following theorem can be found in al-most any calculus book.

Theorem 2.1 Suppose that f : I → R is continuous, where I isan interval with endpoints a < b and assume that x∗ ∈ I is a localminimizer for f(·) on I. If df(x∗)

dxexists, then

df (x∗)

dx= 0, if a < x∗ < b, (2.6)

ordf (x∗)

dx≥ 0, if x∗ = a, (2.7)

ordf (x∗)

dx≤ 0, if x∗ = b. (2.8)

In addition, if d2f(x∗)dx2

= f ′′(x∗) exists, then

f ′′(x∗) ≥ 0. (2.9)

It is important to emphasize that Theorem 2.1 is only a nec-essary condition. The theorem says that if x∗ is a local minimizer,then x∗ must satisfy (2.6), (2.7) or (2.8). It does not imply thatpoints that satisfy (2.6)-(2.8) are local minimizers. Necessary con-ditions like Theorem 2.1 can be used to reduce the number ofpossible candidates that must be tested to see if they provide aminimum to f(·). Consider the following example that illustratesthe above theorem.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Example 2.3 Let I = [−2, 3) and define f (x) = x4− 6x2. Let ustry to find the local (maximum and) minimum. In order to applyTheorem 2.1 we compute the derivatives

f ′ (x) = 4x3 − 12x

andf ′′ (x) = 12x2 − 12.

If x∗ is a local minimum and −2 < x∗ < 3, then

f ′ (x∗) = 4[x∗]3 − 12[x∗] = 4x∗([x∗]2 − 3

)= 0.

Solving this equation, we find that x∗ could be either 0,√

3 ,−√

3.Checking (2.9) at x∗ = 0, yields

f ′′ (0) = 12 · 0− 12 = −12 < 0

so that condition (2.9) fails and x∗ = 0 is not a local minimizer.Also, checking condition (2.7) at x∗ = −2, yields

f ′ (−2) = −32 + 24 = −8 < 0

so, x∗ = −2 is not a candidate. Thus, an application of Theorem2.1 has reduced the process to that of “testing” the two points

√3

and−√

3. The second derivative test (2.9) yields

f ′′(±√

3) = 12[±√

3]2 − 12 = 12(3− 1) = 24 > 0

so that x∗ = −√

3, and x∗ = +√

3 are possible local minimum.This is clear from the graph, but what one really needs are sufficientconditions.

Theorem 2.2 Suppose that f(·) is as in Theorem 2.1. If x∗ ∈(a, b), f ′ (x∗) = 0 and f ′′ (x∗) > 0, then x∗ provides a local mini-mum for f(·) on [a, b].

We close this section with a proof of Theorem 2.1. The proof isvery simple, but the idea behind the proof happens to be one ofthe key ideas in much of optimization.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3−10

−5

0

5

10

15

20

25

30

Figure 2.7: Plot of the Cost Function

Proof of Theorem 2.1: First assume that a < x∗ < b, is alocal minimizer. This means that there is a δ > 0 such that

f (x∗) ≤ f (x)

for all x ∈ U (x∗, δ) ∩ I. Let δ1 = 12

minδ, b− x∗, x∗ − a > 0 andnote that the open interval U (x∗, δ1) is contained in I. In partic-ular, if −δ1 < ε < δ1, then the “variation” x∗ + ε ∈ U (x∗, δ1) ⊂U (x∗, δ) ∩ I. Hence x∗ satisfies

f(x∗) ≤ f(x∗ + ε),

or equivalently,0 ≤ f(x∗ + ε)− f(x∗). (2.10)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Observe that this inequality holds for positive and negative ε. Di-viding both sides of (2.10) by ε > 0, yields the inequality

0 ≤ f(x∗ + ε)− f(x∗)

ε,

and passing to the limit as ε→ 0+ it follows that

0 ≤ f ′(x∗). (2.11)

Likewise, dividing both sides of (2.10) by ε < 0, yields the reverseinequality

0 ≥ f(x∗ + ε)− f(x∗)

ε,

and passing to the limit as ε→ 0− it follows that

0 ≥ f ′(x∗). (2.12)

Combining (2.11) with (2.12) it follows that

f ′(x∗) = 0

and we have established (2.6).Consider now the case where x∗ = a. Let δ1 = 1

2minδ, b −

x∗ > 0 and note that if 0 < ε < δ1, then the “variation” x∗+ ε =a+ ε ∈ U (x∗, δ1) ∩ I ⊂ U (a, δ) ∩ I. Hence x∗ = a satisfies

f(a) ≤ f(a+ ε),

or equivalently,0 ≤ f(a+ ε)− f(a).

Observe that this inequality holds for 0 < ε < δ1. Dividing bothsides of this inequality by ε > 0, yields

0 ≤ f(a+ ε)− f(a)

ε,

and passing to the limit as ε→ 0+ it follows that

0 ≤ f ′(a).

This completes the proof of (2.7). The case x∗ = b is completelyanalogous.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Remark 2.7 The important idea in the above proof is that whena < x∗ < b, the variations x∗ + ε belong to U (x∗, δ) ∩ I for bothpositive and negative values of ε. In particular, one can “approach”x∗ from both directions and still be inside U (x∗, δ) ∩ I. However,when x∗ = a, the variations x∗ + ε belong to U (x∗, δ) ∩ I onlywhen ε is positive. Thus, one can “approach” x∗ only from theright and still remain in U (x∗, δ) ∩ I. This simple observation iscentral to much of what we do in deriving necessary conditions forthe simplest problem in the calculus of variations.

Remark 2.8 It is extremely important to understand that caremust be exercised when applying necessary conditions. A necessarycondition usually assumes the existence of of an optimizer. If anoptimizer does not exist, then the necessary condition is vacuous.Even worse, one can draw the incorrect conclusion by applyingthe necessary condition. Perron’s Paradox (see [186]) provides avery simple example to illustrate the danger of applying necessaryconditions to a problem with no solution. Let

Φ = N : N is a positive integer (2.13)

and defineJ(N) = N. (2.14)

Assume that N ∈ Φ maximizes J(·) on the set Φ, i.e. that N is thelargest positive integer. Thus, N ≥ 1 which implies that N2 ≥ N .However, since N2 is a positive integer and N is the largest posi-tive integer, it follows that N ≥ N2. Consequently, N2 ≤ N ≤ N2

which implies that N ≤ 1 ≤ N so that N = 1. Therefore, if oneassumes that the optimization problem (2.13) - (2.14) has a solu-tion, then one can (correctly) prove that the largest positive integeris N = 1. Of course the issue is a point of logic, where a false as-sumption can be used to prove a false conclusion. If one assumesan optimizer exists and it does not, then necessary conditions canbe used to produce incorrect answers. Unlike the simple PerronParadox, it is often difficult to establish the existence of solutionsto calculus of variation and optimal control problems. One shouldtake this remark as a warning when applying necessaryconditions to such problems.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


0 0.2 0.4 0.6 0.8 1

-1

-0.5

0

0.5

1

1.5

2

2.5

3

Figure 2.8: Plot of the Cost Function f(·) on (0, 1]

Example 2.4 Let I = (0, 1] and define f (x) = 5x2 − x4 − 1.The plot of f (·) on the interval (0, 1] is shown in Figure 2.8. Itis also clear that f (·) does not have a global nor local minimizeron the (0, 1] since 0 /∈ (0, 1]. However, if one “relaxes” (expands)the problem and considers the problem of minimizing f (·) on theclosure of I given by I = [0, 1], then x∗ = 0 ∈ (0, 1] solves the“relaxed problem”. In addition, there exist a sequence of pointsxk ∈ I = (0, 1] satisfying

xk −→ x∗ = 0

andf (xk) −→ f (x∗) = −1.

This process of relaxing the optimization problem by expanding theset of admissible points is an important idea and leads to the con-cepts of generalized curves and relaxed controllers. The key point

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is that by relaxing the problem one obtains existence of a mini-mizer in a larger set. But equally important, it can be shown thatthe solution to the relaxed problem can be approximated by pointsin the original constraint set.

This general idea also has profound applications to the theoryof nonlinear partial differential equations. In particular, a “vis-cosity solution” is an extension of the classical concept of whatis meant by a “solution” to certain partial differential equations.The basic idea is to define a viscosity solution as a limit of (clas-sical) solutions that are parameterized by parameter that tends tozero. This method produces existence of (weak) solution and thesesolutions are close related to the concept of Young’s “generalizedcurves”. The references [24], [61], [60], [75] and [186] provide anintroduction to these ideas.

2.2.3 Lagrange Multiplier Theorems

The constraint set for the general optimization problem is often de-fined in terms of equality (or inequality) constraints. The isoperi-metric problem in the calculus of variations is such a problem. Inthis section assume there are two vector spaces Z and Y and twofunctions

J : D(J) ⊆ Z −→ R1 (2.15)

andG : D(G) ⊆ Z −→ Y. (2.16)

The function J(·) is called the cost function and G(·) is called theconstraint function. Define the constraint set

ΘG ⊆ Z

by

ΘG = z ∈D(G) : G(z) = 0 ∈ Y ⊂D(G). (2.17)

The Equality Constrained Optimization Problem is defined to be:

Find an element z∗ ∈ ΘG ∩D(J) such that

J (z∗) ≤ J (z)

for all z ∈ ΘG ∩D(J).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Observe that since ΘG ⊂ D(G), it follows that ΘG ∩D(J) ⊂D(G) ∩D(J). Therefore, the equality constrained optimizationproblem is equivalent to finding z∗ ∈ D(G) ∩ D(J) such thatz∗ minimizes J(z) subject to G(z) = 0 ∈ Y . We first discussspecial cases and then move to the more abstract versions.

Lagrange Multiplier Theorem in Rn

We consider the finite dimensional constrained optimization prob-lem in n variables and m equality constraints. For the sake ofsimplicity of presentation, we assume that the cost function

J : Rn −→ R1

has domain equal to all Rn, i.e. D(J) = Rn. Also, we assumethat the constraint function

G : Rn −→ Rm

has domain equal to all Rn, i.e. D(G) = Rn and m < n. Inparticular, there are m real-valued functions

gi : Rn −→ R1,i = 1, 2, . . . ,m,

such that

G(z) =

g1(z)g2(z)

...gm(z)

,where z =

[x1 x2 · · · xn

]T ∈ Rn. We assume that all thefunctions gi(·), i = 1, 2, ..., n, are C1 real valued functions of the

n real variables z =[x1 x2 · · · xn

]T ∈ Rn so that the gradi-ents

∇J(z) =

∂J(z)∂x1∂J(z)∂x2...

∂J(z)∂xn

and ∇gi(z) =

∂ gi(z)∂x1

∂ gi(z)∂x2...

∂ gi(z)∂xn

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


exist and are continuous on Rn. The (equality) constrained mini-mization problem is to minimize J(z) subject to G(z) = 0 ∈ Rm.

Define the Lagrangian L : R1 × Rm × Rn → R1 by

L(λ0,λ, z) , λ0J (z) +m∑i=1

λigi(z), (2.18)

where λ =[λ1 λ2 · · · λm

]T ∈ Rm. Observe that L(λ0,λ, z)can be written as

L(λ0,λ, z) = λ0J (z) + 〈λ,G(z)〉m

where 〈·, ·〉m is the standard inner product on Rm. Also, the gra-dient ∇zL(λ0,λ, z) is given by

∇zL(λ0,λ, z) = λ0∇J (z) +m∑i=1

λi∇gi(z). (2.19)

Finally, we shall need the Jacobian of G(·) which is the m × nmatrix defined by

JG(z) ,

∂ g1(z)∂x1

∂ g1(z)∂x2

· · · ∂ g1(z)∂xn

∂ g2(z)∂x1

∂ g2(z)∂x2

· · · ∂ g2(z)∂xn

......

. . ....

∂ gm(z)∂x1

∂ gm(z)∂x2

· · · ∂ gm(z)∂xn

, (2.20)

which can be written as

JG(z) =[∇g1(z) ∇g2(z) · · · ∇gm(z)

]T.

As above we set

ΘG = z ∈ Rn : G(z) = 0 ∈ Rm

and state the Lagrange Multiplier Theorem for this problem.

Theorem 2.3 (Multiplier Theorem for the n-D Problem)If z∗ ∈ ΘG minimizes J (·) on ΘG then there exists a constant λ∗0and vector λ∗ =

[λ∗1 λ∗2 · · · λ∗m

]T ∈ Rm such that

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(i) |λ∗0|+ ‖λ∗‖ 6= 0 and

(ii) the minimizer z∗ satisfies

∇zL(λ∗0,λ∗, z∗) = λ∗0∇J (z∗) +

m∑i=1

λ∗i∇gi(z∗) = 0. (2.21)

(iii) If in addition the gradients ∇g1(z∗),∇g2(z∗), . . . ,∇gm(z∗)are linearly independent, i.e. the Jacobian JG(z) =[∇g1(z∗) ∇g2(z∗) · · · ∇gm(z∗)

]Thas maximal rank

m, then λ∗0 is not zero.

Remark 2.9 Note that condition (iii) implies that the linear op-erator T : Rn → Rm defined by

T h = [JG(z∗)]h

is onto all of Rm. This form of condition (iii) is the key to moreabstract forms of the Lagrange Multiplier Theorem. We say thatthe minimizer z∗ is a normal minimizer if λ∗0 6= 0.

Example 2.5 (A Normal Problem) Consider the problem in R2

of minimizingJ (x, y) = x2 + y2

subject to the single (m = 1) equality constraint

G (x, y) = y − x2 − 1 = 0.

The gradients of J(·) and G(·) are given by

∇J (x, y) =[

2x 2y]T

and∇G (x, y) =

[−2x 1

]T,

respectively.

Assume that z∗ =[x∗ y∗

]Tminimizes J(·, ·) subject to

G(x, y) = 0. The Lagrange Multiplier Theorem above implies thatthere exist λ∗0 and λ∗1 ∈ R1 such that

λ∗0∇J (x∗, y∗) + λ∗1∇G (x∗, y∗) = 0 (2.22)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and |λ∗0| + |λ∗1| 6= 0. Observe that if λ∗0 = 0, then it follows from(2.22) that

λ∗1∇G (x∗, y∗) = λ∗1[−2x∗ 1

]T=[−λ∗12x∗ λ∗1

]T= 0,

which implies that λ∗1 = 0. Since λ∗0 = 0 this would imply that|λ∗0| + |λ∗1| = 0 which contradicts the theorem. Therefore, λ∗0 6= 0and if we define λ = λ∗1/λ

∗0, then (2.22) is equivalent to

∇J (x∗, y∗) + λ∇G (x∗, y∗) = 0. (2.23)

Consequently, the Lagrange Multiplier Theorem yields the fol-lowing system

2x∗ − λ2x∗ = 2x∗(1− λ) = 0,

2y∗ + λ = 0,

g (x∗, y∗) = y∗ − [x∗]2 − 1 = 0.

Therefore, either λ = 1 or x∗ = 0. However, if λ = 1, then itwould follow that y∗ = −λ/2 = −1/2 and −1/2 = y∗ = [x∗]2 +1 >0 which is impossible. The only solution to the system is x∗ = 0,y∗ = 1 and λ = −2.

The previous example was normal in the sense that one couldshow that λ∗0 6= 0 and hence the Lagrange Multiplier Theoremresulted in three (nonlinear) equations to be solved for for threeunknowns x∗, y∗ and λ. When the problem is not normal this is notalways the case. This is easily illustrated by the following example.

Example 2.6 (A Non-Normal Problem) Consider the prob-lem of minimizing

J (x, y) = x2 + y2

subject to the equality constraint

G (x, y) = x2 − (y − 1)3 = 0.

The gradients of J(·) and G(·) are given by

∇J (x, y) =[

2x 2y]T

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


-1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Figure 2.9: A Non-Normal Problem

and∇G (x, y) =

[2x −3 (y − 1)2

]T,

respectively.It is clear from Figure 2.9 that the minimizer is given by

z∗ =[x∗ y∗

]T=[

0 1]T

. The Lagrange Multiplier Theoremimplies that there exist λ∗0 and λ∗1 such that

λ∗0∇J (x∗, y∗) + λ∗1∇G (x∗, y∗) = 0 (2.24)

and |λ∗0|+ |λ∗1| 6= 0.

If λ∗0 6= 0 and if we define λ = λ∗1/λ∗0, then (2.24) is equivalent

to∇J (x∗, y∗) + λ∇G (x∗, y∗) = 0. (2.25)

Using the fact that z∗ =[x∗ y∗

]T=[

0 1]T

is the minimizer,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(2.24) and (2.25) imply that

∇J (0, 1) + λ∇G (0, 1) = 0,

or equivalently, that[0 1

]T+ λ

[0 0

]T=[

0 0]T.

Clearly,[

0 1]T 6= [

0 0]T

so that λ∗0 = 0 is the only possiblechoice for λ∗0 and λ∗1 can be any non-zero value. The reason thatthe only choice for λ∗0 is λ∗0 = 0 is because the gradient of G (·) at

z∗ =[x∗ y∗

]T=[

0 1]T

is ∇G (0, 1) =[

0 0]T

= 0.

Proofs of the 2D Lagrange Multiplier Theorem

We shall present two proofs of the Lagrange Multiplier Theo-rem 2.3 in the special case where n = 2. The first proof is basedon a variational method and can be easily modified to deal withisoperimetric problems in the calculus of variations discussed inChapter 7 below. The second proof is geometric and relies ona separation result. Both proofs can be extended to a very gen-eral setting and yield a Lagrange Multiplier Theorem in abstractspaces.

The first proof makes use of the Inverse Mapping Theoremfrom advanced calculus. The Inverse Mapping Theorem providesconditions that ensure a function will have an inverse (at leastlocally). Let T : O ⊂ R2 → R2 be a function from an open set Ointo the plane defined by

T (α, β) = [p(α, β) q(α, β)]T

where p(α, β) and q(α, β) are smooth functions. Assume that[α β]T ∈ O and

T (α, β) = [p(α, β) q(α, β)] = [p q]T .

Roughly speaking, the Inverse Mapping Theorem implies that ifthe Jacobian matrix at [α β]T is non-singular (i.e. invertible), thenthere is a neighborhood U of [α β]T and an open neighborhood V

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


of T (α, β) = [p q]T so that T (α, β) restricted to U is a one-to-oneand onto mapping from U to V with a continuous inverse function.

Recall that the Jacobian matrix at [α β]T is given by

JT (α, β) =

[∂p(α,β)∂α

∂p(α,β)∂β

∂q(α,β)∂α

∂q(α,β)∂β

][α β]T =[α β]T

=

[∂p(α,β)∂α

∂p(α,β)∂β

∂q(α,β)∂α

∂q(α,β)∂β

],

(2.26)and JT (α, β) in one-to-one and onto (i.e. non-singular) if and onlyif the determinant

det JT (α, β) = det

[∂p(α,β)∂α

∂p(α,β)∂β

∂q(α,β)∂α

∂q(α,β)∂β

]6= 0.

The following version of the Inverse Mapping Theorem followsfrom Theorem 41.8 on page 381 in Bartle’s book [15].

Theorem 2.4 (Inverse Function Theorem) Let T : O ⊂R2 → R2 be a C1 function from the open set O into the planedefined by

T (α, β) = [p(α, β) q(α, β)]T

where p(α, β) and q(α, β) are smooth functions. Assume that[α β]T ∈ O with

T (α, β) = [p(α, β) q(α, β)] = [p q]T

and that the Jacobian at [α β]T , JT (α, β) is non-singular. Thenthere are open neighborhoods U of [α β]T and V of T (α, β) =[p q]T such that T (α, β) restricted to U is a one-to-one and ontomapping from U onto V. Moreover, if

T (α, β) , T (α, β)|Udenotes the restriction of T (α, β) to U , then T (α, β) : U → V hasa continuous inverse T −1(p, q) : V → U belonging to C1,

[α β]T = [α(p, q) β(p, q)]T = T −1 (p, q) ,

and

J[T −1(p, q)] = [JT (α(p, q), β(p, q))]−1 = [JT (T (p, q))]−1

for all [p q]T ∈ V.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Proof of Theorem 2.3 for the 2D Case:Assume z∗ = [ x∗ y∗]T ∈ R2 minimizes

J (x, y)

subject toG (x, y) = g (x, y) = 0,

whereJ : R2 −→ R1

andG : R2 −→ R1.

Since there is only one constraint (i.e. m = 1), the Jacobian

JG(z∗) =[∇g1(z∗) ∇g2(z∗) · · · ∇gm(z∗)

]T= [∇g(z∗)]T =

[gx (x∗, y∗) gy (x∗, y∗)

]Thas maximal rank m = 1, if and only if ∇g(z∗) 6= [0 0]T .

First consider the case where z∗ = [ x∗ y∗]T satisfies

∇G (x∗, y∗) = ∇g (x∗, y∗) =

[gx (x∗, y∗)gy (x∗, y∗)

]=

[00

].

In this case set λ∗0 = 0 and λ∗1 = 1. It follows that |λ∗0|+|λ∗1| = 1 6= 0and

∇zL(λ∗0,λ∗, z∗) = λ∗0∇J (z∗) + λ∗∇g(z∗)

= 0∇J (z∗) + 1∇g(z∗) = 0.

Hence,λ∗0∇J (z∗) + λ∗1∇g(z∗) = 0

and the theorem is clearly true.Now consider the case where z∗ = [ x∗ y∗]T satisfies

∇G (x∗, y∗) = ∇g (x∗, y∗) =

[gx (x∗, y∗)gy (x∗, y∗)

]6=[

00

].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


In particular, at least one of the partial derivatives is not zero.Without loss of generality we assume gx (x∗, y∗) 6= 0 and define

λ∗0 = gx (x∗, y∗) 6= 0

andλ∗1 = −Jx (x∗, y∗) .

We now show that z∗ = [x∗ y∗]T satisfies

λ∗0∇J (z∗) + λ∗1∇g(z∗) = 0.

Observe that

λ∗0∇J (z∗) + λ∗1∇g(z∗) = gx (z∗)∇J (z∗)− Jx (z∗)∇g(z∗),

so that

λ∗0Jx (z∗) + λ∗1gx(z∗) = gx (z∗)Jx (z∗)− Jx (z∗) gx(z

∗) = 0(2.27)

and

λ∗0Jy (z∗) + λ∗1gy(z∗) = gx (z∗)Jy (z∗)

− Jx (z∗) gy(z∗)

= det

[gx (z∗) Jx (z∗)gy(z

∗) Jy (z∗)

].

Therefore, to establish that z∗ = [ x∗ y∗]T satisfies

λ∗0∇J (z∗) + λ∗1∇g(z∗) = gx (z∗)∇J (z∗)− Jx (z∗)∇g(z∗) = 0,

we must show that

det

[gx (z∗) Jx (z∗)gy(z

∗) Jy (z∗)

]= det

[Jx (z∗) Jy (z∗)gx(z

∗) gy (z∗)

]= 0. (2.28)

This is accomplished by applying the Inverse Mapping Theo-rem 2.4 above.

Define T : R2 → R2 by

T (α, β) = [(p(α, β) q(α, β)]T , (2.29)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


wherep(α, β) = J(x∗ + α, y∗ + β) (2.30)

andq(α, β) = G (x∗ + α, y∗ + β) , (2.31)

respectively.Observe that T (α, β) maps the open set R2 to R2, is defined

by (2.29) - (2.31) with [α β]T = [0 0]T and

T (α, β) = T (0, 0) = [J(x∗, y∗) G(x∗, y∗)]T = [J(z∗) 0]T = [p 0]T .

The Jacobian of T (α, β) at [α β]T = [0 0]T is given by[∂p(0,0)∂α

∂p(0,0)∂β

∂q(0,0)∂α

∂q(0,0)∂β

]=

[Jx (x∗, y∗) Jy (x∗, y∗)gx(x

∗, y∗) gy(x∗, y∗)

].

Assume that (2.28) is not true. This assumption implies that

det

[∂p(0,0)∂α

∂p(0,0)∂β

∂q(0,0)∂α

∂q(0,0)∂β

]= det

[Jx (x∗, y∗) Jy (x∗, y∗)gx(x

∗, y∗) gy(x∗, y∗)

]6= 0

and the Jacobian of T (α, β) is non-singular at [α β]T = [0 0]T sowe may apply the Theorem 2.4. In particular, (see Figure 2.10)

there is a neighborhood U =

[α β]T :√α2 + β2 < γ

of [0 0]T

and a neighborhood V of [J(x∗, y∗) 0]T = [p 0]T such that therestriction of T (α, β) to U , T (α, β) : U → V , has a continuousinverse T −1(p, q) : V → U belonging to C1.

Let [p 0]T ∈ V be any point with p < J(x∗, y∗) and let[α β]T = T −1(p, 0) ∈ U . Observe that

J(x∗ + α, y∗+β) = p(α, β) = p < J(x∗, y∗) = J(z∗) (2.32)

andG(x∗ + α, y∗+β) = q(α, β) = 0. (2.33)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


q

p

[ ]T

.

[ ( , ) ( , )]Tp q

ˆˆ[ ] [0 0]T T

* * ˆ(0,0) [ ( , ) 0] [ 0]J T TT x y p

2 2[ ] :T U

T

1T

[ 0]Tp

1[ ] ( ,0)T p T

* *( , ) x yp J

V

Figure 2.10: 2D Lagrange Multiplier Proof

Thus, the vector z = [x∗ + α y∗+β]T satisfies

G(z) = G(x∗ + α, y∗+β) = q(α, β) = 0

and

J(z) = J(x∗ + α, y∗+β) = p(α, β) = p < J(x∗, y∗) = J(z∗)

which contradicts the assumption that z∗ = [x∗ y∗]T minimizesJ (x, y) subject to G (x, y) = 0.

Therefore, it follows that

det

[∂p(0,0)∂α

∂p(0,0)∂β

∂q(0,0)∂α

∂q(0,0)∂β

]= det

[Jx (x∗, y∗) Jy (x∗, y∗)gx(x

∗, y∗) gy(x∗, y∗)

]6= 0

must be false and hence

gx (x∗, y∗)Jy (x∗, y∗)− Jx (x∗, y∗) gy(x∗, y∗)

= − det

[Jx (x∗, y∗) Jy (x∗, y∗)gx(x

∗, y∗) gy(x∗, y∗)

](2.34)

= 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Consequently, (2.27) and (2.34) together imply that

λ∗0∇J (z∗) + λ∗1∇g(z∗) = gx (z∗)∇J (z∗)− Jx (z∗)∇g(z∗) = 0

and λ∗0 = gx (x∗, y∗) 6= 0 which completes the proof. We also outline another proof that is geometric in nature. The

details can be found in Hestenes’ book [102].

Second Proof of Theorem 2.3 for the 2D Case:Assume z∗ = [x∗ y∗]T ∈ R2 minimizes

J (x, y)

subject toG (x, y) = g (x, y) = 0.

Observe that if ∇g(z∗) =0 = [0 0]T , then λ∗0 = 0 and λ∗ = 1produces

∇zL(λ∗0,λ∗, z∗) = λ∗0∇J (z∗) + λ∗∇g(z∗) = 0

and hence we need only consider the case where ∇g(z∗) 6= 0.Also, in the trivial case when ∇J (z∗) = 0 one can set λ∗0 = 1 andλ∗ = 0 so that without loss of generality we can consider the casewhere ∇g(z∗) 6= 0 and ∇J (z∗) 6= 0. Under this assumption, thecondition that

λ∗0∇J (z∗) + λ∗∇g(z∗) = 0

with λ∗0 6= 0 is equivalent to the existence of a λ such that

∇J (z∗) + λ∇g(z∗) = 0,

where λ = (λ∗/λ∗0). In particular, the nonzero gradients ∇g(z∗)and ∇J (z∗) must be collinear.

To establish this we define the level set L∗ by

L∗ ,

[x y]T : J (x, y) = m∗ = J (x∗, y∗)

and the constraint set C by

C =

[x y]T : G (x, y) = 0,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


* *( , )x yJ

* *[ ]Tx y

* *[ ] ( , ) : ( , ) z JTS x y B x y m

* *[ ] ( , ) : ( , ) z JTS x y B x y m

*( , ) [ ] : z z z zTB x y

* * [ ] : ( , ) JTx y x y m L

Figure 2.11: The Level Set

respectively. Thus, in a sufficiently small ball B(z∗, δ) about z∗ =[x∗ y∗]T the level set separates the ball B(z∗, δ) into two sets

S− =

[x y]T ∈ B(z∗, δ) : J (x, y) < m∗ = J (x∗, y∗)

and

S+ =

[x y]T ∈ B(z∗, δ) : J (x, y) > m∗ = J (x∗, y∗),

respectively (see Figure 2.11).If we assume that ∇g(z∗) and ∇J (z∗) are not collinear, then

as shown in Figure 2.12 the support line to S− at z∗ = [ x∗ y∗]T

(i.e. the line orthogonal to ∇J (z∗)) must cross the line orthogonalto ∇g(z∗) at z∗ = [ x∗ y∗]T . In particular, the constraint set Cmust intersect S− =

[x y]T ∈ B(z∗, δ) : J (x, y) < m∗

and there

is a point [x y]T ∈ C ∩ S− (see Figure 2.13). However,

C ∩ S− =

[x y]T : G (x, y) = 0 and J (x, y) < m∗ = J (x∗, y∗)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


* *[ ]Tx y

: ( , ) 0 ] [ GTC x yx y

* * [ ] : ( , ) JTx y x y m L

* *( , )x yJ*( , ) [ ] : z z z zTB x y

* *[ ] ( , ) : ( , ) z JTS x y B x y m

* *[ ] ( , ) : ( , ) z JTS x y B x y m

* *( , )x yG

Figure 2.12: Non-collinear Gradients

so that [x y]T ∈ C ∩ S− satisfies

G (x, y) = 0

andJ (x, y) < m∗ = J (x∗, y∗) .

Therefore, [x∗ y∗]T is not a local minimizer of J (x, y) on the set

C =

[x y]T : G (x, y) = 0

and hence can not minimize J (x, y) subject to G (x, y) = 0. Con-sequently, ∇g(z∗) and ∇J (z∗) must be collinear and this com-pletes the proof.

We note that this geometric proof depends on knowing thatthe line orthogonal to the gradient ∇J (z∗) is a support plane forS− =

[x y]T ∈ B(z∗, δ) : J (x, y) < m∗

at z∗ = [ x∗ y∗]T . This

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


[ ]Tx y

*( , )x y mJ

* *[ ]Tx y

* *( , )x yJ*( , ) [ ] : z z z zTB x y

* *[ ] ( , ) : ( , ) z JTS x y B x y m

* *[ ] ( , ) : ( , ) z JTS x y B x y m

: ( , ) 0 ] [ GTC x yx y

* * [ ] : ( , ) JTx y x y m L

* *( , )x yG

Figure 2.13: Contradiction to Assumption

“theme” will be important when we derive the simplest MaximumPrinciple for the time optimal control problem.

2.3 Function Spaces

Problems in the calculus of variations and optimal control involvefinding functions that minimize some functional over a set of pre-scribed (admissible) functions. Therefore, we need to specify theprecise space of functions that will be admissible. We start withthe basic real valued piecewise continuous functions defined on aninterval I = [t0, t1].

Definition 2.7 Let I = [t0, t1] be a closed interval. Then x : I →R is said to be piecewise continuous (PWC) on [t0, t1] if:

• The function x (·) is bounded on I.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


• The right-hand limit x(t+)

= limt→t+ [x(t)] exists (is finite)

for all t ∈ [t0, t1).

• The left-hand limit x(t−)

= limt→t− [x(t)] exists (is finite)

for all t ∈ (t0, t1].

• There is a finite partition of [t0, t1], t0 = t0 < t1 < t2 <. . . < tp−1 < tp = t1, such that x (·) is continuous on eachopen subinterval

(ti−1, ti

).

Note that if x (·) is defined and bounded on [t0, t1], t ∈ [t0, t1]and limt→t± [x(t)] exists (even one-sided), then this limit must befinite. Thus, for piecewise continuous functions the one-sided limitsexist and are finite at all points.

Example 2.7 If x (·) is defined as in Figure 2.14, then x (·) ispiecewise continuous. Note however that x (·) is not continuous.

Example 2.8 The function shown in Figure 2.15 is not piecewisecontinuous since it is not bounded on [t0, t1).

t0 t10t 1t 2t 3t 5t4t

Figure 2.14: A Piecewise Continuous Function

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


2t0 0t t 1t 2ˆpt 1

ˆpt 1pt t……..

Figure 2.15: A Non-Piecewise Continuous Function

Example 2.9 Let tN = 12N−1 , for N = 1, 2, 3, · · · and define x :

[0, 1] −→ R by

x(t) =1

2N−1,

1

2N< t ≤ 1

2N−1

and set x(0) = 0 (see Figure 2.16). Note that x(·) is defined andbounded on [0, 1] and given any t ∈ [0, 1] it follows that the limitslimt→t± [x(t)] exist and are finite. Clearly, x(·) is bounded, but thereis no finite partition of [0, 1] such that x(·) is continuous on thispartition. Therefore, x(·) is not piecewise continuous since it hasan infinite number of discontinuous jumps.

Definition 2.8 A function x : [t0, t1] → R is called piecewisesmooth (PWS) on [t0, t1] if

(i) x(·) is continuous on [t0, t1] and

(ii) there exists a piecewise continuous function g(·) and a con-

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


112

14

18

116

1

12

14

18

…

…

Figure 2.16: A Bounded Non-Piecewise Continuous Function

stant c such that for all t ∈ [t0, t1]

x (t) = c+

t∫t0

g (s) ds.

Note that if x : [t0, t1] → R is PWS on [t0, t1] and there aretwo PWC functions gi(·) and constants ci, i = 1, 2 such that forall t ∈ [t0, t1]

x (t) = c1 +

t∫t0

g1 (s) ds

and

x (t) = c2 +

t∫t0

g2 (s) ds,

then c1 = x (t0) = c2. Also,

t∫t0

(g1 (s)− g2 (s))ds = 0

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all t ∈ [t0, t1]. It is straightforward to show that g1 (t) = g2 (t)except at a finite number of points,

g1(t+) = g2(t+)

andg1(t−) = g2(t−)

for all t ∈ (t0, t1). Hence, g1(t) and g2(t) can only have discon-tinuities at the same discrete points ti and between these pointsg1(t) = g2(t) if t 6= ti.

If x : [t0, t1]→ R is PWS on [t0, t1], then one could redefine x :[t0, t1]→ R at a finite number of points, but the resulting functionwould no longer be continuous. In particular, if x(·) and x(·) areequal except at a finite number of points and x(·) is continuous,then x(·) is uniquely defined.

Thus, piecewise smooth functions are continuous and x (t) =g (t) exists except at a finite number of points. In addition, at thepoints ti, i = 1, 2, ..., p−1 the right-hand and left-hand derivativesexist and are given by

d+x(ti)

dt= x+(ti) = g

(t+i)

= x(t+i ) (2.35)

2t1t 2ˆpt 1

ˆpt

……..0 0t t 1pt t

Figure 2.17: A Piecewise Smooth Function

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


andd−x(ti)

dt= x−

(ti)

= g(t−i)

= x(t−i), (2.36)

respectively. Moreover, at the endpoints t0 and t1 it follows that

d+x(t0)

dt= x+(t0) = g

(t+0)

= x(t+0 ) (2.37)

andd−x(t1)

dt= x− (t1) = g

(t−1)

= x(t−1), (2.38)

respectively.

Definition 2.9 If x (·) is piecewise smooth on [t0, t1] and t0 <t < t1 is such that x

(t+)6= x

(t−), then we say that x (·) has

a corner at t. From the remarks above, it follows that x(t+)

=

limt→t+ [x(t)] = g(t+)

and x(t−)

= limt→t− [x(t)] = g(t−)

alwaysexist and are finite since x (t) = g (t) except at a finite number ofpoints and g (·) is piecewise continuous.

Before discussing specific spaces of functions, it is worthwhileto discuss what one means when we say that two functions are“equal” on a fixed interval [t0, t1]. For example, the functions x(·)and z(·) plotted in Figure 2.18 below are not equal at each point.However, for all “practical purposes” (like integration) they areessentially the same functions. Their values are the same except ata finite number of points. If x : [t0, t1] −→ R and z : [t0, t1] −→ Rare two functions such that x(t) = z(t) except at an finite numberof points we shall write

x(·) = z(·) e.f. (2.39)

and, unless otherwise noted, we will rarely distinguish betweenx(·) and z(·). The functions x(t) and z(t) in Figure 2.18 are equale.f.

We denote the space of all real-valued piecewise continuousfunctions defined on [t0, t1] by PWC(t0, t1) and the space ofall real-valued piecewise smooth functions defined on [t0, t1] byPWS(t0, t1).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


( )x t ( )z t

Figure 2.18: The Functions x(·) and z(·) are Equal e.f.

Remark 2.10 Important Remark on Notation: Recall thateach z(·) ∈ PWS(t0, t1) is continuous. Therefore, if x(·) ∈PWS(t0, t1) and z(·) ∈ PWS(t0, t1), then x(·) = z(·) e.f. ifand only if x(t) = z(t) for all t ∈ [t0, t1]. Also note that ifx(·) ∈ PWC(t0, t1), z(·) ∈ PWS(t0, t1) and x(·) = z(·) e.f., thenthe left and right limits of x(·) are equal at all points of (t0, t1). Inparticular,

x(t+) = z(t+) = z(t−) = x(t−).

Clearly, z(·) is the only continuous function satisfying x(·) = z(·)e.f. and since x(t+) = z(t+) = z(t−) = x(t−) we shall makeno distinction between x(·) and its continuous representation z(·).Thus, for future reference we shall always use the equiv-alent continuous representation z(·) of x(·) when it existsand not distinguish between x(·) and z(·).

If [t0,+∞) is a semi-infinite interval, then we say that x(·) ∈PWS(t0,+∞) if x(·) ∈ PWS(t0, T ) for all t0 < T < +∞. Thus,we denote the space of all real-valued piecewise continuous func-tions defined on [t0,+∞) by PWC(t0,+∞) and the space of allreal-valued piecewise smooth functions defined on [t0,+∞) byPWS(t0,+∞).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


2.3.1 Distances between Functions

We shall need to consider what is meant by two functions “beingclose” in some sense. Although there are many possible definitionsof “the distance” between x(·) and z(·) ∈ PWS(t0, t1), we shallconsider only two specific metrics.

Definition 2.10 If x(·) and z(·) ∈ PWS(t0, t1), then the d0 dis-tance between two piecewise smooth functions x(·) and z(·) withdomain [t0, t1] is defined by

d0(x(·), z(·)) , supt0≤t≤t1

|x(t)− z(t)|. (2.40)

In this case we can define a norm on PWS(t0, t1) by

‖x(·)‖0 = supt0≤t≤t1

|x(t)|,

and note that

d0(x(·), z(·)) = ‖x(·)− z(·)‖0 .

ˆ( )x

0t 1t

Figure 2.19: A U0(x(·), δ)-neighborhood of x(·)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Definition 2.11 If x(·) ∈ PWS(t0, t1) and δ > 0, the U0(x(·), δ)-neighborhood (or Strong Neighborhood) of x(·) is definedto be the open ball

U0(x(·), δ) = x(·) ∈ PWS(t0, t1) : d0(x(·), x(·)) < δ.

It is easy to visualize what such neighborhoods look like. Givenx(·) and δ > 0, the U0(x(·), δ)-neighborhood of x(·) is the set ofall x(·) ∈ PWS(t0, t1) with graphs in a tube of radius δ about thegraph of x(·) (see the figure below).

As we see from above, x(·) and z(·) ∈ PWS(t0, t1) are “close”in the d0 metric if their graphs are close. However, the derivativescan be greatly different. For example the two functions shown inthe Figure 2.20 below have very different derivatives.

In order to “fix this problem” we need a different metric. Recallthat if x(·) and z(·) ∈ PWS(t0, t1), then their derivatives x (·) andz(·) are PWC on I. In particular, it follows that:

• x (·) and z(·) are bounded on I.

• x (t+) and z (t+) exist (are finite) on [t0, t1).

• x (t−) and z (t−) exist (are finite) on (t0, t1].

• There is a (finite) partition of [t0, t1], say t0 = t0 < t1 <t2 < . . . < tp−1 < tp = t1, such that both x (t) and z (t) existand are continuous (and bounded) on each open subinterval(ti−1, ti

). We can now define a weaker notion of distance.

Definition 2.12 The d1 distance between two piecewisesmooth functions x(·) and z(·) with domain [t0, t1] is defined by

d1(x(·), z(·)) = sup |x(t)− z(t)| : t0 ≤ t ≤ t1 (2.41)

+ sup|x(t)− z(t)| : t0 ≤ t ≤ t1, t 6= ti

.

Definition 2.13 If x(·) ∈ PWS(t0, t1) and δ > 0, the U1(x(·), δ)-neighborhood (or weak neighborhood) of x(·) is defined tobe the open ball

U1(x(·), δ) = x(·) ∈ PWS(t0, t1) : d1(x(·), x(·)) < δ .

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


ˆ( )x

0t 1t

( )x

Figure 2.20: A Non-smooth Function in a U0(x(·), δ)-neighborhoodof x(·)

Remark 2.11 Note that the d1 distance between x(·) and z(·) isgiven by

d1(x(·), z(·)) = d0(x(·), z(·)) + sup|x(t)− z(t)| : t0 ≤ t ≤ t1,

t 6= ti. (2.42)

If x(·) and z(·) ∈ PWS(t0, t1) and d1(x(·), z(·)) = 0, then x(t) =z(t) for all t ∈ [t0, t1] and x(t) = z(t) e.f. Also, since

d0(x(·), z(·)) ≤ d1(x(·), z(·)),

it follows that if d1(x(·), z(·)) < δ, then d0(x(·), z(·)) < δ. It isimportant to note that this inequality implies that

U1(x(·), δ) ⊂ U0(x(·), δ) ⊂ PWS(t0, t1), (2.43)

so that the U1(x(·), δ)-neighborhood U1(x(·), δ) is smaller than theU0(x(·), δ)-neighborhood U0(x(·), δ) (see Figure 2.21).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


0 ˆ( ( ), )U x

ˆ( )x

1 ˆ( ( ), )U x

0 1( , )PWS t t

Figure 2.21: A Comparison of U0(x(·), δ) and U1(x(·), δ) -neighborhoods of x(·)

Two functions x(·) and z(·) are “close” in the d0 sense if theirgraphs are within δ of each other at all points. It is helpful (al-though not quite accurate) to think of two functions x(·) and z(·)as “close” in the d1 sense if their graphs and the graphs of x(·)and z(·) are within δ of each other except at a finite number ofpoints.

Remark 2.12 Another Remark on Notation: If x(·) is apiecewise smooth function, then x(t) is defined except perhaps at afinite number of points say, t0 = t0 < t1 < t2 < . . . < tp−1 < tp =t1. However, at these points the left and right derivatives exist andare given by x(t−i ) and x(t+i ), respectively. In order to keep notationat a minimum, when we use x(t) in any expression, we mean thatthis expression holds for all points t where x(t) exists. Also, recallthat at corners t where x(t) do not exist, both x(t−) and x(t+)exist and are finite. We shall treat conditions at corners as theyarise in the development of the material.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


2.3.2 An Introduction to the First Variation

In order to extend Theorem 2.1 to infinite dimensional optimiza-tion problems characteristic of those in the calculus of variationsand optimal control, we need to introduce the concept of a “Vari-ation” of a functional. Although this is a very general concept, wefocus on functionals defined on the space PWS(t0, t1) and extendthe definition in a later section.

Assume that J : D(J) ⊆ PWS(t0, t1) −→ R is a real-valuedfunctional defined on a subset of PWS(t0, t1) and let x(·) ∈ D(J)be given. We say η(·) ∈ PWS(t0, t1) is an admissible directionfor x(·) if there is an interval (−ε,+ε) with ε > 0 such thatx(·) + εη(·) ∈ D(J) ⊆ PWS(t0, t1) for all ε ∈ (−ε,+ε). If η(·) ∈PWS(t0, t1) is an admissible direction, then

F (ε) = J(x(·) + εη(·)) (2.44)

defines a real-valued function of the real variable ε on the interval(−ε,+ε). In this case we have the following definition.

Definition 2.14 If x(·) ∈ PWS(t0, t1), η(·) ∈ PWS(t0, t1) is anadmissible direction for x(·) and F (·) : (−ε,+ε) −→ R has aderivative at ε = 0, then the first variation of J(·) at x(·) inthe direction of η(·) is denoted by δJ(x(·); η(·)) and is definedby

δJ(x(·); η(·)) =d

dεF (ε)

∣∣∣∣ε=0

=d

dε[J(x(·) + εη(·))]

∣∣∣∣ε=0

. (2.45)

Likewise, we say η(·) ∈ PWS(t0, t1) is a right admissible di-rection for x(·) if there is an ε > 0 such that that x(·) + εη(·) ∈D(J) ⊆ PWS(t0, t1) for all ε ∈ [0,+ε). If η(·) ∈ PWS(t0, t1) isa right admissible direction, then F (ε) = J(x(·) + εη(·)) defines areal-valued function of the real variable ε on the interval [0,+ε).If the right-hand derivative

d+

dεF (ε)

∣∣∣∣ε=0

=d+

dε[J(x(·) + εη(·))]

∣∣∣∣ε=0

(2.46)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


exists, then

δ+J(x(·); η(·)) =d+

dεF (ε)

∣∣∣∣ε=0

=d+

dε[J(x(·) + εη(·))]

∣∣∣∣ε=0

is the right-hand first variation of J(·) at x(·) in the direc-tion of η(·). We will abuse notation and write

δJ(x(·); η(·)) = δ+J(x(·); η(·))

even when η(·) ∈ PWS(t0, t1) is only a right admissible directionfor x(·).

If x(·) ∈ PWS(t0, t1), η(·) ∈ PWS(t0, t1) is an admissible di-rection for x(·) and F (·) : (−ε,+ε) −→ R has a second derivativeat ε = 0, then the second variation of J(·) at x(·) in thedirection of η(·) is denoted by δ2J(x(·); η(·)) and is defined by

δ2J(x(·); η(·)) =d2

dε2F (ε)

∣∣∣∣ε=0

=d2

dε2[J(x(·) + εη(·))]

∣∣∣∣ε=0

. (2.47)

2.4 Mathematical Formulation of Prob-

lems

In this section we return to some of the motivating problems out-lined at the beginning of this chapter. Now that we have carefullydefined specific classes of functions, it is possible to develop precisemathematical formulations of these problems.

2.4.1 The Brachistochrone Problem

We consider the problem where the particle starts at (0, 0) andslides down a wire to the point [a b]T and there is no friction asagain illustrated in Figure 2.22. Let g be the acceleration due togravity and m be the mass of the bead. The velocity of the beadwhen it is located at [x y]T is denoted by v (x). The kinetic energyof the bead is given by

T =1

2mv2 (x)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


y(x)

a

b

x

,

,

Figure 2.22: Mathematical Formulation of The BrachistochroneProblem

and the potential energy is given by

V = mg (−y (x)) .

The total energy of the bead is

T + V

and conservation of energy requires that T + V remains constantat all times. Since at x = 0, the kinetic energy is 0 (i.e. v (0) = 0)and the potential energy is 0 (i.e. y (0) = 0), we have that

T = −V,

or equivalently,1

2mv2 = mgy.

Now, we use some elementary calculus. The length of the pathdescribed by the graph of y (x) is given by

s (x) =

x∫0

√1 + [y′ (τ)]2dτ.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The velocity is given by

v (x) =d

dts (x) =

d

dxs (x)

dx

dt=

√1 + [y′ (x)]2

dtdx.

However, v (x) =√

2gy (x) and hence we obtain

dt =

√1 + [y′ (x)]2√

2gy (x)dx.

Integration gives the time of travel from [0 0]T to [x y(x)]T asthe integral

t (x) =

x∫0

√1 + [y′ (x)]2√

2gy (x)dx.

Hence, the time required for the bead to slide from [0 0]T to [a b]T

is given by

J(y(·)) = t (a) =

a∫0

√1 + [y′ (x)]2√

2gy (x)dx. (2.48)

Thus, we have derived a formula (2.48) for the time it takesthe bead to slide down a given curve y(·). Observe that the time(2.48) depends only on the function y(·) and is independent of themass of the bead. We can now state the mathematical formulationof the brachistochrone problem.

Among the set of all smooth functions y : [0, a]→ R satisfying

y (0) = 0, y (a) = b,

find the function that minimizes

J(y(·)) =

a∫0

√1 + [y′ (x)]2√

2gy (x)dx. (2.49)

Note that in the derivation of (2.49) we assumed that y (x)is smooth (i.e., that y (x) and y′ (x) are continuous). Although

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


we never explicitly assumed that y (x) ≥ 0, this is needed tomake the terms under the square root sign non-negative. Also,observe that at x = 0, y(0) = 0, so that the integrand is sin-gular. All of these “little details” are not so important for thebrachistochrone problem, but can become important in othercases.

2.4.2 The Minimal Surface of RevolutionProblem

Consider the problem of generating a surface by revolving a con-tinuous curve about the x-axis. Suppose we are given two pointsP0 = [a0 b0]T and P1 = [a1 b1]T in the plane. These two pointsare joined by a curve which is the graph of the function y(·). Weassume that y(·) is smooth and we generate a surface by rotatingthe curve about the x-axis (see Figure 2.23 below). What shapeshould y(·) be so that this surface of revolution has minimum sur-face area?

( )y xx

0a 1a

0 0[ ]Ta b0b

1b1 1[ ]Ta b

Figure 2.23: Minimal Surface of Revolution Problem

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Again, we return to elementary calculus and recall the formulafor surface area. If y(·) ≥ 0, then the surface area is given by

S = 2π

a1∫a0

y (x)

√1 + [y′ (x)]2dx.

We are seeking, among the set of all smooth functions y : [a1, a2]→R satisfying

y (a0) = b0, y (a1) = b1,

a function y∗(·) that minimizes

J(y(·)) = S = 2π

a1∫a0

y (x)

√1 + [y′ (x)]2dx. (2.50)

At this point you should have noticed that the brachistochroneproblem and the problem of minimal surface of revolution are “al-most” identical in form. Both problems involve finding a function,with fixed endpoints, that minimizes an integral among all suffi-ciently smooth functions that pass through these same endpoints.

2.4.3 The River Crossing Problem

Consider the river crossing problem discussed in section 2.1.2 andas illustrated in Figure 2.2. If the path of the boat is representedby the graph of the function x(t), then it can be shown that thetime it takes for the boat to cross the river is given by

time =

1∫0

√c2(1 + [x (s)]2)− [v(s)]2 − v(s)x (s)

c2 − [v(s)]2ds, (2.51)

where c is the constant speed of the boat in still water and weassume that c2 − [v(t)]2 > 0. The point of departure is given byt1 = 0, x1 = 0 so that

x (0) = 0. (2.52)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Thus, the river crossing problem is to find a smooth function x∗ :[0, 1] −→ R that minimizes

J(x(·)) =

1∫0

√c2(1 + [x (s)]2)− [v(s)]2 − v(s)x (s)

c2 − [v(s)]2ds (2.53)

among all smooth functions satisfying (2.52). Observe that unlikethe brachistochrone problem and the problem of minimal surfaceof revolution, there is no specified condition at x = 1. Thus, thisis a simple example of the classical free-endpoint problem.

2.4.4 The Rocket Sled Problem

Consider the rocket sled described in Section 2.1.4 above. We as-sume the sled is on a frictionless track, the sled is of mass m, it iscontrolled by two rocket engines thrusting in opposite directionsand that the thrusting force of the rockets is bounded by a max-imum of 1 so that |u(t)| ≤ 1. Let x (t) denote the displacementof the sled from the reference point R, at time t. Given that attime t = 0, the sled is at an initial position x0 with initial velocityv0, we wish to find a thrust force action that will move the sledto position R and stop it there, and we want to accomplish thistransfer in minimal time. Newton’s second law may be written as

mx(t) = u(t),

where x(t) is the displacement from the base station at time t, mis the mass of the sled and u(t) is the thrust. We assume that attime t = 0 the initial data is given by the initial displacement

x(0) = x0,

and initial velocityx(0) = v0.

The time optimal control problem is to find a control u∗(t) thattransfers the system from [x0 v]T to [0 0]T in minimal time, giventhat |u(t)| ≤ 1.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


To formulate the problem we write this as a first order systemby defining

x1(t) = x(t) = position at time t,

x2(t) = x(t) = velocity at time t,

with initial data

x1(0) = x1,0 , x0,

x2(0) = x2,0 , v0.

The system becomes

d

dt

[x1(t)x2(t)

]=

[0 10 0

] [x1(t)x2(t)

]+

[01/m

]u(t), (2.54)

orx(t) = Ax(t) +Bu(t), x(0) = x0 (2.55)

where the matrices A and B are defined by

A =

[0 10 0

]and B =

[01/m

],

respectively. Here, x0 =[x1,0 x2,0

]Tand x(t) =

[x1(t) x2(t)

]Tis the trajectory in the plane R2. Given a control u(·) we letx(t;u(·)) denote the solution of the initial value problem (2.55)with control input u(·).

We say that a control u(t) steers x0 =[x1,0 x2,0

]Tto a state

x1 =[x1,1 x2,1

]Tin time t1 if there is a solution x(t;u(·)) to

(2.55) satisfyingx(0;u(·)) = x0 (2.56)

andx(t1;u(·)) = x1 (2.57)

for some finite time t1 > 0. The time optimal problem is to findthe control u∗(·) such that

|u∗(t)| ≤ 1,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and u∗(·) steers x0 =[x1,0 x2,0

]Tto x1 =

[x1,1 x2,1

]Tin

minimal time.Observe that here the final time t1 is not fixed and the cost

function is very simple. In particular, the cost is the final timeand this can be represented by the integral

J(x(·), u(·)) = t1 =

t1∫0

1ds. (2.58)

This time optimal control problem can be “solved” by severalmeans, including a Lagrange Multiplier method. Also, since thisparticular problem is rather simple, one can use a geometric anal-ysis of the problem and the optimal controller can be synthesizedby means of a switching locus. We will present this solution inChapter 9 because the basic ideas are useful in understanding themaximum principle and how one might derive this necessary con-dition.

2.4.5 The Finite Element Method

In its purest form the finite element method is intimately con-nected to variational calculus, optimization theory and Galerkinapproximation methods. The books by Strang and Fix (see[170, 171]) provide an excellent introduction to the finite elementmethod. The references [44], [96], [98], and [179] provide exampleswhere the finite element method has been applied to a variety ofareas in science and engineering.

The finite element method and its extensions are among themost powerful computational tools for solving complex ordinaryand partial differential equations. We shall illustrate how one for-mulates the finite element approximation for the simple two-pointboundary value problem described by (2.4)-(2.5) above. Later weshall see how the calculus of variations provides the mathematicalframework and theory necessary to analyze the convergence of themethod.

Here we focus on the application of the finite element method to

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


a simple example defined by the two-point boundary value problem

− x(t) + x(t) = f(t), 0 < t < 1, (2.59)


x(0) = 0, x(1) = 0. (2.60)

A strong (or classical) solution is a twice differentiable functionx(·) that satisfies (2.59)-(2.60) at every value 0 ≤ t ≤ 1. We areinterested in developing a numerical algorithm for approximatingsolutions.

As noted above, in order to construct the finite element ap-proximation one divides the interval (0, 1) into N subintervals(called elements) with nodes 0 = t0 < t1 < t2 < . . . < tN−1 <tN < tN+1 = 1 and constructs the approximation xN(·) so that itis continuous on all of [0, 1] and linear between the nodes. Sincecontinuous piecewise linear approximating functions xN(·) are nottypically differentiable at the nodes, it is not possible to insertthis approximation directly into equation (2.59). In particular, thepiecewise smooth function xN(·) has only a piecewise continuousderivative xN(·) and hence xN(·) does not exist. In order to dealwith this lack of smoothness, we must define the concept of weaksolutions.

Before introducing this approximation, one derives a weak (orvariational) form of this equation. If x(·) is a solution to (2.59) -(2.60), then multiplying both sides of (2.59) by any function η(·)yields

[−x(t) + x(t)]η(t) = f(t)η(t), 0 < t < 1. (2.61)

If one assumes that η(·) is piecewise continuous, then integratingboth sides of (2.61) implies that∫ 1

0

[−x(t) + x(t)]η(t)dt =

∫ 1

0

f(t)η(t)dt,

or equivalently,∫ 1

0

−x(t)η(t)dt+

∫ 1

0

x(t)η(t)dt =

∫ 1

0

f(t)η(t)dt, (2.62)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for any function η(·) ∈ PWC(0, 1). If η(·) is piecewise smooth,then one can use integration by parts on the first term in (2.62)which implies that

− x(t)η(t)|t=1t=0 +

∫ 1

0

x(t)η(t)dt+

∫ 1

0

x(t)η(t)dt =

∫ 1

0

f(t)η(t)dt,

(2.63)for any function η(·) ∈ PWS(0, 1). If in addition η(·) ∈ PWS(0, 1)satisfies the boundary conditions (2.60), then the boundary termsin (2.63) are zero so that∫ 1

0

x(t)η(t)dt+

∫ 1

0

x(t)η(t)dt =

∫ 1

0

f(t)η(t)dt, (2.64)

for all η(·) ∈ PWS(0, 1) satisfying

η(0) = 0, η(1) = 0. (2.65)

What we have shown is that if x(·) is a solution of the two-pointboundary value problem (2.59)-(2.60), then x(·) satisfies (2.64) forall η(·) ∈ PWS(0, 1) that satisfy the Dirichlet boundary condi-tions (2.65). Observe that the equation (2.64) does not involve thesecond derivative of the function x(·) and equation (2.64) makessense as long as x(·) is just piecewise smooth on (0, 1).

Let PWS0(0, 1) denote the space of all functions z(·) ∈PWS(0, 1) satisfying z(0) = 0 and z(1) = 0. In particular,

PWS0(0, 1) = z(·) ∈ PWS(0, 1) : z(0) = 0, z(1) = 0 .(2.66)

We now define what we mean by a weak solution of the boundaryvalue problem (2.59)-(2.60).

Definition 2.15 (Weak Solution) We say that the functionx(·) is a weak solution of the two-point boundary value prob-lem (2.59)-(2.60), if:(1) x(·) ∈ PWS0(0, 1),(2) x(·) satisfies (2.64) for all η(·) ∈ PWS0(0, 1).

Observe that we have shown that a solution (in the strongsense) to the two-point boundary value problem (2.59)-(2.60) is

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


always a weak solution. We shall show later that the FundamentalLemma of the Calculus of Variations can be used to prove thata weak solution is also a strong solution. Therefore, for this onedimensional problem, weak and strong solutions are the same. Nowwe return to the issue of approximating this solution.

We begin by dividing the interval [0, 1] into N + 1 subintervals(called elements) of length ∆ = 1/(N+1) with nodes 0 = t0 < t1 <t2 < . . . < tN−1 < tN < tN+1 = 1, where for i = 0, 1, 2, . . . , N,N +1, ti = i ·∆. For i = 0, 1, 2, . . . , N,N + 1, define the hat functionshi(·) on [0, 1] by

h0(t) =

(t1 − t)/∆, 0 ≤ t ≤ t1

0, t1 ≤ t ≤ 1,

hN+1(t) =

(t− tN)/∆, tN ≤ t ≤ 1

0, 0 ≤ t ≤ tN, (2.67)

hi(t) =

(t− ti−1)/∆, ti−1 ≤ t ≤ ti(ti+1 − t)/∆, ti ≤ t ≤ ti+1

0, t /∈ (ti−1, ti+1), for i = 1, 2, . . . , N.

Plots of these functions are shown in Figures 2.24, 2.25 and 2.26below.

These hat functions provide a basis for all continuous piece-wise linear functions with (possible) corners at the nodes t1 <t2 < . . . < tN−1 < tN . Therefore, any continuous piecewise linearfunction yN(·) with corners only at these nodes can be written as

yN(t) =N+1∑i=0

yihi(t), (2.68)

where the numbers yi determine the value of yN(t) at ti. In partic-ular, yN(ti) = yi and in order to form the function yN(·) one mustprovide the coefficients yi for i = 0, 1, 2, . . . , N,N + 1. Moreover,if yN(t) is assumed to satisfy the Dirichlet boundary conditions(2.60), then yN(t0) = yN(0) = y0 = 0 and yN(tN+1) = yN(1) =yN+1 = 0 and yN(·) can be written as

yN(t) =N∑i=1

yihi(t). (2.69)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

h0(t)

Figure 2.24: The Hat Function h0(·)

If we seek an approximate continuous piecewise linear solutionxN(·) to the weak form of the two-point value problem (2.59)-

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

hi(t)

Figure 2.25: The Hat Function hi(·)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

hN+1(t)

Figure 2.26: The Hat Function hN+1(·)

(2.60), then xN(·) must have the form

xN(t) =N∑i=1

xihi(t) (2.70)

and we need only to “compute” the values xi for i = 1, 2, . . . , N .

We do this by substituting xN(t) =N∑i=1

xihi(t) into the weak form

of the equation given by (2.64). In particular, xN(t) is assumedto satisfy∫ 1

0

xN(t)η(t)dt+

∫ 1

0

xN(t)η(t)dt =

∫ 1

0

f(t)η(t)dt, (2.71)

for all η(·) ∈ PWS0(0, 1).

Substituting xN(t) =N∑i=1

xihi(t) into the weak equation (2.71)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


yields ∫ 1

0

(N∑i=1

xihi(t)

)η(t)dt+

∫ 1

0

(N∑i=1

xihi(t)

)η(t)dt

=

∫ 1

0

f(t)η(t)dt,

for all η(·) ∈ PWS0(0, 1). This equation can be written as

N∑i=1

xi

(∫ 1

0

hi(t)η(t)dt

)+

N∑i=1

xi

(∫ 1

0

hi(t)η(t)dt

)=

∫ 1

0

f(t)η(t)dt, (2.72)

for all η(·) ∈ PWS0(0, 1). In order to use the variational equationto compute the coefficients xi for i = 1, 2, . . . , N , we note thateach basis function hi(·) belongs to PWS0(0, 1) for i = 1, 2, . . . , N .Therefore, setting η(·) = hj(t) ∈ PWS0(0, 1) for each index j =1, 2, . . . , N , yields N equations

N∑i=1

xi

(∫ 1

0

hi(t)hj(t)dt

)+

N∑i=1

xi

(∫ 1

0

hi(t)hj(t)dt

)=

∫ 1

0

f(t)hj(t)dt. (2.73)

Define the N ×N mass matrix M = MN by

M = MN = [mi,j]i,j=1,2,...,N ,

where the entries mi,j of MN are given by the integrals

mi,j =

∫ 1

0

hj(t)hi(t)dt.

Likewise, define the N ×N stiffness matrix K = KN by

K = KN = [ki,j]i,j=1,2,...,N ,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where the entries ki,j of KN are given by the integrals

ki,j =

∫ 1

0

hj(t)hi(t)dt.

Finally, let fN be the N × 1 (column) vector defined by

fN =[f1 f2 · · · fN

]T,

where entries fj of fN are given by the integrals

fj =

∫ 1

0

hj(t)f(t)dt.

If x is the solution vector

x =

x1

x2...xN

,of (2.73), then x satisfies the matrix system

KNx+MNx = fN . (2.74)

DefiningLN = KN +MN

yields the algebraic equation

LNx = fN (2.75)

which must be solved to find xi for i = 1, 2, . . . , N . This ap-proach provides a mechanism for finding the (finite element) ap-

proximation xN(t) =N∑i=1

xihi(t). In particular, one solves the ma-

trix equation (2.75) for x =[x1 x2 · · · xN

]Tand then con-

structs the piecewise smooth finite element approximate function

xN(t) =N∑i=1

xihi(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


There are many issues that need to be addressed in order toprove that this method actually produces accurate approximationsof the solution to the two-point boundary value problem (2.59) -(2.60). What one might expect (hope) is that as N → +∞ thefunctions xN(·) converge to x(·) in some sense. We shall return tothis issue later, but we note here that this process breaks down ifthe matrix LN is not invertible. In particular, one needs to provethat LN is non-singular.

For this problem one can show that the mass and stiffnessmatrices are given by

MN =∆

6

4 1 0 0 · · · 0 01 4 1 0 · · · 0 00 1 4 1 · · · 0 00 0 1 4 · · · 0 0...

......

.... . .

......

0 0 · · · · · · 1 4 10 0 · · · · · · 0 1 4

and

KN =1

∆

2 −1 0 0 · · · 0 0−1 2 −1 0 · · · 0 0

0 −1 2 −1 · · · 0 00 0 −1 2 · · · 0 0...

......

.... . .

......

0 0 · · · · · · −1 2 −10 0 · · · · · · 0 −1 2

,

respectively. Thus, LN = KN +MN is given by

LN =

4∆6 + 2

∆∆6 −

1∆ 0 0 · · · 0 0

∆6 −

1∆

4∆6 + 2

∆∆6 −

1∆ 0 · · · 0 0

0 ∆6 −

1∆

4∆6 + 2

∆∆6 −

1∆ · · · 0 0

0 0 ∆6 −

1∆

4∆6 + 2

∆ · · · 0 0...

......

.... . .

......

0 0 · · · · · · ∆6 −

1∆

4∆6 + 2

∆∆6 −

1∆

0 0 · · · · · · 0 ∆6 −

1∆

4∆6 + 2

∆

and one can show that LN is a symmetric positive definite matrix.Thus, LN is non-singular (see [92]).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


0 0.2 0.4 0.6 0.8 1-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

t

x(t) = t(t-1)

x3(t)

Figure 2.27: N = 3 Finite Element Approximation

Consider the case where the function f(·) is given by

f(t) = t2 − t− 2.

It is easy to show (by direct substitution) that

x(t) = t(t− 1)

is the solution to the boundary value problem (2.59)-(2.60). InFigures 2.27 and 2.28 below, we compare the finite element solu-tions for N = 3 and N = 7. Clearly, the approximating solutionsconverge very rapidly.

Although it might seem like the problem discussed above hasnothing to do with classical problems such as the brachistochroneand minimal surface area problems, this is not true. In fact, weshall show later that finding solutions to the two-point boundaryproblem (2.59) - (2.60) is equivalent to solving the following sim-plest problem in the calculus of variations. Find x∗(·) ∈ PWS(0, 1)to minimize the integral

J(x(·)) =1

2

∫ 1

0

[x(s)]2 + [x(s)

]2 − 2x(s)f(s)ds, (2.76)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


0 0.2 0.4 0.6 0.8 1-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

t

x(t) = t(t-1)

x7(t)

Figure 2.28: N = 7 Finite Element Approximation

subject to the end conditions

x(0) = 0 and x(1) = 0. (2.77)

Again, this is a problem of finding a function passing through twogiven points that minimize an integral where f(t, x, u) is given by

f(t, x, u) =1

2

u2 + x2 − 2xf(t)

.

In fact, the three problems above are examples of the so calledSimplest Problem in the Calculus of Variations (SPCV) to be dis-cussed in Chapter 3.

2.5 Problem Set for Chapter 2

Problem 2.1 Let f : [0, 2]→ R be defined by f(x) = xe−x. Findall local minimizers for f(·) on [0, 2].

Problem 2.2 Let f : (0, 2]→ R be defined by f(x) = xe−x. Findall local minimizers for f(·) on (0, 2].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Problem 2.3 Let f : [−1,+1]→ R be defined by f(x) = |x|. Findall local minimizers for f(·) on [−1,+1].

Problem 2.4 Let f : [−1,+1] → R be defined by f(x) = x3e−x2.

Find all local minimizers for f(·) on [−1,+1].

Problem 2.5 Let f : [−2,+2] → R be defined by f(x) = x3e−x2.

Find all local minimizers for f(·) on [−2,+2].

Problem 2.6 Let x : [−1, 1]→ R be defined by

x(t) =

|t| , t 6= 0,1, t = 0.

Prove that z(t) = |t| is the only continuous function satisfyingz(t) = x(t) except at a finite number of points.

Problem 2.7 Let x : [−1, 1] → R and z : [−1, 1] → R be definedby x(t) = |t| and z(t) = t2, respectively. Compute d0(x(·); z(·)) andd1(x(·); z(·)).

Problem 2.8 Let xN : [−π, π] → R and z : [−π, π] → R bedefined by xN(t) = 1

Nsin(Nt) and z(t) = 0, respectively. Compute

d0(xN(·); z(·)) and d1(xN(·); z(·)).

Problem 2.9 Consider the functional J(x(·)) =π∫0

[x (s)]2

−[x (s)]2ds and let x(·) = sin(t). Compute the first and secondvariation of J(·) at x(·) in the direction of η(·) ∈ PWS(0, π).

Problem 2.10 Consider the functional J(x(·)) =2π∫0

[x (s)]2ds and

let x(·) = |t− 1|. Compute the first and second variation of J(·)at x(·) in the direction of η(·) ∈ PWC(0, π).

Problem 2.11 Let J : PWS(0, 1) −→ R be defined by

J(x(·)) =

∫ 1

0

x(s)√

1 + [x(s)]2ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and let x(s) = sin(s). Show that any η(·) ∈ PWS(0, 1) is an ad-missible direction for x(s) = sin(s). Compute the first and secondvariation of J(·) at x(·) in the direction of η(·). You may assumeit is allowable to pull the partial derivative with respect to ε insidethe integral.

Problem 2.12 Let J : PWS(0, 1) −→ R be defined by

J(x(·)) =

∫ t1

t0

f(s, x(s), x(s))ds

where f(t, x, u) is a C2 function. Given any x(·) ∈ PWS(t0, t1),show that η(·) is an admissible direction for all η(·) ∈ PWS(0, 1).Compute the first and second variation of J(·) at x(·) in the di-rection of η(·). You may assume it is allowable to pull the partialderivative with respect to ε inside the integral.

Problem 2.13 Show that for the river crossing problem that thetime to cross is given by the integral

J(x(·)) =

1∫0

√c2(1 + [x (s)]2)− [v(s)]2 − v(s)x (s)

c2 − [v(s)]2ds.

Problem 2.14 Use the Lagrange Multiplier Theorem to find allpossible minimizers of

J(x, y, x) = x2 + y2 + z2

subject to the constraints

g1(x, y, z) = x2 + y2 − 4 = 0,

g2(x, y, z) = 6x+ 3y + 2z − 6 = 0.

Problem 2.15 Use the Lagrange Multiplier Theorem to find allpossible minimizers of

J(x, y, x) = x2 + y2 + z2

subject to the constraints

g1(x, y, z) = x2 + y2 + z2 − 1 = 0,

g2(x, y, z) = x+ y + z − 1 = 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Advanced Problems

Problem 2.16 Given any x(·) ∈ PWS(t0, t1), let ‖x(·)‖0 =sup |x(t)| : t0 ≤ t ≤ t1. Prove that ‖·‖0 is a norm on PWS(t0, t1).

Problem 2.17 Use the finite element method to solve the twopoint boundary problem

−x(t) + 4x(t) = t, x(0) = x(1) = 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 3

The Simplest Problem inthe Calculus of Variations

We turn now to the Simplest Problem in the Calculus of Variations(SPCV) with both endpoints fixed and focus on obtaining (firstorder) necessary conditions. We begin with global minimizers andthen return to the problem for local minimizers later.

3.1 The Mathematical Formulation of

the SPCV

We shall make the standard assumption that f = f(t, x, u) isa C2 function of three real variables t, x and u. In particular,f : D(f) ⊆ R3 → R is real-valued, continuous and all the partialderivatives

ft(t, x, u), fx(t, x, u), fu(t, x, u),

ftx(t, x, u), ftu(t, x, u), fxu(t, x, u),

andftt(t, x, u), fxx(t, x, u), fuu(t, x, u),

exist and are continuous on the domain D(f). Recall that bySchwarz’s Theorem (see page 369 in [15]) this implies all the mixed

91

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

92 Chapter 3. The Simplest Problem

derivatives of 2 or less are equal. Thus, we have

∂2

∂t∂xf(t, x, u) =

∂2

∂x∂tf(t, x, u),

∂2

∂t∂uf(t, x, u) =

∂2

∂u∂tf(t, x, u)

and∂2

∂x∂uf(t, x, u) =

∂2

∂u∂xf(t, x, u).

In most cases the domain D(f) will have the form D(f) =[t0, t1] × R2. Although, there are problems where f(t, x, u) willnot be defined for all [x u]T ∈ R2 and one should be aware ofthese cases, the main ideas discussed in this book are not greatlyimpacted by this detail.

Remark 3.1 It is important to comment on the “notation”f(t, x, u) used in this book. Using the symbols t, x and u for theindependent variables is not standard in many classical books onthe calculus of variations. However, the choice of the symbol u asthe third independent variable is rather standard in treatments ofmodern control theory. Consequently, using the same notation forboth classical and modern problems avoids having to “switch” no-tation as the material moves from variational theory to control.Many older texts on the calculus of variations use notation suchas f(t, x, x) or f(x, y, y) so that one sees terms of the form

∂

∂xf(t, x, x),

∂

∂yf(x, y, y) and

∂2

∂y∂xf(x, y, y).

This notation often leads to a misunderstanding (especially by stu-dents new to these subjects) of the difference between the inde-pendent variable x and the derivative of the function x(·).Ewing’s book [77] was one of the first texts on the calculus of vari-ations to use the notation f(x, y, r) and Leitmann [120] uses asimilar notation f(t, x, r) to avoid this issue. Using t, x and u forthe independent variables is mathematically more pleasing, leadsto fewer mistakes and is consistent with modern notation.

Note that if x : [t0, t1]→ R is PWS on [t0, t1], then the functiong : [t0, t1]→ R defined by

g(s) = f(s, x(s), x(s))

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 3. The Simplest Problem 93

is PWC on [t0, t1]. Therefore, the integral

t1∫t0

f (s, x(s), x (s)) ds

exists and is finite. Let X = PWS(t0, t1) denote the space of allreal-valued piecewise smooth functions defined on [t0, t1]. For eachPWS function x : [t0, t1] → R, define the functional (a “functionof a function”) by

J(x(·)) =

t1∫t0

f (s, x(s), x (s)) ds. (3.1)

Assume that the points [t0 x0]T ∈ R1 ×R1 and [t1 x1]T ∈ R1 ×R1

are given and define the set of PWS functions Θ by

Θ = x(·) ∈ PWS(t0, t1) : x (t0) = x0, x (t1) = x1 . (3.2)

Observe that J : X → R is a real valued function on X =PWS(t0, t1).

The Simplest Problem in the Calculus of Variations(SPCV) is the problem of minimizing J(·) on Θ. In particular, thegoal is to find x∗ (·) ∈ Θ such that

J(x∗(·)) =

t1∫t0

f (s, x∗(s), x∗ (s)) ds ≤ J(x(·))

=

t1∫t0

f (s, x(s), x (s)) ds, (3.3)

for all x (·) ∈ Θ.In the brachistochrone problem,

f(t, x, u) =

√1 + [u]2

√2gx

,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


1( )x t

*( )x t2 ( )x t

1x

0x

0t 1t

Figure 3.1: The Simplest Problem in the Calculus of Variations

so that

J(x(·)) =

a∫0

√1 + [x (s)]2√

2gx (s)ds =

a∫0

f (s, x(s), x (s)) ds.

For the problem of finding the surface of revolution of minimumarea,

f(t, x, u) = 2πx

√1 + [u]2,

so that

J(x(·)) = S = 2π

a1∫a0

x (s)

√1 + [x (s)]2ds =

a1∫a0

f (s, x(s), x (s)) ds.

The Simplest Problem in the Calculus of Variations will bethe focus of the next few chapters. We will move on to other(more general) problems in later chapters. The basic goal is

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


the development of the classical necessary conditions for (local)minimizers. In the following chapters we shall derive necessaryconditions for local minimizers. However, we start with globalminimizers.

Before we derive the first necessary condition, we develop afew fundamental lemmas that provide the backbone to much ofvariational theory. In addition, we review some basic results ondifferentiation.

3.2 The Fundamental Lemma of the


The Fundamental Lemma of the Calculus of Variations (FLCV)is also known as the Du Bois-Reymond Lemma. To set the stagefor the lemma, we need some additional notation. Let V0 denotethe set

V0 = PWS0(t0, t1) = η(·) ∈ PWS(t0, t1) : η (t0) = η (t1) = 0 .(3.4)

Observe that V0 ⊆ PWS(t0, t1) is the set of all PWS functionson (t0, t1) that vanish at both ends (see Figure 3.2 below). Thespace V0 defined by (3.4) is called the space of admissible variationsfor the fixed endpoint problem.

Lemma 3.1 (Fundamental Lemma of the Calculus ofVariations)

Part (A): If α(·) is piecewise continuous on [t0, t1] and

t1∫t0

α (s) η (s) ds = 0 (3.5)

for all η(·) ∈ V0, then α (t) is constant on [t0, t1] except at a finitenumber of points. In particular, there is a constant c and a finite

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


1( )t

1t2 ( )t

3( )t0t

Figure 3.2: Admissible Variations

set of points t0 < t1 < t2 < ... < tp < t1, in (t0, t1) such that foreach t ∈ (t0, t1) with t 6= ti, i = 1, 2, ...., p

α (t) = c.

The converse is also true.

Part (B): If α(·) and β(·) are piecewise continuous functionson [t0, t1] and

t1∫t0

[α (s) η (s) + β (s) η (s)] ds = 0 (3.6)

for all η(·) ∈ V0, then there is a constant c such that

β (t) = c+

t∫t0

α (s) ds

except at a finite number of points. In particular, there is a finiteset of points t0 < t1 < t2 < ... < tq < t1, in (t0, t1) such that for

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


each t ∈ (t0, t1) with t 6= ti, i = 1, 2, ...., q, β (t) exists and

β (t) = α (t) .


Proof of Part (A): Assume (3.5) holds for all η(·) ∈ V0. Let

c =1

t1 − t0

t1∫t0

α (s) ds,

and define the function η(·) by

η (t) =

t∫t0

[α (s)− c] ds.

Observe that η(·) is PWS since it is the integral of a piecewisecontinuous function. Also,

η (t0) =

t0∫t0

[α (s)− c] ds = 0, (3.7)

and

η (t1) =

t1∫t0

[α (s)− c] ds =

t1∫t0

α (s) ds−t1∫t0

cds

=

t1∫t0

α (s) ds− c[t1 − t0] = 0. (3.8)

Therefore, it follows from (3.7) and (3.8) that η(·) ∈ V0. Hence,the assumption implies that

t1∫t0

α (s) [d

dsη (s)]ds = 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


However, since c is constant and η (t0) = η (t1) = 0, it follows that

t1∫t0

c[d

dsη (s)]ds = c [η (t1)− η (t0)] = 0.

Therefore, we have that

t1∫t0

[α (s)− c] [d

dsη (s)]ds =

t1∫t0

α (s) [d

dsη (s)]ds−

t1∫t0

c[d

dsη (s)]ds = 0

and since

[d

dsη (s)] = [α (t)− c] ,

it follows that

t1∫t0

[α (s)− c]2 ds =

t1∫t0

[α (s)− c] [d

dsη (s)]ds = 0. (3.9)

The assumption that α (·) is piecewise continuous implies thatthere is a partition of [t0, t1], say t0 = t0 < t1 < t2 < ... < tp <tp+1 = t1, such that α (·) is continuous on each subinterval (ti, ti+1).On the other hand (3.9) implies that

ti+1∫ti

[α (s)− c]2 ds = 0

on these subintervals. Consequently, for s ∈ (ti, ti+1) it follows that

[α (s)− c]2 = 0

and hence on each subinterval (ti, ti+1)

α (s) = c,

which establishes the result.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


For the converse, simply note that if η(·) ∈ V0 and

α (t) = c.

for each t ∈ (t0, t1) with t 6= ti, i = 1, 2, ...., p, then

t1∫t0

α (s) [d

dsη (s)]ds =

p∑i=0

ti+1∫ti

α (s) [d

dsη (s)]ds

=

p∑i=0

ti+1∫ti

c[d

dsη (s)]ds

= c[

p∑i=0

ti+1∫ti

d

dsη (s) ds]

= c

t1∫t0

[d

dsη (s)]ds = c [η (t1)− η (t0)] = 0,

and this proves the converse of Part (A).

Proof of Part (B): Assume (3.6) holds for all for all η(·) ∈ V0.Let g (·) be defined by

g (t) =

t∫t0

α (s) ds,

and observe that g (·) is piecewise smooth with

d

dtg (t) = g(t) = α (t)

except at a finite number of points in (t0, t1). Integrate (3.6) by

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


parts, to obtain

t1∫t0

[α (s) η (s) + β (s) [

d

dsη (s)]

]ds

=

t1∫t0

[[d

dsg (s)]η (s) + β (s) [

d

dsη (s)]

]ds

=

t1∫t0

[[d

dsg (s)]η (s)

]ds+

t1∫t0

[β (s) [

d

dsη (s)]

]ds

= g (t) η (t)|t=t1t=t0−

t1∫t0

[g (s) [

d

dsη (s)]

]ds+

t1∫t0

[β (s) [

d

dsη (s)]

]ds

=

t1∫t0

[−g (s) [

d

dsη (s)]

]ds+

t1∫t0

[β (s) [

d

dsη (s)]

]ds

=

t1∫t0

[−g (s) + β (s)] [d

dsη (s)]ds.

Therefore, it follows that

t1∫t0

[−g (s) + β (s)] η (s) ds =

t1∫t0

[α (s) η (s) + β (s) η (s)] ds = 0,

for all η(·) ∈ V0. Applying Part (A) yields the existence of a con-stant c such that

−g (t) + β (t) = c,

except at a finite number of points. Therefore,

β (t) = c+ g(t) = c+

t∫t0

α (s) ds (3.10)

except at a finite number of points. Since α(·) is piecewise con-tinuous, there is a partition of [t0, t1], say t0 = t0 < t1 < t2 <

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


... < tq < tq+1 = t1 such that on each subinterval (ti, ti+1), β (t) isdifferentiable and for all t 6= ti, i = 1, 2, ...., q

β (t) = α (t) .

Now assume that (3.10) holds. If η(·) ∈ V0, then

t1∫t0

[α (s) η (s) + β (s) [

d

dsη (s)]

]ds

=

t1∫t0

[[d

dsβ (s)]η (s) + β (s) [

d

dsη (s)]

]ds

=

t1∫t0

d

ds[β (s) η (s)] ds

= β (s) η (s)|s=t1s=t0= 0

and hence the converse is also true. This completes the proof ofthe lemma.

Remark 3.2 A Third Important Remark on Notation:Observe that Part (A) of the FLCV implies that α (t) = c ex-cept at a finite number of points. If one defines the continuousfunction αc (t) ≡ c, then it follows that α (t) = αc (t) ≡ c e.f. andαc (t) is the only continuous function satisfying α (t) = αc (t) ≡ ce.f. Moreover, even at points ti where α

(ti)6= c, it follows that

α(t+i)

= α(t−i)

= c. Thus, as noted in Remark 2.10 we identifyα(·) with the continuous constant function αc (t) ≡ c.

Likewise, Part (B) of the FLCV implies that if we define thePWS function βc(·) by

βc (t) ≡ c+

t∫t0

α (s) ds,

then it follows that β(·) = βc(·) e.f. and βc(·) is the only PWS(continuous) function satisfying β(·) = βc(·) e.f. Moreover,

β(t+i)

= βc(t+i)

= βc(t−i)

= β(t−i).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Also, at a corner t of βc(·), it follows that

β(t+)

= βc(t+)

= α(t+)

andβ(t−)

= βc(t−)

= α(t−).

In light of Remarks 2.10 and 2.12, to be consistent we shall usethe equivalent PWS representation βc(·) of β(·) when itexists and not distinguish between β(·) and βc(·) except inspecial circumstances.

We turn now to the derivation of Euler’s Necessary Condition fora global minimizer. Necessary conditions for local minimizers willbe treated later.

3.3 The First Necessary Condition for

a Global Minimizer

Assume that x∗ (·) ∈ Θ is a global minimizer for J(·) on Θ. Inparticular, assume that x∗ (·) ∈ Θ satisfies

J(x∗(·)) =

t1∫t0

f (t, x∗(x), x∗ (t)) dt ≤ J(x(·))

=

t1∫t0

f (t, x(t), x (t)) dt, (3.11)

for all x (·) ∈ Θ.Let η(·) ∈ V0 and consider the “variation”

ϕ(t, ε) , x∗ (t) + εη(t). (3.12)

Observe that for each ε ∈ R, the variation ϕ(x, ε) satisfies thefollowing conditions:

(i) ϕ(t0, ε) = x∗ (t0) + εη(t0) = x0 + εη(t0) = x0 + ε0 = x0,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(ii) ϕ(t1, ε) = x∗(t1) + εη(t1) = x1 + εη(t1) = x1 + ε0 = x1,

(iii) ϕ(t, ε) = x∗ (t) + εη(t) ∈ PWS(t0, t1).

It follows that if η(·) ∈ V0, then for all ε ∈ R the variation ϕ(t, ε) ,x∗ (t) + εη(t) belongs to Θ, i.e. it is admissible. Since x∗ (·) ∈ Θminimizes J(·) on Θ, it follows that

J(x∗(·)) ≤ J(x∗ (·) + εη(·)) (3.13)

for all ε ∈ (−∞,+∞). Define F : (−∞,+∞) −→ R by

F (ε) = J(x∗(·) + εη(·)) =

t1∫t0

f(s, x∗ (s) + εη(s), x∗ (s) + εη(s))ds,

(3.14)and note that (3.13) implies that

F (0) = J(x∗(·)) ≤ J(x∗(·) + εη(·)) = F (ε)

for all ε ∈ (−∞,+∞). Therefore, F (·) has a minimum on(−∞,+∞) at ε∗ = 0 and applying Theorem 2.1 it follows that(if the derivative exists)

d

dεF (ε)

∣∣∣∣ε=0

=d

dε[J(x∗(·) + εη(·))]

∣∣∣∣ε=0

= 0. (3.15)

Observe that (3.15) must hold for all η(·) ∈ V0. Recalling Defini-tion 2.14, the derivative defined by (3.15) is called the first varia-tion of J(·) at x∗(·) in the direction of η(·). In particular, we haveestablished the following necessary condition.

Theorem 3.1 Assume x∗ (·) ∈ Θ minimizes J(·) on Θ. If η(·) ∈V0 and the first variation δJ(x∗(·); η(·)) exists, then

δJ(x∗(·); η(·)) =d

dε[J(x∗(·) + εη(·))]

∣∣∣∣ε=0

= 0. (3.16)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


In order to apply the previous theorem, we need to know thatthe first variation exists at x∗(·) for all η(·) ∈ V0, how to computeit and then use it to obtain useful information about the minimizerx∗(·). We first recall Leibniz’s formula (see page 245 in [15]).

Lemma 3.2 (Leibniz’s Formula) Suppose that for each ε ∈[−δ, δ] the function g (t, ε) and the partial derivative ∂

∂εg (t, ε) =

gε (t, ε) are continuous functions of t on the interval [a, b]. Inaddition, assume that the functions p : [−δ, δ] −→ [a, b] andq : [−δ, δ] −→ [a, b] are differentiable. If F : [−δ, δ] −→ R isdefined by

F (ε) =

q(ε)∫p(ε)

g (s, ε) ds,

then F ′(ε) exists and

d

dεF (ε) = g(q(ε), ε)[

d

dεq(ε)]− g(p(ε), ε)[

d

dεp(ε)]

+

q(ε)∫p(ε)

gε (s, ε) ds. (3.17)

A special case occurs when p(·) and q(·) are independent of ε.In this case

d

dεF (ε) =

q∫p

gε (s, ε) ds.

Suppose that x∗(·) and η(·) ∈ PWS(t0,t1) and define the function

g (t, ε) = f (t, x∗ (t) + εη(t), x∗ (t) + ε η (t)) .

It follows that

F (ε) = J(x∗(·) + εη(·)) =

t1∫t0

f(s, x∗ (s) + εη(s), x∗ (s) + εη(s))ds

=

t1∫t0

g(s, ε)ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and the goal is to differentiate F (ε) at ε = 0. Since x∗(·) andη(·) ∈ PWS(t0, t1), it follows that g (t, ε) is piecewise continuousand there are points t0 = t0 < t1 < t2 < ... < tp < tp+1 = t1,such that g (t, ε) is continuous and bounded on each subinterval(ti, ti+1). For example, let t1 < t2 < ... < tp be the union of allpoints where x(·) and η(·) are discontinuous. Observe that

F (ε) = J(x∗(·) + εη(·))

=

p∑i=0

ti+1∫ti

f(s, x∗ (s) + εη(s), x∗ (s) + εη(s))ds

=

p∑i=0

ti+1∫ti

g(s, ε)ds,

so that in order to use Leibniz’s Lemma 3.2 above, we must onlyshow that g (t, ε) = f (t, x∗ (t) + εη(t), x∗ (t) + ε η (t)) and the par-tial derivative gε (t, ε) are continuous on each subinterval (ti, ti+1).Since x∗(·) and η(·) are continuous on each subinterval (ti, ti+1)and the integrand f = f(t, x, u) is a smooth function, it followsthat g (t, ε) is continuous and gε (t, ε) exists and is also continuouson each subinterval (ti, ti+1). Applying the chain rule one obtainsthat for t ∈ (ti, ti+1)

gε (t, ε) =d

dε[f (t, x∗ (t) + εη(t), x∗ (t) + ε η (t))]

= fx (t, x∗ (t) + εη(t), x∗ (t) + ε η (t)) η (t)

+ fu (t, x∗ (t) + εη(t), x∗ (t) + ε η (t)) η (t) .

Leibniz’s Lemma 3.2 can now be applied on each subinterval

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


to produce the expression

d

dεF (ε) =

d

dε

t1∫t0

g(s, ε)ds =d

dε

p∑i=0

ti+1∫ti

g(s, ε)ds

=

p∑i=0

d

dε

ti+1∫ti

g(s, ε)ds =

p∑i=0

ti+1∫ti

gε(s, ε)ds

=

t1∫t0

gε(s, ε)ds,

and hence

d

dεF (ε)

∣∣∣∣ε=0

=d

dε[J(x∗(·) + εη(·))]

∣∣∣∣ε=0

=

t1∫t0

gε(s, 0)ds. (3.18)

Observe that we needed only to compute gε(t, ε) on each subin-terval (ti, ti+1) where both x∗(·) and η(·) are continuous. Moreover,the chain rule produced

gε (s, ε) = [fx (s, x∗ (s) + εη(s), x∗ (s) + εη (s))] · η (s)

+ [fu (s, x∗ (s) + εη(s), x∗ (s) + εη (s))] · η (s) .

Setting ε = 0, it follows that (recall that f(t, x, u) ∈ C2)

gε(s, ε)|ε=0 = [fx (s, x∗(s), x∗ (s))] η (s)

+ [fu (s, x∗(s), x∗ (s))] η (s) . (3.19)

Substituting (3.19) into (3.18) yields the existence and an explicitformula for the first variation δJ(x∗(·); η(·)) of J(·) at x∗(·) in thedirection of η(·). In particular,

δJ(x∗(·); η(·)) =

t1∫t0

fx(s, x∗(s), x∗(s))η(s)

+ fu(s, x∗(s), x∗(s))η(s)ds. (3.20)

In view of Theorem 3.1 and the formula (3.20) we have estab-lished the following result.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Theorem 3.2 If x∗ (·) ∈ Θ minimizes J(·) on Θ, and η(·) ∈ V0,then the first variation δJ(x∗(·); η(·)) of J(·) at x∗(·) in the direc-tion of η(·) exists and

t1∫t0

[fx(s, x∗(s), x∗(s))] · η(s)

+ [fu(s, x∗(s), x∗(s))] · η(s)ds = 0. (3.21)

Although (3.21) is equivalent to “setting the first variationequal to zero”, it is not very informative. However, this is wherethe Fundamental Lemma of the Calculus of Variations be-comes useful. Applying (3.6) in Lemma 3.1 to (3.21) with

α(t) = [fx (t, x∗(t), x∗ (t))]

andβ(t) = [fu (t, x∗(t), x∗ (t))] ,

yields the existence of a constant c such that for all t ∈ [t0, t1]

[fu (t, x∗(t), x∗ (t))] = β (t) = c+

t∫t0

α (s) ds

= c+

t∫t0

[fx (s, x∗(s), x∗ (s))] ds.

Thus, we have derived (proven) the following Euler necessary con-dition.

Theorem 3.3 (Euler Necessary Condition for a GlobalMinimum) If x∗ (·) ∈ Θ minimizes J(·) on Θ, then

(1) there is a constant c such that for all t ∈ [t0, t1],

[fu (t, x∗(t), x∗ (t))] = c+

t∫t0

[fx (s, x∗(s), x∗ (s))] ds, (3.22)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(2) x∗(t0) = x0,

(3) x∗(t1) = x1.

(4) Between corners the function fu (t, x∗(t), x∗ (t)) is differen-tiable and

d

dt[fu (t, x∗(t), x∗ (t))] = [fx (t, x∗(t), x∗ (t))] . (3.23)

Remark 3.3 At first cut it may seem strange to include items (2)x∗(t0) = x0 and (3) x∗(t1) = x1 in the statement of the theoremsince these conditions are used in defining the constraint set Θ.However, later we shall consider problems with “free” end condi-tions that do not occur in the definition of the corresponding con-straint set and the corresponding necessary conditions will yield“natural” boundary conditions (and transversality conditions) toreplace (2) or (3) or both. In such cases it is essential to includethese boundary conditions as a fundamental part of the necessarycondition. By “repeating” conditions (2) and (3) for the simplestproblem we hope to emphasize the importance of obtaining the cor-rect boundary conditions.

Equation (3.22) is called Euler’s Integral Equation, while equa-tion (3.23) is called Euler’s Differential Equation. Therefore, wehave shown that a minimizer x∗ (·) of J(·) on Θ must satisfy Eu-ler’s Integral Equation and between corners x∗ (·) satisfies Euler’sDifferential Equation. We say that a function x (·) satisfies Euler’sequation if it is a solution to either Euler’s Integral Equation or,where differentiable, Euler’s Differential Equation. Euler’s equa-tion is one of the most important equations in the calculus ofvariations. What we have shown is that optimizers for the SPCVmust satisfy Euler’s equation. However, not all solutions of Euler’sequation are minimizers.

Definition 3.1 Any piecewise smooth function x(·) satisfying Eu-ler’s Integral Equation

fu (t, x(t), x (t)) = c+

t∫t0

fx (s, x(s), x (s)) ds, (3.24)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is called an extremal.

Remark 3.4 It is very important to note that extremals do nothave to satisfy any prescribed boundary conditions. Inparticular, any piecewise smooth function x(·) satisfying Euler’sIntegral Equation is called an extremal. The Euler Necessary Con-dition (3.22) implies that any global minimizer of J(·) on Θ is anextremal. As we see later, this is also true for local minimizers.

Since the right-hand side of the equation

fu (t, x(t), x (t)) = c+

t∫t0

fx (s, x(s), x (s)) ds,

is a continuous function of t, it follows that if x(·) is an extremal,then the function

ψ(t) , fu (t, x(t), x (t)) = c+

t∫t0

fx (s, x(s), x (s)) ds

is continuous. Thus, even when we have a corner at t, i.e.

x(t+)6= x

(t−),

the left-hand and right-hand limits of ψ(t) , ∂∂uf (t, x(t), x (t))

must be equal. In particular,

fu(t, x(t), x

(t+))

= ψ(t+) = ψ(t) = ψ(t−) = fu(t, x(t), x(t−))

for all t ∈ (t0, t1). Therefore we have established the followingresult.

Theorem 3.4 (Weierstrass-Erdmann Corner Condition) Ifx(·) ∈ PWS(t0, t1) is an extremal, then

fu(t, x(t), x(t+)) = fu(t, x(t), x(t−)). (3.25)

for all t ∈ (t0, t1).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Definition 3.2 If fuu(t, x, u) 6= 0 for all (t, x, u) ∈ D(f), then theintegrand f(t, x, u) is said to be non-singular . If fuu(t, x, u) > 0for all (t, x, u) ∈ D(f), then the integrand f(t, x, u) is said to beregular and the SPCV is said to be a regular problem.

If t and x are fixed, then the function ρ(u) , f(t, x, u

)is

called the figurative (or indicatrix) at (t, x). Note that if f(t, x, u)is a regular integrand, then the figurative is a strictly convexfunction since d2

du2ρ(u) = fuu

(t, x, u

)> 0. This implies that

dduρ(u) = fu

(t, x, u

)is a strictly increasing function. This obser-

vation leads to the following result.

Theorem 3.5 If the integrand f(t, x, u) is regular, then all ex-tremals are of class C2. In particular, extremals for a regular prob-lem cannot have corners.

Proof: Assume that f(t, x, u) is regular, and suppose that x(·) isan extremal with a corner at t, i.e.

x(t+)6= x

(t−).

Without loss of generality we may assume that

u1 , x(t+)< x

(t−), u2.

The derivative of the figurative ρ′(u) , fu(t, x(t), u) at (t, x(t)) isstrictly increasing so that

fu(t, x(t), u1) =d

duρ(u1) <

d

duρ(u2) = fu(t, x(t), u2). (3.26)

However, the corner condition (3.25) implies that

fu(t, x(t), u1) = fu(t, x(t), x(t+)) = fu(t, x(t), x

(t−))

= fu(t, x(t), u2),

which contradicts (3.26). Therefore, x(·) cannot have a corner att. Since x(·) has no corners, it follows that x(·) ∈ C1(t0, t1) andhence x(·) ∈ C2(t0, t1) and this completes the proof.

Actually, Hilbert proved a much stronger result. We state histheorem below.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Theorem 3.6 (Hilbert’s Differentiability Theorem) If x(·) ∈PWS(t0, t1) is an extremal, t is not a corner of x(·), andfuu(t, x(t), x(t)) 6= 0, then there exists a δ > 0 such that x(·) hasa continuous second derivative for all t ∈ (t− δ, t+ δ) and

[fuu (t, x(t), x (t))] · x (t) = − [fut(t, x(t), x(t))]

− [fux (t, x(t), x (t))] · x (t) (3.27)

+ [fx (t, x(t), x (t))] .

If in addition, f(t, x, u) is of class Cp, p ≥ 2, then any extremalx(·) is also of class Cp on (t− δ, t+ δ).

Observe that Theorem 3.6 implies that, for regular inte-grands, all extremals x(·) have continuous second derivatives x(·)since f is assumed to be of class C2. Therefore, we may differen-tiate (3.23) by applying the chain rule to obtain

d

dt[fu (t, x(t), x (t))] = [fut (t, x(t), x (t))]

+ [fux (t, x(t), x (t))] · x (t) + [fuu (t, x(t), x (t))] · x (t) .

Hence, the Euler Differential Equation (3.23) becomes the secondorder differential equation

[fuu (t, x(t), x (t))] · x (t) = [fx (t, x(t), x (t))]− [fut(t, x(t), x(t))](3.28)

− [fux (t, x(t), x (t))] · x (t) .

Observe that since fuu (t, x (t) , x (t)) > 0, this differential equationmay be written as

x (t) =[fx (t, x(t), x (t))]− [fut (t, x(t), x (t))]− [fux (t, x(t), x (t))] · x (t)

[fuu (t, x(t), x (t))].

(3.29)

Note that Hilbert’s Theorem is valid even if the problem is notregular. The key is that along an extremal x(·), the function

ρ(t) = fuu(t, x(t), x(t)) 6= 0

for all points t where x(t) exists. Such extremals have a specialname leading to the following definition.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Definition 3.3 If x(·) ∈ PWS(t0, t1) is an extremal, then x(·) iscalled a non-singular extremal if fuu(t, x(t), x(t)) 6= 0 for allt ∈ (t0, t1) where x(t) exists. If x(·) ∈ PWS(t0, t1) is an extremal,then x(·) is called a regular extremal if fuu(t, x(t), x(t)) > 0 forall t ∈ (t0, t1) where x(t) exists.

3.3.1 Examples

We shall go through a few examples to illustrate the application ofthe necessary condition. It is important to note that at this pointwe can say very little about the existence of a minimum exceptin some special cases. The following three examples illustrate thatthe interval [t0, t1] plays an important role in SPCV.

Example 3.1 Find a PWS function x∗(·) satisfying x (0) =0, x (π/2) = 1 and such that x∗(·) minimizes

J(x(·)) =

π/2∫0

1

2

([x (s)]2 − [x (s)]2

)ds.

We note that t0 = 0, t1 = π/2, x0 = 0, and x1 = 1. The integrandf(t, x, u) is given by

f(t, x, u) =1

2([u]2 − [x]2)

and hence,

fx(t, x, u) = −xfu(t, x, u) = +u

fuu(t, x, u) = +1 > 0.

We see that f(t, x, u) is regular and hence the minimizer cannothave corners. Euler’s Equation

d

dt[fu (t, x∗(t), x∗ (t))] = [fx (t, x∗(t), x∗ (t))]

becomesd

dt[x∗ (t)] = [−x∗ (t)] ,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


or equivalently,x∗ (t) + x∗ (t) = 0.

The general solution is

x∗ (t) = α cos(t) + β sin(t),

and applying the boundary conditions

0 = x∗ (0) = α cos(0) + β sin(0) = α,

1 = x∗ (π/2) = α cos(π/2) + β sin(π/2) = β,

it follows thatx∗ (t) = sin(t)

is the only solution to the Euler Necessary Condition as given inTheorem 3.3. Observe that we do not know if x∗(t) = sin(t) min-imizes J(·). However, if there is a minimizer, then x∗(t) = sin(t)must be the minimizer since it is the only function satisfying thenecessary condition.

Example 3.2 Find a PWS function x∗(·) satisfying x (0) =0, x (3π/2) = 0 and such that x∗(·) minimizes

J(x(·)) =

3π/2∫0

1

2

([x (s)]2 − [x (s)]2

)ds.

Observe that the integrand f(t, x, u) is the same as in Example 3.1so that the Euler Equation is the same

x∗ (t) + x∗ (t) = 0

and has the general solution

x∗ (t) = α cos(t) + β sin(t).

The boundary conditions

0 = x∗ (0) = α cos(0) + β sin(0) = α,

0 = x∗ (π/2) = α cos(3π/2) + β sin(3π/2) = −β,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


imply thatx∗ (t) ≡ 0

is the only solution to the Euler Necessary Condition as given inTheorem 3.3. Again, at this point we do not know if x∗(t) ≡ 0minimizes J(·). However, if there is a minimizer, then x∗(t) ≡ 0must be the minimizer since it is the only function satisfying thenecessary condition.

Example 3.3 Find a PWS function x∗(·) satisfying x (0) =0, x (2π) = 0 and such that x∗(·) minimizes

J(x(·)) =

2π∫0

1

2

([x (s)]2 − [x (s)]2

)ds.

Again, the integrand f(t, x, u) is the same as in Example 3.1 sothat the Euler Equation is

x∗ (t) + x∗ (t) = 0,

and has the general solution

x∗ (t) = α cos(t) + β sin(t).

However, the boundary conditions

0 = x∗ (0) = α cos(0) + β sin(0) = α,

0 = x∗ (2π) = α cos(2π) + β sin(2π) = α,

only imply thatx∗ (t) = β sin(t).

Therefore, there are infinitely many solutions to the Euler Neces-sary Condition as given in Theorem 3.3, and we do not know ifany of these functions x∗(t) = β sin(t) actually minimizes J(·).

Example 3.4 Find a PWS function x∗(·) satisfying x (−1) =0, x (1) = 1 and such that x∗(·) minimizes

J(x(·)) =

1∫−1

[x (s)]2 [x (s)− 1]2 ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The integrand f(t, x, u) is given by

f(t, x, u) = [x]2 [u− 1]2

and hence,

fx(t, x, u) = 2x[u− 1]2

fu(t, x, u) = 2[x]2[u− 1]

fuu(t, x, u) = 2[x]2 ≥ 0.

Note that the integrand is not regular since fuu(t, 0, u) = 0. TheEuler equation is

[fu (t, x∗(t), x∗ (t))] = c+

t∫−1

[fx (s, x∗(s), x∗ (s))] ds,

or equivalently,

2[x∗(t)]2[x∗ (t)− 1] = c+

t∫−1

[2x∗(s)[x∗ (s)− 1]2

]ds.

This equation is not as simple as in the previous examples.However, it is possible to find the solution to this problem by “in-spection” of the cost function. Observe that

J(x(·)) =

1∫−1

[x (s)]2 [x (s)− 1]2 ds ≥ 0

for all functions x (·), and

J(x(·)) =

1∫−1

[x (s)]2 [x (s)− 1]2 ds = 0

if x (s) = 0 or x (s)− 1 = 0. Consider the function defined by

x∗(t) =

0, −1 ≤ t ≤ 0,t, 0 ≤ t ≤ 1.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Note that

J(x∗(·)) =

1∫−1

[x∗ (s)]2 [x∗ (s)− 1]2 ds

=

0∫−1

[x∗ (s)]2 [x∗ (s)− 1]2 ds+

1∫0

[x∗ (s)]2 [x∗ (s)− 1]2 ds

=

0∫−1

[0]2 [0− 1]2 ds+

1∫0

[s]2 [1− 1]2 ds = 0,

and hence

J(x∗(·)) = 0 ≤1∫

−1

[x (s)]2 [x (s)− 1]2 ds = J(x(·)),

for all x(·). Hence, x∗(·) is a global minimizer for J(·) on

Θ = x(·) ∈ PWS(−1, 1) : x (−1) = 0, x (1) = 1 .

Remark 3.5 This example illustrates one important point about“solving” problems. Always think about the problem beforeyou start to “turn the crank” and compute.

Example 3.5 Minimize the functional J(x(·)) =1∫0

[x (s)]3ds,

subject to the endpoint conditions x (0) = 0 and x (1) = 1. Here,f(t, x, u) = u3, fu(t, x, u) = 3u2, and fx(t, x, u) = 0. Euler’s equa-tion becomes

3[x∗(t)]2 = fu(t, x∗(t), x∗(t)) = c+

t∫0

fx(s, x∗(s), x∗(s))ds

= c+

t∫0

0ds = c,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


or equivalently,

3[x∗(t)]2 = c.

Therefore,

x∗(t) = ±√c/3 = ±k

and all we know is that x∗(t) is piecewise linear. Since x∗ (0) = 0

and x∗ (1) = 1, a possible candidate is

x∗(t) = t.

Although we have derived a first order necessary condition forthe simplest problem, the basic idea can be extended to very gen-eral problems. In particular, we shall see that the simplest problemis a special case of a class of “infinite dimensional” optimizationproblems. We shall discuss this framework in later sections andapply this to the problem of finding local minimizers for the sim-plest problem. However, we first discuss some other applicationsof the FLCV.

3.4 Implications and Applications of

the FLCV

In the previous section we applied the FLCV to develop the EulerNecessary Condition Theorem 3.3 for a global minimum. How-ever, the FLCV also plays a key role in the development of manyideas that provide the basis for the modern theories of distri-butions and partial differential equations. Although these ideasare important and interesting, a full development of the mate-rial lies outside the scope of these notes. However, we presenttwo simple examples to illustrate other applications of the FLCVand to provide some historical perspective on the role the cal-culus of variations has played in the development of modernmathematics.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


3.4.1 Weak and Generalized Derivatives

Recall that Part (B) of the FLCL Lemma 3.1 states that if α(·)and β(·) are piecewise continuous on [t0, t1] and

t1∫t0

[α (s) η (s) + β (s) η (s)] ds = 0 (3.30)

for all η(·) ∈ V0, then there is a constant c such that

β (t) = c+

t∫t0

α (s) ds

except at a finite number of points. The converse is also true.In particular, β (·) = βc (·) e.f. where βc (·) is the PWS functiondefined by

βc (t) ≡ c+

t∫t0

α (s) ds

and at points t where α(·) is continuous

βc (t) = α (t) .

If we rewrite (3.30) as

t1∫t0

β (s) η (s) ds = −1

t1∫t0

α (s) η (s) ds, (3.31)

then the expression (3.31) can be used to define the “weak deriva-tive” of a piecewise continuous function β (·).

Definition 3.4 Let β (·) ∈ PWC(t0, t1). We say that β (·) has aweak derivative on [t0, t1], if there is a PWC function α (·) ∈PWC(t0, t1) such that

t1∫t0

β (s) η (s) ds = (−1)1

t1∫t0

α (s) η (s) ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all η(·) ∈ V0. The function α (·) ∈ PWC(t0, t1) is called theweak derivative of β (·) on [t0, t1].

Remark 3.6 It is important to note that the concept of a weakderivative as defined here is dependent on the specific interval[t0, t1]. In particular, it is possible that a function can have a weakderivative on the interval [−1, 0], and a weak derivative on [0,+1],but not have a weak derivative on the interval [−1,+1].

Observe that Part (B) of the FLCV implies that if β (·) ∈PWC(t0, t1) has a weak derivative on [t0, t1], say α (·) ∈PWC(t0, t1), then β (·) has a ordinary derivative except at a finitenumber of points in [t0, t1] and at points where α(·) is continuous

β (t) = α (t) .

Therefore, if the weak derivative of β (·) exist, then the ordinary(strong) derivative of β (·) exists except at a finite number of pointsand is given by β (t) = α (t). The FLCV also implies the converseis true. Moreover, as noted in Remark 3.2 above there is a uniquePWS function βc (·) such that β (·) = βc (·) e.f. and we can identifyβ (·) with its “equivalent” PWS representation βc (·). Thus, withthis convention one can say that if β (·) ∈ PWC(t0, t1) has a weakderivative on [t0, t1], then β (·) ∈ PWS(t0, t1).

It may appear that the notion of a weak derivative does notbring anything very new to the table and in one dimension thisis partially true because of the FLCV. However, consider how onemight extend the notion to higher order derivatives. A naturalextension to higher order derivatives would be to define a functionα (·) ∈ PWC(t0, t1) to be a weak second derivative of β (·) ∈PWC(t0, t1), if

t1∫t0

β (s) η (s) ds = (−1)2

t1∫t0

α (s) η (s) ds (3.32)

for all η(·) ∈ V0 with η(·) ∈ V0. Later we shall see that extensionsof the FLCV can be used to show that (3.31) implies that β (·) ∈PWS(t0, t1) and β (·) exists except at a finite number of points. It

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is easy to see, the key idea is to use integration by parts (like in theproof of Part (B) of the FLCV) to “move” the derivatives fromβ (·) ∈ PWC(t0, t1) to the functions η(·). Again, it may appearthat the notion of a weak 2nd derivative does not bring anythingnew to the table. However, the real power of of this idea comeswhen it is applied to functions of several variables.

Example 3.6 Consider the PWC function β(·) defined on[−1,+1] by

β(t) = |t|

and α(·) defined by

α(t) =

−1, −1 ≤ t ≤ 0,+1, 0 < t ≤ +1.

If η (·) ∈ V0, then

+1∫−1

β (s) η (s) ds =

0∫−1

|s| η (s) ds+

+1∫0

|s| η (s) ds

= −0∫

−1

sη (s) ds+

+1∫0

sη (s) ds

= − [sη (s)] |s=0s=−1 +

0∫−1

η (s) ds+ [sη (s)] |s=+1s=0

−+1∫0

η (s) ds

= −0 +

0∫−1

η (s) ds+ 0−+1∫0

η (s) ds

=

0∫−1

η (s) ds−+1∫0

η (s) ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


On the other hand

+1∫−1

α (s) η (s) ds =

0∫−1

α (s) η (s) ds+

+1∫0

α (s) η (s) ds

= −0∫

−1

η (s) ds+

+1∫0

η (s) ds,

so that

+1∫−1

β (s) η (s) ds =

0∫−1

η (s) ds−+1∫0

η (s) ds

= −

− 0∫−1

η (s) ds+

+1∫0

η (s) ds

= −

+1∫−1

α (s) η (s) ds.

Hence, β(t) = |t| has a PWC weak derivative on [−1,+1] and theweak derivative is α(·).

Example 3.7 Consider the PWC function β(·) defined on[−1,+1] by

β(t) =

−1/2, −1 ≤ t ≤ 0,+1/2, 0 ≤ t ≤ +1.

We show that β(·) does not have a weak derivative on[−1,+1]. Assume the contrary, that there is a function α(·) ∈PWS(−1,+1) such that

+1∫−1

β (s) η (s) ds = −+1∫−1

α (s) η (s) ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all η (·) ∈ V0(−1,+1). Note that

+1∫−1

β (s) η (s) ds =

0∫−1

(−1/2)η (s) ds+

+1∫0

(+1/2)η (s) ds

=1

2

− 0∫−1

η (s) ds+

+1∫0

η (s) ds

=

1

2

[−η(0−)− η(0+)

]= −1

2

[η(0−) + η(0+)

]= −η(0),

since η (·) ∈ V0(−1,+1) is continuous. Thus, if

+1∫−1

β (s) η (s) ds = −+1∫−1

α (s) η (s) ds

for all η (·) ∈ V0(−1,+1), then

+1∫−1

α (s) η (s) ds = η(0) (3.33)

for all η (·) ∈ V0(−1,+1). To see that there is no function α(·) ∈PWS(−1,+1) satisfying (3.33) for all η(·) ∈ V0(−1,+1), assumesuch a function α(·) ∈ PWS(−1,+1) exists and let α be such that

|α(s)| ≤ α, − 1 ≤ s ≤ +1.

For m = 1, 2, 3, . . . let ηm(·) ∈ V0(−1,+1) be given by

ηm (t) =

+m(t− 1/m), −1/m ≤ t ≤ 0,−m(t+ 1/m), 0 ≤ t ≤ 1/m,

0, elsewhere,

and note that ηm (0) = 1 while

+1∫−1

ηm (s) ds =

+1∫−1

|ηm (s)| ds =1

m.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Select M > 1 so thatα

M< 1/2

and observe that∣∣∣∣∣∣+1∫−1

α (s) ηM (s) ds

∣∣∣∣∣∣ ≤+1∫−1

∣∣α (s) ηM (s)∣∣ ds ≤ |α| +1∫

−1

∣∣ηM (s)∣∣ ds

=α

M< 1/2.

However, ηM(·) ∈ V0(−1,+1) and ηM(0) = 1, but

+1∫−1

α (s) ηM (s) ds ≤ 1/2 < 1 = ηM(0)

and hence (3.33) does not hold for ηM(·) ∈ V0(−1,+1). Conse-quently,

β(t) =

−1/2, −1 ≤ t ≤ 0,+1/2, 0 ≤ t ≤ +1,

does not have a weak derivative on [−1,+1]. Observe that β(·) doeshave a weak derivative on [−1, 0] and a weak derivative on [0,+1]and on each of these intervals the weak derivative is zero.

Remark 3.7 The extension of weak derivatives to a more gen-eral setting requires the development of “distribution theory” andwill not be discussed in this book. However, this extension leads tothe modern definition of a generalized derivative (or distribution)that covers the example above. In particular, for β(·) defined on(−∞,+∞) by

β(t) =

0, t ≤ 0,

+1, 0 < t.

the generalized derivative of β(·) (on R) is the Dirac “delta func-tion”, denoted by δ(·) and is not a PWC function. In fact, δ(·) isnot a function in the usual sense and hence the generalized deriva-tive of β(·) is a distribution (see the references [2], [88] and [161]).Modern theories of partial differential equations make extensive

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


use of weak and generalized derivatives. These derivatives are usedto define weak solutions that are key to understanding both theo-retical and computational issues in this field (see [55], [56], [76],[96], [126] and [179]).

3.4.2 Weak Solutions to Differential Equations

In order to set up the finite element method in Section 2.4.5 tosolve the two-point boundary value problem

− x(t) + x(t) = f(t), 0 < t < 1, (3.34)


x(0) = 0, x(1) = 0, (3.35)

we discussed strong and weak solutions to (3.34) - (3.35). If x(t) isa solution of (3.34) - (3.35) in the classical sense, then multiplyingboth sides of (3.34) by η(·) ∈ V0 = PWS0(0, 1) and integration byparts produced the variational equation∫ 1

0

x(t)η(t)dt+

∫ 1

0

x(t)η(t)dt =

∫ 1

0

f(t)η(t)dt, (3.36)

which must hold for all η(·) ∈ PWS(0, 1) satisfying

η(0) = 0, η(1) = 0. (3.37)

Thus, ∫ 1

0

x(t)η(t)dt = −∫ 1

0

x(t)η(t)dt+

∫ 1

0

f(t)η(t)dt

= (−1)1

∫ 1

0

[x(t)− f(t)]η(t)dt

for all η(·) ∈ V0 and hence β(t) = x(t) satisfies∫ 1

0

β(t)η(t)dt = (−1)1

∫ 1

0

α(t)η(t)dt

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all η(·) ∈ V0, where α(t) = [x(t)− f(t)]. Consequently, β(t) =x(t) has a weak derivative on [0, 1] given by

[x(t)− f(t)]

and it follows that if x(·) ∈ PWS0(0, 1), then is a weak solutionof the two-point boundary value problem (3.34) - (3.35).

Weak and generalized (distributional) derivatives and the no-tion of weak solutions to differential equations are key conceptsin modern analysis. In multi-dimensional settings where one is in-terested in partial differential equations, the mathematical back-ground required to properly address the theory of partial differen-tial equations is more complex. However, the basic ideas have theirroots in the classical problems discussed above. For more advancedreaders we suggest the references [64], [65], [89] and [186].


Consider the Simplest Problem in the Calculus of Varia-tions (SPCV): Find x∗(·) to minimize the cost function

J(x(·)) =

∫ t1

t0

f(s, x(s), x(s))ds,

subject tox(t0) = x0, x(t1) = x1.

For each of the following problems:

(A) Write out the integrand f(t, x, u).

(B) Determine the endpoints t0 and t1.

(C) Determine the endpoints x0 and x1.

(D) Compute all the partial derivatives ft(t, x, u), fx(t, x, u),fu(t, x, u), ftt(t, x, u), fxx(t, x, u), fuu(t, x, u), ftx(t, x, u),ftu(t, x, u) and fxu(t, x, u).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(E) What can you say about possible minimizing functions x∗(·)for these problems? Write a short summary of what you knowand don’t know about each problem.

Here x(t) = dx(t)dt

is the derivative.

Problem 3.1 Minimize the functional J(x(·)) =1∫0

x (s) ds,

subject to the endpoint conditions x (0) = 0 and x (1) = 1.


x (s) x (s) ds,



sx (s) x (s) ds,


Problem 3.4 Minimize the functional J(x(·)) =b∫

0

[x (s)]3ds,

subject to the endpoint conditions x (0) = 0 and x (b) = x1.

Problem 3.5 Minimize the functional

[J(x(·)) =

1∫0

[x (s)]2 + [x (s)]2 + 2esx (s)ds,

subject to the endpoint conditions x (0) = 0 and x (1) = e/2.


s−3[x (s)]2ds,



J(x(·)) =

4∫0

[x (s)− 1]2 [x (s) + 1]2 ds,


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) =

π/2∫0

[x (s)]2 − [x (s)]2ds,

subject to the endpoint conditions x (0) = 0 and x (π/2) = 0.


J(x(·)) =

π∫0

[x (s)]2 − [x (s)]2ds,

subject to the endpoint conditions x (0) = 0 and x (π) = 0.


J(x(·)) =

3π/2∫0

[x (s)]2 − [x (s)]2ds,

subject to the endpoint conditions x (0) = 0 and x (3π/2) = 0.


J(x(·)) =

b∫0

x(s)√

1 + [x(s)]2ds,

subject to the endpoint conditions x (0) = 1 and x (b) = 2.


J(x(·)) =

b∫0

√1 + [x(s)]2

2gx(s)ds,


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) =

2∫1

[x(s)]2 − 2sx(s)ds,

subject to the endpoint conditions x (1) = 0 and x (2) = −1.


J(x(·)) =

π∫0

[x(s)]2[1− [x(s)]2ds,



J(x(·)) =

3∫1

[3s− x(s)]x(s)ds,

subject to the endpoint conditions x (1) = 1 and x (3) = 9/2.


J(x(·)) = 4πρv2

L∫0

[x(s)]3x(s)ds,

subject to the endpoint conditions x (0) = 1 and x (L) = R. Here,ρ, v2, L > 0 and R > 0 are all constants.


J(x(·)) =

2∫1

x(s)[1 + s2x(s)]ds,


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Advanced Problems

Problem 3.18 Show that if x∗(·) minimize the functional

J(x(·)) =

1∫0

1

2

[x (s)]2 + [x (s)]2 − 2esx (s)

ds,

then x∗(·) satisfies the two point boundary value problem

−x(t) + x(t) = et, x(0) = x(1) = 0.

Problem 3.19 Use the finite element method to solve the twopoint boundary problem

−x(t) + x(t) = et, x(0) = x(1) = 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 4

Necessary Conditions forLocal Minima

We turn now to the problem of obtaining necessary conditions forlocal minimizers. As in the previous chapters, let X = PWS(t0, t1)denote the space of all real-valued piecewise smooth functions de-fined on [t0, t1]. For each PWS function x : [t0, t1]→ R, define thefunctional J : X → R (a “function of a function”) by

J(x(·)) =

t1∫t0

f (s, x(s), x (s)) ds. (4.1)

Assume that the points [t0 x0]T and [t1 x1]T are given and definethe subset Θ of PWS(t0, t1) by

Θ = x(·) ∈ PWS(t0, t1) : x (t0) = x0, x (t1) = x1 . (4.2)

Observe that J : X → R is a real valued function on X.The Simplest Problem in the Calculus of Variations

(the fixed endpoint problem) is the problem of minimizing J(·) on

131

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

132 Chapter 4. Necessary Conditions for Local Minima

Θ. In particular, the goal is to find x∗ (·) ∈ Θ such that

J(x∗(·)) =

t1∫t0

f(s, x∗(s), x∗ (s))ds ≤ J(x(·))

=

t1∫t0

f(s, x(s), x (s))ds,

for all x (·) ∈ Θ.The basic goal in this chapter is the development of the classical

necessary conditions for local minimizers. We begin with a reviewof the basic definitions.

4.1 Weak and Strong Local Minimiz-

ers

In order to define local minimizers for the SPCV, we must have ameasure of distance between functions in PWS(t0, t1). Given twofunctions x(·) and z(·) ∈ PWS(t0, t1) there are many choices for adistance function, but we will focus on the weak and strong metricsdefined in section 2.3.1. Recall that the d0 distance between x(·)and z(·) is defined by

d0(x(·), z(·)) , supt0≤t≤t1

|x(t)− z(t)|. (4.3)

In this case we can define a norm on PWS(t0, t1) by

‖x(·)‖0 = supt0≤t≤t1

|x(t)| (4.4)

and note that

d0(x(·), z(·)) = ‖x(·)− z(·)‖0 .

Given x(·) ∈ PWS(t0, t1) and δ > 0, the U0(x(·), δ)-neighborhood (or Strong Neighborhood) of x(·) is defined

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 4. Necessary Conditions for Local Minima 133

to be the open ball

U0(x(·), δ) = x(·) ∈ PWS(t0, t1) : d0(x(·), x(·)) < δ .

Likewise, the d1 distance between x(·) and z(·) is defined by

d1(x(·), z(·)) = sup |x(t)− z(t)| : t0 ≤ t ≤ t1+ sup

|x(t)− z(t)| : t0 ≤ t ≤ t1, t 6= ti

(4.5)

= d0(x(·), z(·)) + sup |x(t)− z(t)| : t0 ≤ t ≤ t1,

t 6= ti.

In this case the 1-norm is defined on PWS(t0, t1) by

‖x(·)‖1 = d1(x(·), 0(·)), (4.6)

where 0(·) is the zero function and as before,

d1(x(·), z(·)) = ‖x(·)− z(·)‖1 . (4.7)

If x(·) ∈ PWS(t0, t1) and δ > 0, the U1(x(·), δ)-neighborhood (or Weak Neighborhood) of x(·) is definedto be the open ball

U1(x(·), δ) = x(·) ∈ PWS(t0, t1) : d1(x(·), x(·)) < δ.

Remark 4.1 Recall that if d1(x(·), z(·)) = 0, then x(t) = z(t) forall t ∈ [t0, t1] and x(t) = z(t) e.f. Also,

d0(x(·), z(·)) ≤ d1(x(·), z(·))

and it follows that if d1(x(·), z(·)) < δ, then d0(x(·), z(·)) ≤ δ. Thisis an important inequality since it implies that

U1(x(·), δ) ⊂ U0(x(·), δ) ⊂ PWS(t0, t1), (4.8)

so that the U1(x(·), δ)-neighborhood U1(x(·), δ) is smaller than theU0(x(·), δ)-neighborhood U0(x(·), δ).

In addition to defining global minimizers, the metrics d0 andd1 defined on PWS(t0, t1) allows us to define two types of localminimizers for the SPCV.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Definition 4.1 The function x∗(·) ∈ Θ, provides a global min-imizer for J(·) on Θ if

J(x∗(·)) ≤ J(x(·))

for all x(·) ∈ Θ.

Definition 4.2 The function x∗(·) ∈ Θ, provides a strong localminimizer for J(·) on Θ if there is a δ > 0 such that

J(x∗(·)) ≤ J(x(·))

for all x(·) ∈ U0(x∗(·), δ) ∩Θ.

Definition 4.3 The function x∗(·) ∈ Θ, provides a weak localminimizer for J(·) on Θ if there is a δ > 0 such that

J(x∗(·)) ≤ J(x(·))

for all x(·) ∈ U1(x∗(·), δ) ∩Θ.

Remark 4.2 Recall that U1(x∗(·), δ) ⊂ U0(x∗(·), δ) ⊂ PWS(t0, t1).Therefore, it follows that a global minimizer is a strong local min-imizer, and a strong local minimizer is a weak local minimizer.It is important to note that a necessary condition for weak localminimum is also a necessary condition for a strong local mini-mum, and a necessary condition for strong local minimum is alsoa necessary condition for a global minimum. In particular, anynecessary condition for a weak local minimum applies to strongand global minima. However, a necessary condition obtained byassuming that x∗ (·) is a global minimum may not apply to a localminimum. The important point is that if one can derivea necessary condition assuming only that x∗ (·) is a weaklocal minimizer, then it is more powerful (i.e. applies tomore problems) than a necessary condition obtained byassuming that x∗ (·) is a strong or global minimizer.

Remark 4.3 In the following sections we derive four necessaryconditions for weak and strong local minimizers for the SPCV.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


These necessary conditions are numbered I, II, III and IV . Thisnumbering system is used to match what appears in the classicalwork of Gilbert Bliss [29] and follows the convention used by Ewing[77]. The numbers do not reflect the historical development of theconditions and should not be thought of as an order for solution ofpractical problems.

4.2 The Euler Necessary Condition -

(I)

In this section we extend the Euler Necessary Condition from theprevious chapter to weak local minimizers. Assume that x∗ (·) ∈ Θis a weak local minimizer for J(·) on Θ. In particular, there is aδ > 0 such that

J(x∗(·)) =

t1∫t0

f(s, x∗(s), x∗ (s))ds ≤ J(x(·))

=

t1∫t0

f(s, x(s), x (s))ds, (4.9)

for all x (·) ∈ U1(x∗(·), δ) ∩Θ.Let η(·) ∈ V0 and consider the variation

ϕ(t, ε) , x∗ (t) + εη(t). (4.10)

Recall that for each ε ∈ R the variation ϕ(t, ε) satisfies the follow-ing conditions:

(i) ϕ(t0, ε) = x∗ (t0) + εη(t0) = x0 + εη(t0) = x0 + ε0 = x0,

(ii) ϕ(t1, ε) = x∗(t1) + εη(t1) = x1 + εη(t1) = x1 + ε0 = x1,

(iii) ϕ(t, ε) = x∗ (t) + εη(t) ∈ PWS(t0, t1).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


It follows that if η(·) ∈ V0, then for all ε ∈ R, ϕ(·, ε) = x∗ (·) +εη(·) ∈ Θ. However, it is not always true that ϕ(·, ε) = x∗ (·) +εη(·) ∈ U1(x∗(·), δ) unless ε is small. Let

‖η(·)‖1 = sup |η(t)| : t0 ≤ t ≤ t1+ sup

|η(t)| : t0 ≤ t ≤ t1, t 6= ti

and note that ‖η(·)‖1 = 0 if and only if η(t) = 0 for all t ∈ [t0, t1].The case ‖η(·)‖1 = 0 is trivial, so assume that ‖η(·)‖1 6= 0 andselect ε such that

−δ‖η(·)‖1

< ε <δ

‖η(·)‖1

.

If ε ∈ ( −δ‖η(·)‖1

, δ‖η(·)‖1

), then the distance between ϕ(·, ε) and

x∗ (·) is given by

d1(x∗(·), ϕ(·, ε))

= supt0≤t≤t1

|x∗(t)− ϕ(t, ε)|+ supt0≤t≤t1, t6=ti

|x∗(t)− ∂ϕ(t, ε)

∂t|

= supt0≤t≤t1

|x∗(t)− [x∗ (t) + εη(t)]|

+ supt0≤t≤t1, t6=ti

|x∗(t)− [x∗ (t) + εη(t)]|

= supt0≤t≤t1

|εη(t)|+ supt0≤t≤t1, t6=ti

|εη(t)|

= |ε|[ supt0≤t≤t1

|η(t)|+ supt0≤t≤t1, t6=ti

|η(t)|]

= |ε| ‖η(·)‖1 <δ

‖η(·)‖1

‖η(·)‖1 = δ.

Therefore, if ε ∈ ( −δ‖η(·)‖1

, δ‖η(·)‖1

), then ϕ(·, ε) = x∗ (·) + εη(·) ∈U1(x∗(·), δ) ∩Θ and is admissible. Since x∗ (·) ∈ Θ minimizes J(·)on U1(x∗(·), δ) ∩Θ, it follows that

J(x∗(·)) ≤ J(x∗ (·) + εη(·)) (4.11)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all ε ∈ ( −δ‖η(·)‖1

, δ‖η(·)‖1

). Define F : ( −δ‖η(·)‖1

, δ‖η(·)‖1

) −→ R by

F (ε) = J(x∗(·) + εη(·)) =

t1∫t0

f(s, x∗ (s) + εη(s), x∗ (s) + εη(s))ds,

(4.12)and note that the equation (4.11) implies that

F (0) = J(x∗(·)) ≤ J(x∗(·) + εη(·)) = F (ε)

for all ε ∈ ( −δ‖η(·)‖1

, δ‖η(·)‖1

). Therefore, F (·) has a minimum on the

open interval ( −δ‖η(·)‖1

, δ‖η(·)‖1

) at ε∗ = 0. Applying the simple first

order necessary condition Theorem 2.1, it follows that

d

dεF (ε)

∣∣∣∣ε=0

=d

dε[J(x∗(·) + εη(·))]

∣∣∣∣ε=0

= 0. (4.13)

Observe that (4.13) holds for all η(·) ∈ V0 and we have establishedthe following necessary condition.

Theorem 4.1 If x∗(·) ∈ Θ provides a weak local minimum forJ(·) on Θ and the first variation δJ(x∗(·); η(·)) exists, then

δJ(x∗(·); η(·)) =d

dε[J(x∗(·) + εη(·))]

∣∣∣∣ε=0

= 0, (4.14)

for all η(·) ∈ V0.

This result is identical to Theorem 3.1, which was establishedfor a global minimizer in Section 3.3. Thus, Theorem 3.1 is validfor weak and strong local minimizers. We know that

δJ(x∗(·); η(·)) =

t1∫t0

fx (s, x∗(s), x∗ (s)) η (s)

+ fu (s, x∗(s), x∗ (s)) η (s)ds. (4.15)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


In view of Theorem 4.1 and the formula (4.15) we have thatif x∗(·) ∈ Θ provides a weak local minimum for J(·) on Θ,then

t1∫t0

fx (s, x∗(s), x∗ (s)) · η (s) + fu (s, x∗(s), x∗ (s)) · η (s) ds = 0,

(4.16)

for all η(·) ∈ V0. The Fundamental Lemma of the Calculus ofVariations (Lemma 3.1) yields the existence of a constant c suchthat for all t ∈ [t0, t1]

fu (t, x∗ (t) , x∗ (t)) = c+

t∫t0

fx (s, x∗(s), x∗ (s)) ds.

Thus, we have established the following result.

Theorem 4.2 (Euler Necessary Condition - (I)) If x∗(·) ∈Θ provides a weak local minimum for J(·) on Θ, then

(E-1) there is a constant c such that for all t ∈ [t0, t1],

fu (t, x∗(t), x∗ (t)) = c+

t∫t0

fx (s, x∗(s), x∗ (s)) ds, (4.17)

(E-2) x∗(t0) = x0,

(E-3) x∗(t1) = x1.

(E-4) Between corners of x∗(·) the function fu (t, x∗(t), x∗ (t)) isdifferentiable and if t is not a corner of x∗(·), then

d

dtfu (t, x∗(t), x∗ (t)) = fx (t, x∗(t), x∗ (t)) . (4.18)

Recall that equation (4.17) is called Euler’s Integral Equation,while equation (4.18) is called Euler’s Differential Equation. Eu-ler Necessary Condition - (I) (Theorem 4.2) implies that any

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


local minimizer must be an extremal (i.e. a PWS function satisfy-ing (4.17)). Thus, if x∗(·) ∈ Θ is a weak local minimizer for J(·)on Θ, then x∗(·) satisfies the Weierstrass-Erdmann CornerCondition

fu(t, x∗(t), x∗(t+)) = fu(t, x

∗(t), x∗(t−)) (4.19)

for all t ∈ (t0, t1).If f is regular, then extremals cannot have corners. In addition,

if f(t, x, u) is of class Cp, p ≥ 2, then Hilbert’s DifferentiabilityTheorem 3.6 implies that x∗(·) is also of class Cp and satisfies

x∗ (t) =[ft (t, x∗(t), x∗ (t))]− [fut (t, x∗(t), x∗ (t))]

[fuu (t, x∗(t), x∗ (t))](4.20)

− [fux (t, x∗(t), x∗ (t))] · x∗ (t)

[fuu (t, x∗(t), x∗ (t))].

It is important to emphasize that the Euler Necessary Condi-tion - (I) has four parts. Part (E-2 ) x∗(t0) = x0 and Part (E-3 )x∗(t1) = x1, are also covered by the fact that x∗(·) ∈ Θ. However,for more general problems to be considered later, these boundaryconditions will change and become more significant.

4.3 The Legendre Necessary Condi-

tion - (III)

The Euler and Weierstrass necessary conditions are first order con-ditions. We turn now to second order conditions. We begin just aswe did for the Euler Necessary Condition. Assume that x∗ (·) ∈ Θis a weak local minimizer for J(·) on Θ. In particular, there is aδ > 0 such that

J(x∗(·)) =

t1∫t0

f(s, x∗(s), x∗ (s))ds ≤ J(x(·)) =

t1∫t0


for all x (·) ∈ U1(x∗(·), δ) ∩Θ.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Let η(·) ∈ V0 and again we consider the classical variation

ϕ(x, ε) , x∗ (t) + εη(t).

As before, if ε ∈ ( −δ‖η(·)‖1

, δ‖η(·)‖1

), then d1(x∗(·), ϕ(·, ε)) < δ and

ϕ(·, ε) = x∗ (·) + εη(·) ∈ U1(x∗(·), δ) ∩ Θ and is admissible. Sincex∗ (·) ∈ Θ minimizes J(·) on U1(x∗(·), δ) ∩Θ, it follows that

J(x∗(·)) ≤ J(x∗ (·) + εη(·))

for all ε ∈ ( −δ‖η(·)‖1

, δ‖η(·)‖1

). Define F : ( −δ‖η(·)‖1

, δ‖η(·)‖1

) −→ R by

F (ε) = J(x∗(·) + εη(·)) =

t1∫t0

f(s, x∗ (s) + εη(s), x∗ (s) + εη(s))ds,

and note that

F (0) = J(x∗(·)) ≤ J(x∗(·) + εη(·)) = F (ε)

for all ε ∈ ( −δ‖η(·)‖1

, δ‖η(·)‖1

). Therefore, F (·) has a minimum on

( −δ‖η(·)‖1

, δ‖η(·)‖1

) at ε∗ = 0. This time we apply the second order

condition as stated in Theorem 2.2 from Chapter 2. In particu-

lar, if d2

dε2F (ε)

∣∣∣ε=0

exists then

d2

dε2F (ε)

∣∣∣∣ε=0

, δ2J(x∗(·); η(·)) =d2

dε2[J(x∗(·) + εη(·))]

∣∣∣∣ε=0

≥ 0.

(4.21)

Observe that (4.21) holds for all η(·) ∈ V0.To use (4.21) we must compute the second variation

δ2J(x∗(·); η(·)). The first variation of

F (ε) =

t1∫t0

f(s, x∗ (s) + εη(s), x∗ (s) + εη(s))ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is given by

d

dεF (ε) =

d

dε

t1∫t0

f(s, x∗ (s) + εη(s), x∗ (s) + εη(s))ds

=

t1∫t0

[fx(s, x∗(s) + εη(s), x∗(s) + εη(s))] · η(s)ds

+

t1∫t0

[fu(s, x∗(s) + εη(s), x∗(s) + εη(s))] · η (s) ds.

Differentiating once again yields

d2

dε2F (ε) =

d

dε

t1∫t0

[fx(s, x∗(s) + εη(s), x∗(s) + εη(s))] · η(s)ds

+d

dε

t1∫t0

[fu(s, x∗(s) + εη(s), x∗(s) + εη(s))] · η (s) ds

=

t1∫t0

[fxx(s, x∗(s) + εη(s), x∗(s) + εη(s))] · [η(s)]2ds

+

t1∫t0

[fxu(s, x∗(s) + εη(s), x∗(s) + εη(s))] · η(s) · η(s)ds

+

t1∫t0

[fux(s, x∗(s) + εη(s), x∗(s) + εη(s))] · η (s) · η(s)ds

+

t1∫t0

[fuu(s, x∗(s) + εη(s), x∗(s) + εη(s))] · [η (s)]2ds,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and setting ε = 0 produces

d2

dε2F (ε)

∣∣∣∣ε=0

=

t1∫t0

[fxx(s, x∗(s), x∗(s))] · [η(s)]2ds

+

t1∫t0

[fxu(s, x∗(s), x∗(s))] · η(s) · η (s) ds

+

t1∫t0

[fux(s, x∗(s), x∗(s))] · η (s) · η(s)ds

+

t1∫t0

[fuu(s, x∗(s), x∗(s))] · [η (s)]2ds.

Since, fux(s, x∗(s), x∗(s)) = fxu(s, x

∗(s), x∗(s)), it follows that

d2

dε2F (ε)

∣∣∣∣ε=0

=

t1∫t0

[fxx(s, x∗(s), x∗(s))] · [η(s)]2ds (4.22)

+

t1∫t0

[2fxu(s, x∗(s), x∗(s))] · η(s) · η(s)ds

+

t1∫t0

[fuu(s, x∗(s), x∗(s))] · [η (s)]2ds.

In order to simplify notation, we set

f ∗xx(t) = fxx(t, x∗(t), x∗(t)), (4.23)

f ∗xu(t) = fxu(t, x∗(t), x∗(t)), (4.24)

andf ∗uu(t) = fuu(t, x

∗(t), x∗(t)). (4.25)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Therefore, we have established that the second variation is givenby

δ2J(x∗(·); η(·)) =

t1∫t0

f ∗xx(s)[η(s)]2 + 2f ∗xu(s)[η(s)η(s)]

+ f ∗uu(s)[η (s)]2ds. (4.26)

Consequently, (4.21) is equivalent to the condition that

t1∫t0

f ∗xx(s)[η(s)]2 + 2f ∗xu(s)[η(s)η(s)] + f ∗uu(s)[η (s)]2ds ≥ 0

holds for all η(·) ∈ V0. Therefore, we have established the followingnecessary condition.

Theorem 4.3 If x∗ (·) ∈ Θ provides a weak local minimum forJ(·) on Θ, then

t1∫t0

f ∗xx(s)[η(s)]2+2f ∗xu(s)[η(s)η(s)]+f ∗uu(s)[η (s)]2ds ≥ 0, (4.27)

for all η(·) ∈ V0.

Theorem 4.3 above is not very useful as stated. We need toextract useful information about x∗(·) from this inequality. Thefirst result along this line is the Legendre Necessary Condition.

Theorem 4.4 (Legendre Necessary Condition - (III)) Ifx∗ (·) ∈ Θ provides a weak local minimum for J(·) on Θ, then,

(L-1) f ∗uu (t) = fuu (t, x∗(t), x∗ (t)) ≥ 0, for all t0 ≤ t ≤ t1,

(L-2) x∗(t0) = x0,

(L-3) x∗(t1) = x1.

Remark 4.4 It is important to note that condition (L-1) holds atcorners. In particular, since

fuu (t, x∗(t), x∗ (t)) ≥ 0 (4.28)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


it follows thatfuu(t, x∗(t), x∗

(t+))≥ 0

for all t ∈ [t0, t1), and

fuu(t, x∗(t), x∗

(t−))≥ 0


Remark 4.5 The condition that

fuu (t, x∗(t), x∗ (t)) ≥ 0 (4.29)

for all t ∈ [t0, t1] is called the Legendre Condition. Note thatthe Legendre Necessary Condition is often easy to check. In thecase where fuu(t, x, u) ≥ 0 for all (t, x, u), it is not very helpful.However, the Strengthened Legendre Condition

fuu (t, x∗(t), x∗ (t)) > 0 (4.30)

for all t ∈ [t0, t1] will be very useful.

Recall that an extremal x(·) ∈ PWS(t0, t1) is called a regularextremal if

fuu(t, x(t), x(t)) > 0

for all t ∈ [t0, t1] such that x(t) exists. Also, if f(t, x, u) is a regularintegrand, then all extremals are regular.

Example 4.1 Consider the functional J(x(·)) =1∫0

[x (s)]3ds.

Here, f(t, x, u) = u3, fu(t, x, u) = 3u2, fuu(t, x, u) = 6t, andfx(t, x, u) = 0. As noted in Example 3.5 all extremals are piece-wise linear functions. In particular, Euler’s Integral Equation isgiven by

3[x∗(t)]2 = fu(t, x∗(t), x∗(t)) = c+

t∫0


= c+

x∫0

0ds = c,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


or equivalently,3[x∗(t)]2 = c.

Therefore,x∗(t) = ±

√c/3 = ±k

and hence it follows that x∗(·) is piecewise linear with slope re-stricted to either ±k. Hence,

fuu(t, x∗(t), x∗(t)) = 6x∗(t) = ±6k

and as long as k 6= 0 the extremal is non-singular. On the otherhand, the only regular extremals are those satisfying

fuu(t, x(t), x∗(t)) = 6x∗(t) = ±6k > 0,

which means that the derivative must always be positive. In par-ticular,

x∗(t) = mt+ r

with m > 0.

Example 4.2 Consider the problem of minimizing the functional

J(x(·)) =

1∫0

[x (s)]3ds

subject to the endpoint conditions x (0) = 0 and x (1) = b. Weknow that if x∗(·) is a weak local minimizer it is an extremal sothat

x∗(t) = ±6k

for some k. Applying the Legendre Necessary Condition, it mustbe the case that

fuu(t, x∗(t), x∗(t)) = 6x∗(t) = ±6k , m ≥ 0, (4.31)

and the derivative cannot change sign. Thus, x∗(t) = mt + r forall t ∈ [0, 1] where m ≥ 0. The endpoint conditions x(0) = 0 andx(1) = b imply that m = b and x∗(t) = bt is the only possibleminimizer. If b < 0, then x∗(t) = bt fails to satisfy the LegendreCondition (4.31) and there is no local minimizer. If b ≥ 0, thenx∗(t) = bt will satisfy the Legendre Condition (4.31) and perhapscan be a minimizer.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


4.4 Jacobi Necessary Condition - (IV)

The general second order necessary condition Theorem 4.3 canalso be used to obtain another necessary condition due to KarlGustav Jacob Jacobi. In 1837 Jacobi used some of Legendre’s basicideas on the second variation and constructed what is known asJacobi’s (second order) Necessary Condition. In order to state theresult we need to introduce some additional terms and definitions.

Recall that if x∗ (·) ∈ Θ provides a weak local minimum forJ(·) on Θ, then the general result Theorem 4.3 implies that

δ2J(x∗(·); η(·)) =

t1∫t0

f ∗xx(s)[η(s)]2 + 2f ∗xu(s)[η(s)η(s)]

+ f ∗uu(s)[η (s)]2ds ≥ 0, (4.32)

for all η(·) ∈ V0. Jacobi noted that if ηo(·) ∈ V0 is defined to bethe zero function, ηo(t) ≡ 0 for all t ∈ [t0, t1], then

δ2J(x∗(·); ηo(·)) =

t1∫t0

f ∗xx(s)[0]2 + 2f ∗xu(s)[00] + f ∗uu(s)[0]2ds = 0.

(4.33)Again, remember that the functions f ∗xx(·), f ∗xu(·), and f ∗uu(·) arefixed functions of t given by

f ∗xx(t) = fxx(t, x∗(t), x∗(t)),

f ∗xu(t) = fxu(t, x∗(t), x∗(t)),


∗(t), x∗(t)),

respectively.Using this notation we define the function F(t, η, ξ) by

F(t, η, ξ) =1

2[f ∗xx(t)η

2 + 2f ∗xu(t)ηξ + f ∗uu(t)ξ2] (4.34)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and consider the functional J : PWS(t0, t1) −→ R given by

J (η(·)) =

t1∫t0

F(s, η(s), η(s))ds. (4.35)

Let ΘS ⊂ PWS(t0, t1) be defined by

ΘS = η(·) ∈ PWS(t0, t1) : η (t0) = 0, η (t1) = 0 = V0, (4.36)

and consider the so called Accessory (Secondary) Minimum Prob-lem.The Accessory (Secondary) Minimum Problem: Findη∗ (·) ∈ ΘS, such that

J (η∗(·)) =

t1∫t0

F(s, η∗(s), η∗(s))ds ≤ J (η(·))

=

t1∫t0

F(s, η(s), η(s))ds,

for all η (·) ∈ ΘS.There are two key observations that make the Accessory Min-

imum Problem important and useful.

(1) The answer to the Accessory Minimum Problem is known. Inview of (4.32) and (4.33), we know that if x∗ (·) ∈ Θ providesa weak local minimum for J(·) on Θ, then the zero functionηo (t) ≡ 0, satisfies

J (ηo(·)) =

t1∫t0

F(s, ηo(s), ηo(s))ds = 0 ≤ J (η(·))

=

t1∫t0


for all η (·) ∈ V0 = ΘS. In particular, η∗(x) = ηo (x) ≡ 0 is aglobal minimizer for J on ΘS.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(2) The Accessory Minimum Problem is a special case of theSimplest Problem in the Calculus of Variations with thechange of variables

(t, x, u)←→ (t, η, ξ),

f(t, x, u)←→ F(t, η, ξ),

J(x(·))←→ J (η(·)),(t0, x0)←→ (t0, 0),

(t1, x1)←→ (t1, 0),

andΘ←→ ΘS.

Therefore, we can apply the Euler Necessary Condition to theAccessory Problem. In particular, if η∗ (·) ∈ ΘS, is any minimizerof

J (η(·)) =

t1∫t0

F(s, η(s), η(s))ds

on ΘS, then there is a constant c such that η∗ (·) satisfies theEuler’s Integral Equation

[Fξ (t, η∗(s), η∗ (s))] = c+

t∫t0

[Fη (s, η∗(s), η∗ (s))] ds. (4.37)

In addition, between corners the function Fξ (t, η∗(t), η∗ (t)) is dif-ferentiable and

d

dt[Fξ (t, η∗(t), η∗ (t))] = [Fη (t, η∗(t), η∗ (t))] . (4.38)

The equation

[Fξ (t, η(t), η (t))] = c+

t∫t0

[Fη (s, η(s), η (s))] ds (4.39)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is called Jacobi’s Integral Equation and

d

dt[Fξ (t, η(t), η (t))] = [Fη (t, η(t), η (t))] (4.40)

is called Jacobi’s Differential Equation. Observe that Jacobi’sEquation is Euler’s Equation for the case where f(t, x, u) is re-placed by F(t, η, ξ).

Definition 4.4 A PWS function η(·) satisfying Jacobi’s IntegralEquation (4.39) (or (4.40)) is called a secondary extremal.

We are interested in secondary extremals and what they cantell us about the minimizer x∗(·) to the original SPCV. Thus, itis important to look at the specific form of the Jacobi equations.Since

F(t, η, ξ) =1

2[f ∗xx(t)η

2 + 2f ∗xu(t)ηξ + f ∗uu(t)ξ2] (4.41)

it is obvious that

Fη(t, η, ξ) =∂

∂ηF(t, η, ξ) = [f ∗xx(t)η + f ∗xu(t)ξ], (4.42)

Fξ(x, η, ξ) =∂

∂ξF(t, η, ξ) = [f ∗xu(t)η + f ∗uu(t)ξ] (4.43)

and

Fξξ(t, η, ξ) =∂2

∂ξ2F(t, η, ξ) = [f ∗uu(t)] = fuu(t, x

∗(t), x∗(t)).

Legendre’s Necessary Condition applied to the Accessory Mini-mum Problem implies that

Fξξ(t, η, ξ) = fuu(t, x∗(t), x∗(t)) ≥ 0.

However, in order to go further, we must assume that x∗(·) is anon-singular extremal. In this case

Fξξ(t, η, ξ) = fuu(t, x∗(t), x∗(t)) > 0, (4.44)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


which in turn implies that the corresponding Accessory MinimumProblem is regular. In particular, we know that all secondary ex-tremals η (·) are smooth and, in this case, we need only considerJacobi’s Differential Equation (4.40). In view of (4.41) Jacobi’sDifferential Equation has the form

d

dt[f ∗xu(t)η(t) + f ∗uu(t)η(t)] = [f ∗xx(t)η(t) + f ∗xu(t)η(t)]. (4.45)

Remark 4.6 We will focus on Jacobi’s Differential Equation(4.45). Recall that solutions of Jacobi’s Differential Equation aresecondary extremals. It is important to note that Jacobi’s Differ-ential Equation is a second order linear differential equation inη(·). Consequently, Jacobi’s Differential Equation with initial con-ditions of the form η(t) = p and η(t) = v has a unique solution.This point is important in the proof of Jacobi’s Necessary Condi-tion.

Example 4.3 Minimize the functional

J(x(·)) =

π/2∫0

[x (s)]2 − [x (s)]2

ds,

subject to the endpoint conditions x (0) = 0 and x(π/2) = 0.Here f(t, x, u) = u2 − x2, fx(t, x, u) = −2x, fxx(t, x, u) = −2,fu(t, x, u) = 2u, fuu(t, x, u) = 2, and fxu(t, x, u) = fxu(t, x, u) = 0.Thus, if x∗(·) is any minimizer of J(·) on Θ,

f ∗ux(t) = fux(t, x∗(t), x∗(t)) = 0,

f ∗xu(t) = fxu(t, x∗(t), x∗(t)) = 0,

f ∗xx(t) = fxx(t, x∗(t), x∗(t)) = −2,


∗(t), x∗(t)) = 2 > 0.

Therefore, Jacobi’s Equation (4.45)

d

dt[f ∗xu(t)η(t) + f ∗uu(t)η(t)] = [f ∗xx(t)η(x) + f ∗xu(t)η(t)],

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


reduces to

d

dt[0 · η(t) + 2η(t)] = [−2η(t) + 0 · η(t)],

or equivalently,η(t) = −η(t).

This implies that all secondary extremals have the form

η(t) = c1 cos(t) + c2 sin(t).

Definition 4.5 A value tc is said to be a conjugate value tot0, if t0 < tc, and there is a solution ηc(·) to Jacobi’s Equation(4.45) satisfying (i) ηc(t0) = ηc(tc) = 0 and ηc(t) 6= 0, for somet ∈ (t0, tc). In particular, ηc(·) does not vanish on (t0, tc). The point[tc x∗(tc]

T ∈ R2 on the graph of x∗(·) is said to be a conjugatepoint to the initial point [t0 x∗(t0)]T ∈ R2.

Figure 4.1 illustrates the definition. We now state the JacobiNecessary Condition.

Theorem 4.5 (Jacobi’s Necessary Condition - (IV)) Assumethat x∗(·) ∈ Θ provides a weak local minimum for J(·) on Θ. Ifx∗(·) is smooth and regular, then

t0 t1

( )c tconjugate values

ct ct

Figure 4.1: Definition of a Conjugate Value

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(J-1) there cannot be a value tc conjugate to t0 with

tc < t1, (4.46)

(J-2) x∗(t0) = x0,

(J-3) x∗(t1) = x1.

4.4.1 Proof of the Jacobi Necessary Condition

Assume that x∗ (·) ∈ Θ provides a weak local minimum for J(·)on Θ, x∗ (·) is smooth and fuu(t, x

∗(t), x∗(t)) > 0. The proof is bycontradiction. Assume that there is a value tc conjugate to t0 with

tc < t1.

Without loss in generality we may assume that tc is the “first” con-jugate value so that (see Figure 4.1), there is a secondary extremalηc(·) ∈ PWS(t0, t1) such that ηc(·) satisfies the Jacobi Equation

d

dt[f ∗xu(t)ηc(t) + f ∗uu(t)ηc(t)] = [f ∗xx(t)ηc(t) + f ∗xu(t)ηc(t)],

with ηc(t0) = 0, ηc(tc) = 0 and

ηc(t) 6= 0, t0 < t < tc. (4.47)

Since Fξξ(t, η, ξ) = fuu(t, x∗(t), x∗(t)) > 0, the accessory prob-

lem is regular and Hilbert’s Differentiability Theorem implies thatall secondary extremals are smooth. Thus, ηc(·) cannot have a cor-ner. Let η(t) be the piecewise smooth function defined by

η(t) =

ηc(t), t0 ≤ t ≤ tc

0, tc ≤ t ≤ t1

and note that η(t) 6= 0 for t0 < t < tc (see Figure 4.2). We shallshow that η(·) ∈ V0 = ΘS minimizes

J (η(·)) =

t1∫t0


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


( )c t

ct 1t0t ct 1t0t

ˆ( )t

Figure 4.2: Definition of η(·)

and hence is also a secondary extremal. Observe that for eacht0 < t < tc,

d

dt

[η(t)Fξ(t, η(t),

d

dtη(t))

]= η(t)

[d

dtFξ(t, η(t),

d

dtη(t))

]+d

dtη(t)

[Fξ(t, η(t),

d

dtη(t))

]= ηc(t)

[d

dtFξ(t, ηc(t), ηc(t))

]+ ηc(t) [Fξ(t, ηc(t), ηc(t))]

= ηc(t)

[d

dt

[f ∗xu(t)ηc(t) + fuu(t)ηc(t)

]]+ ηc(t) [f ∗xu(t)ηc(t) + f ∗uu(t)ηc(t)]

= ηc(t) [f ∗xx(t)ηc(t) + f ∗xu(t)ηc(t)]

+ ηc(t) [f ∗xu(t)ηc(t) + f ∗uu(t)ηc(t)]

=[f ∗xx(t)[ηc(t)]

2 + f ∗xu(t)ηc(t)ηc(t)]

+[f ∗ux(t)ηc(t)ηc(t) + f ∗uu(t)[ηc(t)]

2]

= 2F(t, ηc(t), ηc(t)).

Hence, it follows that

2J (η(·)) =

t1∫t0

2F(s, η(s),d

dsη(s))ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


=

tc∫t0

2F(s, ηc(s), ηc(s))ds+

t1∫tc

2F(s, 0, 0)ds

=

tc∫t0

2F(s, ηc(s), ηc(s))ds

=

tc∫t0

d

ds

[η(s) · Fξ(s, η(s),

d

dsη(s))

]ds

=

[η(t) · Fξ(t, η(t),

d

dsη(t))

]∣∣∣∣t=tct=t0

= 0.

Therefore,

2J (η(·)) =

t1∫t0

2F(s, η(s),d

ds(s)η)ds = 0 ≤ 2J (η(·))

for all η(·) ∈ V0 = ΘS, and hence η(·) minimizes J (η(·)) on ΘS.However, this means that η(·) satisfies Jacobi’s Equation (4.40)and is a secondary extremal.

Since secondary extremals cannot have corners, it follows that

d

dtη(tc) =

d

dtη(t−c ) =

d

dtη(t+c ) = 0,

and, by Hilbert’s Differentiability Theorem, η(·) ∈ C2. Thus, η(·)satisfies the linear second order initial value problem

d

dt[f ∗xu(t)η(t) + f ∗uu(t)η(t)] = [f ∗xx(t)η(t) + f ∗xu(t)η(t)],

with initial condition

η(tc) = 0, η(tc) = 0.

However, the only solution to a linear second order initial valueproblem, with zero initial data, is η(t) ≡ 0. It follows that fort0 < t < tc,

ηc(t) = η(t) = 0,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


which contradicts (4.47) above. Therefore, there cannot be a valuetc with t0 < tc < t1 conjugate to t0 and this completes the proof.

4.5 Weierstrass Necessary Condition -

(II)

The three previous necessary conditions are valid for weak lo-cal minimizers. In this section we assume we have a strong localminimizer and derive the Weierstrass’ Necessary Condition for astrong local minimum (Karl Theodor Wilhelm Weierstrass -1879). Although the techniques are similar to the idea used forEuler’s Necessary condition, the “variations” are different. Weier-strass’ Necessary Condition is much closer to the Maximum Prin-ciple we will study in Optimal Control. In particular, for the SPCVthe Weierstrass Necessary Condition can be stated as a MaximumPrinciple which is equivalent to the Pontryagin Maximum principlein optimal control. In order to formulate the Weierstrass NecessaryCondition, we need to define the Excess Function.

Definition 4.6 The Weierstrass Excess Function E :[t0, t1]× R3 −→ R is defined by

E(t, x, u, v) = [f(t, x, v)− f(t, x, u)]− [v − u]fu(t, x, u) (4.48)

for all [t0, t1]× R3.

Theorem 4.6 (Weierstrass Necessary Condition - (II)) Ifx∗(·) ∈ Θ provides a strong local minimum for J(·) on Θ, then,

(W-1) E(t, x∗(t), x∗(t), v) ≥ 0 for all t ∈ [t0, t1] and v ∈ R,

(W-2) x∗(t0) = x0,

(W-3) x∗(t1) = x1.

Condition (W-1 )

E(t, x∗(t), x∗(t), v) ≥ 0, (4.49)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is the essential new information in Weierstrass’ Necessary Condi-tion. Moreover, (4.49) holds at all t ∈ [t0, t1], including corners. Inparticular, for all v ∈ R

E(t, x∗(t), x∗(t+), v) ≥ 0, (4.50)

andE(t, x∗(t), x∗(t−), v) ≥ 0. (4.51)

Before proving the Weierstrass Necessary Condition Theorem4.6 we note some results that follow from this theorem. First werestate the Weierstrass Necessary Condition as a Maximum Prin-ciple by defining a new function. Given x∗(·), let

H(t, v) , −E(t, x∗(t), x∗(t), v) (4.52)

and observe that Weierstrass’ Necessary Condition may be writtenas

H(t, v) = −E(t, x∗(t), x∗(t), v) ≤ 0

for all v ∈ R. However, if v = u∗(t) = x∗(t), then using thedefinition of the excess function, one has that

H(t, x∗(t)) = H(t, u∗(t))

= −E(t, x∗(t), x∗(t), u∗(t))

= −[f(t, x∗(t), u∗(t))− f(t, x∗(t), x∗(t))]

− [u∗(t)− x∗(t)]fu(t, x∗(t), x∗(t)= −[f(t, x∗(t), x∗(t))− f(t, x∗(t), x∗(t))]

− [x∗(t)− x∗(t)]fu(t, x∗(t), x∗(t))= 0.

Consequently,

H(t, v) ≤ 0 = H(t, x∗(t)) = H(t, u∗(t)),

for all v ∈ R, and we have the following equivalent version ofWeierstrass’ Necessary Condition.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Theorem 4.7 (Weierstrass Maximum Principle) If x∗(·) ∈Θ provides a strong local minimum for J(·) on Θ and H(t, v) =−E(t, x∗(t), x∗(t), v), then,

(WMP-1) v = u∗(t) = x∗(t) maximizes H(t, v),i.e. for all t ∈ [t0, t1],

H(t, u∗(t)) = H(t, x∗(t)) = maxv∈R

H(t, v) = 0, (4.53)

(WMP-2) x∗(t0) = x0,

(WMP-3) x∗(t1) = x1.

In addition to the above necessary conditions, one can addstronger corner conditions when x∗(·) ∈ Θ provides a strong localminimum for J(·) on Θ. In particular, observe that (4.50) and(4.51) imply that if t is a corner of x∗(·), then for any v ∈ R

E(t, x∗(t), x∗(t+), v) ≥ 0 and E(t, x∗(t), x∗(t−), v) ≥ 0.

Therefore,

[f(t, x∗(t), v)− f(t, x∗(t), x∗(t+))]

− [v − x∗(t+)]fu(t, x∗(t), x∗(t+)) ≥ 0 (4.54)

and

[f(t, x∗(t), v)− f(t, x∗(t), x∗(t−))]

− [v − x∗(t−)]fu(t, x∗(t), x∗(t−)) ≥ 0 (4.55)

both hold for any v ∈ R. Setting v = x∗(t−) in equation (4.54)and set v = x∗(t+) in equation (4.55) yields

E+ , [f(t, x∗(t), x∗(t−))− f(t, x∗(t), x∗(t+))] (4.56)

−[x∗(t−)− x∗(t+)]fu(t, x∗(t), x∗(t+)) ≥ 0

and

E− , [f(t, x∗(t), x∗(t+))− f(t, x∗(t), x∗(t−))]

−[x∗(t+)− x∗(t−)]fu(t, x∗(t), x∗(t−)) ≥ 0,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


respectively. Note that E+ = −E− since

− [f(t, x∗(t), x∗(t+))− f(t, x∗(t), x∗(t−))]

− [x∗(t+)− x∗(t−)]fu(t, x∗(t), x∗(t−))

= [f(t, x∗(t), x∗(t−))− f(t, x∗(t), x∗(t+))]

− [x∗(t−)− x∗(t+)]fu(t, x∗(t), x∗(t−))

= [f(t, x∗(t), x∗(t−))− f(t, x∗(t), x∗(t+))]

− [x∗(t−)− x∗(t+)]fu(t, x∗(t), x∗(t+)),

where the last equality follows by replacing the term fu(t, x∗(t), x∗(t−))

by fu(t, x∗(t), x∗(t+)). This step is valid because of the Weierstrass-

Erdmann Corner Condition Theorem 3.4. In particular, we haveshown that E+ = −E− ≤ 0 and from inequality (4.56) E+ ≥ 0 sothat E+ = 0 = −E− = 0 = E−.

In particular,

[f(t, x∗(t), x∗(t−))− f(t, x∗(t), x∗(t+))]

− [x∗(t−)− x∗(t+)]fu(t, x∗(t), x∗(t+))

= [f(t, x∗(t), x∗(t+))− f(t, x∗(t), x∗(t−))]

− [f(t, x∗(t), x∗(t+))− f(t, x∗(t), x∗(t−))].

Rearranging terms and again using the fact that

fu(t, x∗(t), x∗(t−)) = fu(t, x

∗(t), x∗(t+))

yields

2f(t, x∗(t), x∗(t−))− 2x∗(t−)fu(t, x

∗(t), x∗(t−))

= 2f(t, x∗(t), x∗(t+))− 2x∗(t+)fu(t, x

∗(t), x∗(t+)).

Dividing both sides of this expression by 2 provides a proof of thefollowing result.

Theorem 4.8 (Second Corner Condition of Erdmann) Ifx∗(·) ∈ Θ provides a strong local minimum for J(·) on Θ andt is a corner, then

f(t, x∗(t), x∗(t+))− x∗(t+) · fu(t, x∗(t), x∗(t+))

= f(t, x∗(t), x∗(t−))− x∗(t−) · fu(t, x∗(t), x∗(t−)).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The remainder of this section is devoted to proving the Weier-strass Necessary Condition Theorem 4.6.

4.5.1 Proof of the Weierstrass Necessary Con-dition

Assume that x∗(·) ∈ Θ provides a strong local minimum for J(·)on Θ. Therefore, there is a δ > 0 such that

J(x∗(·)) =

t1∫t0

f(s, x∗(s), x∗ (s))ds ≤ J(x(·))

=

t1∫t0

f(s, x(s), x (s))ds, (4.57)

for all x (·) ∈ U0(x∗(·), δ) ∩ Θ. Recall that strong local mini-mizers are also weak local minimizers so that Euler’s NecessaryCondition must hold. In particular, there is a constant c suchthat

[fu(t, x∗(t), x∗(t))] = c+

t∫t0

[fx(s, x∗(s), x∗(s))] ds,

and, between corners,

d

dt[fu(t, x

∗(t), x∗(t))] = [fx(t, x∗(t), x∗(t))] .

Let t0 < t1 < t2 < · · · < tp < t1 be the corners of x∗(·). On eachsubinterval (ti, ti+1) the minimizer x∗(·) and x∗(·) are continuous(see Figure 4.3).

Select a subinterval (tj, tj+1) and let z be any point in (tj, tj+1).In particular, z is not a corner and there is a ρ > 0 suchthat

tj < z < z + ρ < tj+1.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


0t 1tˆjt 1

ˆjt

*( )x

Figure 4.3: The Strong Local Minimizer

Note that if α is any number satisfying z < α ≤ z + ρ, then[z, α] ⊂ (tj, tj+1), so that x∗(·) is smooth on [z, α]. In particular,the derivative x∗(t) is continuous on [z, z + ρ] so there exists abound M = M(ρ) ≥ 0 such that if z ≤ t ≤ z + ρ, then

|x∗(t)| ≤ |M(ρ)| .

Let M = M(v, ρ) = |v| + M(ρ) and let α = α(δ, v) be a numbersatisfying z < α ≤ z + ρ and

M · |α− z| < δ. (4.58)

Observe that if z ≤ ε ≤ α, then

|v| · |ε− z| < M · |α− z| < δ. (4.59)

The Mean Value Theorem implies that if z ≤ t ≤ z+ρ, then thereis a t ∈ (z, t) such that x∗(t)− x∗(z) = x∗(t)(t− z) and hence

|x∗(t)− x∗(z)| ≤∣∣x∗(t)∣∣ · |t− z| < M · |α− z| < δ. (4.60)

We shall use these inequalities to prove the results.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


For v ∈ R and α as above with z ≤ ε ≤ α, define the functions

xv(t) = x∗(z) + v(t− z),

and

θ(t, ε) = x∗(t) + [xv(ε)− x∗(ε)](α− t)(α− ε)

,

respectively.Using these functions we construct a “variation” ϕ (·, ε) of x∗(t)

by

ϕ (t, ε) =

x∗(t), t0 ≤ t < z,xv(t), z ≤ t ≤ ε,θ (t, ε) , ε ≤ t ≤ α,x∗(t), α ≤ t ≤ t1.

.

Figure 4.4 provides a plot of ϕ (·, ε) in terms of its pieces. Itis clear that for each z ≤ ε ≤ α, the variation ϕ (·, ε) is piecewisesmooth, ϕ (t0, ε) = x0, and ϕ (t1, ε) = x1. Thus, ϕ (·, ε) ∈ Θ. Also,

0t 1t

*( )x

z

( )vx t ( , )t

ˆjt 1

ˆjt

Figure 4.4: Defining the Variation: Step One

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


observe that

|x∗(t)− ϕ(t, ε)| =

0, t0 ≤ t < z,|x∗(t)− xv(t)|, z ≤ t ≤ ε|x∗(t)− θ (t, ε) |, ε < t ≤ α0, α ≤ t ≤ t1,

=

0, t0 ≤ t < z,|x∗(t)− [x∗(z) + v(t− z)]|, z ≤ t ≤ ε

|x∗(t)− x∗(t) + [xv(ε)− x∗(ε)] (α−t)(α−ε)|,

ε < t ≤ α0, α ≤ t ≤ t1,

so that for z ≤ t ≤ ε < α

|x∗(t)− [x∗(z) + v(t− z)]| ≤ |x∗(t)− x∗(z)|+ |v(t− z)|≤∣∣x∗(t)∣∣ · |t− z|+ |v| · |t− z|

≤M(ρ) · |ε− z|+ |v| · |ε− z|< (M(ρ) + |v|) · |ε− z|< M · |ε− z| < M · |α− z| < δ.

Therefore, for z ≤ t ≤ ε < α it follows that

|x∗(t)− ϕ(t, ε)| = |x∗(t)− xv(t)| < M · |ε− z| < δ. (4.61)

Now consider the case where z < ε < t ≤ α. For this case itfollows that

|x∗(t)− ϕ(t, ε)| = |x∗(t)− x∗(t) + [xv(ε)− x∗(ε)](α− t)(α− ε)

|

= |[xv(ε)− x∗(ε)](α− t)(α− ε)

|

= |[x∗(z) + v(ε− z)− x∗(ε)] (α− t)(α− ε)

|

= |[x∗(z)− x∗(ε) + v(ε− z)](α− t)(α− ε)

|

≤ [|x∗(z)− x∗(ε)|+ |v(ε− z)|] · | (α− t)(α− ε)

|.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Applying the Mean Value Theorem to x∗(·) on the interval[z, ε] yields a ε ∈ (z, ε) such that

x∗(z)− x∗(ε) = x∗(ε)[ε− z],

and hence we have

|x∗(t)− ϕ(t, ε)| ≤ [|x∗(z)− x∗(ε)|+ |v(ε− z)|] · | (α− t)(α− ε)

|

≤ [|x∗(ε)| |ε− z|+ |v| · |ε− z|]∣∣∣∣ (α− t)(α− ε)

∣∣∣∣≤ M · |ε− z| ·

∣∣∣∣(α− ε)(α− ε)

∣∣∣∣= M · |ε− z| < δ.

Consequently, for z < ε < t ≤ α,

|x∗(t)− ϕ(t, ε)| = |x∗(t)− x∗(t) + [xv(ε)− x∗(ε)](α− t)(α− ε)

|

< M · |ε− z| < δ. (4.62)

Therefore, it follows from (4.61) and (4.62) that for each t ∈ [t0, t1]and z ≤ ε ≤ α

|x∗(t)− ϕ(t, ε)| < M · |ε− z| < δ.

This last inequality implies that

d0(x∗(·), ϕ(·, ε)) = supt0≤t≤t1

|x∗(t)−ϕ(t, ε)| < M · |ε− z| < δ (4.63)

for all z ≤ ε ≤ α and hence the variation ϕ(·, ε) ∈ U0(x∗(·), δ)∩Θfor all ε satisfying z ≤ ε ≤ α. Also note that when ε = z, thenϕ (t, ε)|ε=z = ϕ (t, z) is given by

ϕ (t, z) =

x∗(t), t0 ≤ t ≤ z,θ (t, z) , z < t ≤ α,x∗(t), α ≤ t ≤ t1,

=

x∗(t), t0 ≤ t ≤ z,

x∗(t) + [xv(z)− x∗(z)] (α−t)(α−z) , z < t ≤ α,

x∗(t), α ≤ t ≤ t1.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


*( )x t

( , )t ( , )t

0t 1tz 1

ˆjt

ˆjt

Figure 4.5: Defining the Variation: Step Two

Hence, it follows that

ϕ (t, z) = ϕ (t, ε)|ε=z =

x∗(t), t0 ≤ t ≤ z,x∗(t), z < t ≤ α,x∗(t), α ≤ t ≤ t1,

= x∗(t), t0 ≤ t < t1.

(4.64)Define the function F : [z, α] −→ R by

F (ε) = J(ϕ(·, ε))− J(x∗(·))

=

t1∫t0

f(s, ϕ(s, ε),∂ϕ(s, ε)

∂s)ds−

t1∫t0

f(s, x∗(s), x∗(s))ds,

(4.65)

and observe that (4.64) implies

F (z) = J(ϕ(·, z))− J(x∗(·)) = J(x∗(·))− J(x∗(·)) = 0.

The error bound (4.63) implies that ϕ(·, ε) ∈ U0(x∗(·), δ) ∩ Θ forall ε satisfying z ≤ ε ≤ α so that

J(x∗(·)) ≤ J(ϕ(·, ε)),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and hence

F (z) = 0 ≤ J(ϕ(·, ε))− J(x∗(·)) = F (ε).

Therefore, F : [z, α] −→ R has a minimum at the left endpointε = z. We now apply the one dimensional result Theorem 2.1from Chapter 1. In particular, it follows that

d

dεF (ε)

∣∣∣∣ε=z

≥ 0, (4.66)

and to complete the proof we need to compute this derivative.The computation of d

dεF (ε) requires some preparation. Observe

thatxv(z) = x∗(z) + v(z − z) = x∗(z), (4.67)

and

xv(z) =d

dt[xv(t)]

∣∣∣∣t=z

=d

dt[x∗(z) + v(t− z)]

∣∣∣∣t=z

= v|t=z = v.

(4.68)From the definition of θ(t, ε),

θ(t, ε) = x∗(t) + [xv(ε)− x∗(ε)](α− t)(α− ε)

,

and (4.67) it follows that

θ(t, z) = x∗(t) + [xv(z)− x∗(z)](α− t)(α− z)

= x∗(t), (4.69)

and

θt(t, ε) =∂

∂tθ(t, ε) =

∂

∂t

x∗(t) + [xv(ε)− x∗(ε)]

(α− t)(α− ε)

(4.70)

= x∗(t)− [xv(ε)− x∗(ε)](α− ε)

.

Also, it follows from (4.67) and (4.70) that

θt(t, z) = θt(t, ε)|ε=z = x∗(t)− [xv(z)− x∗(z)]

(α− z)= x∗(t),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


so that at t = zθt(z, z) = x∗(z). (4.71)

Differentiating

θ(t, ε) = x∗(t) + [xv(ε)− x∗(ε)](α− t)(α− ε)

with respect to ε yields

θε(t, ε) =∂

∂εθ(t, ε) =

∂

∂ε

x∗(t) + [xv(ε)− x∗(ε)]

(α− t)(α− ε)

(4.72)

= [xv(ε)− x∗(ε)](α− t)(α− ε)

+ [xv(ε)− x∗(ε)](α− t)(α− ε)2

,

and hence

θε(z, z) = θε(t, ε)|t=ε=z = [xv(z)− x∗(z)](α− z)

(α− z)

+ [xv(z)− x∗(z)](α− z)

(α− z)2

= [xv(z)− x∗(z)] + [xv(z)− x∗(z)]1

(α− z).

It follows from equations (4.67) and (4.68) that xv(z)− x∗(z) = 0and xv(z) = v, respectively. Thus,

θε(z, z) = θε(t, ε)|t=ε=z = v − x∗(z). (4.73)

Likewise,

θε(α, z) = [xv(z)− x∗(z)](α− α)

(α− z)

+[xv(z)− x∗(z)](α− α)

(α− z)2= 0. (4.74)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


We are now ready to compute ddεF (ε)

∣∣ε=z

. First note that

F (ε) = J(ϕ(·, ε))− J(x∗(·))

=

t1∫t0

[f(s, ϕ(s, ε), ϕs(s, ε))− f(s, x∗(s), x∗(s))]ds

=

z∫t0


+

ε∫z


+

α∫ε


+

t1∫α

[f(s, ϕ(s, ε), ϕs(s, ε))− f(s, x∗(s), x∗(s))]ds.

Since ϕ(t, ε) = x∗(t) on the intervals [t0, z] and [α, t1], it followsthat

F (ε) =

ε∫z


+

α∫ε

[f(s, ϕ(s, ε), ϕs(s, ε))− f(s, x∗(s), x∗(s))]ds,

or equivalently,

F (ε) =

ε∫z

f(s, ϕ(s, ε), ϕs(s, ε))ds+

α∫ε

f(s, ϕ(s, ε), ϕs(s, ε))ds

−α∫z

f(s, x∗(s), x∗(s))ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Therefore,

d

dεF (ε) =

d

dε

ε∫z


+

α∫ε


− d

dε

α∫z

f(s, x∗(s), x∗(s))ds

=d

dε

ε∫z


+

α∫ε


=d

dε

ε∫z


−ε∫

α


and the definition of ϕ(·, ε) implies that

d

dεF (ε) =

d

dε

ε∫z

f(s, xv(s), xv(s))ds

− d

dε

ε∫

α

f(s, θ(s, ε), θs(s, ε))ds

. (4.75)

Applying Leibniz’s Formula to the first integral yields

d

dε

ε∫z

f(s, xv(s), xv(s))ds

= f(ε, xv(ε), xv(ε)),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and differentiating the second integral produces

d

dε

ε∫

α

f(s, θ(s, ε), θs(s, ε))ds

= f(ε, θ(ε, ε), θt(ε, ε))

+

ε∫α

[fx(s, θ(s, ε), θs(s, ε))]θε(s, ε)ds

+

ε∫α

[fu(s, θ(s, ε), θs(s, ε))]θsε(s, ε)ds.

Setting ε = z we find that

d

dεF (ε)

∣∣∣∣ε=z

= f(z, xv(z), xv(z))− f(z, θ(z, z), θt(z, z))

− z∫

α

[fx(s, θ(s, z), θs(s, z))]θε(s, z)ds

+

z∫α

[fu(s, θ(s, z), θs(s, z))]θsε(s, z)ds

= f(z, xv(z), xv(z))− f(z, θ(z, z), θt(z, z))

−z∫

α

[fx(s, θ(s, z), θs(s, z))]θε(s, z)

+ [fu(s, θ(s, z), θs(s, z))]θsε(s, z)ds.

Equations (4.67), (4.68), (4.69) and (4.71) yield that xv(z) =x∗(z), xv(z) = v, θ(t, z) = x∗(t), and θt(t, z) = x∗(t), respectively.Therefore,

d

dεF (ε)

∣∣∣∣ε=z

= f(z, x∗(z), v)− f(z, x∗(z), x∗(z))

−z∫

α

[fx(s, x∗(s), x∗(s))]θε(t, z)

+ [fu(s, x∗(s), x∗(s))]θtε(s, z)ds,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and now we use the fact that x∗(t) satisfies Euler’s equation ( i.e.dds

[fu(s, x∗(s), x∗(s))] = [fx(s, x

∗(s), x∗(s))]), to obtain

d

dεF (ε)

∣∣∣∣ε=z

= f(z, x∗(z), v)− f(z, x∗(z), x∗(z))

−z∫

α

d

ds[fu(s, x

∗(s), x∗(s))]θε(s, z)

+ [fu(s, x∗(s), x∗(s))]θsε(s, z)

ds

= f(z, x∗(z), v)− f(z, x∗(z), x∗(z))

−z∫

α

d

ds[fu(s, x∗(s), x∗(s))]θε(s, z) ds

= f(z, x∗(z), v)− f(z, x∗(z), x∗(z))

− [fu(s, x∗(s), x∗(s))]θε(s, z)|s=zs=α .

Finally, we have

d

dεF (ε)

∣∣∣∣ε=z

= f(z, x∗(z), v)− f(z, x∗(z), x∗(z))

− [fu(z, x∗(z), x∗(z))]θε(z, z)

− [fu(α, x∗(α), x∗(α))]θε(α, z).

Substituting θε(z, z) = v − x∗(z) (from equation (4.73)) andθε(α, z) = 0 (from equation (4.74)) into the above equation yields

d

dεF (ε)

∣∣∣∣ε=z

= f(z, x∗(z), v)− f(z, x∗(z), x∗(z))

− [fu(z, x∗(z), x∗(z))][v − x∗(z)]

− [fu(α, x∗(α), x∗(α))] · 0.

Hence, we have established that ddεF (z) = d

dεF (ε)

∣∣ε=z

is given by

d

dεF (ε)

∣∣∣∣ε=z

= f(z, x∗(z), v)− f(z, x∗(z), x∗(z))

− [v − x∗(z)][fu(z, x∗(z), x∗(z))],

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


or equivalently,

d

dεF (z) =

d

dεF (ε)

∣∣∣∣ε=z

= E(z, x∗(z), x∗(z), v). (4.76)

Returning to (4.66), we have now shown that at the point z

E(z, x∗(z), x∗(z), v) =d

dεF (ε)

∣∣∣∣ε=z

≥ 0,

which establishes condition (W-1), for all v ∈ R and any z ∈ [t0, t1]where z 6= tj. If z = tj, the same argument shows that

E(z, x∗(z), x∗(z+), v) =d

dεF (ε)

∣∣∣∣ε=z

≥ 0,

and by using the interval α ≤ ε ≤ z to the left of z, a similarargument yields

E(z, x∗(z), x∗(z−), v) =d

dεF (ε)

∣∣∣∣ε=z

≥ 0.

This completes the proof when z is not a corner of x∗(·). How-ever, note that the argument holds to the right (or left) if z is acorner of x∗(·). Thus, the proof above can be applied even at cornerpoints and hence E(z, x∗(z), x∗(z−), v) = d

dεF (ε)

∣∣ε=z≥ 0 is valid

for any z ∈ [t0, t1]. Since z ∈ [t0, t1] is arbitrary, this completes theproof of the theorem.

4.5.2 Weierstrass Necessary Condition for aWeak Local Minimum

To motivate this section, we return to Example 4.2 with b = 1and see how Theorem 4.7 applies.

Example 4.4 Minimize the functional J(x(·)) =1∫0

[x (s)]3ds, subject

to the endpoint conditions x (0) = 0 and x (1) = 1. Here,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


f(t, x, u) = u3, fu(t, x, u) = 3u2, and fx(t, x, u) = 0. Euler’s equa-tion becomes

3[x∗(t)]2 = fu(t, x∗(t), x∗(t)) = c+

t∫0


= c+

t∫0

0ds = c,

or equivalently,3[x∗(t)]2 = c.

Therefore,x∗(t) = ±

√c/3 = ±k

and all we know is that x∗(·) is piecewise linear. Since x∗ (0) = 0and x∗ (1) = 1, a possible candidate is

x∗(t) = t.

We check Weierstrass’ Necessary Condition. The excess functionis given by

E(t, x, u, v) = [f(t, x, v)− f(t, x, u)]− [v − u] · fu(t, x, u)

= [v3 − u3]− [v − u] · 3u2

so that when x∗(t) = t, x∗(t) = 1 and

E(t, x∗(t), x∗(t), v) = [v3 − 1]− [v − 1] · 3.

If x∗(t) = t was a strong local minimizer, then

[v3 − 1]− [v − 1] · 3 = E(t, x∗(t), x∗(t), v) ≥ 0

must hold for all v ∈ R. However, if v = −3, then

[v3 − 1]− [v − 1] · 3 = [−27− 1]− [−3− 1] · 3= −28 + 4 · 3 = −16 < 0

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and henceE(t, x∗(t), x∗(t),−3) 0.

This shows that x∗(t) = t cannot be a strong local minimizer!Since x∗(t) = t is not a strong local minimum it can not be aglobal minimizer. However, it is still possible that x∗(t) = t is aweak local minimizer.

This example illustrates that we need more (“better”) condi-tions to help with such problems. For example, the excess functionin Example 4.4 above is given by

E(t, x∗(t), x∗(t), v) = E(t, t, 1, v) = [v3 − 1]− [v − 1]3.

Observe that (see Figure 4.6) if v is close to x∗(v) = 1, then

E(t, x∗(t), x∗(t), v) = v3 − 3v + 2 ≥ 0

and the excess function is non-negative in this restricted regime.Thus, if we restrict v to be near x∗(t), then we can improve onthe basic Weierstrass Necessary Condition. In particular, we canobtain a necessary condition for a weak local minimum.

-3 -2 -1 0 1 2 3-20

-15

-10

-5

0

5

10

15

20

v

E(v)

Figure 4.6: Plot of the Excess Function

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Let α1 = [z + α]/2 and assume that ε < α1 < α. Select z ≤ε < α1 such that

|v||(ε− z)| 1

|(α− ε)|<|v||(ε− z)||(α− α1)|

<δ

2,

and note that for ε < t ≤ α

|x∗(t)− ∂ϕ(t, ε)

∂t| = |[xv(ε)− x∗(ε)]

1

(α− ε)|

= |[x∗(ε) + v(ε− z)− x∗(ε)] 1

(α− ε)|

=|v(ε− z)||(α− ε)|

.

If z ≤ ε < ε < α1 < α ≤ α0, then

|x∗(t)− ∂ϕ(t, ε)

∂t| = |x∗(t)− v| ≤ |x∗(t)− x∗(z)|+ |x∗(z)− v|

<δ

4+δ

4≤ δ

2,

for z ≤ t ≤ ε, and

|x∗(t)− ∂ϕ(t, ε)

∂t| = |v(ε− z)||(α− ε)|

≤ |v||(ε− z)||(α− ε)|

≤ |v||(ε− z)||(α− α1)|

<δ

2,

for ε < t ≤ α. Therefore, we have shown that if z ≤ ε < ε, and

|x∗(z)− v| < δ/4,

then

d1(x∗(·), ϕ(·, ε)) = d0(x∗(·), ϕ(·, ε)) + supt0≤t≤t1, t6=ti

|x∗(t)− ∂ϕ(t, ε)

∂t|

<δ

2+δ

2= δ.

This inequality shows that ϕ(·, ε) ∈ U1(x∗(·), δ) ∩ Θ for allε ∈ [z, ε). Now we repeat the proof of Theorem 4.6 with F :

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


[z, ε) −→ R defined by

F (ε) = J(ϕ(·, ε))− J(x∗(·))

=

t1∫t0

f(s, ϕ(s, ε),∂ϕ(s, ε)

∂s)ds−

t1∫t0

f(s, x∗(s), x∗(s))ds

on the sub-interval [z, ε) ⊂ [z, α). This establishes the followingWeierstrass Necessary Condition for a weak local minimizer. Notethat this condition is valid only for v close to x∗(t).

Theorem 4.9 (Restricted Weierstrass Necessary Condition)If x∗(·) ∈ Θ provides a weak local minimum for J(·) on Θ, then,there is a ρ > 0 such that

(RW-1) E(t, x∗(t), x∗(t), v) ≥ 0 for all t ∈ [t0, t1]and v satisfying

|x∗(t)− v| < ρ, (4.77)

and

(RW-2) x∗(t0) = x0,

(RW-3) x∗(t1) = x1.

Using this restricted form of the Weierstrass Necessary Con-dition, one can prove the Legendre Necessary Condition withoutusing second order variations. For completeness we provide thisproof below.

4.5.3 A Proof of Legendre’s NecessaryCondition

Assume x∗ (·) ∈ Θ provides a weak local minimum for J(·) on Θ.If (4.28) does not hold, then there is a fixed z ∈ [t0, t1] such that

fuu (z, x∗(z), x∗ (z)) < 0.

We consider the case where z ∈ (t0, t1) is not a corner. The caseat a corner is treated the same except one works to the left or tothe right. Since the function

Ψ(u) , fuu(z, x∗(z), x∗ (z) + u)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is continuous, there is a ρ1 > 0 such that

Ψ(u) , fuu(z, x∗(z), x∗ (z) + u) < 0

for all u ∈ [−ρ1, ρ1]. However, applying Theorem 4.9, there is aρ > 0 such that if |v − x∗(z)| < ρ, then

E(z, x∗(z), x∗(z), v) ≥ 0.

Let γ(v) be the function

γ(v) = f(z, x∗(z), v),

and select v such that 0 < |v− x∗(z)| < minρ, ρ1. Note that γ(·)is twice continuously differentiable and hence by Taylor’s Theoremthere is a λ, 0 < λ < 1, such that

γ(v) = γ(x∗(z)) + (v − x∗(z))[d

dvγ(x∗(z))] (4.78)

+1

2(v − x∗(z))2[

d2

dv2γ(x∗(z) + λ(v − x∗(z))].

However,

γ(v)− γ(x∗(z))− (v − x∗(z))[d

dvγ(x∗(z))]

= f(z, x∗(z), v)− f(z, x∗(z), x∗(z))

− (v − x∗(z))[fu(z, x∗(z), x∗(z))]

= E(z, x∗(z), x∗(z), v),

and (4.78) implies

E(z, x∗(z), x∗(z), v) = γ(v)− γ(x∗(z))− (v − x∗(z))[d

dvγ(x∗(z))]

=1

2(v − x∗(z))2[

d2

dv2γx∗(z) + λ(v − x∗(z))]

=1

2(v − x∗(z))2[fuu(z, x

∗(z), x∗(z)

+ λ(v − x∗(z)))].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Therefore,

E(z, x∗(z), x∗(z), v) =1

2(v − x∗(z))2[fuu(z, x

∗(z), x∗(z) + u)]

(4.79)where u , λ(v − x∗(z)), satisfies

0 < |u| = |λ||(v − x∗(z))| < |(v − x∗(z))| < minρ, ρ1.

Since |(v − x∗(z))| < ρ it follows that

E(z, x∗(z), x∗(z), u) ≥ 0,

but, on the other hand, since u ∈ [−ρ1, ρ1] it follows that

Ψ(u) , fuu(z, x∗(z), x∗ (z) + u) < 0.

In view of (4.79) we have

0 ≤ E(z, x∗(z), x∗(z), u)

=1

2(v − x∗(z))2[fuu(z, x

∗(z), x∗(z) + u)] < 0,

which is impossible. Hence the assumption that there is a fixedz ∈ [t0, t1] such that

fuu (z, x∗(z), x∗(z)) < 0

must be false. This proves the theorem.

4.6 Applying the Four Necessary Con-

ditions

At this point we have four necessary conditions. Three conditions(Euler Necessary Condition - (I), Legendre Necessary Condition -(III), and Jacobi Necessary Condition - (IV)) hold for weak localminimum, while the Weierstrass Necessary Condition - (II) holdsonly for a strong local minimum. In terms of how to proceed, it issuggested that one starts with the Euler NC - (I) and then checkthe Legendre NC - (III). After this step, when appropriate, checkthe Jacobi NC - (IV) before turning to the Weierstrass NC - (II).We illustrate this by revisiting Example 4.4.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Example 4.5 Consider the problem of minimizing the functional

J(x(·)) =1∫0

[x (s)]3ds subject to the endpoint conditions x (0) =

0 and x (1) = 1. We already know that if x∗(·) is a weak localminimizer, then it is an extremal so that

x∗(t) = ±k

for some k. Applying the Legendre Necessary Condition, it mustbe the case that

fuu(t, x∗(t), x∗(t)) = 6x∗(t) = ±6k , m ≥ 0,

and hence the derivative cannot change sign. Thus, x∗(t) =mt + r for all t ∈ [0, 1] where m ≥ 0. However, the endpointconditions x(0) = 0 and x(1) = 1 imply that m = 1 and x∗(t) = tis the only possible minimizer. Note that x∗(t) = t is smooth and

fuu(t, x∗(t), x∗(t)) = 6x∗(t) = 6 > 0,

so we can apply the Jacobi Necessary Condition - (IV). Since

f(t, x, u) = u3,

it follows that

fxu(t, x∗(t), x∗(t)) = 0, fxx(t, x

∗(t), x∗(t)) = 0

andfuu(t, x

∗(t), x∗(t)) = 6x∗(t) = 6.

Hence, Jacobi’s Equation is given by

d

dt[0 · η(t) + 6η(t)] = [0 · η(t) + 0 · η(t)] = 0,

or equivalently,η(t) = 0.

Thus, all secondary extremals have the form

η(t) = pt+ q,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and if there is a value 0 < tc < 1 such that

η(0) = q = 0

andη(tc) = ptc + q = 0,

thenη(t) ≡ 0

for all t ∈ R. Hence there are no values conjugate to t0 = 0 andJacobi’s Necessary Condition - (IV) is satisfied.

Summarizing, we know that x∗(t) = t satisfies all the necessaryconditions for a weak local minimum, but fails to satisfy Weier-strass’ Necessary Condition. All we can say at this point is thatx∗(t) = t is not a strong local minimizer.



J(x(·)) =

∫ t1

t0

f(s, x(s), x(s))ds,


Use the four necessary conditions to completely analyze the prob-lems below. State exactly what you can say about each problem atthis point. Be sure to distinguish between weak, strong and globalminimizers when possible.


x (s) ds,



x (s) x (s) ds,


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



sx (s) x (s) ds,



0

[x (s)]3ds,



J(x(·)) =

1∫0

[x (s)]2 + [x (s)]2 + 2esx (s)ds,



s−3[x (s)]2ds,



J(x(·)) =

4∫0

[x (s)− 1]2 × [x (s) + 1]2 ds,



J(x(·)) =

π/2∫0

[x (s)]2 − [x (s)]2ds,



J(x(·)) =

π∫0

[x (s)]2 − [x (s)]2ds,


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) =

3π/2∫0

[x (s)]2 − [x (s)]2ds,



J(x(·)) =

b∫0

x(s)√

1 + [x(s)]2ds,



0

√1+[x(s)]2

2gx(s)ds,



J(x(·)) =

2∫1

[x(s)]2 − 2sx(s)ds,



J(x(·)) =

π∫0

[x(s)]2(1− [x(s)]2)ds,



J(x(·)) =

3∫1

[3s− x(s)]x(s)ds,


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) = 4πρv2

L∫0

[x(s)]3x(s)ds,



J(x(·)) =

2∫1

x(s)[1 + s2x(s)]ds,


Advanced Problems

Problem 4.18 Consider the problem of minimizing the func-tional

J(x(·)) =

1∫−1

[x (s) + 1]2[x (s)− 1]2

ds ≥ 0,

with x(−1) = x(1) = 0. Show that there is a sequence of functionsxN(·) such that

J(xN(·)) −→ 0

and d0(xN(·), 0)→ 0, but x0(t) = 0 is not a minimizer of J(·).

Problem 4.19 Consider the problem of minimizing the func-tional

J(x(·)) =

1∫0

[x(s)]2 + [x(s)]2(1− [x(s)]2)

ds,

with x(0) = x(1) = 0. Show that x0(t) = 0 is a weak local mini-mizer, but not a strong local minimizer.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 5

Sufficient Conditions forthe Simplest Problem

Although it is difficult to derive “useful” sufficient conditions,there are some results that combine the necessary conditions. Weshall not spend much time on these conditions. However, thereis merit in looking at some of the techniques used to derive suchconditions. First recall Hilbert’s form of Euler’s Equations.

Observe that Hilbert’s Differentiability Theorem 3.6 im-plies that smooth extremals x(·) have continuous second deriva-tives. Therefore, we may again differentiate Euler’s DifferentialEquation

d

dt[fu (t, x(t), x (t))] = [fx (t, x(t), x (t))] , (5.1)

and apply the chain rule to obtain Hilbert’s (second order) differ-ential equation

x (t) · [fuu (t, x(t), x (t))] = [fx (t, x(t), x (t))]− [fut (t, x(t), x (t))](5.2)

− [fux (t, x(t), x (t))] · x (t) .

185

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

186 Chapter 5. Sufficient Conditions

5.1 A Field of Extremals

Assume that x0(·) is a fixed smooth extremal satisfying Hilbert’sequation (5.2). Also assume that there exists a one parameter fam-ily of solutions ϕ(·, α) to (5.2) with the following properties:

(FE-i) For each parameter α in an open interval (α0− γ, α0 + γ),the function xα(·) defined by xα(t) , ϕ(t, α) is a solution toHilbert’s equation on the interval t0 ≤ t ≤ t1 and x0(t) =ϕ(t, α0).

(FE-ii) The function ϕ(t, α) and all the partial derivatives∂∂tϕ(t, α), ∂

∂αϕ(t, α), ∂2

∂t∂αϕ(t, α), and ∂2

∂t2ϕ(t, α) exist and are

continuous on the set

[t0, t1]× (α0 − γ, α0 + γ).

(FE-iii) The equationx− ϕ(t, α) = 0 (5.3)

implicitly defines a function α : S −→ R on a region S inthe (t, x) plane defined by

S = [t0, t1]× x : x0(t)− δ < x < x0(t) + δ

for some δ > 0. In particular, α = α(t, x) satisfies

x− ϕ(t, α(t, x)) = 0, (5.4)

for all [t x]T ∈ S.

(FE-iv) The partial derivatives ∂∂tα(t, x) and ∂

∂xα(t, x) exist and

are continuous on S.

It is helpful to think of the family of solutions ϕ(·, α) as pro-viding a “strip” of graphs of smooth extremals about the graph ofx0(·).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 5. Sufficient Conditions 187

t0 t1

0 0( ) ( , )x

S

1 1( ) ( , )x

2 2( ) ( , )x

Figure 5.1: A Field of Extremals

Note that (FE-i) through (FE-iv) imply that through each fixedpoint [t x]T ∈ S, there exists exactly one extremal

x(t) = xα(t) = ϕ(t, α) = ϕ(t, α(t, x)).

Let p(t, x) denote the slope of the unique extremal that goesthrough the point [t x]T . Note that at a specific point [t x]T ∈ Sthere is a value α = α(t, x) such that the value of the slope at[t x]T is the slope of the extremal

x(t) = xα(t) = ϕ(t, α)

at t = t. Thus, it follows that

p(t, x) = xα(t) =∂

∂tϕ(t, α)

∣∣∣∣t=t

=∂

∂tϕ(t, α) =

∂

∂tϕ(t, α(t, x))

(5.5)and since (5.5) holds at each [t x]T ∈ S one has that

xα(t) = p(t, x) =∂

∂tϕ(t, α(t, x)) =

∂

∂tϕ(t, α) (5.6)

holds for all [t x]T ∈ S. The function p(t, x) is called the slopefunction on S.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


t0 t1

0 0( ) ( , )x

S

1 1( ) ( , )x

2 2( ) ( , )x

ˆ ˆ( , )t x

ˆ ˆ( , )p t x

Figure 5.2: An Extremal Through a Specified Point and the SlopeFunction

Definition 5.1 The pair F = (S, p(·, ·)) is called a field of ex-tremals about x0(·). Here S is the domain of the function α(t, x),and we say that x0(·) is embedded in F .

The existence of a field of extremals will provide the basic toolfor developing sufficient conditions. However, we need a few prelim-inary results to set the stage for the development of these results.We begin by establishing the following theorem about the slopefunction. Assume x0(·) is embedded in the field F = (S, p(·, ·))with slope function p(·, ·).

Theorem 5.1 (Basic PDE for Slope) The slope function p (t, x)satisfies the partial differential equation

pt(t, x) + p(t, x)px(t, x) · fuu(t, x, p(t, x))

= fx(t, x, p(t, x)) (5.7)

− p(t, x) · fux(t, x, p(t, x))

− fut(t, x, p(t, x))

for all [t x]T ∈ S.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Proof: For each α in the open interval (α0 − γ, α0 + γ), theextremal xα(t) = ϕ(t, α) satisfies Hilbert’s Equation

xα(t) · fuu(t, xα(t), xα(t)) = fx(t, xα(t), xα(t)) (5.8)

− xα(t) · fux(t, xα(t), xα(t))

− fut(t, xα(t), xα(t)).

Given any “fixed” pair [t x]T , let α = α(t, x) so that

xα(t) = ϕ(t, α) = ϕ(t, α(t, x))

and 5.4 in (FE-iii) implies that

x = xα(t) = ϕ(t, α) = ϕ(t, α(t, x)).

Moreover, since ϕ(t, α) = xα(t) and ϕt(t, α) = xα(t), then

(t, xα(t), xα(t)) = (t, ϕ(t, α), p(t, x)) = (t, ϕ(t, α(t, x)), p(t, x))

= (t, x, p(t, x)). (5.9)

Differentiating both sides of (5.6)

p(t, x) = p(t, ϕ(t, α)) = ϕt(t, α) = xα(t), (5.10)

yields

pt(t, ϕ(t, α)) + px(t, ϕ(t, α)) · ϕt (t, α) = ϕtt (t, α) = xα(t).

In addition, since x = ϕ(t, α), p (t, x) = ϕt(t, α) and ϕtt(t, α) =xα(t) it follows that

pt (t, x) + px (t, x) p (t, x) = xα(t).

Multiplying both sides of this equation by fuu(t, xα(t), xα(t)) =fuu(t, x, p(t, x)) and substituting (5.10) into (5.8) yields

pt(t, x) + px(t, x)p(t, x) · fuu(t, xα(t), p(t, x))

= xα(t) · fuu(t, xα(t), xα(t))

= fx(t, xα(t), xα(t))

− xα(t) · fux(t, xα(t), xα(t))

− fut(t, xα(t), xα(t))

= fx(t, x, p(t, x))

− p(t, x) · fux(t, x, p(t, x))

− fut(t, x, p(t, x))

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and this completes the proof.

5.2 The Hilbert Integral

Assume that F = (S, p(·, ·)) is a field of extremals about x0(·)and suppose q(t, x) is any function defined on S. Given any x(·) ∈PWS(t0, t1), the Hilbert Integral is defined by

J∗q (x(·)) =

t1∫t0

f(s, x(s), q(s, x(s)))

+ [x(s)− q(s, x(s))] · fu(s, x(s), q(s, x(s)))ds

=

t1∫t0

f(s, x(s), q(s, x(s)))− q(s, x(s)) · fu(s, x(s),

q(s, x(s)))ds+

t1∫t0

ddsx(s) · fu(s, x(s), q(s, x(s)))ds.

If one defines P(t, x) and Q(t, x) by

P(t, x) = f(t, x, q(t, x))− q(t, x) · fu(t, x, q(t, x)), (5.11)

andQ(t, x) = fu(t, x, q(t, x)), (5.12)

respectively, then J∗q (x(·)) is a line integral given by

J∗q (x(·)) =

t1∫t0

P(s, x)ds+Q(s, x)dx.

We are interested in the case where q(s, x) renders the lineintegral J∗q (x(·)) independent of path. Recall from vector calculus(see Section 9.5 in [48]) that J∗q (x(·)) is independent of path if and

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


only if Px(t, x) = Qt(t, x). A direct calculation yields

Px(t, x) = fx(t, x, q(t, x)) + fu(t, x, q(t, x))qx(t, x)

− qx(t, x)fu(t, x, q(t, x))− q(t, x)[fux(t, x, q(t, x))

+ fuu(t, x, q(t, x))qx(t, x)]

= fx(t, x, q(t, x))− q(t, x)fux(t, x, q(t, x))

− fuu(t, x, q(t, x))q(t, x)qx(t, x).

Likewise, a direct calculation yields

Qt(t, x) = fut(t, x, q(t, x)) + fuu(t, x, q(t, x))qt(t, x).

Hence,Qt(t, x) = Px(t, x),

if and only if

fut(t, x, q(t, x)) + fuu(t, x, q(t, x))qt(t, x)

= fx(t, x, q(t, x))− q(t, x)fux(t, x, q(t, x))

− fuu(t, x, q(t, x))q(t, x)qx(t, x).

Therefore, J∗q (x(·)) is independent of path if and only if for all[t x]T ∈ S, q (t, x) satisfies the partial differential equation

qt(t, x) + q(t, x)qx(t, x) · fuu(t, x, q(t, x))

= fx(t, x, q(t, x)) (5.13)

− q(t, x) · fux(t, x, q(t, x))

− fut(t, x, q(t, x)).

But we know from (5.7) that the slope function satisfies thispartial differential equation. Hence, if q(t, x) = p(t, x), then theHilbert Integral

J∗p (x(·)) =

t1∫t0

f(s, x(s), p(s, x(s)))

+ [x(s)− p(s, x(s))] · fu(s, x(s), p(s, x(s)))ds,

is independent of path. Thus we have established the followingresult.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Theorem 5.2 (Hilbert’s Integral Theorem) Assume that F =(S, p(·, ·)) is a field of extremals about x0(·) and let x1(·) andx2(·) ∈ PWS(t0, t1) be any two functions with graphs containedin S. If x1 (t0) = x2 (t0), and x1 (t1) = x2 (t1), then

J∗p (x1(·)) = J∗p (x2(·)).

5.3 Fundamental Sufficient Results

If there exists a field of extremals F = (S, p(·, ·)) about a smoothfunction x0(·), then Hilbert’s Integral Theorem can be exploited toyield sufficient conditions. The key result relates the cost functionto the Weierstrass Excess Function.

Theorem 5.3 (Weierstrass-Hilbert) Suppose x0(·) is a smoothfunction embedded in a field F = (S, p(·, ·)) and the graph ofx(·) ∈ PWS(t0, t1) is in S. If

x(t0) = x0(t0),

andx(t1) = x0(t1),

then

J(x(·))− J(x0(·)) =

t1∫t0

E(s, x(s), p(s, x(s)), x(s))ds. (5.14)

Proof: Consider Hilbert’s integral with q(t, x) = p(t, x) . Eval-uating J∗p (·) at x0(·), it follows that

J∗p (x0(·)) =

t1∫t0

f(s, x0(s), p(s, x0(s))ds

+

t1∫t0

[x0(s)− p(s, x0(s))] · [fu(s, x0(s), p(s, x0(s))]ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and sincex0(s) = p(s, x0(s))

the Hilbert integral reduces to

J∗p (x0(·)) =

t1∫t0

f(s, x0(s), p(s, x0(s))ds = J(x0(·)). (5.15)

However, Hilbert’s Integral Theorem above implies that

J∗p (x0(·)) = J∗p (x(·)). (5.16)

Therefore it follows from (5.15) and (5.16) above that

J(x(·))− J(x0(·))= J(x(·))− J∗p (x0(·)) = J(x(·))− J∗p (x(·))

=

t1∫t0

f(s, x(s), x(s))ds−t1∫t0

f(s, x(s), p(s, x(s))ds

−t1∫t0

[x(s)− p(s, x(s))] · [fu(s, x(s), p(s, x(s))]ds

=

t1∫t0

E(s, x(s), p(s, x(s)), x(s))ds,

and this completes the proof. We now have the following fundamental sufficiency condition

based on the Weierstrass excess function. Note that we need theexistence of a field of extremals.

Theorem 5.4 (Fundamental Sufficiency Theorem) Assumethat x0(·) is a smooth extremal embedded in a field F = (S, p(·, ·))satisfying x0 (t0) = x0 and x0 (t1) = x1. If x(·) ∈ Θ is any otherpiecewise smooth function with graph in S and if for all t ∈ [t0, t1]

E(t, x(t), p(t, x(t)), x(t)) ≥ 0,

thenJ(x0(·)) ≤ J(x(·)).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


In order to make use of the Fundamental Sufficiency Theorem,one needs to have the answer to the following two questions:

(A) When can a smooth function x0(·) be embedded in a fieldF = (S, p(·, ·))?and

(B) When is E(t, x(t), p(t, x(t)), x(t)) ≥ 0?One answer to the first question is given by the following result.The proof of this result is outlined in Ewing’s book [77] and detailscan be found in Bliss [27] and Bolza [31].

Theorem 5.5 (Fundamental Field Theorem) If x0(·) is smoothand satisfies(F1) Euler’s Equation

d

dt[fu (t, x0(t), x0 (t))] = [fx (t, x0(t), x0 (t))] , (5.17)

(F2) the Strengthen Legendre Condition

fuu(t, x0(t), x0(t)) > 0, t0 ≤ t ≤ t1, (5.18)

(F3) the Strengthen Jacobi Condition that there is no value tcconjugate to t0 satisfying

tc ≤ t1, (5.19)

then there exists a field of extremals F = (S, p(·, ·)) about x0(·).

The following result connects the Weierstrass and Legendreconditions and is the key to obtaining the basic sufficient condi-tions.

Theorem 5.6 (Excess Function Expansion) If x0(·) is em-bedded in a field with slope function p(t, x), then for each v ∈ Rand [t x]T ∈ S there is a function θ = θ(t, x, v) such that0 < θ(t, x, v) < 1 and

E(t, x, p(t, x), v) = 1/2[v − p(t, x)]2fuu(t, x, p(t, x)

+ θ(t, x, v)[v − p(t, x)]). (5.20)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Proof : Assume x0(·) is embedded in a field of extremals F =(S, p(·, ·)) about x0(·) with slope function p(t, x). For each [t x]T ∈S define the function

r(v) = f(t, x, v)

so thatd

dvr(v) = r′(v) = fu(t, x, v),

andd2

dv2r(v) = r′′(v) = fuu(t, x, v).

Let p = p(t, x). Taylor’s theorem with remainder implies forany v that there is a θ = θ(t, x, v), with 0 < θ < 1 such that

r(v) = r(p) + [v − p]r′(p) + 1/2[v − p]2[r′′(p+ θ[v − p])]= f(t, x, p) + [v − p][fu(t, x, p)]

+ 1/2[v − p]2[fuu(t, x, p+ θ[v − p])],

which implies

f(t, x, v) = f(t, x, p) + [v − p][fu(t, x, p)]+ 1/2[v − p]2[fuu(t, x, p+ θ[v − p])],

or equivalently,

f(t, x, v)− f(t, x, p)− [v − p][fu(t, x, p)]= 1/2[v − p]2[fuu(t, x, p+ θ[v − p])].

The previous equality implies that

E(t, x, p, v) = 1/2[v − p]2fuu(t, x, p+ θ[v − p]).

In particular, since [t x]T ∈ S is arbitrary, it follows that thefunction θ = θ(t, x, v) exists for all [t x]T ∈ S, 0 < θ(t, x, v) < 1and

E(t, x, p(t, x), v)

= 1/2[v − p(t, x)]2fuu(t, x, p(t, x) + θ(t, x, v)[v − p(t, x)]).(5.21)

This completes the proof and leads to the first sufficient conditionfor a strong local minimum.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Theorem 5.7 (Sufficient Condition (1)) If the problem isregular (i.e. fuu(t, x, u) > 0 for all (t, x, u)), x0 (·) ∈ Θ is smooth,and satisfies(S1–1) Euler’s Differential Equation

d

dt[fu (t, x0(t), x0 (t))] = [fx (t, x0(t), x0 (t))] ,

(S1–2) the Strengthen Jacobi Condition that there is no value tcconjugate to t0 satisfying

tc ≤ t1,

then x0 (·) provides a strong local minimum for J(·) on Θ.

Proof: By Theorem 5.5 it follows that the smooth functionx0(·) can be embedded in a field F = (S, p(·, ·)), x0 (t0) = x0

and x0 (t1) = x1. Also, there is a δ > 0 such that if x(·) is anypiecewise smooth function satisfying x (t0) = x0, x (t1) = x1 andd0(x0(·), x(·)) < δ, then x(·) ∈ Θ has its graph in S. Thus, (5.20)implies that there exists a θ(t) = θ(t, x(t), x(t)) with 0 < θ(t) < 1such that

E(t, x(t), p(t, x(t)), x(t)) = 1/2[x(t)− p(t, x(t))]2 · [fuu(t, x(t),

p(t, x(t)) + θ(t)[x(t)− p(t, x(t))])]

≥ 0

for all t ∈ [t0, t1] and the result follows from the FundamentalSufficiency Condition above.

We also have the following sufficient condition for a weak localminimum. The proof is similar to the proof to Theorem 5.7 aboveand is left as an exercise.

Theorem 5.8 (Sufficient Condition (2)) If x0 (·) ∈ Θ issmooth, and satisfies(S2− 1) Euler’s Equation

d


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(S2− 2) the Strengthen Legendre Condition

fuu(t, x0(t), x0(t)) > 0, t0 ≤ t ≤ t1,

(S2− 3) the Strengthen Jacobi Condition that there is no value tcconjugate to t0 satisfying

tc ≤ t1,

then x0 (·) provides a weak local minimum for J(·) on Θ.



J(x(·)) =

∫ t1

t0

f(s, x(s), x(s))ds,


Determine if the sufficient conditions above are helpful in analyz-ing the following problems. Be sure to distinguish between weak,strong and global minimizers.


x (s) ds,



x (s) x (s) ds,



sx (s) x (s) ds,


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



0

[x (s)]3ds,



J(x(·)) =

1∫0

[x (s)]2 + [x (s)]2 + 2esx (s)ds,



J(x(·)) =

2∫1

s−3[x (s)]2ds,



J(x(·)) =

4∫0

[x (s)− 1]2 [x (s) + 1]2 ds,



J(x(·)) =

π/2∫0

[x (s)]2 − [x (s)]2ds,



J(x(·)) =

π∫0

[x (s)]2 − [x (s)]2ds,


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) =

3π/2∫0

[x (s)]2 − [x (s)]2ds,



J(x(·)) =

b∫0

x(s)√

1 + [x(s)]2ds,



0

√1+[x(s)]2

2gx(s)ds,



J(x(·)) =

2∫1

[x(s)]2 − 2sx(s)ds,



J(x(·)) =

π∫0

[x(s)]2(1− [x(s)]2)ds,



J(x(·)) =

3∫1

[3s− x(s)]x(s)ds,


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) = 4πρv2

L∫0

[x(s)]3x(s)

ds,



J(x(·)) =

2∫1

x(s)[1 + s2x(s)]ds,


Advanced Problems

Problem 5.18 Prove the Fundamental Field Theorem 5.5.

Problem 5.19 Consider the functional

J(x(·)) =

2∫0

[x(s)]3 + [sin(s)]2ds,

subject to the endpoint conditions x (0) = 1 and x (2) = 1. Find afield of extremals about the extremal x0(t) = 1.


J(x(·)) =

2∫0

[x(s)]3 + [sin(s)]2ds,

subject to the endpoint conditions x (0) = 0 and x (2) = 4. Find afield of extremals about the extremal x0(t) = 2t.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) =

1∫−1

x(s)[2t− 1

2x(s)]ds,

subject to the endpoint conditions x (−1) = 0 and x (1) = 12. Find

a field of extremals about the extremal x0(t) = t2 + 14t− 3

4.

Problem 5.22 Prove the Sufficient Condition (2) as stated inTheorem 5.8.

Problem 5.23 Consider the problem of minimizing∫ T

0

[x(s)]2 − [x(s)]2ds,

subject to x(0) = 0, x(T ) = 0. [(a)] For which T is it the case thatx(t) ≡ 0 is a strong local minimizer? [(b)] How does your analysisextend to the more general problem of minimizing∫ T

0

p(s)[x(s)]2 + q(s)[x(s)]2ds,

subject to x(0) = 0 and x(T ) = 0. Here, p(·) and q(·) are realvalued smooth functions defined on [0,∞) with p(t) > 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 6

Summary for the SimplestProblem

This is a summary of the definitions and fundamental results forthe Simplest Problem in the Calculus of Variations. Recall thatf(t, x, u), t0, t1, x0 and x1 are given and we assume that f(t, x, u)belongs to C2.

Let X = PWS(t0, t1) denote the space of all real-valued piece-wise smooth functions defined on [t0, t1]. For each PWS functionx : [t0, t1]→ R, define the functional J : X → R by

J(x(·)) =

t1∫t0

f (s, x(s), x (s)) ds. (6.1)

Assume that the points [t0 x0]T and [t1 x1]T are given and definethe set of PWS functions Θ by

Θ = x(·) ∈ PWS(t0, t1) : x (t0) = x0, x (t1) = x1 . (6.2)

Observe that J : X → R is a real valued function on X.

———————————————————————–

203

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

204 Chapter 6. Summary for the Simplest Problem

The Simplest Problem in the Calculus of Variations:Find x∗ (·) ∈ Θ such that

J(x∗(·)) =

t1∫t0

f(s, x∗(s), x∗ (s))ds ≤ J(x(·))

=

t1∫t0


for all x (·) ∈ Θ.

———————————————————————–

If x(·) and z(·) ∈ PWS(t0, t1), then

d0(x(·), z(·)) , supt0≤s≤t1

|x(s)− z(s)| (6.3)

defines the d0 distance between x(·) and z(·). Given x(·) ∈PWS(t0, t1) and δ > 0, the U0(x(·), δ)-neighborhood of x(·) is de-fined to be the open ball

U0(x(·), δ) = x(·) ∈ PWS(t0, t1) : d0(x(·), x(·)) < δ .

———————————————————————–

For x(·) and z(·) ∈ PWS(t0, t1), there is a (finite) partition of[t0, t1], say t0 = t0 < t1 < t2 < · · · < tp−1 < tp = t1, such thatx (t) and z (t) exist and are continuous (and bounded) on eachopen subinterval

(ti−1, ti

). The d1 distance between x(·) and z(·)

is defined by

d1(x(·), z(·)) , supt0≤t≤t1

|x(t)− z(t)|+ supt0≤s≤t1, s 6=ti

|x(t)− z(t)|

(6.4)

= d0(x(·), z(·)) + supt0≤t≤t1, s 6=ti

|x(t)− z(t)|.

The U1(x(·), δ)-neighborhood of x(·) is defined to be the open ball

U1(x(·), δ) = x(·) ∈ PWS(t0, t1) : d1(x(·), x(·)) < δ .

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 6. Summary for the Simplest Problem 205

———————————————————————–

If x∗ (·) ∈ Θ satisfies J(x∗(·)) ≤ J(x(·)) for all x (·) ∈ Θ, thenx∗ (·) is called a global minimizer for J(·) on Θ.

If there is a δ > 0 and a x∗ (·) ∈ Θ, such that J(x∗(·)) ≤J(x(·)), for all x (·) ∈ U0(x∗(·), δ)∩Θ, then x∗ (·) is called a stronglocal minimizer for J(·) on Θ.

Similarly, if there is a δ > 0 and a x∗ (·) ∈ Θ, such thatJ(x∗(·)) ≤ J(x(·)), for all x (·) ∈ U1(x∗(·), δ) ∩ Θ, then x∗ (·) iscalled a weak local minimizer for J(·) on Θ.

———————————————————————–

Theorem 6.1 (Euler Necessary Condition - (I)) If x∗ (·) ∈Θ provides a weak local minimum for J(·) on Θ, then,

(E-1) there is a constant c such that for all t ∈ [t0, t1],

[fu (t, x∗(t), x∗ (t))] = c+

t∫t0

[fx (s, x∗(s), x∗ (s))] ds, (6.5)

(E-2) x∗(t0) = x0,

(E-3) x∗(t1) = x1.

(E-4) Between corners of x∗(·) the function fu (t, x∗(t), x∗ (t)) isdifferentiable and if t is not a corner of x∗(·), then

d


———————————————————————–

Any piecewise smooth function x(·) satisfying (6.5) is calledan extremal. The Euler Necessary Condition 6.1 implies that anylocal minimizer of J(·) on Θ is an extremal.

Observe that extremals do not have to satisfy theboundary conditions.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


If fuu(t, x, u) 6= 0 for all (t, x, u), t0 ≤ t ≤ t1, then the integrandf(t, x, u) is called non-singular. If fuu(t, x, u) > 0 for all (t, x, u),then the integrand f(t, x, u) is said to be regular and the SPCVis called a regular problem. If x(·) ∈ PWS(t0, t1) is an extremal,then x(·) is called a non-singular extremal if at all points t ∈ [t0, t1]where x(t) is defined, fuu(t, x(t), x(t)) 6= 0. If x(·) ∈ PWS(t0, t1)is an extremal, then x(·) is called a regular extremal if at all pointst ∈ [t0, t1] where x(t) is defined, fuu(t, x(t), x(t)) > 0.

———————————————————————–

Weierstrass-Erdmann Corner Condition. If x(·) ∈PWS(t0, t1) is an extremal, then

fu(t, x(t), x(t+)) = fu(t, x(t), x(t−)) (6.7)

for all t ∈ (t0, t1).

———————————————————————–

Theorem 6.2 (Hilbert’s Differentiability Theorem) If x(·) ∈PWS(t0, t1) is an extremal, t is not a corner of x(·), andfuu(t, x(t), x(t)) 6= 0, then there exists a δ > 0 such that x(·) hasa continuous second derivative for all t ∈ (t− δ, t+ δ) and

[fuu (t, x(t), x (t))] · x (t) + [fux (t, x(t), x (t))] · x (t)+ [fut(t, x(t), x(t))]

= [fx (t, x(t), x (t))] .(6.8)

If in addition, f(t, x, u) is of class Cp, p ≥ 2, then any non-singular extremal x(·) is also of class Cp.

———————————————————————–

The Weierstrass Excess Function is defined by

E(t, x, u, v) = [f(t, x, v)− f(t, x, u)]− [v − u]fu(t, x, u) (6.9)

for all (t, x, u, v) ∈ [t0, t1]× R3.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


———————————————————————–

Theorem 6.3 (Weierstrass Necessary Condition - (II)) Ifx∗(·) ∈ Θ provides a strong local minimum for J(·) on Θ, then,

(W-1) E(t, x∗(t), x∗(t), v) ≥ 0 for all t ∈ [t0, t1] and v ∈ R,

(W-2) x∗(t0) = x0,

(W-3) x∗(t1) = x1.

Condition (W-1 )

E(t, x∗(t), x∗(t), v) ≥ 0, (6.10)

is the essential new information in Weierstrass’ Necessary Condi-tion. Moreover, (6.10) holds at all t ∈ [t0, t1], including corners. Inparticular, for all v ∈ R

E(t, x∗(t), x∗(t+), v) ≥ 0, (6.11)

andE(t, x∗(t), x∗(t−), v) ≥ 0. (6.12)

———————————————————————–

Assume that x∗(·) ∈ Θ provides a strong local minimum forJ(·) on Θ and define

H(t, v) , −E(t, x∗(t), x∗(t), v), (6.13)

and note that Weierstrass’ Necessary Condition may be written as

H(t, v) = −E(t, x∗(t), x∗(t), v) ≤ 0

for all v ∈ R. However, if v = u∗(t) = x∗(t), then using the

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


definition of the excess function, one has that

H(t, x∗(t)) = H(t, u∗(t))

= −E(t, x∗(t), x∗(t), u∗(t))

= −[f(t, x∗(t), u∗(t))− f(t, x∗(t), x∗(t))]− [u∗(t)

− x∗(t)]fu(t, x∗(t), x∗(t)= −[f(t, x∗(t), x∗(t))− f(t, x∗(t), x∗(t))]

− [x∗(t)− x∗(t)]fu(t, x∗(t), x∗(t))= 0.

Consequently,

H(t, v) ≤ 0 = H(t, x∗(t)) = H(t, u∗(t)),

for all v ∈ R, and we have the following equivalent version ofWeierstrass’ Necessary Condition.

Theorem 6.4 (Weierstrass Maximum Principle) If x∗(·) ∈Θ provides a strong local minimum for J(·) on Θ, then,

(WMP-1) v = u∗(t) = x∗(t) maximizes H(t, v)

(WMP-2) x∗(t0) = x0,

(WMP-3) x∗(t1) = x1.

In particular,

H(t, u∗(t)) = H(t, x∗(t)) = maxv∈R

H(t, v) = 0 (6.14)

for all t ∈ [t0, t1].

———————————————————————–

Theorem 6.5 (Legendre Necessary Condition - (III)) Ifx∗ (·) ∈ Θ provides a weak local minimum for J(·) on Θ, then,

(L-1) fuu (t, x∗(t), x∗ (t)) ≥ 0, for all t ∈ [t0, t1],

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(L-2) x∗(t0) = x0,

(L-3) x∗(t1) = x1.

It is important to again note that condition (L − 1) holds atcorners. In particular,

fuu (t, x∗(t), x∗ (t)) ≥ 0 (6.15)

implies thatfuu(t, x∗(t), x∗

(t+))≥ 0

for all t ∈ [t0, t1), and

fuu(t, x∗(t), x∗

(t−))≥ 0


———————————————————————–

If x∗ (·) ∈ Θ provides a weak local minimum for J(·) on Θ,then define the functions f ∗xx(t), f

∗xu(t), and f ∗uu(t) by

f ∗xx(t) = fxx(t, x∗(t), x∗(t)), f ∗xu(t) = fxu(t, x

∗(t), x∗(t)),


∗(t), x∗(t)),

respectively. Also, define the function F(t, η, ξ) by

F(t, η, ξ) =1

2[f ∗xx(t)η

2 + 2f ∗xu(t)ηξ + f ∗uu(t)ξ2], (6.16)

and consider the functional J : PWS(t0, t1) −→ R given by

J (η(·)) =

t1∫t0

F(s, η(s), η(s))ds. (6.17)

Let ΘS ⊂ PWS(t0, t1) be defined by

ΘS = η(·) ∈ PWS(t0, t1) : η (t0) = 0, η (t1) = 0 = V0, (6.18)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and consider the so called Accessory (Secondary) MinimumProblem: Find η∗ (·) ∈ ΘS, such that

J (η∗(·)) =

t1∫t0

F(s, η∗(s), η∗(s))ds ≤ J (η(·)) =

t1∫t0


for all η (·) ∈ ΘS.

If η∗ (·) ∈ ΘS is any minimizer of J (η(·)) on ΘS, then there isa constant c such that η∗ (·) satisfies Jacobi’s Integral Equation

[Fξ (t, η∗(t), η∗ (t))] = c+

t∫t0

[Fη (s, η∗(s), η∗ (s))] ds. (6.19)

In addition, between corners the function Fξ (t, η∗(t), η∗ (t)) is dif-ferentiable and

d

dt[Fξ (t, η∗(t), η∗ (t))] = [Fη (t, η∗(t), η∗ (t))] . (6.20)

Observe that Jacobi’s Differential Equation is Euler’s Differen-tial Equation for the case where f(t, x, u) is replaced by F(t, η, ξ).In particular, Jacobi’s Differential Equation (6.20) has the form

d

dt[f ∗xu(t)η(t) + f ∗uu(t)η(t)] = [f ∗xx(t)η(t) + f ∗xu(t)η(t)]. (6.21)

A PWS function η(·) satisfying Jacobi’s equation (6.19) (or(6.20)) is called a secondary extremal.

———————————————————————–

A value tc is said to be a conjugate value to t0, if t0 < tc,and there is a solution ηc(·) to Jacobi’s Equation (6.19) (or (6.20))satisfying (i) ηc(t0) = ηc(tc) = 0 and ηc(t) 6= 0, for some t ∈(t0, tc). In particular, ηc(·) does not vanish on (t0, tc). The point[tc x∗(tc)]

T ∈ R2 on the graph of x∗(·) is said to be a conjugatepoint to the initial point [t0 x∗(t0)]T inR2.

———————————————————————–

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


t0 t1

( )c tconjugate values

ct ct

Figure 6.1: Definition of a Conjugate Value

Theorem 6.6 (Jacobi’s Necessary Condition - (IV)) As-sume that x∗ (·) ∈ Θ provides a weak local minimum for J(·) on Θ.If x∗ (·) is smooth and regular (i.e. fuu(t, x

∗(t), x∗(t)) > 0), then

(J-1) there can not be a value tc conjugate to t0 with

tc < t1, (6.22)

(J-2) x∗(t0) = x0,

(J-3) x∗(t1) = x1.

———————————————————————–

Theorem 6.7 (Sufficient Condition (1)) If the problem isregular (i.e. fuu(t, x, u) > 0 for all (t, x, u)), x0 (·) ∈ Θ is smooth,and satisfies(S1–1) Euler’s Equation

d


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(S1–2) the Strengthen Jacobi Condition that there is no value tcconjugate to t0 satisfying

tc ≤ t1,

then x0 (·) provides a strong local minimum for J(·) on Θ.

———————————————————————–

Theorem 6.8 (Sufficient Condition (2)) If x0 (·) ∈ Θ issmooth, and satisfies

(S2−1) Euler’s Equation

d


(S2−2) the Strengthen Legendre Condition

fuu(t, x0(t), x0(t)) > 0, t0 ≤ t ≤ t1,

(S2−3) the Strengthen Jacobi Condition that there is no value tcconjugate to t0 satisfying

tc ≤ t1,

then x0 (·) provides a weak local minimum for J(·) on Θ.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 7

Extensions andGeneralizations

In this chapter we discuss several extensions of the Simplest Prob-lem of the Calculus of Variations. Although we focus on the firstnecessary condition (Euler’s Necessary Condition), there are ex-tensions of all the Necessary conditions (I) - (IV) presented above.We shall derive the necessary conditions for global minimizers andsimply note that the proofs may be modified as in Chapter 4 forweak local minimizers.

7.1 Properties of the First Variation

We recall the definition of the first variation of a functional de-fined on PWS(t0, t1) at a given xo(·) ∈ PWS(t0, t1). We restateDefinition 2.14 in Chapter 2. If xo(·) and η(·) ∈ PWS(t0, t1), thenthe first variation of J(·) at xo(·) in the direction of η(·) isdenoted by δJ(xo(·); η(·)) and is defined by

δJ(xo(·); η(·)) =d

dε[J(xo(·) + εη(·))]

∣∣∣∣ε=0

. (7.1)

As in Chapter 3 one uses Leibniz’s formula (Lemma 3.2) to showthat the first variation exists for all η(·) ∈ PWS(t0, t1). Moreover,the first variation δJ(xo(·); η(·)) of J(·) at xo(·) in the direction of

213

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

214 Chapter 7. Extensions and Generalizations

η(·) has the form

δJ(xo(·); η(·)) =

t1∫t0

fx(s, xo(s), xo(s))η(s)

+ fu(s, xo(s), xo(s))η(s)ds. (7.2)

Observe that the above definition is valid for any functionxo(·) ∈ PWS(t0, t1). When xo(·) is an extremal, the first varia-tion has a special form. Assume now that xo(·) is an extremal sothat there is a constant c such that for all t ∈ [t0, t1]

[fu (t, xo(t), xo (t))] = c+

t∫t0

[fx (s, xo(s), xo (s))] ds.

Between corners, xo(·) is differentiable and

d

dt[fu (t, xo(t), xo (t))] = fx (t, xo(t), xo (t)) . (7.3)

Using the fact that xo(·) satisfies the Euler equation (7.3), thenthe first variation (7.2) becomes

δJ(xo(·); η(·)) =

t1∫t0

[fx (s, xo(s), xo (s))] · η (s)

+ [fu (s, xo(s), xo (s))] · η (s)ds

=

t1∫t0

dds

[fu (s, xo(s), xo (s))] · η (s)

+ [fu (s, xo(s), xo (s))] · η (s)ds

=

t1∫t0

d

ds[fu (s, xo(s), xo (s)) · η (s)] ds

= [fu (t, xo(t), xo (t)) · η (t)]|t=t1t=t0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 7. Extensions and Generalizations 215

Therefore, we have shown that if xo (·) is any extremal andη (·) ∈ PWS(t0, t1), then

δJ(xo(·); η(·)) = [fu (t, xo(t), xo (t)) · η (t)]|t=t1t=t0

= fu (t1, xo (t1), xo (t1)) · η (t1)

− fu (t0, xo (t0), xo (t0)) · η (t0) .

We summarize this as the following Lemma.

Lemma 7.1 If xo(·) ∈ PWS(t0, t1) is an extremal and η(·) ∈PWS(t0, t1), then the first variation of J(·) at xo(·) in the directionof η(·) exists and is given by

δJ(xo(·); η(·)) = [fu (t, xo(t), xo (t)) · η (t)]|t=t1t=t0. (7.4)

7.2 The Free Endpoint Problem

Recall the river crossing problem discussed in Section 2.1.2 andSection 2.4.3. The river crossing problem is to find a smooth func-tion x∗ : [0, 1] −→ R that minimizes

J(x(·)) =

1∫0

√c2(1 + [x (s)]2)− [v(s)]2 − v(s)x (s)

c2 − [v(s)]2ds (7.5)

among all smooth functions satisfying

x (0) = 0. (7.6)

Observe that unlike the SPCV, there is no specified value at t = 1.In particular, the value of x(1) is “free” and must be determinedas part of finding the minimizing function x∗(·). This a typicalexample of the so-called free-endpoint problem.

In this section we consider the free-endpoint problem and focuson obtaining (first order) necessary conditions. As in the previouschapters, let X = PWS(t0, t1) denote the space of all real-valued

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


piecewise smooth functions defined on [t0, t1]. For each PWS func-tion x : [t0, t1]→ R, define the functional J : X → R by

J(x(·)) =

t1∫t0

f (s, x(s), x (s)) ds. (7.7)

Assume that the interval [t0, t1] and initial value x0 are given (novalue is assigned at t1) and define the set of PWS functions ΘL by

ΘL = x(·) ∈ PWS(t0, t1) : x (t0) = x0 . (7.8)

The Free Endpoint Problem is the problem of minimizingJ(·) on ΘL. In particular, the goal is to find x∗ (·) ∈ ΘL such that

J(x∗(·)) =

t1∫t0

f (s, x∗(s), x∗ (s)) ds ≤ J(x(·))

=

t1∫t0

f (s, x(s), x (s)) ds,

for all x (·) ∈ ΘL.

t0

t1

t

x0

o

2( )x

*( )x

1( )x

Figure 7.1: The Free Endpoint Problem

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


In order to derive the first order necessary condition for mini-mizing J(·) on the set ΘL we must decide what class of variationsare admissible. Assume that x∗ (·) ∈ ΘL minimizes J(·) on ΘL. Wewish to make sure that for all ε (or for all ε sufficiently small) thevariations ϕ(·, ε) = x∗ (·) + εη(·) belongs to ΘL. In particular, weneed to show that

ϕ(t, ε) = x∗ (t) + εη(t) ∈ ΘL. (7.9)

It is clear that if

VL(t0, t1) = η(·) ∈ PWS(t0, t1) : η (t0) = 0 , (7.10)

then (7.9) is satisfied for all ε. Observe that V0(t0, t1) ⊂ VL(t0, t1) ⊂PWS(t0, t1) so that (7.9) also holds for any η(·) ∈ V0(t0, t1) ⊂VL(t0, t1) ⊂ PWS(t0, t1). The space VL(t0, t1) defined by (7.10) iscalled the space of “admissible variations” for the free endpointproblem.

We turn now to the derivation of Euler’s Necessary Conditionfor the free endpoint problem. Although the presentation below isalmost identical to the derivation for the SPCV in Chapter 3, wepresent the details again to re-enforce the basic idea. The main

t0

t1

1( )

2 ( )

Figure 7.2: Admissible Variations for the Free Endpoint Problem

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


difference here is that the free endpoint condition leads to a new“natural boundary condition” that is an essential component ofthe necessary condition.

7.2.1 The Euler Necessary Condition

Assume that x∗ (·) ∈ ΘL is a (global) minimizer for J(·) on ΘL. Inparticular, assume that x∗ (·) ∈ ΘL satisfies

J(x∗(·)) =

t1∫t0

f (s, x∗(s), x∗ (s)) ds ≤ J(x(·))

=

t1∫t0

f (s, x(s), x (s)) ds, (7.11)

for all x (·) ∈ ΘL.Let η(·) ∈ VL(t0, t1) and consider the “variation”

ϕ(t, ε) = x∗ (t) + εη(t). (7.12)

Observe that for each ε ∈ R, ϕ(t, ε) ∈ ΘL since ϕ(t, ε) ∈PWS(t0, t1) and satisfies ϕ(t0, ε) = x∗ (t0)+εη(t0) = x0 +εη(t0) =x0. Hence, if η(·) ∈ VL(t0, t1), then for all ε ∈ R the variationϕ(t, ε) , x∗ (t) + εη(t) ∈ ΘL, i.e. it is admissible. Since x∗ (·) ∈ ΘL

minimizes J(·) on ΘL, it follows that

J(x∗(·)) ≤ J(x∗ (·) + εη(·)) (7.13)

for all ε ∈ (−∞,+∞). Define F : (−∞,+∞) −→ R by

F (ε) = J(x∗(·) + εη(·)) =

t1∫t0

f(s, x∗ (s) + εη(s), x∗ (s) + εη(s))ds,

(7.14)and note that (7.13) implies that

F (0) = J(x∗(·)) ≤ J(x∗(·) + εη(·)) = F (ε)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all ε ∈ (−∞,+∞). Therefore, F (·) has a minimum on(−∞,+∞) at ε∗ = 0 and applying Theorem 2.1 it follows thatthe first variation must be zero. That is,

δJ(x∗(·); η(·)) , d

dεF (ε)

∣∣∣∣ε=0

=d

dε[J(x∗(·) + εη(·))]

∣∣∣∣ε=0

= 0,

(7.15)for all η(·) ∈ VL(t0, t1). However, (7.2) implies that

δJ(x∗(·); η(·)) =

t1∫t0

[fx (s, x∗(s), x∗ (s))] η (s)

+ [fu (s, x∗(s), x∗ (s))] η (s)ds (7.16)

and (7.15) yields

t1∫t0

[fx (s, x∗(s), x∗ (s))] η (s) + [fu (s, x∗(s), x∗ (s))] η (s) ds = 0

(7.17)for all η(·) ∈ VL(t0, t1).

Since (7.17) holds for all η(·) ∈ VL(t0, t1) and V0(t0, t1) ⊂VL(t0, t1) it follows that

t1∫t0

[fx (s, x∗(s), x∗ (s))] η (s) + [fu (s, x∗(s), x∗ (s))] η (s) ds = 0

holds for all η(·) ∈ V0(t0, t1). Therefore, the FundamentalLemma of the Calculus of Variations 3.1, Part (B) impliesthat there is a c such that

[fu (t, x∗(t), x∗ (t))] = c+

t∫t0

[fx (s, x∗(s), x∗ (s))] ds

e.f. on t ∈ [t0, t1] and hence x∗(·) is an extremal. Moreover, be-tween corners

d

dt[fu (t, x∗(t), x∗ (t))] = fx (t, x∗(t), x∗ (t)) . (7.18)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Consequently, Lemma 7.1 implies that the first variation of J(·)at x∗(·) in the direction of η(·) exists and is given by

δJ(x∗(·); η(·)) = [fu (t, x∗(t), x∗ (t)) η (t)]|t=t1t=t0. (7.19)

Returning to (7.15) we have that

0 = δJ(x∗(·); η(·)) = [fu (t, x∗(t), x∗ (t)) η (t)]|t=t1t=t0

must hold for all η(·) ∈ VL(t0, t1). Hence, if η(·) ∈ VL(t0, t1), thenη(t0) = 0 and it follows that

0 = [fu (t, x∗(t), x∗ (t)) η (t)]|t=t1t=t0

= fu (t1, x∗ (t1), x∗ (t1)) η (t1)− fu (t0, x

∗ (t0), x∗ (t0)) η (t0)

= fu (t1, x∗ (t1), x∗ (t1)) η (t1) .

However, for η(·) ∈ VL(t0, t1) there is no restriction on η (t1) sothat

fu (t1, x∗ (t1), x∗ (t1)) η (t1) = 0 (7.20)

must hold for arbitrary values of η (t1). Hence (7.20) holds for anyvalue of η (t1) which implies that

fu (t1, x∗ (t1), x∗ (t1)) = 0. (7.21)

Condition (7.21) is called the natural boundary condition for thefree endpoint problem. Thus, we have derived the (global) Eulernecessary condition for the Free Endpoint Problem. It is straight-forward to extend this proof to the case where we only assumethat x∗ (·) ∈ ΘL is a weak local minimizer in ΘL. In particular, wehave the following Euler Necessary Condition for the free endpointproblem.

Theorem 7.1 (Euler Necessary Condition for the FreeEndpoint Problem) If x∗ (·) ∈ ΘL is a weak local minimizerfor J(·) on ΘL, then(EF-1) there is a constant c such that for all t ∈ [t0, t1],

fu (t, x∗(t), x∗ (t)) = c+

t∫t0

fx (s, x∗(s), x∗ (s)) ds, (7.22)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(EF-2) x∗(t0) = x0,

(EF-3) fu (t1, x∗ (t1), x∗ (t1)) = 0.

(EF-4) Between corners the function fu (t, x∗(t), x∗ (t)) is differ-entiable and if t is not a corner of x∗(·)

d

dtfu (t, x∗(t), x∗ (t)) = fx (t, x∗(t), x∗ (t)) . (7.23)

As in the previous chapters, equation (7.22) is called Euler’sEquation in integral form, while equation (7.23) is called Euler’sDifferential Equation. Therefore, we have shown that a minimizerx∗ (·) of J(·) on ΘL must be an extremal. This implies that theErdmann corner condition

fu(t, x∗(t), x∗

(t+))

= fu(t, x∗(t), x∗(t−))

also holds, and if the problem is regular, then extremals can nothave corners. The Hilbert Differentiability Theorem also holds andapplying the chain rule to the left side of (7.23) yields

d

dt[fu (t, x∗(t), x∗ (t))] = [fut (t, x∗(t), x∗ (t))]

+ [fux (t, x∗(t), x∗ (t))] · x∗ (t)

+ [fuu (t, x∗(t), x∗ (t))] · x∗ (t) .

Hence, the Euler Equation (7.23) becomes the second order differ-ential equation

[fuu (t, x∗(t), x∗ (t))] · x∗ (t) = [fx (t, x∗(t), x∗ (t))]

− [fut(t, x∗(t), x∗ (t))] (7.24)

− [fux (t, x∗(t), x∗ (t))] · x∗ (t) .

7.2.2 Examples of Free Endpoint Problems

We shall go through a couple of simple examples to illustrate theapplication of the necessary condition.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Example 7.1 Find a PWS function x∗(·) satisfying x∗ (0) = 0and such that x∗(·) minimizes

J(x(·)) =

π/2∫0

1

2

([x (s)]2 − [x (s)]2

)ds.

We note that t0 = 0, t1 = π/2, x0 = 0, and x(π/2) is free. Theintegrand f(t, x, u) is given by

f(t, x, u) =1

2([u]2 − [x]2),

and hence

fx(t, x, u) = −xfu(t, x, u) = +u

fuu(t, x, u) = +1 > 0.

We see that f(t, x, u) is regular and hence the minimizer can nothave corners. Euler’s Equation

d

dt[fu (t, x∗(t), x∗ (t))] = [fx (t, x∗(t), x∗ (t))]

becomesd

dt[x∗ (t)] = [−x∗ (t)] ,

or equivalently,x∗ (t) + x∗ (t) = 0.

The general solution is

x∗ (t) = α cos(t) + β sin(t),

and applying the boundary condition at t = t0 = 0 yields

0 = x∗ (0) = α cos(0) + β sin(0) = α,

so thatx∗ (t) = β sin(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The natural boundary at t = t1 = π/2 becomes

fu (π/2, x∗ (π/2) , x∗ (π/2)) = x∗ (π/2) = 0.

However, x∗ (t) = β cos(t) so that

x∗ (π/2) = β cos(π/2) = 0.

Since cos(π/2) = 0, it follows that β can be any number and hence

x∗ (t) = β sin(t)

are possible minimizers. Observe that we do not know that x∗(·)minimizes J(·) on ΘL for any number β since we only checked thenecessary condition.

Example 7.2 Consider the functional

J(x(·)) =

1∫0

[x (s)]2ds

with x(0) = 0 and the endpoint x(1) free. The integrand isgiven by f(t, x, u) = u2, fu(t, x, u) = 2u, fuu(t, x, u) = 2, andfx(t, x, u) = 0. Since the problem is regular, all extremals are reg-ular and Euler’s Integral Equation is given by

2x∗(t) = fu(t, x∗(t), x∗(t)) = c+

t∫0


= c+

t∫0

0ds = c,

or equivalently,x∗(t) = k

for some constant k. Therefore,

x∗(t) = kt+ b

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and the condition x(0) = 0 implies

x∗(t) = kt.

The natural boundary condition at t1 = 1 takes the form

fu(1, x∗(1), x∗(1)) = 2x∗(1) = 2k = 0

which means thatx∗(t) = 0.

Hence, the only extremal satisfying the necessary condition The-orem 7.1 is x∗(t) = 0. Clearly, x∗(t) = 0 is a global minimizer.

7.3 The Simplest Point to Curve

Problem

We assume that the initial time t0 and initial value x0 are givenand that there is a given smooth function φ(·) defined on theinterval (t0,+∞). The problem is to minimize

J (x(·)) =

∫ t1

t0

f (s, x(s), x (s)) ds,

subject tox (t0) = x0

andx (t1) = φ (t1),

where t0 < t1 and t1 is a point where the graph of the functionx (·) intersects the graph of φ (·). This problem is illustrated inFigure 7.3.

Note that t1 is not fixed and different functions may inter-sect φ(·) at different “final times”. To formulate the optimizationproblem we first define the space of piecewise smooth functions on[t0,+∞) by PWS∞ = PWS(t0,+∞), where

PWS∞ = x(·) : [t0,+∞)→ R : x (·) ∈ PWS(t0, T ) for all T > t0.(7.25)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


t0

t *1t 1

1t

( )

21t

x0

2 ( )x

*( )x

1( )x

Figure 7.3: Point to Curve Problem

Note that PWS∞ = PWS(t0,+∞) is the set of all “locally” piece-wise smooth functions defined on the half line [t0,+∞). Assumethat t0, x0 and φ(·) are given as above and define the set of PWSfunctions (intersecting the graph of φ) Θφ by

Θφ = x(·) ∈ PWS∞ : x (t0) = x0, x (t1)

= φ(t1) for some t1 > t0. (7.26)

The simplest Point to Curve Problem is the problem ofminimizing J(·) on Θφ. In particular, the goal is to find x∗ (·) ∈ Θφ

and a t∗1 > t0 such that x∗ (t∗1) = φ(t∗1) and

J(x∗(·)) =

t∗1∫t0

f (s, x∗(s), x∗ (s)) ds ≤ J(x(·))

=

t1∫t0

f (s, x(s), x (s)) ds,

for all x (·) ∈ Θφ.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


We will derive a necessary condition for a global minimizer,but the result holds for a weak local minimizer and the followingderivation is easily extended to this case. We start with a few basiclemmas.

Lemma 7.2 If x∗ (·) ∈ Θφ is piecewise smooth and minimizesJ(·) on Θφ, with x∗(t∗1) = φ(t∗1), then

(EPC-1) there is a constant c such that for all t ∈ [t0, t∗1],

[fu (t, x∗(t), x∗ (t))] = c+

t∫t0

[fx (s, x∗(s), x∗ (s))] ds, (7.27)

(EPC-2) x∗(t0) = x0,

(EPC-3) x∗(t∗1) = φ(t∗1).

(EPC-4) Between corners of x∗ (·) the function ∂∂uf (t, x∗(t), x∗ (t))

is differentiable and

d


Proof. This lemma follows from the standard derivation for a fixedendpoint problem. In particular, for η(·) ∈ V ∗0 = PWS0(t0, t

∗1), the

variation x∗(·) + εη(·) belongs to Θφ and hence

δJ(x∗(·); η(·)) =

t∗1∫t0

[fx(s, x∗(s), x∗(s))η(s)

+ fu(s, x∗(s), x∗(s))η(s)]ds = 0. (7.29)

Since this holds for all η(·) ∈ V ∗0 = PWS0(t0, t∗1), the Fundamen-

tal Lemma of the Calculus of Variations implies that x∗(·) is anextremal and this completes the proof.

Note that the Euler equations (7.27) or (7.28) must besolved on the interval [t0, t

∗1]. However, we need an additional

piece of information to determine the extra unknown parame-ter t∗1. To obtain this condition, we must enlarge the space ofvariations.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


t0

t *1t

( ) x0

ˆ *( )x

*( )x

Figure 7.4: Extending the Optimal Curve

In order to derive the transversality condition we define anotherclass of variations that are admissible for this problem. If x∗ (·) ∈Θφ minimizes J(·) on Θφ, with x∗(t∗1) = φ(t∗1), then we need tomake sure that for all ε (or for all ε sufficiently small) the variationsx∗ (t) + εη(t) belong to Θφ. In particular, we need to find a set ofadmissible variations Vφ so that

x∗ (t) + εη(t) ∈ Θφ (7.30)

for all η(·) ∈ Vφ and all ε sufficiently small. Thus, we must defineVφ in such a way that x∗ (t) + εη(t) intersects φ(·) at some timetε1 > t0.

Given the minimizer x∗(·), define x∗(·) by

x∗(t) =

x∗(t), t0 ≤ t ≤ t∗1,x∗(t∗1)(t− t∗1) + x∗(t∗1), t∗1 ≤ t.

Also, given any η(·) ∈ V ∗L = η(·) ∈ PWS(t0, t∗1) : η (t0) = 0 de-

fine η(·) by

η(t) =

η(t), t0 ≤ t ≤ t∗1,η(t∗1)(t− t∗1) + η(t∗1), t∗1 ≤ t.

(7.31)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


t0

*1t

1( )

2( ) 2ˆ ( )

1( )

Figure 7.5: Extending η(·) to η(·)

Remark 7.1 The functions x∗(·) and η(·) are continuously differ-entiable in a neighborhood of t∗1. To see this, pick tL < t∗1 so thatneither x∗(·) nor η(·) has a corner tc with tL < tc. By constructionx∗(·) and η(·) will be continuously differentiable for all t > tL.

Now we define Vφ by

Vφ = η(·) ∈ PWS(t0,+∞) : η(·) ∈ V ∗L , (7.32)

where η(·) ∈ Vφ is defined by (7.31) above. What we want to showis that for sufficiently small ε, x∗ (·) + εη(·) intersects φ(·) at sometime tε1 > t0. Observe that all functions η(·) ∈ Vφ are linear ont∗1 ≤ t. Also, η(·) and x∗(·) are smooth on an interval about t∗1.Note that for any η(·) ∈ Vφ, the function

x∗ (t) + εη(t),

is defined for all t ≥ t0. Since at ε = 0

x∗ (t) + εη (t) = x∗(t),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


t 1*t

( ) x0

ˆ *( ) ˆ( )x

1 ˆ( )ˆ *( )x

1tt0

*( )x

Figure 7.6: The Variation x∗(·) + εη(·)

one might expect that for sufficiently small ε the curve x∗ (t) +εη (t) would intersect φ(·) at some time tε1 > t0 (see Figure 7.6).For one important case we shall show that this is true. In orderto show this we need the Implicit Function Theorem. Thefollowing theorem follows directly from Theorem 41.9 and Lemma41.10 in Bartle’s book [15] (see pages 382 - 391).

Theorem 7.2 (Implicit Function Theorem) Suppose H : R×R→ R satisfies (i.e. H (ε, γ)):

i) H (0, 0) = 0.

ii) H is of class C1 on a neighborhood of [0 0]T .

iii) ∂∂γH (0, 0) = Hγ (0, 0) 6= 0.

Then, there is a neighborhood of 0 (i.e. (−δ,+δ)) such that onecan solve for γ in terms of ε. In particular, there exists a functionγ(·) : (−δ,+δ)→ R such that for all −δ < ε < δ

H (ε, γ (ε)) = 0,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


andγ (0) = 0.

Moreover, the function γ(ε) is continuously differentiable withrespect to ε and

d

dεγ(ε) = γ (ε) = −Hε (ε, γ (ε))

Hγ (ε, γ (ε)).

We shall define a specific function H(ε, γ) and apply the Im-plicit Function Theorem. Again, we are trying to show that forsufficiently small ε the variation x∗ (t) + εη (t) will intersect φ(·)at some time tε1 > t0. If tε1 = t∗1 + γ(ε), then we are asking if foreach ε on an interval (−δ,+δ), is there a γ = γ(ε) such that

φ(t∗1 + γ)− x∗(t∗1 + γ)− εη(t∗1 + γ) = 0?

Thus, we are motivated to define H(ε, γ) by

H(ε, γ) = φ(t∗1 + γ)− x∗(t∗1 + γ)− εη(t∗1 + γ). (7.33)

Observe that H(ε, γ) defined above has the following proper-ties:

i) H (0, 0) = φ (t∗1)− x∗ (t∗1) = 0.

ii) Hε(ε, γ) = −η (t∗1 + γ) and Hγ(ε, γ) = ddtφ (t∗1 + γ) −

ddtx∗ (t∗1 + γ)−ε d

dtη (t∗1 + γ) are continuous and hence H(ε, γ)

is of class C1 on (−∞,∞)×(tL,+∞

).

iii) Hγ (0, 0) = ddtφ (t∗1)− d

dtx∗ (t∗1) = φ (t∗1)− x∗ (t∗1).

Therefore, if φ (t∗1) − x∗ (t∗1) 6= 0, then H(ε, γ) satisfies the as-sumption of the Implicit Function Theorem. In particular, ifφ (t∗1)−x∗ (t∗1) 6= 0, then there exists a function γ(·) : (−δ,+δ)→ Rsuch that for all −δ < ε < δ

0 = H(ε, γ(ε)) = φ(t∗1+γ(ε))−x∗(t∗1+γ(ε))−εη(t∗1+γ(ε)). (7.34)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Equation (7.34) implies that for each ε ∈ (−δ,+δ), the vari-ations x∗(·) + εη(·) intersects φ(·) at the time β(ε) = t∗1 + γ(ε).Moreover, β (ε) exists and equals

β (ε) = γ (ε)

= −

[−η (t∗1 + γ (ε))

ddtφ (t∗1 + γ (ε))− d

dtx∗ (t∗1 + γ (ε))− ε d

dtη (t∗1 + γ (ε))

].

In particular,

β (0) =η (t∗1)

ddtφ (t∗1)− d

dtx∗ (t∗1)

=η (t∗1)

φ (t∗1)− x∗ (t∗1)= γ (0) . (7.35)

We now have established the following result.

Theorem 7.3 If x∗(·) satisfies

J(x∗(·)) =

t∗1∫t0

f (s, x∗(s), x∗ (s)) ds ≤ J(x(·))

=

t1∫t0

f (s, x(s), x (s)) ds,

on Θφ and φ (t∗1) − x∗ (t∗1) 6= 0, then there exists a function γ(·) :(−δ,+δ)→ R such that for all −δ < ε < δ the variations x∗(·) +εη(·) ∈ Θφ and intersect φ(·) at the time β(ε) = t∗1 + γ(ε).

Define the function F : (−δ,+δ)→ R by

F (ε) = J (x∗(·) + εη(·))

=

β(ε)∫t0

f

(s, x∗ (s) + εη(s),

d

dsx∗ (s) + ε

d

dsη (s)

)ds

=

β(ε)∫t0

G (s, ε) ds,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where

G (t, ε) = f

(t, x∗(t) + εη(t),

d

dtx∗(t) + ε

d

dtη(t)

).

Therefore, for all −δ < ε < δ we have

F (0) =

β(0)∫t0

f (s, x∗(s), x∗ (s)) ds

≤β(ε)∫t0

f

(s, x∗ (s) + εη(s),

d

dsx∗ (s) + ε

d

dsη (s)

)ds = F (ε),

so that ε∗ = 0 minimizes F (ε) on the open interval (−δ,+δ).Consequently, it follows that d

dεF (0) = F ′(0) = 0. In order to

compute F ′(ε), we apply Leibniz’s Formula which yields

F ′ (ε) = G (β (ε) , ε) β (ε) +

β(ε)∫t0

Gε (s, ε) ds.

In particular, since β (ε) = t∗1 + γ (ε) and γ (0) = 0, it follows that

F ′(0) =G (β (0) , 0) β (0) +

β(0)∫t0

Gε (s, 0) ds

= f (t∗1, x∗ (t∗1), x∗ (t∗1)) β (0)

+

t∗1∫t0

[fx (s, x∗(s), x∗ (s)) η (s) + fu (s, x∗(s), x∗ (s)) η (s)]ds

= f (t∗1, x∗ (t∗1), x∗ (t∗1))

η (t∗1)

[φ (t∗1)− x∗ (t∗1)]

+

t∗1∫t0

(f ∗x(s)η(s) + f ∗u(s)η(s)) ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Hence, for all η(·) ∈ Vφ we have

0 =f (t∗1, x

∗ (t∗1), x∗ (t∗1)) η (t∗1)

[φ (t∗1)− x∗ (t∗1)]+

t∗1∫t0

(f ∗x(s)η(s) + f ∗u(s)η(s)) ds.

In particular, (7.3) is valid for all η(·) satisfying η (t∗1) = 0, so weobtain Euler’s equation

d

dtf ∗u(t) = f ∗x(t),

which must hold on [t0, t∗1].

Substituting ddtf ∗u(t) = f ∗x(t) into the above equation yields

0 =f (t∗1, x

∗ (t∗1), x∗ (t∗1)) η (t∗1)

[φ (t∗1)− x∗ (t∗1)]+

t∗1∫t0

(f ∗x(s)η(s) + f ∗u(s)η(s)) ds

=f ∗(t∗1)η (t∗1)

[φ (t∗1)− x∗ (t∗1)]+

t∗1∫t0

([d

dsf ∗u(s)]η(s) + f ∗u(s)η(s)

)ds

=f ∗(t∗1)η (t∗1)

[φ (t∗1)− x∗ (t∗1)]+

t∗1∫t0

(d

ds[f ∗u(s)η(s)]

)ds

=f ∗(t∗1)η (t∗1)

[φ (t∗1)− x∗ (t∗1)]+ f ∗u(t∗1)η (t∗1)− f ∗u(t0)η (t0) .

Using the fact that η (t0) = 0, it follows that

0 =f ∗(t∗1)η (t∗1)

[φ (t∗1)− x∗ (t∗1)]+ f ∗u(t∗1)η (t∗1),

or equivalently,

0 = f ∗(t∗1) + [φ (t∗1)− x∗ (t∗1)]f ∗u(t∗1)η (t∗1) .

Since η (t∗1) is arbitrary, we obtain the transversality condition

f ∗(t∗1) + f ∗u(t∗1)[φ (t∗1)− x∗ (t∗1)] = 0. (7.36)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Combining Lemma 7.2 with the transversality condition (7.36)we have proven the following theorem.

Theorem 7.4 (Euler Necessary Condition for the Point toCurve Problem) If x∗ (·) ∈ Θφ is piecewise smooth and mini-mizes J(·) on Θφ, with x∗(t∗1) = φ(t∗1), then

(EPC-1) there is a constant c such that for all t ∈ [t0, t∗1],

[fu (t, x∗(t), x∗ (x))] = c+

t∫t0

[fx (s, x∗(s), x∗ (s))] ds, (7.37)

(EPC-2) x∗(t0) = x0,

(EPC-3) x∗(t∗1) = φ(t∗1).

(EPC-3’) If φ (t∗1) 6= x∗ (t∗1), then

f ∗(t∗1) + f ∗u(t∗1)[φ (t∗1)− x∗ (t∗1)] = 0. (7.38)

(EPC-4) Between corners of x∗ (·) the function ∂∂uf (t, x∗(t), x∗ (x))

is differentiable and

d

dt[fu (t, x∗(t), x∗ (x))] = [fx (t, x∗(t), x∗ (x))] . (7.39)

These results can clearly be extended to two curves, curve tofree endpoint, etc. To motivate the term “transversality condi-tion”, consider the class of equations where f(t, x, u) has the form

f (t, x, u) = g (t, x)√

1 + u2,

(e.g. the brachistochrone problem, minimal surface area of revolu-tion problem, etc.). Here,

fu(t, x, u) = g (t, x)u√

1 + u2= [g (t, x)

√1 + u2][

u

1 + u2]

=u

1 + u2f(t, x, u).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Hence,

f ∗(t) + f ∗u(t)[φ(t)− x∗(t)]

= f ∗(t) + f ∗(t)

[x∗(t)

1 + [x∗(t)]2

][φ(t)− x∗(t)]

=f ∗(t)(1 + [x∗(t)]2 + [x∗(t)][φ(t)− x∗(t)])

[1 + [x∗(t)]2]

=f ∗(t)(1 + [x∗(t)]2 + x∗(t)φ(t)− [x∗(t)]2)

[1 + [x∗(t)]2]

=

[f ∗(t)

1 + [x∗(t)]2

][1 + x∗(t)φ(t)].

The transversality condition at t∗1 implies that

f ∗(t∗1)+f ∗u(t∗1)[φ (t∗1)−x∗ (t∗1)] =

[f ∗(t∗1)

1 + [x∗(t∗1)]2

][1+x∗(t∗1)φ(t∗1)] = 0

and hence if f ∗(t∗1) 6= 0, then it follows that

[1 + x∗(t∗1)φ(t∗1)] = 0.

In this case

x∗(t∗1) =−1

φ(t∗1)

which means that the slopes of the curves x∗(·) and φ(·) are per-pendicular at the intersecting value t = t∗1. That is, the optimaltrajectory x∗(·) must be orthogonal (transversal) to the curve φ(·)at the intersection point.

Example 7.3 Consider the problem of minimizing

J (x(·)) =

∫ t1

t0

√1 + [x (s)]2ds,

subject tox (t0) = ϕ (t0), x (t1) = ψ (t1),

where

ϕ (t) = − (t+ 1) , ψ (t) =

(t− 5

2

)2

.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Since the problem is regular, we need only consider the EulerDifferential Equation

d

dtfu(t, x

∗(t), x∗(t)) = fx(t, x∗(t), x∗(t))

which has the form

d

dt

(x∗(t)√

1 + [x∗(t)]2

)= 0,

or equivalently,x∗(t)√

1 + [x∗(t)]2= c.

Solving this equation we get

[x∗(t)]2 = c2(1 + [x∗(t)]2

),

which implies that

[x∗(t)]2 =c2

1− c2,

or equivalently, there is a constant a such that

x∗(t) = a

and the extremals are straight lines x(t) = at+ b.The transversality conditions become

a = x∗ (t∗0) =−1

ϕ (t∗0)=−1

−1= 1,

and hence it follows that a = 1. At the other end t1 = t∗1, thetransversality condition becomes

a = 1 = x∗ (t1) =−1

ψ (t1)=

−1

2(t1 − 5

2

)and solving for t1 it follows that

2

(t1 −

5

2

)= −1,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


so that

t∗1 = t1 =5

2− 1

2= 2.

Since a = 1,x∗ (t) = t+ b,

and from above t∗1 = 2 so that the relationship

x∗ (t∗1) = ψ (t∗1)

implies that

x∗ (t∗1) = 2 + b = t∗1 + b =

(t∗1 −

5

2

)2

=

(2− 5

2

)2

=

(−1

2

)2

=1

4= ψ (t∗1) .

Hence

b = −7

4

so that

x∗ (t) = t− 7

4,

and t∗1 = 2. To find t∗0 we know that

x∗(t∗0) = ϕ (t∗0)

which implies

t∗0 −7

4= − (t∗0 + 1) ,

and hence

t∗0 =3

8.

We have found that the optimal curve is given by x∗ (t) = t− 74

and it intersects ϕ (t) = −(t+ 1) at t∗0 = 38

and ψ (t) =(t− 5

2

)2at

t∗1 = 2.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


7.4 Vector Formulations and Higher

Order Problems

In this section we discuss extensions of the SPCV to vector systemsand higher order models. Consider the problem where we look fortwo (or more) functions x1(·), x2(·) to minimize a functional of theform

J (x1(·), x2(·)) =

t1∫t0

f (s, x1(s), x2(s), x1(s), x2 (s)) ds,

subject tox1 (t0) = x1,0 x1 (t1) = x1,1,

x2 (t0) = x2,0 x2 (t1) = x2,1.

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

()

()

x*()

t1*=t0

*=3/8

Figure 7.7: The Optimal Curve to Curve Solution

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Here we assume that f(t, x1, x2, u1, u2) is a C2 function and wewill apply the same ideas used in the SPCV. This will yield anEuler Necessary Condition where the Euler Differential Equationis a system of equations of the form

d

dtfu1(t, x1(t), x2(t), x1(t), x2(t)) = fx1(t, x1(t), x2(t), x1(t), x2(t)),

d

dtfu2(t, x1(t), x2(t), x1(t), x2(t)) = fx2(t, x1(t), x2(t), x1(t), x2(t)).

Higher order problems are a special case of the vector formu-lation. For example, consider the problem of minimizing

J(x(·)) =

t1∫t0

f(s, x(s), x(s), x(s))ds,

subject tox (t0) = x0, x (t1) = x1,

x (t0) = u0, x (t1) = u1,

where f(t, x, u, v) is a C2 function. Here we assume that x(·) issmooth and x(·) is piecewise smooth on [t0, t1]. In this case settingthe first variation to zero and using the higher order FundamentalLemma of the Calculus of Variations will lead to the higher orderEuler Differential Equation

− d2

dt2fv(t, x(t), x(t), x(t)) +

d

dtfu(t, x(t), x(t), x(t))

= fx(t, x(t), x(t), x(t)).

In order to formulate the vector problem we assume that t0and t1 are given and fixed and f(t, x1, x2, u1, u2) is a C2 function.Let PWS(t0, t1;R2) denote the space of all R2-valued piecewisesmooth functions defined on [t0, t1]. In particular,

PWS(t0, t1;R2) =x(·) =

[x1(·) x2(·)

]T: xi(·) ∈ PWS(t0, t1), i = 1, 2

. (7.40)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


For each x(·) =[x1(·) x2(·)

]T ∈ PWS(t0, t1;R2), define thefunctionalJ : PWS(t0, t1;R2)→ R by

J(x(·)) = J (x1(·), x2(·)) =

t1∫t0

f (s, x1(s), x2(s), x1(s), x2 (s)) ds.

(7.41)Assume that the points x1,0, x2,0, x1,1 and x2,1 are given. Define

the set of PWS functions Θ2 by

Θ2 =x(·) ∈ PWS(t0, t1;R2) : xi (tj) = xi, j, i = 1, 2, j = 0, 1

(7.42)

Observe that J : PWS(t0, t1;R2) → R is a real valued functiondefined on PWS(t0, t1;R2).

The Simplest Problem in Vector Form (the fixed endpointproblem) is the problem of minimizing J(·) on Θ2. In particular,the goal is to find

x∗ (·) = [x∗1(·) x∗2(·)]T ∈ Θ2

such that

J(x∗(·)) =

t1∫t0

f (s, x∗1(s), x∗2(s), x∗1(s), x∗2 (s)) ds

≤ J(x(·)) =

t1∫t0

f (s, x1(s), x2(s), x1(s), x2 (s)) ds,

for all x (·) ∈ Θ2.In order to formulate the higher order problem we assume that

t0 and t1 are given and fixed and f(t, x, u, v) is a C2 function. LetPWS2(t0, t1) denote the space of all real-valued piecewise smoothfunctions x(·) defined on [t0, t1] such that x(·) is smooth and x(·)is piecewise smooth on [t0, t1]. In particular,

PWS2(t0, t1) =x : [t0, t1] −→ R :x(·) ∈ C1([t0, t1]) and

x(·) ∈ PWS(t0, t1)

(7.43)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Note that if x(·) ∈ PWS2(t0, t1), then x(·) ∈ PWC(t0, t1).For each x(·) ∈ PWS2(t0, t1), define the functional J :PWS2(t0, t1)→ R by

J(x(·)) =

t1∫t0

f (s, x(s), x(s), x (s)) ds. (7.44)

Since x(·) ∈ PWC(t0, t1), the cost function defined by (7.44) iswell defined. Assume that the points x0, x1, u0 and u1 are givenand define the set of functions Θ2 by

Θ2 =x(·) ∈ PWS2(t0, t1) : x (ti) = xi, x (ti) = ui, i = 0, 1

.

(7.45)The Simplest Problem in Higher Order Form (the fixed

endpoint problem) is the problem of minimizing J(·) on Θ2. Inparticular, the goal is to find x∗ (·) ∈ Θ2 such that

J(x∗(·)) =

t1∫t0

f (s, x∗(s), x∗(s), x∗ (s)) ds

≤ J(x(·)) =

t1∫t0

f (s, x(s), x(s), x (s)) ds,

for all x (·) ∈ Θ2.

Remark 7.2 Observe that the Simplest Problem in Higher OrderForm is a special case of the Simplest Problem in Vector Form. Tomake this precise, we set

x1(·) = x(·) and x2(·) = x(·)

so that x2(·) = x(·) and the cost function (7.44) becomes

J(x1(·), x2(·)) =

t1∫t0

f (s, x1(s), x2(s), x1(s), x2 (s)) ds

= J(x(·)) =

t1∫t0

f (s, x(s), x(s), x (s)) ds,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where f(t, x1, x2, u1, u2) does not explicitly depend u1 and is definedby

f(t, x1, x2, u1, u2) = f(t, x1, x2, u2).

Before deriving necessary conditions, we present additional lem-mas that extend the Fundamental Lemma of the Calculus of Vari-ations.

7.4.1 Extensions of Some Basic Lemmas

In order to derive first order necessary conditions for vector andhigher order problems, we present a few fundamental lemmas thatprovide the backbone to much of variational theory. These resultsare natural extensions of the FLCV. In addition, we review somebasic results on differentiation.

The Fundamental Lemma of the Calculus of Variations 3.1involves only scalar functions and first order derivatives. We shallextend this lemma to higher order derivatives and to vector valuedfunctions. To set the stage for the lemmas, we need some additionalnotation. Let p ≥ 1 be a given integer and define the set

PWSp(t0, t1) =x(·) : x[k](·) ∈ PWS(t0, t1), k = 0, 1, . . . , p− 1

,

(7.46)where

x[k](·) , dkx(·)dtk

(7.47)

is the kth derivative of x(·). For notational purposes we definethe zero derivative as the function x[0](·) , x(·). Observe that ifx(·) ∈ PWSp(t0, t1), then the pth derivative of x(·) exists exceptat a finite number of points and x[p](·) ∈ PWC(t0, t1).

Likewise, we define V p0 =V p

0 (t0, t1) = PWSp0(t0, t1) to be the set

V p0 (t0, t1) =

η(·) ∈ PWSp(t0, t1) :η[k] (t0) = 0 = η[k] (t1) ,

k = 0, 1, . . . , p− 1. (7.48)

Note that V p0 (t0, t1) ⊆ PWSp(t0, t1) is the set of all functions

η(·) ∈ PWSp(t0, t1) with the property that η(·) and all its deriva-tives up to order p − 1 are zero at both ends of the inter-val [t0, t1]. Also, by definition PWS1(t0, t1) = PWS(t0, t1) and

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


V 10 (t0, t1) = V0(t0, t1). The following is the basic result needed for

all extensions of the FLCV.

Lemma 7.3 Let p ≥ 1 be any integer. If v (·) is piecewise contin-uous on [t0, t1] and

t1∫t0

v (s) η (s) ds = 0 (7.49)

for all η(·) ∈ V p0 (t0, t1), then v (·) ∈ PWS(t0, t1) and

v (t) = 0, (7.50)

except at a finite number of points. The converse is also true.

Proof: Since v (·) is piecewise continuous on [t0, t1], there exista finite set of points t0 < t1 < t2 < ... < tk < t1 in (t0, t1) suchthat v (·) is continuous on each subinterval (tj, tj−1). Assume thatv (·) is not zero on one of these intervals. In particular, assumethat there is a point z ∈ (tj, tj−1) so that v (z) > 0. Since v (·)is continuous on (tj, tj−1), there is a δ > 0 and a < b such that(z−δ, z+δ) ⊂ [a, b] ⊂ (tj, tj−1) and v (t) > 0 for all t ∈ (z−δ, z+δ).Define the function

η(t) =

[(t− a)(b− t)]p ,

0,t ∈ [a, b]t /∈ [a, b]

and note that η(t) has the following properties:

(i) η(·) ∈ V p0 (t0, t1),

(ii) η(t) > 0 for all t ∈ (z − δ, z + δ),

(iii) v (t) η (t) > 0 for all t ∈ (z − δ, z + δ).

Consequently, it follows that

t1∫t0

v (s) η (s) ds ≥z+δ∫z−δ

v (s) η (s) ds > 0

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


which contradicts the assumption (7.49). Hence, v (t) = 0 on eachof the subintervals (tj, tj−1) and this completes the proof.

We now state a higher order form on the Fundamental Lemmaof the Calculus of Variations.

Lemma 7.4 (High Order Form of the FLCV) Let p ≥ 1 beany integer. If v (·) is piecewise continuous on [t0, t1] and

t1∫t0

v (s) η[p] (s) ds = 0 (7.51)

for all η(·) ∈ V p0 (t0, t1), then there exist p constants a0, a1, a2,

. . . ,ap−1 such that

v (t) = ap−1tp−1 + ap−2t

p−2 + · · ·+ a2t2 + a1t+ a0

except at a finite number of points. In particular, v (·) ∈PWSp(t0, t1) and at all points t where v[p−1](·) is continuous

v[p−1] (t) = (p− 1)!ap−1.


Observe that this is a powerful result that implies the functionv (·) ∈ PWSp(t0, t1) is equal e.f. to a polynomial of degree p− 1.The proof of this result is nontrivial and will not be given here(see pages 112 to 117 in Reid’s book [153] and Lemma 13.1 (page105) in Hestenes’ book [101]). However, if one assumes that v (·) ∈PWSp(t0, t1), the proof follows from an easy integration by partsand applying Lemma 7.3 above.

Example 7.4 Consider the case where p = 2. Lemma 7.4 im-plies that if v(·) ∈ PWC(t0, t1) and∫ t1

t0

v(s)η(s)ds = 0

for all η(·) ∈ V 20 (t0, t1) where

V 20 (t0, t1) = η(·) ∈ PWC(t0, t1) :η(·), η(·) ∈ PWS(t0, t1),

η(t0) = η(t0) = η(t1) = η(t1) = 0,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


then there are constants a1 and a2 such that

v(t) = a1t+ a0

except at a finite number of points on [t0, t1]. Thus, v(·) is equal toa linear function e.f. If one assumes that there are two piecewisecontinuous functions α(·) and β(·) such that∫ t1

t0

α(s)η(s) + β(s)η(s) ds = 0

for all η(·) ∈ V 20 (t0, t1), then setting γ(t) =

∫ t0

∫ s0β(µ)dµ and in-

tegrating by parts twice one has

0 =

∫ t1

t0

α(s)η(s) + β(s)η(s) ds =

∫ t1

t0

α(s)η(s) + γ(s)η(s) ds

=

∫ t1

t0

α(s) + γ(s) η(s)ds

for all η(·) ∈ V 20 (t0, t1). Consequently, it follows that

α(t) + γ(t) = α(t) +

∫ t

0

∫ s

0

β(µ)dµ = a1t+ a0,

or equivalently,

α(t) = a1t+ a0 −∫ t

0

∫ s

0

β(µ)dµ.

Thus, α(·) is twice differentiable (except at a finite number ofpoints) and

α(t) = −β(t), e.f.

Let PWC(t0, t1;Rn) denote the space of all Rn-valued piece-wise smooth functions defined on [t0, t1]. In particular,

PWC(t0, t1;Rn) =

x(·) =

[x1(·) x2(·) · · · xn(·)

]T:

and xi(·) ∈ PWC(t0, t1), i = 1, 2, . . . , n

.

(7.52)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Likewise, let PWS(t0, t1;Rn) denote the space of all Rn-valuedpiecewise smooth functions defined on [t0, t1]. In particular,

PWS(t0, t1;Rn) =

x(·) =

[x1(·) x2(·) · · · xn(·)

]T:

and xi(·) ∈ PWS(t0, t1), i = 1, 2, . . . , n

.

(7.53)Also, PWS0(t0, t1;Rn) will denote the space of all Rn-valued piece-wise smooth functions defined on [t0, t1] satisfying zero boundaryconditions

x(t0) = 0 = x(t1)

and we define V0(t0, t1;Rn) to be

V0(t0, t1;Rn)

=

η(·) =

[η1(·) η2(·) · · · ηn(·)

]T ∈ PWS(t0, t1) :and ηi(tj) = 0, i = 1, 2, . . . , n, j = 0, 1

.

(7.54)

Note that V0(t0, t1;Rn) = PWS0(t0, t1;Rn) which leads to follow-ing extension of the FLCV Part A to the vector case.

Lemma 7.5 Let p ≥ 1 be any integer. If v (·) = [v1(·) v2(·) · · ·vn(·)]T ∈ PWC(t0, t1;Rn) is piecewise continuous on [t0, t1] and

t1∫t0

< v (s) ,η (s) > ds = 0 (7.55)

for all η(·) ∈ V0(t0, t1;Rn), then v (·) ∈ PWS(t0, t1;Rn) and

v (t) = 0, (7.56)


Proof: Since v (·) is piecewise continuous on [t0, t1], there ex-ists a finite set of points t0 < t1 < t2 < ... < tk < t1, in (t0, t1) suchthat v (·) is continuous on each subinterval (tj, tj−1). Assume thatv (·) is not zero on one of these intervals. In particular, assumethat there is a point z ∈ (tj, tj−1) and m, with 1 ≤ m ≤ n so

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


that vm (z) > 0. Since vm (·) is continuous on (tj, tj−1), there is anδ > 0 and a < b such that (z − δ, z + δ) ⊂ [a, b] ⊂ (tj, tj−1) andvm (t) > 0 for all t ∈ (z − δ, z + δ). Define the function

ηm(t) =

[(t− a)(b− t)]p ,

0,t ∈ [a, b]t /∈ [a, b]

and let η(·) note the function η (·) =[η1(·) η2(·) · · · ηn(·)

]T ∈V0(t0, t1;Rn) defined by

ηj(t) =

ηm(t),

0,j = mj 6= m

.

Observe that η(·) has the following properties:

(i) η(·) ∈ V0(t0, t1;Rn),

(ii) ηm(t) > 0 for all t ∈ (z − δ, z + δ),

(iii) vm (t) ηm (t) > 0 for all t ∈ (z − δ, z + δ).


t1∫t0

< v(s),η (s) > ds =

t1∫t0

vm (s) ηm (s) ds

≥z+δ∫z−δ

vm (s) ηm (s) ds > 0,

which contradicts the assumption (7.55). Hence, v (t) = 0 oneach of the subintervals (tj, tj−1) and this completes the proof.

Lemma 7.6 (FLCV: Part A in Vector Form)

Let p ≥ 1 be any integer. If v (·) =[v1(·) v2(·) · · · vn(·)

]T ∈PWC(t0, t1;Rn) is piecewise continuous on [t0, t1] and

t1∫t0

< v(s), η (s) > ds = 0 (7.57)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all η(·) ∈ V0(t0, t1;Rn), then v (·) ∈ PWS(t0, t1;Rn) and thereis a constant vector c such that

v (t) = c, (7.58)


Proof: Assume that v (·) is piecewise continuous on [t0, t1]. Letη(·) be any piecewise smooth function in PWS0(t0, t1) and definethe function η(·) ∈ V0(t0, t1;Rn) by

ηj(t) =

η(t),

0,j = mj 6= m

.

Observe that that η(·) ∈ V0(t0, t1;Rn) and hence

t1∫t0

vm (s) η(t)ds =

t1∫t0

< v(s), η (s) > ds = 0.

Since η(·) ∈ PWS0(t0, t1) is arbitrary, it follows from the Fun-damental Lemma of the Calculus of Variations that there is aconstant cm such that vm (t) = cm except at a finite numberof points. This can be repeated for each m = 1, 2, · · · , n and if

c =[c1 c2 · · · cn

]T, then v (t) = c except at a finite number

of points and this completes the proof. The following lemma follows immediately from the previous

result.

Lemma 7.7 (FLCV: Part B in Vector Form)

Let p ≥ 1 be any integer. If α (·) =[α1(·) α2(·) · · · αn(·)

]Tandβ (·) =

[β1(·) β2(·) · · · βn(·)

]T ∈ PWC(t0, t1;Rn) are piece-wise continuous on [t0, t1] and

t1∫t0

〈α(s),η (s)〉+ 〈β(s), η (s)〉 ds = 0 (7.59)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all η(·) ∈ V0(t0, t1;Rn), then β (·) ∈ PWS(t0, t1;Rn) and thereis a constant vector c such that

β (t) = c+

∫ t

t0

α (s) ds, (7.60)


We can now formulate the general vector and higher order formsof the Simplest Problem in the Calculus of Variations.

7.4.2 The Simplest Problem in Vector Form

Let t0 and t1be given and suppose the function f : [t0, t1]× Rn ×Rn −→ R is C2. In particular, f(t,x,u) has the form

f(t,x,u) = f(t, x1, x2, . . . , xn, u1, u2, . . . , un),

where all the partial derivatives of order 2 exist and are continuous.Here,

∂f(t,x,u)

∂t= f t(t,x,u) = f t(t, x1, x2, . . . , xn, u1, u2, . . . , un),

∂f(t,x,u)

∂xi= fxi(t,x,u) = fxi(t, x1, x2, . . . , xn, u1, u2, . . . , un)

and

∂f(t,x,u)

∂ui= fui(t,x,u) = fui(t, x1, x2, . . . , xn, u1, u2, . . . , un).

We use standard notation for the gradients

∇xf(t,x,u) ,

fx1(t,x,u)fx2(t,x,u)

...fxn(t,x,u)

and

∇uf(t,x,u) ,

fu1(t,x,u)fu2(t,x,u)

...fun(t,x,u)

.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


For each x(·) =[x1(·) x2(·) · · · xn(·)

]T ∈ PWS(t0, t1;Rn),define the functional J : PWS(t0, t1;Rn)→ R by

J(x(·)) =

t1∫t0

f(s,x (s) , x (s))ds. (7.61)

Assume that the vectors x0 and x1 are given. Define the set ofpiecewise smooth vector functions Θn by

Θn = x(·) ∈ PWS(t0, t1;Rn) : x (t0) = x0, x (t1) = x1.(7.62)

The Simplest Problem in Vector Form (the fixed endpointproblem) is the problem of minimizing J(·) on Θn. In particular,

the goal is to find x∗(·) =[x∗1(·) x∗2(·) · · · x∗n(·)

]T ∈ Θn suchthat

J(x∗(·)) =

t1∫t0

f (s,x∗(s), x∗ (s)) ds

≤ J(x(·)) =

t1∫t0

f (s,x(s), x (s)) ds

for all x (·) ∈ Θn.

Theorem 7.5 (Vector Form of the Euler Necessary Con-dition) If x∗ (·) ∈ Θn minimizes J(·) on Θn, then

(1) there is a vector c such that

[∇uf (t,x∗ (t) , x∗ (t))] = c+

t∫t0

[∇xf (s,x∗(s), x∗ (s))] ds, (7.63)

except at a finite number of points,

(2) x∗(t0) = x0,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(3) x∗(t1) = x1.

(4) Between corners of x∗(·) the function ∇uf (t,x∗(t), x∗ (t)) isdifferentiable and

d

dt[∇uf (t, x∗(t), x∗ (t))] = [∇xf (t,x∗(t), x∗ (t))] . (7.64)

Proof: Suppose that x∗ (·) ∈ Θn minimizes J(·) on Θn andη(·) ∈ PWS(t0,t1;Rn). Define the function

g (t, ε) = f (t,x∗ (t) + εη(t), x∗ (t) + ε η (t)) .

Since x∗ (t) and η (t) are continuous on a finite partition t0 = t1 <t2 < . . . < tk = t1 of [t0, t1], it follows that g (t, ε) and gε (t, ε)are both continuous on each subinterval (ti, ti+1). Without lossof generality we may assume that x∗(·) and η(·) are smooth. Itfollows that

F (ε) = J(x∗(·) + εη(·))

=

t1∫t0

f(s,x∗ (s) + εη(s), x∗ (s) + εη(s))ds =

t1∫t0

g(s, ε)ds

and the goal is to differentiate F (ε) at ε = 0. Applying the chainrule we have

gε (t, ε) =d

dε[f (t,x∗ (t) + εη(t), x∗ (s) + εη(t))]

=n∑i=1

fxi (t,x∗ (t) + εη(t), x∗ (t) + εη(t)) ηi (t)

+ fui (t,x∗ (t) + εη(t), x∗ (t) + εη(t)) ηi (t) .

Applying Leibniz’s Lemma 3.2 yields

d

dεF (ε)

∣∣∣∣ε=0

=d

dε[J(x∗ (·) + εη(·))]

∣∣∣∣ε=0

=

t1∫t0

gε(s, 0)ds. (7.65)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Again, note that we really only need to compute gε(t, ε) on eachsubinterval (ti, ti+1) where both x∗(·) and η(·) are continuous andthe chain rule produces an explicit formula for the first variationδJ(x∗(·); η(·)) of J(·) at x∗(·) in the direction of η(·). In particular,

δJ(x∗(·);η(·)) =

t1∫t0

n∑i=1

fxi(s,x∗ (s) + εη(s), x∗ (s)

+ εη(s))ηi (s) ds

+

t1∫t0

n∑i=1

fui(s,x∗ (s) + εη(s), x∗ (s)

+ εη(s))ηi (s) ds.


δJ(x∗(·);η(·)) =

t1∫t0

〈∇xf (s,x∗(s), x∗ (s)) ,η(s)〉

+ 〈∇uf (s,x∗(s), x∗ (s)) , η(s)〉ds.

In addition, if η(·) ∈ PWS0(t0,t1;Rn), then

t1∫t0

〈∇xf (s,x∗(s), x∗ (s)) ,η(s)〉

+ 〈∇uf (s,x∗(s), x∗ (s)) , η(s)〉ds = 0

and the theorem follows from the Fundamental Lemma Part B inVector Form given in Lemma 7.7 above.

7.4.3 The Simplest Problem in Higher OrderForm

Assume p ≥ 1 is a given integer and the endpoints t0 and t1 arefixed. Let f : [t0, t1]× R1+p −→ R be a C2 real valued function ofthe form

f(t, x, u1, u2, . . . , up),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where all the partial derivatives of order 2 exist and are continuous.As above, we let PWSp(t0, t1) denote the space of all real-valuedpiecewise smooth functions x(·) defined on [t0, t1] such that x[k](·)is piecewise smooth on [t0, t1] for all k = 0, 1, . . . , p− 1. In partic-ular,

PWSp(t0, t1) =x(·) : x[k](·) ∈ PWS(t0, t1), k = 0, 1, . . . , p− 1

.

(7.66)Observe that if x(·) ∈ PWSp(t0, t1), then x[p](·) ∈ PWC(t0, t1).For each x(·) ∈ PWSp(t0, t1), define the functional J :PWSp(t0, t1)→ R by

J(x(·)) =

t1∫t0

f(s, x(s), x(s), x(s), . . . , x[p](s)

)ds. (7.67)

Since x[p](·) ∈ PWC(t0, t1), the cost function defined by (7.67)is well defined. Assume that the points xk,0 and xk,1 for k =0, 1, 2, . . . , p− 1 are given and define the set of functions Θp by

Θp = x(·) ∈ PWSp(t0, t1) : x[k] (ti) = xk,i, k = 0, 1, 2, . . . , p− 1,

i = 0, 1. (7.68)

The Simplest Problem in Higher Order Form (the fixedendpoint problem) is the problem of minimizing J(·) on Θp. Inparticular, the goal is to find x∗ (·) ∈ Θp such that

J(x∗(·)) =

t1∫t0

f(s, x∗(s), x∗(s), x∗(s), . . . , (x∗)[p](s)

)ds

≤ J(x(·)) =

t1∫t0

f(s, x(s), x(s), x(s), . . . , x[p](s)

)ds,

for all x (·) ∈ Θp.

Remark 7.3 Note that the Simplest Problem in Higher OrderForm is a special case of the Simplest Problem in Vector Form.To make this precise, we set

x1(·) = x(·), x2(·) = x(·), x3(·) = x(·), . . . , xp(·) = x[p−1](·)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


so that xp(·) = x[p](·). Define f : [t0, t1]× Rp × Rp −→ R by

f(t,x,u)) = f(t, x1, x2, . . . , xp, u1, u2, . . . , up) , f(t, x, u1, u2, . . . , up)

and observe that for

x(·) =[x(·) x(·) · · · x[p−1](·)

]T=[x1(·) x2(·) · · · xp(·)

]T,

f(t,x(t), x(t)) has the form

f(t,x(t), x(t)) = f(t, x1(t), x2(t), . . . , xp(t), x1(t), x2(t), . . . , xp(t))

= f(t, x(t), x(t), x(t), . . . , x[p](t)

).

Consequently, the cost function (7.67) becomes

J(x(t)) =

t1∫t0

f(s,x(s), x(s))ds

=

t1∫t0

f(s, x(s), x(s), x(t), . . . , x[p](s)

)ds = J(x(·)),

wheref (t, x1, x2, . . . , xp, u1, u2, . . . , up)

does not explicitly depend uk for k = 1, 2, . . . , p− 1.

In order to simplify expressions, recall that we use f ∗(t) todenote the evaluation of f(·) along the optimal curve so that

f ∗(t) = f(t, x∗(t), x∗(t), x∗(t), . . . , (x∗)[p](t)

),

f ∗x(t) = fx(t, x∗(t), x∗(t), x∗(t), . . . , (x∗)[p](t)

),

f ∗ui(t) = fui(t, x∗(t), x∗(t), x∗(t), . . . , (x∗)[p](t)

), i = 1, 2, ...p.

Also, we setf ∗u0(t) = f ∗x(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Theorem 7.6 (High Order Form of the Euler NecessaryCondition) If x∗ (·) ∈ Θp minimizes J(·) on Θp, then

(1) there are constants ci, i = 1, 2, . . . , p − 1 such that for allt ∈ [t0, t1],

f ∗up (t) =

p−1∑i=0

ci(i− t)p−1−i

(p− 1− i)!(7.69)

+

t∫t0

p−1∑i=0

[(s− t)p−1−i

(p− 1− i)!f ∗ui (s)

]ds

except at a finite number of points,

(2) x[k] (t0) = xk,0, k = 0, 1, 2, . . . , p− 1,

(3) x[k] (t1) = xk,1, k = 0, 1, 2, . . . , p− 1.

(4) Between corners of (x∗)[p−1](·), x∗ (·) satisfies

(−1)pdp

dtp

[f ∗up (t)

]+

p−1∑i=1

(−1)p−idp−i

dtp−i

[f ∗up−i

(t)]

+ f ∗x (t) = 0,

(7.70)where

dk

dk[f ∗uk (t)

]=dk

dk[f ∗uk

(t, x∗(t), x∗(t), x∗(t), . . . , (x∗)[p](t)

)].

7.5 Problems with Constraints: Isoperi-

metric Problem

Here we impose a “functional” constraint on the curve x(·). Forexample, consider the problem of finding x∗(·) ∈ PWS(t0, t1) suchthat x∗(·) minimizes

J (x(·)) =

∫ t1

t0

f (s, x(s), x (s)) ds,

subject constraint

x (t0) = x0, x (t1) = x1,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and

G (x(·)) =

∫ t1

t0

g (s, x(s), x (s)) ds = 0. (7.71)

Example 7.5 Find the curve of length 5 that passes through thepoints [−1 0]T and [1 0]T and minimizes

J (x(·)) =

∫ 1

−1

−x (s) ds

subject to to the length constraint∫ 1

−1

√1 + [x (s)]2ds = 5.

Observe that

G (x(·)) =

∫ 1

−1

g (s, x(s), x (s)) ds,

whereg(t, x, u) =

√1 + [u]2 − 5/2.

In order to obtain a first order necessary condition for thisconstrained problem, we recall the definition of an extremal. Anextremal for the functional

J (x(·)) =

∫ t1

t0

f (s, x(s), x (s)) ds

is any function x(·) ∈ PWS(t0, t1) that satisfies the integral formof the Euler equation

fu (t, x(t), x (t)) = c+

t∫t0

fx (s, x(s), x (s)) ds

for some c. Note that because of the Fundamental Lemma of theCalculus of Variations, x(·) ∈ PWS(t0, t1) is an extremal for J(·)if and only if

δJ(x(·); η(·)) = 0 (7.72)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all η(·) ∈ PWS0(t0, t1). Again, δJ(x(·); η(·)) is the first varia-tion of J(·) at x(·) in the direction of η(·) and is given by

δJ(x(·); η(·)) =

t1∫t0

fx (s, x(s), x (s)) η (s)

+ fu (s, x(s), x (s)) η (s)ds. (7.73)

Likewise, we say that x(·) ∈ PWS(t0, t1) is an extremal for thefunctional

G (x(·)) =

∫ t1

t0

g (s, x(s), x (s)) ds

if x(·) ∈ PWS(t0, t1) satisfies the Euler Integral Equation

gu (t, x(t), x (t)) = c+

t∫t0

gx (s, x(s), x (s)) ds

for some c. Thus, the Fundamental Lemma of the Calculus ofVariations implies that x(·) ∈ PWS(t0, t1) is an extremal for G(·)if and only if

δG(x(·); η(·)) = 0 (7.74)

for all η(·) ∈ PWS0(t0, t1), where δG(x(·); η(·)) is the first variationof G(·) at x(·) in the direction of η(·) and is given by

δG(x(·); η(·)) =

t1∫t0

gx (s, x(s), x (s)) η (s)

+ gu (s, x(s), x (s)) η (s)ds. (7.75)

We now can state a necessary condition which is an infinite di-mensional form of the Lagrange Multiplier Theorem. We outlinea proof in Section 7.5.1 below.

Theorem 7.7 (Multiplier Theorem for the IsoperimetricProblem) If x∗(·) ∈ PWS(t0, t1) is smooth and minimizes

J (x(·)) =

∫ t1

t0

f (s, x(s), x(s)) ds,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



and

G (x(·)) =

∫ t1

t0

g (s, x(s), x(s)) = 0,

then there exist constants λ0 and λ1 such that

(i) |λ0|+ |λ1| 6= 0 and

(ii) x∗(·) is an extremal of

λ0J (x(·)) + λ1G (x(·)) . (7.76)

(iii) If in addition x∗(·) is not an extremal of G(·), then the con-

stant λ0 is not zero.

To make the notation more compact we define the Lagrangianby

L(λ0, λ1, x(·)) = λ0J (x(·)) + λ1G (x(·)) . (7.77)

If we define l(λ0, λ1, t, x, u) by

l(λ0, λ1, t, x, u) = λ0f(t, x, u) + λ1g(t, x, u),

then

L(λ0, λ1, x(·)) =

∫ t1

t0

l (λ0, λ1, s, x(s), x(s)) ds. (7.78)

Remark 7.4 The statement that x∗(·) is an extremal ofL(λ0, λ1, x(·)) = λ0J (x(·)) + λ1G (x(·)) implies that x∗(·) is a so-lution of the corresponding Euler equation

[λ0fu(t, x(t), x(t)) +λ1gu(t, x(t), x(t))]= c+

t∫t0

[λ0fx(s, x(s), x(s))]ds

+

t∫t0

[λ1gx(s, x(s), x(s))]ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for some c. Also, if x∗(·) is smooth then x∗(·) satisfies the differ-ential equation

d

dt[λ0fu(t, x(t), x(t)) + λ1gu(t, x(t), x(t))]

= [λ0fx(t, x(t), x(t)) + λ1gx(t, x(t), x(t))] .

In the case (iii) above, when, x∗(·) is not an extremal of G(·), thenthe minimizer x∗(·) is called a normal minimizer. Thus, if x∗(·)is a normal minimizer, then λ0 6= 0. This definition of normality ofa minimizing arc was first given by Bliss in [28] where he noted that(global) normality implied that the multiplier λ0 was not zero. Aswe shall see later, this idea is true for general Lagrange multipliertheorems.

7.5.1 Proof of the Lagrange Multiplier Theo-rem

In this section we prove Theorem 7.7. The proof is essentially thesame as the proof for the necessary condition for the 2D LagrangeMultiplier Theorem in Section 2.2.3.

Proof of Theorem 7.7: Assume x∗(·) ∈ C1(t0, t1) minimizes

J (x(·)) =

∫ t1

t0

f (s, x(s), x(s)) ds,


and

G (x(·)) =

∫ t1

t0

g (s, x(s), x(s)) = 0.

If x∗(·) is an extremal of G(·), then

δG(x∗(·); η(·)) = 0

for all η(·) ∈ PWS0(t0, t1). In this case set λ0 = 0 and λ1 = 1. It

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


follows that |λ0|+ |λ1| = 1 6= 0 and

δ [λ0J(x∗(·); η(·))] + δ [λ1G(x∗(·); η(·))]= λ0δJ(x∗(·); η(·)) + λ1δG(x∗(·); η(·))= 0δJ(x∗(·); η(·)) + δG(x∗(·); η(·))= δG(x∗(·); η(·)) = 0,

for all η(·) ∈ PWS0(t0, t1). Hence x∗(·) is an extremal of

λ0J (x(·)) + λ1G (x(·))

and the theorem is clearly true.Now consider the case where x∗(·) is not an extremal of

G(·). Since x∗(·) is not extremal of G(·), there exists a η(·) ∈PWS0(t0, t1) such that δG(x∗(·); η(·)) 6= 0. Clearly, η(·) is not thezero function since δG(x(·); 0(·)) = 0. Let

λ0 = δG(x∗(·); η(·)) 6= 0

andλ1 = −δJ(x∗(·); η(·)).

We now show that x∗(·) is an extremal of

λ0J(x(·))+λ1G(x(·)) = [δG(x∗(·); η(·))]J(x(·))+[δJ(x∗(·); η(·))]G(x(·)).

Observe that

δ[λ0J(x∗(·); η(·)) + λ1G(x∗(·); η(·))]= λ0[δJ(x∗(·); η(·))] + λ1[δG(x∗(·); η(·))]= [δG(x∗(·); η(·))][δJ(x∗(·); η(·))]− [δJ(x∗(·); η(·))][δG(x∗(·); η(·))],

or equivalently,

δ[λ0J(x∗(·); η(·)) + λ1G(x∗(·); η(·))]

= det

[[δJ(x∗(·); η(·))] [δJ(x∗(·); η(·))][δG(x∗(·); η(·))] [δG(x∗(·); η(·))]

].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Therefore, to establish that x∗(·) is an extremal of

λ0J(x(·)) + λ1G(x(·)) = [δG(x∗(·); η(·))]J(x(·))+ [δJ(x∗(·); η(·))]G(x(·))

we must show that

det


]= 0 (7.79)

for all η(·) ∈ PWS0(t0, t1). This is accomplished by applying theInverse Mapping Theorem 2.4.

Define T : R2 → R2 by

T (α, β) = [p(α, β) q(α, β)]T , (7.80)

wherep(α, β) = J(x∗(·) + αη(·) + βη(·)) (7.81)

andq(α, β) = G(x∗(·) + αη(·) + βη(·)), (7.82)

respectively.Note that T (α, β) maps the open set R2 to R2 and T (0, 0) =

[p(0, 0) q(0, 0)]T = [J(x∗(·)) 0]T = [p q]T . Also, the Jacobian ofT (α, β) at [α β]T = [0 0]T is given by[

∂p(0,0)∂α

∂p(0,0)∂β

∂q(0,0)∂α

∂q(0,0)∂β

]=

[[δJ(x∗(·); η(·))] [δJ(x∗(·); η(·))][δG(x∗(·); η(·))] [δG(x∗(·); η(·))],

].

Assume that (7.79) is not true. This assumption implies that theJacobian of T (α, β) is non-singular at [α β]T = [0 0]T so we mayapply the Inverse Mapping Theorem 2.4. Here, T (α, β) as definedby (7.80), (7.81) and (7.82) with [α β]T = [0 0]T and

T (α, β) = T (0, 0) = [J(x∗(·)) G(x∗(·))]T = [p 0]T .

In particular, there is a neighborhood U =

[α β]T :√α2 + β2 < γ

of [0 0]T and a neighborhood V of [J(x∗(·)) 0]T such that the re-striction of T (α, β) to U , T (α, β) : U → V , has a continuousinverse T −1(p, q) : V → U belonging to C1.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Let [p 0]T ∈ V be any point with p < J(x∗(·)) and let [α β]T =T −1(p, 0) ∈ U . Observe that for [α β]T ,

J(x∗(·) + αη(·) + βη(·)) = p(α, β) = p < J(x∗(·)) (7.83)

andG(x∗(·) + αη(·) + βη(·)) = 0. (7.84)

Therefore, by construction the function

x(·) = x∗(·) + αη(·) + βη(·)

satisfies all the constraints of the Isoperimetric Problem and

J(x(·)) < J(x∗(·))

which contradicts the assumption that x∗(·) ∈ C1(t0, t1) minimizes

J (x(·)) =

∫ t1

t0

f (s, x(s), x(s)) ds,


and

G (x(·)) =

∫ t1

t0

g (s, x(s), x(s)) = 0.

Therefore the assumption that

det

[∂p(0,0)∂α

∂p(0,0)∂β

∂q(0,0)∂α

∂q(0,0)∂β

]= det


]6= 0

must be false. Hence, x∗(·) is an extremal of

λ0J(x(·))+λ1G(x(·)) = [δG(x∗(·); η(·))]J(x(·))+[δJ(x∗(·); η(·))]G(x(·))

which completes the proof.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


7.6 Problems with Constraints: Finite

Constraints

Here we consider a non-integral constraint. Assume that the func-tion g(t, x) is C2 and consider the problem of minimizing

J (x(·)) =

∫ t1

t0

f (s, x(s), x(s)) ds,


and the “finite constraint”

g (t, x (t)) = 0. (7.85)

Observe that equation (7.85) implies that the trajectory x(·) mustlie on the curve

Mg =

[t x]T : g(t, x) = 0. (7.86)

Theorem 7.8 (Multiplier Theorem for the Finite Con-straint Problem) Ifx∗(·) ∈ PWS(t0, t1) is smooth and minimizes

J (x(·)) =

∫ t1

t0

f (s, x(s), x(s)) ds,


and the finite constraint

g (t, x (t)) = 0,

then there exists a constant λ0 and a function λ1(·) such that

(i) |λ0|+ |λ1(t)| 6= 0 for all t0 ≤ t ≤ t1 and

(ii) x∗(·) is an extremal of

L(λ0, λ1(·), x(·)) =

∫ t1

t0

(λ0f (s, x(s), x(s)) + λ1(s)g (s, x(s)))ds.

(7.87)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


(iii) If in addition, gx (t, x) 6= 0 for all [t x]T on the surfaceMg =

[t x]T : g(t, x) = 0

, then λ0 6= 0. Thus, there exists a

function λ(t) = λ1(t)/λ0 such that x∗(·) is an extremal of thefunctional ∫ t1

t0

f (s, x(s), x (s)) + λ (s) g (s, x (s))ds.

LetF (t, x, u) = λ0f (t, x, u) + λ1(t)g (t, x)

so thatF x(t, x, u) = λ0fx (t, x, u) + λ1(t)gx (t, x)

andF u(t, x, u) = λ0fu (t, x, u) .

The statement that x∗(·) is an extremal of

L(λ0, λ1(·), x(·)) =

∫ t1

t0

(λ0f (s, x(s), x(s)) + λ1(s)g (s, x(s)))ds

implies that x∗(·) is a solution of the corresponding Euler IntegralEquation

λ0fu(t, x(t), x(t)) = c+

t∫t0

(λ0fx(s, x(s), x(s)) + λ1(s)gx(s, x(s))) ds

for some c. Also, between corners x∗(·) satisfies

d

dtλ0fu(t, x(t), x(t)) = [λ0fx(t, x(t), x(t)) + λ1(t)gx(t, x(t))] .

Remark 7.5 It is worth noting that Theorem 7.8 does not pro-vide information about the function λ1(·). In particular, there isno claim that λ1(·) is piecewise smooth or even piecewise continu-ous. In order to obtain additional information of this kind requiresmathematical techniques beyond advanced calculus and will not beconsidered here.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


We close this chapter with a very brief introduction to ab-stract optimization problems. Although this material is not re-quired reading, it does help the reader to see how the idea of thefirst variation can be extended to rather general settings. It is ex-tensions of this type that provide the foundations for the moderndevelopment of necessary and sufficient conditions for optimiza-tion with applications to optimal control.

7.7 An Introduction to Abstract Opti-

mization Problems

In this section we provide an basic introduction to the theory ofnecessary conditions for general optimization problems. Generallyspeaking, the subjects of the calculus of variations and optimalcontrol theory belong to the larger discipline of infinite dimen-sional optimization theory. Problems in the calculus of variationscame from classical physics and date back 300 years. The the-ory of optimal control is relatively new and some mathematiciansplace its formal beginning in 1952, with Bushaw’s Ph.D. thesis(see Hermes and La Salle [100]). However, many of the basic ideasdate back into the last century.

Here we review some basic finite and infinite dimensional un-constrained and constrained optimization theory. The reader isreferred to [102], [131] and [166] for details. The book by Neustadt[144] contains very general results on necessary conditions withapplications to a wide variety of problems.

7.7.1 The General Optimization Problem

We shall consider problems that are special cases of the followinggeneral optimization problem.

Let Z be a vector space and assume that Θ ⊆ Z is givena given (constraint) set. Also, let J : D(J) ⊆ Z −→ R1 be areal valued function defined on the domain D(J). The GeneralOptimization Problem (GOP) is defined by: Find an element z∗ ∈

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Θ ∩D(J) such that

J(z∗) ≤ J(z)

for all z ∈ Θ ∩D(J).Of course, specific problems have much more structure and the

challenge is to formulate the problem in such a way that there isa solution and one can “compute” z∗. In addition to the SimplestProblem in the Calculus of Variations, this framework includesmuch of finite dimensional optimization as well as problems inoptimal control. The following examples will be used to motivatethe discussion in this section.

Example 7.6 Let J(·) be a real-valued function defined on someinterval I = [a, b]. In ordinary calculus one considers the problemof minimizing J(·) on [a, b]. In this example, Z = R1 and Θ =[a, b].

Example 7.7 To give another elementary example, let us con-sider the problem of finding the point on the plane with equation

2x+ 3y − z = 5

which is nearest to the origin in R3. In this problem, we set

Θ =

[x y z]T : 2x+ 3y − z − 5 = 0

and let J(·) be the square of the distance to the origin; namely

J (z) = J (x, y, z) = x2 + y2 + z2.

Here, Z = R3 is the vector space.

Example 7.8 The problem is to find a piecewise continuous (con-trol) function u(t), 0 ≤ t ≤ 3 such that the cost functional

J(x(·), u(·)) =

3∫0

[1− u (s) · x (s)] ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is minimized where x(·) satisfies the differential equation

x (t) = u (t)x (t)

with initial conditionx (0) = 1

and u(·) is constrained by

0 ≤ u (t) ≤ 1.

This is the so called Farmer’s Allocation Problem. For this examplewe set X = PWS(0, 3)×PWC(0, 3) and define the constraint setΘ by

Θ =

z(·) = [x(·) u(·)]T : u(·) ∈ PWC(0, 3), 0 ≤ u (t) ≤ 1,

x (t) = u (t)x(t), x (0) = 1

.

The cost function is defined on J : Z = PWS(0, 3) ×PWC(0, 3) −→ R1 by J(z(·)) = J(x(·), u(·)). We shall see laterthat there is a more efficient way to describe the constraint set Θin terms of an equality and inequality constraint.

Example 7.9 Returning to Example 2.6 we let J : R2 −→ R1

be defined by J (x, y) = x2 + y2 and G : R2 −→ R1 be given byG (x, y) = x2 − (y − 1)3. In this problem the constraint set is the“level curve” defined by

Θ = ΘG =z = [x y]T ∈ R2 : G (x, y) = 0

.

Recall that the minimizer is z∗ = [x∗ y∗]T = [0 1]T , but the La-grange Multiplier Rule is not very helpful.

7.7.2 General Necessary Conditions

In this section we present the basic first order necessary conditionsfor the general optimization problem described above. The prob-lem of finding the minimum of J(·) over a constraint set Θ canrange from the trivial problem in Example (7.6) to the difficult op-timal control problem in Example (7.8). The approach most often

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


used to “solve” such problems has been the application of neces-sary conditions for a minimum. The basic idea behind the use ofa necessary condition is really very simple. A necessary conditionis used to reduce the problem of minimizing J(·) over all of Θto a problem of minimizing J(·) over a smaller set ΘS ⊆ Θ. Weillustrate this idea with the following trivial example.

Example 7.10 Consider the problem of minimizing the continu-ous function

J(z) =2z2 − z4 + 3

4

on the compact interval Θ = [−2, 1]. From elementary calculus weknow that there is a z∗ ∈ Θ that minimizes J(·) on the compactinterval [−2, 1]. Moreover, if J (z∗) ≤ J (z) for all z ∈ Θ = [−2, 1]then

J ′ (z∗) = 0 if − 2 < z∗ < 1

J ′ (z∗) ≥ 0 if z∗ = −2

J ′ (z∗) ≤ 0 if z∗ = 1.

We use the first derivative test and solve for all z∗ ∈ (−2, 1) sat-isfying

0 =dJ (z)

dz= J ′ (z) = z − z3.

Clearly, z = −1 and z = 0 are the only solutions in the openinterval (−2, 1). At the left endpoint, we find that

J ′ (−2) = 6 > 0

and at the right endpoint

J ′ (1) = 0 ≤ 0.

Thus, at this point we have reduced the possible set of minimizersto

ΘS = −2,−1, 0, 1 .It is now trivial to find the global minimum, since

J (−2) = −5/4, J (0) = 3/4, and J (−1) = J (1) = 1.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Thus, z∗ = −2 is the global minimum of J(·) on Θ = [−2, 1]. Notethat we could have reduced the set ΘS even more by applying thesecond order condition that

d2J (z)

dz2= J ′′ (z∗) ≥ 0 if − 2 < z∗ < 0.

For in this problemJ ′′ (z) = 1− 3z2,

so J ′′ (−1) = −2 ≤ 0, while J ′′ (0) = 1 ≥ 0. Thus, z = −1 couldhave been removed from ΘS leading to the problem of minimizingJ(·) on ΘSS = −2, 0, 1 .

Recall from Section 2.2.2 that the proof of the basic necessarycondition used in the previous example is rather simple. However,the idea can be extended to very general settings. Consider theproblem of minimizing J(·) on a set Θ ⊆ Z. Assume that z∗ ∈Θ∩D(J) provides a (global) minimizer for J : D(J) ⊆ Z −→ R1

on Θ ∩D(J). Thus,

J (z∗) ≤ J (z) ,

for all z ∈ Θ∩D(J). Furthermore, assume that there is a function

ϕ(·) : (−δ,+δ)→ Θ ∩D(J) (7.88)

satisfyingϕ (0) = z∗. (7.89)

Define F : (−δ,+δ)→ R1 by

F (ε) = J(ϕ(ε)),

and observe that, since ϕ(ε) ∈ Θ ∩D(J) for all ε ∈ (−δ,+δ),

F (0) = J(ϕ(0)) = J(z∗) ≤ J(ϕ(ε)) = F (ε),

for all ε ∈ (−δ,+δ). If the derivative

F ′(0) =d

dεF (ε)

∣∣∣∣ε=0

=d

dεJ(ϕ(ε))

∣∣∣∣ε=0

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


exists, then

F ′(0) =d

dεJ(ϕ(ε))

∣∣∣∣ε=0

= 0. (7.90)

Note that this is the basic necessary condition and to be “useful”one must construct functions ϕ(·) and have enough freedom inthe choice of these functions so that the condition (7.90) impliessomething about the minimizer z∗ ∈ Θ ∩D(J). In the followingsections we show that this general framework can be applied to awide variety of problems.

Note that in all of the above examples Θ is a subset of a vec-tor space Z. Assume that z∗ ∈ Θ ∩D(J) provides a (global)minimizer for J : D(J) ⊆ Z −→ R1 on Θ ∩D(J). We focus onvariations ϕ(·) : (−δ,+δ)→ Θ∩D(J) of the form ϕ(ε) = z∗+εηwhere η ∈ Z is “limited” to a specific subset of Z. The key is-sue is to make sure that, for a suitable class of η, the variationsϕ(ε) = z∗ + εη are admissible in the following sense.

Definition 7.1 The vector ϕ(ε) = z∗ + εη is called an admis-sible variation of z∗ if z∗+ εη ∈ Θ∩D(J) on some nontrivialinterval containing ε = 0.

It is clear that some “directions” η may lead to admissible vari-ations and others may not. Also, sometimes it is sufficient to as-sume that ϕ(·) : [0,+δ)→ Θ∩D(J) and consider only one sidedderivatives at ε = 0. Thus, we are led to the following concepts.

Definition 7.2 Suppose A is a subset of the vector space Z withz ∈ A and η ∈ Z. We say that z ∈ A is an internal point ofA in the direction of η ∈ Z, if there is a δ = δ(η) > 0 suchthat (z + εη) ∈ A for all ε in the open interval −δ(η) < ε < δ(η).The point z ∈ A is called a radial point of A in the directionof η, if there is an δ = δ (η) > 0 such that (z + εη) ∈ A for allε in the interval 0 ≤ ε < δ (η). The point z ∈ A is called a corepoint of A if z is an internal point of A for all directions η ∈ Z.Observe that the concept of internal, core and radial points arevalid in a general vector space and require no notion of distance.In particular, there is no requirement that Z be a topological vectorspace.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


7.7.3 Abstract Variations

Let J : D(J) ⊆ Z −→ R1. If z is an internal point of D(J) inthe direction of η then the function

ϕ (ε) = z + εη

maps the interval (−δ (η) , δ (η)) into D(J). On the other hand,if z ∈D(J) is a radial point of D(J) in the direction of η then

ϕ (ε) = z + εη

maps [0, δ(η)) into D(J). If z ∈ D(J), η ∈ Z and z is either aninternal point or radial point of D(J) in the direction of η , thenwe define the first variation of J(·) at z in the direction η by

δJ (z;η) ,d

dεJ (z + εη)

∣∣∣∣∈=0

, (7.91)

provided the derivative exists. Here the derivative is two-sided ifz is an internal point and one-sided if z is a radial point of D(J).

If the two-sided or one-sided second derivatives

d2

dε2J (z + εη)

∣∣∣∣∈=0

exist, then we say that J(·) has a second variation at z in thedirection η and denote this by

δ2J (z;η) ,d2

dε2J (z + εη)

∣∣∣∣∈=0

. (7.92)

If z is a core point of D(J) and the first variation of J(·) atz in the direction η exists for all η ∈ Z, then we say that J(·)has a Gateaux variation at z. In other words, if δJ (z;η) existsfor all η ∈ Z, then δJ(z;η) is called the Gateaux variation (orweak differential) of J(·) at z in the direction η. Likewise, ifδ2J(z;η) exists for all η ∈ Z, then δ2J(z;η) is called the secondGateaux variation (or weak second differential) of J(·) atz in the direction η (see [143] for details).

Observe that J(·) has Gateaux variation at z ∈ D(J) if andonly if

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


1. z is a core point of D(J) and

2. δJ(z;η) exists for all η ∈ Z.

Moreover, if J(·) has Gateaux variation at z ∈ D(J), then thederivative (7.91) is two-sided since η ∈ Z implies that −η ∈ Z.

It is now rather easy to establish a very general necessary con-dition for the abstract problem. In particular, assume that z∗ ∈Θ∩D(J) provides a (global) minimizer for J : D(J) ⊆ Z −→ R1

on Θ ∩D(J). In order for the necessary condition to be useful,the set of all η for which z∗ ∈ Θ is an internal point or radicalpoint of A , Θ∩D(J) in the direction η needs to be sufficiently“large”. Thus, we define two spaces of admissible variations givenby

V IA = V I

A(z∗)

, η ∈ Z : z∗ is an internal point of A in the direction of η

and

V RA = V R

A (z∗)

, η ∈ Z : z∗ is an radial point of A in the direction of η.

The following theorem provides the starting point for much of thetheory of general necessary conditions in optimization.

Theorem 7.9 (Fundamental Abstract Necessary Condition)Let J : D(J) ⊆ Z −→ R1 be a real valued function defined on avector space Z and assume that Θ ⊆ Z is a given subset of Z. LetA , Θ ∩D(J) and assume that z∗ ∈ A satisfies J (z∗) ≤ J (z)for all z ∈ A.

(A) If η ∈ V IA and δJ(z∗; η) exists, then

δJ(z∗; η) = 0. (7.93)

Moreover, if δ2J(z∗; η) exists, then

δ2J(z∗; η) ≥ 0. (7.94)

(B) If η ∈ V RA and δJ(z∗; η) exists, then

δJ(z∗; η) ≥ 0. (7.95)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


7.7.4 Application to the SPCV

To apply the abstract necessary condition given by Theorem 7.9to the Simplest Problem in the Calculus of Variations, first notethat Z = PWS(t0, t1) is a vector space. Also, J : Z → R1 isdefined on the domain D(J) = Z by

J(x(·)) =

t1∫t0

f(s, x(s), x(s))ds.

Since the constraint set is

Θ = x(·) ∈ PWS(t0, t1) : x (t0) = x0, x (t1) = x1 ,

it follows that x(·) ∈ A , Θ ∩ D(J) is an internal point of A ,Θ∩D(J) = Θ in the direction of η(·) if η(·) ∈ V0, where V0 is theset of “admissible variations” given by

V0 = η(·) ∈ PWS(t0, t1) : η(t0) = 0,η (t1) = 0 .

Thus, if x(·) ∈ A , Θ ∩D(J), then

V0 ⊂ V IA(x(·)) , η(·) ∈ Z : x(·) is an internal point of A

in the direction of η(·)

and the first variation δJ(x(·); η(·)) exists. However, in generalit is sometimes neither obvious nor easy to pick a “good” set ofadmissible variations.

Note that for the Simplest Problem in the Calculus of Varia-tions, the constraint set

Θ = x(·) ∈ PWS(t0, t1) : x (t0) = x0, x (t1) = x1 ,

has no core points! For example, given any x(·) ∈ Θ, let η(·) bethe constant function η(t) ≡ 1. Since

x(t0) + εη(t0) = x0 + ε 6= x0

for any ε 6= 0, it follows that [x(·)+εη(·)] /∈ Θ, and hence x(·) cannot be a core point of A , Θ ∩ D(J). Therefore, requiring thatJ(·) have Gateaux variation at a minimizer x∗(·) ∈ Θ is too strongfor even the Simplest Problem of the Calculus of Variations.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


7.7.5 Variational Approach to Linear QuadraticOptimal Control

Although the second part of this book is devoted to optimal controlproblems, some of these problems may be formulated as abstractoptimization problems and the variational approach as set outin Theorem 7.9 may be applied. To illustrate this we considera simple linear quadratic control problem governed by the linearsystem

(LS) x(t) = Ax(t),+Bu(t), 0 < t ≤ 1, (7.96)

where A is an n× n constant matrix and B is an n×m constantmatrix. The initial conditions are given by

x(0) = x0 ∈ Rm. (7.97)

The function u(·) ∈ PWC(0, 1;Rm) is called the control and thesolution x(·) = x(·;u(·)) to the initial value problem (7.96) - (7.97)is called the state. Given a control u(·) ∈ PWC(0, 1;Rm), the(quadratic) cost functional is defined by

J(x(s),u(·)) =1

2

1∫0

‖x(s)‖2 + ‖u(s)‖2 ds, (7.98)

where x(·) = x(·;u(·)) ∈ PWS(0, 1;Rn) is the piecewise smoothsolution to the initial value problem (7.96) - (7.97). In particular,

x(t;u(·)) = eAtx0 +

t∫0

eA(t−τ)Bu(τ)dτ. (7.99)

Therefore,

J(u(·)) =1

2

1∫0

‖x(s)‖2 + ‖u(s)‖2

=1

2

1∫0

∥∥∥∥∥∥eAsx0 +

s∫0

eA(s−τ)Bu(τ)dτ

∥∥∥∥∥∥2

+ ‖u(s)‖2

ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


depends only on u(·) ∈ PWC(0, 1;Rm).If one defines Z = PWC(0, 1;Rm) and J : Z → R by

J(u(·)) = J(u(·)) and the set of all admissible controllers by

Θ = u(·) ∈ PWC(0, 1;Rm) , (7.100)

then the optimal control problem is equivalent to the general prob-lem of minimizing J(·) on the set of all admissible controllers Θ.One can apply the abstract necessary condition given by The-orem 7.9 to this problem and obtain the standard necessaryconditions. Observe that the domain of J(·), is the entire spaceD(J) = PWC(0, 1;Rm).

Remark 7.6 This variational approach is valid because there areno constraints on the control. In particular, Θ = PWC(0, 1;Rm)is a vector space and any vector u∗(·) ∈ PWC(0, 1;Rm) is a corepoint of A = D(J)∩ PWC(0, 1;Rm) = PWC(0, 1;Rm). We notethat the classical variational approach for optimal control was thefirst method used in optimal control of both ODE systems and sys-tems governed by partial differential equations. Also, as long asthe control is not constrained, as is the case above, this variationalmethod works for nonlinear systems as well (see Lee and Markus[119], page 18 to 22).

7.7.6 An Abstract Sufficient Condition

The derivation of sufficient conditions is not as easy, nor is it a welldeveloped subject. However, for certain convex problems (whichalmost never occur in the calculus of variations) there is a simpleresult. Again we assume that J : D(J) ⊆ Z −→ R1 is a realvalued function defined on a vector space Z and that Θ ⊆ Z is agiven subset of Z.

Definition 7.3 The set Θ is called convex if

z1 and z2 ∈ Θ

implies that for all λ ∈ [0, 1]

[λz1 + (1− λ)z2] ∈ Θ.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


In particular, the line segment between z1 and z2 lies inside Θ.Figure 7.8 illustrates a convex set. On the other hand, the setshown in Figure 7.9 is not convex.

1z

(1 )1 2(1 )z z

0

2z0

Figure 7.8: A Convex Set

Definition 7.4 If Θ is convex, then a function J : D(J) ⊆Z −→ R1 is said to be a convex function on Θ if Θ ∩D(J) isconvex and for all z1, z2 ∈ Θ ∩D(J) and 0 ≤ λ ≤ 1,

J(λz1 + (1− λ) z2) ≤ λJ(z1) + (1− λ)J(z2).

The function J(·) is said to be a strictly convex function onΘ if for all z1, z2 ∈ Θ ∩D(J) and 0 < λ < 1,

J(λz1 + (1− λ) z2) < λJ(z1) + (1− λ)J(z2).

The following theorem applies to general abstract convex prob-lems. Also, the proof is a rather simple extension of the ideas foundin Section 2.2.2.

Theorem 7.10 (Abstract Sufficient Condition for ConvexProblems) Let

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


1z

(1 )1 2(1 )z z

0

2z0

Figure 7.9: A Non-convex Set

Θ be convex and J(·) be a convex function on Θ. If z0 ∈ Θ∩D(J),δJ (z0;η) exists and δJ (z0;η) ≥ 0 for all η such that (z0 + η) ∈Θ ∩D(J), then J(·) has a minimum on Θ at z0. Moreover, ifJ(·) is strictly convex, z0 is unique.

Proof : If z ∈ Θ and 0 < λ ≤ 1, then λz+ (1− λ) z0 ∈ Θ and

J([λz + (1− λ) z0]) ≤ λJ(z) + (1− λ)J(z0),

or equivalently,

J([λz + (1− λ) z0])− J(z0) ≤ λ[J(z)− J(z0)].

Hence,

J([λz + (1− λ) z0])− J(z0)

λ≤ [J(z)− J(z0)],

and it follows that

δJ(z0; [z − z0]) = limλ→0

[J([λz + (1− λ) z0])− J(z0)

λ≤ [J(z)− J(z0)].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


However, z0 + [z − z0] = z ∈ Θ, so that the assumption that

0 ≤ δJ(z0; [z − z∗])

implies that

0 ≤ δJ(z0; [z − z0]) ≤ [J(z)− J(z0)],


J(z0) ≤ J(z) (7.101)

and since (7.101) holds for any z ∈ Θ, z0 minimizes J(·) on Θ.The second part of the theorem is trivial.

Remark 7.7 Although control of PDE systems lies outside of thescope of this book, we note that the abstract variational approachrepresented by Theorem 7.9 above was the basic method used inthis field. As noted in the 1971 review paper by A. C. Robinson[156], most of the early work on necessary conditions for PDEcontrol systems was in fact done by variational methods (see page374 and the cited references). The idea of using abstract varia-tional calculus to derive the necessary conditions for PDE systemswas the main theme of the 1960’s and 1970’s literature. J. L. Li-ons certainly used this approach in his fundamental book [124].This abstract approach was popular in the early Russian school asillustrated in [144] and [151].


Apply the Euler Necessary Condition to the following free end-point problems.


J(x(·)) =

1∫0

√c2(1 + [x (s)]2)− [v(s)]2 − v(s)x (s)

c2 − [v(s)]2ds

among all piecewise smooth functions satisfying

x (0) = 0, and x (1) is free.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) =

1∫0

[x (s)]2 + [x (s)]2 + 2esx (s)ds,

subject to the endpoint conditions x (0) = 0 and x (1) is free.


J(x(·)) =

π∫0

[x (s)]2 − [x (s)]2ds,

subject to the endpoint conditions x (0) = 0 and x (π) is free.


J(x(·)) =

3π/2∫0

[x (s)]2 − [x (s)]2ds,

subject to the endpoint conditions x (0) = 0 and x (3π/2) is free.


J(x(·)) =

b∫0

x(s)√

1 + [x(s)]2ds,

subject to the endpoint conditions x (0) = 1 and x (b) is free.


J(x(·)) =

2∫1

[x(s)]2 − 2sx(s)ds,


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) =

π∫0

[x(s)]2(1− [x(s)]2)ds,

subject to the endpoint conditions x (0) = 0 and x (π) is free.


J(x(·)) =

3∫1

[3s− x(s)]x(s)ds,



J(x(·)) =

2∫1

x(s)[1 + s2x(s)]ds,

subject to the endpoint conditions x (1) is free and x (2) = 5.


J(x(·)) =

1∫0

1

2[x(s)]2 + x(s)x(s) + x(s) + x(s)ds,

subject to the endpoint conditions x (0) is free and x (1) is free.

Problem 7.11 Consider the point to curve problem of minimizingthe functional

J(x(·)) =

t1∫0

√1 + [x(s)]2

x(s)

ds,

subject to the endpoint condition x (0) = 0 and x (t1) = ϕ(t1),where .ϕ(t) = t−5. Use the necessary condition to find all possibleminimizers.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) =

1∫0

[x(s)]2ds,

subject to the endpoint conditions x (0) = 1, x(0) = 0, x(1) = 2and x(1) = 0.


J(x(·)) =

1∫0

1 + [x(s)]2ds,

subject to the endpoint conditions x (0) = 0, x(0) = 1, x(1) = 1and x(1) = 1.

Problem 7.14 Minimize the functional J(x(·)) =1∫−1

[x(s)]2ds,

subject to the endpoint conditions x (−1) = 0, x(0) = 1, andx(1) = 0. Hint: Note that this requires an extension of the nec-essary condition for higher order problems.


J(x1(·), x2(·)) =

π/2∫0

[x1(s)]2 + 2x1(s)x2(s) + [x2(s)]2

ds,

subject to the endpoint conditions

x1 (0) = 0, x1 (π/2) = 1, x2 (0) = 0, x2 (π/2) = 1.

Advanced Problems

Problem 7.16 Prove the Euler Necessary Condition for the FreeEndpoint Problem 7.1 assuming that x∗(·) provides only a weaklocal minimum.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Problem 7.17 Consider the Euler Necessary Condition for thepoint to curve problem as stated in Theorem 7.4. Modify the proofto allow for piecewise smooth functions x(·) and ϕ(·).

Problem 7.18 Prove the High Order Form of the FLCV as statedin Lemma 7.4. Hint: See Problem Set III.2 in Reid’s book [153].

Problem 7.19 Prove Theorem 7.7 assuming that x∗(·) is piece-wise smooth and provides only a weak local minimum.

Problem 7.20 Assume one has a partition 0 = t0 < t1 < · · ·< tp−1 < tp = 1 of [0, 1], and fixed points x0, x1, x2, xp−1. Let J(·)be the functional

J(x(·)) =

1∫0

[x(s)]2ds

and consider the problem of finding x∗(·) to minimize J(·) on theset

ΘI =x(·) ∈ PWS2(0, 1) : x(ti) = xi, for all i = 0, 1, . . . , p

.

Derive the Euler Necessary Condition for this problem. Hint: Lookinto the theory of splines and check the references [3], [159] and[160].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 8

Applications

In this chapter we apply the theory from the previous chapters tosome specific problems. We begin with the brachistochrone prob-lem.

8.1 Solution of the Brachistochrone

Problem

Recall that the brachistochrone problem leads to the minimizationof

J (x(·)) =

∫ a

0

√1 + [x (s)]2

2gx (s)ds,

subject tox (0) = 0, x (a) = b.

Here we assume that the minimizer has no corners so that x∗ (·)is C2. In order to solve this problem we note that the integranddoes not depend explicitly on t. In particular, the functional hasthe form

J (x(·)) =

∫ a

0

f (x (s) , x (s)) ds,

where

f (x, u) =

√1 + u2

2gx=

1√2gx

√1 + u2,

283

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

284 Chapter 8. Applications

andfu (x, u) = f(x, u)

u

1 + u2.

The Euler Differential Equation becomes

d

dtfu(x

∗(t), x∗(t)) = fx(x∗(t), x∗(t))

or in second order form (Hilbert’s Equation)

fux(x∗(t), x∗(t))x∗(t) + fuu(x

∗(t), x∗(t))x∗(t) = fx(x∗(t), x∗(t)).

(8.1)Again, recall that we shall use f ∗(t), f ∗u(t) and f ∗x(t), etc. to denotethe functions evaluated at (x∗(t), x∗(t)). Multiply both sides of(8.1) by x∗(t) to obtain

f ∗ux(t)[x∗(t)]2 + f ∗uu(t)x

∗(t)x∗(t)− f ∗x(t)x∗(t) = 0. (8.2)

However,

d

dt[f ∗u(t)x∗(t)− f ∗(t)] = f ∗u(t)x∗(t)

+ [f ∗ux(t)x∗(t) + f ∗uu(t)x

∗(t)]x∗(t)

− [f ∗x(t)x∗(t) + f ∗u(t)x∗(t)]

= f ∗u(t)x∗(t) + f ∗ux(t)[x∗(t)]2

+ f ∗uu(t)x∗(t)x∗(t)

− f ∗x(t)x∗(t)− f ∗u(t)x∗(t)

= f ∗ux(t)[x∗(t)]2

+ f ∗uu(t)x∗(t)x∗(t)

− f ∗x(t)x∗(t)

= 0.

Hence, we have that

d

dt[fu(x

∗(t), x∗(t))x∗(t)− f(x∗(t), x∗(t))] = 0,

or equivalently,

[fu(x∗(t), x∗(t))x∗(t)− f(x∗(t), x∗(t))] = c. (8.3)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 8. Applications 285

Thus, equation (8.3) is equivalent to the Euler Integral Equa-tion for the brachistochrone problem and hence the Euler IntegralEquation has the equivalent form

[x∗ (t)]2√2gx∗ (t) (1 + [x∗ (t)]2)

−

√1 + [x∗ (t)]2

2gx∗ (t)= c.

Simplifying, we have that

[x∗ (t)]2√x∗ (t) (1 + [x∗ (t)]2)

− 1 + [x∗ (t)]2√x∗ (t) (1 + [x∗ (t)]2)

=√

2gc2,

or equivalently,

−1√x∗ (t) (1 + [x∗ (t)]2)

=√

2gc2.

Solving for [x∗ (t)]2 we obtain the differential equation

[x∗ (t)]2 =a+ x∗ (t)

x∗ (t), a =

1

2gc2.

Therefore, since the slope of x∗ (t) is non-positive we obtain thedifferential equation

x∗ (t) = −

√a+ x∗ (t)

x∗ (t).

Solving this differential equation is not an easy task. In fact weshall make the (not so obvious) substitution

x∗ (t) = q [sin(θ (t) /2)]2

and observe that by the chain rule

x∗ (t) = −2q sin(θ (t) /2) cos(θ (t) /2)θ (t)

2.

Thus,√q + x∗ (t)

x∗ (t)=

√q [1− [sin(θ (t) /2)]2]

q[sin(θ (t) /2)]2=

√[cos(θ (t) /2)]2

[sin(θ (t) /2)]2

=[cos(θ (t) /2)]

[sin(θ (t) /2)]

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and the differential equation for θ (t) becomes

2q sin(θ (t) /2) cos(θ (t) /2)θ (t)

2=

[cos(θ (t) /2)]

[sin(θ (t) /2)].

Multiplying by sin(θ (t) /2) and dividing out cos(θ (t) /2) yields

q[sin(θ (t) /2)]2θ (t) = 1.

However, the identity

2[sin(θ (t) /2)]2 = 1− cos θ (t)

implies thatq

2(1− cos(θ (t)) θ (t) = 1,

or equivalently,

q

2θ (t)− q

2cos θ (t) θ (t) = 1.

We write this asq

2

dθ

dt− q

2cos θ

dθ

dt= 1

so thatq

2dθ − q

2cos θdθ = dt

and integrating, we obtain

q

2θ − q

2sin θ = t+ t0

and since t0 = 0 we have

q

2θ − q

2sin θ = t.

Thus, we have a parametric description of the solution given by

t =q

2θ − sin θ

andx = q [sin(θ/2)]2 =

q

2(1− cos θ) .

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The curvet =

q

2(θ − sin θ) ,

x =q

2(1− cos θ) ,

is a cycloid and the graph of −x∗ (·) lies on this curve (recall thatx∗ (·) is the distance from the curve to y = 0 axis) and hence theoptimal curve is part of a cycloid.

8.2 Classical Mechanics and Hamil-

ton’s Principle

What we call classical mechanics today was initiated in the late1600’s by Newton and others. Newton changed our understandingof the universe by enumerating his three “Laws of Motion”:

[First Law] Every object in a state of uniform motiontends to remain in that state of motion unless an ex-ternal force is applied to it.

[Second Law] The rate of change of momentum ( p =mass·velocity) is proportional to the impressed forceand is in the direction in which the force acts:

~F (t) =d~p(t)

dt=d[m(t)~v(t)]

dt.

[Third Law] For every action there is an equal andopposite reaction.

Although the second law is often stated as

~F (t) = m~a(t), (8.4)

which is valid under the assumption that the mass is constant,it is important to note the correct version. Also, around 300 BCAristotle had stated a “Second Law of Motion” which, for constantmass, becomes

~F (t) = m~v(t). (8.5)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Aristotle’s Second Law seems to be more in accord with commonsense and even today we hear reports about the “force of impact”when a 3000 pound car traveling at 50 miles per hour hits a brickwall. The point is that “Laws of Motion” are not “laws”, butare mathematical models of observed behavior and often evolveand change over time as our understanding improves. In short,these are assumptions that lead to models that must be validatedthrough experiments.

In 1744 Euler showed Newton’s Second Law (8.4) could be ob-tained by another assumption called the Principle of Least Actionand in 1788 Lagrange showed that, in the case of conservativeforces, a major part of Newtonian mechanics could be derivedfrom this principle. As we shall see below, the Principle of LeastAction is not true for a general time interval (this was pointedout by Jacobi in 1842), but in 1834 and 1835 William Hamiltonprovided an alternative principle that extended the application tomore general forces and set the stage for what we now call Hamil-tonian mechanics. Suppose a particle of fixed constant mass m is

located at the position x(t) =[x1(t) x2(t) x3(t)

]Tat time t

and moves under a force F (t) =[f1(t) f2(t) f3(t)

]T. By New-

ton’s Second Law, the force F (t) will cause the particle to movealong a path in R3 such that

mx(t) = F (t),

i.e.mxi(t) = fi(t), i = 1, 2, 3. (8.6)

The particle is said to be in a conservative field if there is a functionU = U(t, x1, x2, x3) such that

F (t) = −∇xU(t, x1(t), x2(t), x3(t)), (8.7)

or equivalently,

fi(t) = − ∂

∂xiU(t, x1(t), x2(t), x3(t)), i = 1, 2, 3.

The function U : R× R3 → R is called the potential and

P (t) = U(t, x1(t), x2(t), x3(t)) = U(t,x(t)) (8.8)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is called the potential energy of the particle. The kinetic energy ofthe particle is defined to be

K(t) ,1

2m

3∑i=1

[xi(t)]2 =

1

2m ‖x(t)‖2 . (8.9)

The total energy of the particle is given by

H(t) = K(t) + P (t). (8.10)

The Lagrangian L(t,x,u) : R× R3 × R3 → R is defined by

L(t,x,u) =1

2m ‖u‖2 − U(t,x) (8.11)

so thatL(t) = L(t,x(t), x(t)) , K(t)− P (t) (8.12)

is the difference between the kinetic and potential energy of theparticle. Finally, given two times t0 < t1, the action of the particleis defined to be the integral

A(x(·)) ,∫ t1

t0

[K(s)− P (s)] ds =

∫ t1

t0

[L(s,x(s), x(s))] ds

(8.13)and given the special form of the force and kinetic energy, it followsthat

A(x(·)) =

∫ t1

t0

[1

2m

3∑i=1

[xi(s)]2 − U(s,x(s))

]ds.

The Principle of Least Action states that the particle will movebetween x(t0) and x(t1) along a path x∗(·) that minimizes theaction integral. In particular,

A(x∗(·)) ≤ A(x(·))

for all curves joining x∗(t0) and x∗(t1). Note that this is a 3dimensional version of the Simplest Problem of the Calculus of

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Variations and hence if x∗(·) minimizes the action integral (8.13)then x∗(·) must satisfy

δA(x∗(·);η(·)) = 0 (8.14)

for all η(·) ∈ PWS(t0, t1;R3) satisfying η(t0) = η(t1) = 0. Con-sequently, by Euler’s Necessary Condition x∗(·) satisfies Euler’sDifferential Equation

d

dt∇uL(t,x∗(t), x∗(t)) = ∇xL(t,x∗(t), x∗(t)) (8.15)

for t0 < t < t1. Using the definition of

L(t,x,u) =1

2m ‖u‖2 − U(s,x)

=1

2m

3∑i=1

[ui]2 − U(s,x),

it follows that

∇uL(t,x,u) =

mu1

mu2

mu3

and

∇xL(t,x,u) = −∇xU(t,x).

Hence, along the motion of the particle we have

∇uL(t,x∗(t), x∗(t)) =

mx1(t)mx2(t)mx3(t)

= m

x1(t)x2(t)x3(t)

and

∇xL(t,x∗(t), x∗(t)) = −∇xU(t,x∗(t)) =

f1(t)f2(t)f3(t)

which becomes

md

dt

x1(t)x2(t)x3(t)

=

f1(t)f2(t)f3(t)

,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


or equivalently,mx(t) = F (t)

which is Newton’s Second Law.

Example 8.1 Consider the one dimensional version where a par-ticle of mass m = 1 moves under the force defined by the potentialU(x) = κ

2x2 so that f(t) = −κx(t). The kinetic energy is given

by K(t) = 12(x(t))2 and the Lagrangian is defined by

L(x, u) =1

2[u2 − κx2].

Here the action integral becomes

A(x(·)) =1

2

∫ t1

t0

[(x(s))2 − κ(x(s))2]ds (8.16)

and Euler’s equation is

d

dt[x(t)] = −κx(t).

Therefore, the particle will move along a path x∗(·) defined by thedifferential equation

x∗(t) + κx∗(t) = 0.

This is the same equation that comes directly from Newton’s Sec-ond Law. However, consider the case where κ = 1 and the particlestarts at x0 = 0 at time t0 = 0 and stops at x1 = 0 at timet1 = 2π. We know from the Jacobi Necessary Condition - (IV)that the action integral (8.16) does not have a minimum on thisinterval. Thus, the Principle of Least Action does not always hold.

Hamilton stated a new principle which is known as Hamilton’sPrinciple of Stationary Action. In particular, Hamilton’s principlestates that the particle will move between x(t0) and x(t1) along apath x∗(·) that makes the action integral stationary. In particu-lar, A(x∗(·)) has first variation zero for all η(·) ∈ PWS(t0, t1;R3)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


satisfying η(t0) = η(t1) = 0. Thus, Hamilton’s Principle of Sta-tionary Action is equivalent to the condition that

δA(x∗(·);η(·)) = 0 (8.17)

all η(·) ∈ PWS(t0, t1;R3) satisfying η(t0) = η(t1) = 0. Observethat Newton’s Second Law follows from (8.17) since the Funda-mental Lemma of the Calculus of Variations implies that the EulerDifferential Equation holds. Clearly, the Euler Differential Equa-tion in this case reduces to

mx(t) = F (t),

which is Newton’s Second Law.

8.2.1 Conservation of Energy

Here we again assume that a particle of mass m is moving in aconservative field with potential energy given by

P (t) = U(t, x1(t), x2(t), x3(t)) = U(t,x(t)) (8.18)

and kinetic energy given by

K(t) ,1

2m

3∑i=1

[xi(t)]2. (8.19)

Also, with L(t,x,u) = 12m ‖u‖2 − U(t,x) = 1

2m∑3

i=1[ui]2 −

U(t,x), let pi(t) be defined by

pi(t) , mxi(t) =∂

∂uiL(t,x(t), x(t)) (8.20)

and observe that the total energy H(t) = K(t) + P (t) can bewritten as

H(t) =1

2m

3∑i=1

[xi(t)]2 + U(t,x(t))

= −1

2m

3∑i=1

[xi(t)]2 +m

3∑i=1

[xi(t)]2 + U(t,x(t))

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


= −1

2m

3∑i=1

[xi(t)]2 + U(t,x(t)) +

3∑i=1

xi(t)[mxi(t)]

= −[K(t)− U(t,x(t))] +3∑i=1

xi(t)[∂

∂uiL(t,x(t), x(t))]

= −L(t,x(t), x(t)) +3∑i=1

xi(t)pi(t).

If we define the Hamiltonian function by

H(t,x,u,p) , −L(t,x,u) +3∑i=1

uipi, (8.21)

then

H(t,x(t), x(t),p(t)) = −L(t,x(t), x(t)) +3∑i=1

xi(t)pi(t) (8.22)

is the total energy of the system.Now consider the case where the potential is independent of

time so that

P (t) = U(x1(t), x2(t), x3(t)) = U(x(t)) (8.23)

and since L(x,u) = 12m ‖u‖2−U(x) = 1

2m∑3

i=1[ui]2−U(x), the

Euler Differential Equation

d

dt∇uL(x∗(t), x∗(t)) = ∇xL(x∗(t), x∗(t))

has the form

d

dt[mxi(t)] = Lxi(x

∗(t), x∗(t)), i = 1, 2, 3.

Note that for i = 1, 2, 3, we have

∂

∂xiH(x,u,p) = −Lxi(x,u)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and∂

∂piH(x,u,p) = ui.

Therefore,

xi(t) =∂

∂piH(x(t), x(t),p(t)) (8.24)

and

d

dt[mxi(t)] = Lxi(x

∗(t), x∗(t)) = − ∂

∂xiH(x(t), x(t),p(t)),

so that

pi(t) = − ∂

∂xiH(x(t), x(t),p(t)). (8.25)

Combining (8.24) and (8.25) it follows that Euler’s Equation canbe written as the system

xi(t) =∂

∂piH(x(t), x(t),p(t)) (8.26)

pi(t) = − ∂

∂xiH(x(t), x(t),p(t)). (8.27)

Now we differentiate H(x(t), x(t),p(t)) with respect to time.In particular, since

H(x(t), x(t),p(t)) = −L(x(t), x(t)) +3∑i=1

xi(t)pi(t),

it follows that

d

dtH(x(t), x(t),p(t)) =

3∑i=1

(∂

∂xiH(x(t), x(t),p(t))xi(t)

)

+3∑i=1

(∂

∂uiH(x(t), x(t),p(t))xi(t)

)

+3∑i=1

(∂

∂piH(x(t), x(t),p(t))pi(t)

).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


However,

H(x,u,p) , −L(x,u) +3∑i=1

uipi

so that

∂

∂uiH(x,u,p) = −Lui(x,u) + pi = −mui + pi

and hence

∂

∂uiH(x(t), x(t),p(t)) = −mxi(t) + pi(t) ≡ 0.

We now have proven the following result concerning the conserva-tion of energy.

Theorem 8.1 (Conservation of Energy) Assume the poten-tial is independent of time so that the potential energy is givenby P (t) = U(x1(t), x2(t), x3(t)) = U(x(t)). If x(t) satisfies thePrinciple of Stationary Action (i.e. δA(x∗(·);η(·)) = 0 for allη(·) ∈ PWS(t0, t1;R3) satisfying η(t0) = η(t1) = 0), then theHamiltonian is constant along the trajectory x(t). In particular,there is a constant H0 such that the total energy H(t) satisfies

H(t) = H(x(t), x(t),p(t)) ≡ H0

8.3 A Finite Element Method for the

Heat Equation

In this section we discuss how the Fundamental Lemma of theCalculus of Variations can be applied to the theory and numeri-cal solutions of systems governed by partial differential equations(PDEs). The FLCV plays two important roles here. First, as notedin Section 3.4, the FLCV provides the motivation for defining“weak derivatives” which will be used to define “weak solutions”of the PDE. Moreover, this weak formulation of the PDE is thefirst step in developing numerical methods based on finite element

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


schemes. We illustrate this approach for a simple PDE describingthe heat flow in a uniform rod given by

∂

∂tθ (t, x) = k

∂2

∂x2θ (t, x) + g(x)u(t), t > 0, 0 < x < 1, (8.28)

where u(t) is a heat source (control) and g(·) : (0, 1) → R1 is agiven (piecewise continuous) function. We also specify boundaryconditions

θ (t, 0) = θ (t, 1) = 0, (8.29)

and initial data

θ (0, x) = ϕ (x) , 0 < x < 1. (8.30)

A strong (or classical) solution is a function θ (·, ·) of t and x suchthat ∂

∂tθ (t, x), ∂

∂xθ (t, x) and ∂2

∂x2θ (t, x) are continuous and θ (·, ·)

satisfies (8.28) at every value 0 < x < 1 and all t > 0 and theboundary condition (8.29) at x = 0, x = 1 and all t > 0. We areinterested in developing a numerical algorithm for approximatingsolutions. The finite element method can be used to approxi-mate the PDE problem (8.28) - (8.30) in much as the same way itwas employed in Section 2.4.5 above to approximate the two-pointboundary value problem. Indeed, the two problems are linked andthe finite element method in Section 2.4.5 can be extended to theheat equation above. Recall that the basic steps begin with mul-tiplying both sides of (8.28) by an arbitrary function η(·) so thatif θ (t, x) is a solution to (8.28), then for all t > 0 we have

∂

∂tθ (t, x) η(x) = k

∂2

∂x2θ (t, x) η(x) + b(x)u(t)η(x).

If η(·) ∈ PWC(0, 1), then η(·) is integrable so one can integrateboth sides to obtain∫ 1

0

[∂

∂tθ (t, x)

]η(x)dx

=

∫ 1

0

[k∂2

∂x2θ (t, x)

]η(x)dx+

∫ 1

0

[g(x)u(t)] η(x)dx

=

∫ 1

0

[k∂2

∂x2θ (t, x)

]η(x)dx+

[∫ 1

0

g(x)η(x)dx

]u(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


If in addition, η(·) ∈ PWS(0, 1), then ddxη(·) is integrable

and we can use integration by parts on the first term∫ 1

0

[k ∂2

∂x2θ (t, x)

]η(x)dx which yields

∫ 1

0

[k∂2

∂x2θ (t, x)

]η(x)dx =

[k∂

∂xθ (t, x) η(x)

]x=1

x=0

−∫ 1

0

[k∂

∂xθ (t, x)

] [d

dxη(x)

]dx.

Finally, if η(·) ∈ PWS0(0, 1) so that η(0) = η(1) = 0, then[k∂

∂xθ (t, x) η(x)

]x=1

x=0

= 0

and it follows that∫ 1

0

[∂

∂tθ (t, x)

]η(x)dx = −

∫ 1

0

[k∂

∂xθ (t, x)

] [d

dxη(x)

]dx

(8.31)

+

[∫ 1

0

g(x)η(x)dx

]u(t),

for all η(·) ∈ PWS0(0, 1). The equation (8.31) is called the weak(or variational) form of the heat equation defined by the heatequation (8.28) with boundary condition (8.29).

Definition 8.1 (Weak Solution of the Heat Equation) Wesay that the function θ(t, ·) is a weak solution of the heat equation(8.28)-(8.29), if for each t > 0, θ(t, x) and ∂

∂tθ(t, x) are continuous

functions of t and x and(1) θ(t, ·) ∈ PWS0(0, 1), ∂

∂tθ(t, ·) ∈ PWS0(0, 1),

(2) θ(t, ·) satisfies (8.31) for all η(·) ∈ PWS0(0, 1).

As in Section 2.4.5 above, the finite element element methodactually produces an approximation of the weak form of the heatequation. Observe that we have shown that a strong solution to theheat equation (8.28) - (8.29) is always a weak solution. Followingthe approach in Section 2.4.5, we focus on the simplest piecewise

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


linear approximations and hence we divide the interval [0, 1] intoN+1 subintervals (called elements) of length ∆x = 1/(N+1) withnodes 0 = x0 < x1 < x2 < . . . < xN−1 < xN < xN+1 = 1, wherefor i = 0, 1, 2, . . . , N,N + 1, xi = i∆x. The approximation θN(t, x)will be continuous in x on all of [0, 1] and linear between the nodes.Since continuous piecewise linear approximating functions θN(t, x)are not typically differentiable in x at the nodes, it is not possi-ble to insert this approximation directly into the equation (8.28).In particular, the piecewise smooth function θN(t, x) has only apiecewise continuous derivative ∂

∂xθN(t, x) and hence ∂2

∂x2θN(t, x)

does not exist. In order to deal with this lack of smoothness, weuse the weak form (8.31).

Define the spatial hat functions hi(·) on [0, 1] as in 2.67 by

h0(x) =

(x1 − x)/∆x, 0 ≤ x ≤ x1

0, x1 ≤ x ≤ 1,

hN+1(x) =

(x− xN)/∆x, xN ≤ x ≤ 1

0, 0 ≤ x ≤ xN, (8.32)

hi(x) =

(x− xi−1)/∆x, xi−1 ≤ x ≤ xi(xi+1 − x)/∆x, xi ≤ x ≤ xi+1

0, x /∈ (xi−1, xi+1),

for i = 1, 2, . . . , N.

Plots of these hat functions are identical to the plots in Figures2.24, 2.25 and 2.26 in Section 2.4.5 with the variable t replaced bythe variable x.

These hat functions provide a basis for all continuous piecewiselinear functions with (possible) corners at the internal nodes 0 <x1 < x2 < . . . < xN−1 < xN < 1. Therefore, any continuouspiecewise linear function θN(t, x) with corners only at these nodescan be written as

θN(t, x) =N+1∑i=0

θi(t)hi(x), (8.33)

where the functions θi(t) determine the value of θN(t, x) atx = xi. In particular, θN(t, xi) = θi(t) and in order to form

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


the function θN(t, ·) one must provide the coefficients θi(t) fori = 0, 1, 2, . . . , N,N + 1. Moreover, since θN(t, ·) is assumed tosatisfy the Dirichlet boundary conditions (8.29), then θN(t, x0) =θN(t, 0) = θ0(t) = 0 and θN(t, xN+1) = θN(t, 1) = θN+1(t) = 0 andθN(t, ·) can be written as

θN(t, x) =N∑i=1

θi(t)hi(x). (8.34)

We seek an approximate continuous piecewise linear solutionθN(t, ·) of the form (8.34) to the weak form of the two-point valueproblem (8.31). In order to compute the functions θi(t) for i =

1, 2, . . . , N , we substitute θN(t, x) =N∑i=1

θi(t)hi(x) into the weak

form of the equation given by (8.31). In particular, θN(t, x) isassumed to satisfy∫ 1

0

[∂

∂tθN (t, x)

]η(x)dx = −

∫ 1

0

[k∂

∂xθN (t, x)

] [d

dxη(x)

]dx

(8.35)

+

[∫ 1

0

g(x)η(x)dx

]u(t),

for all η(·) ∈ PWS0(0, 1).

Observe that ∂∂tθN(t, x) =

N∑i=1

θi(t)hi(x) and ∂∂xθN(t, x) =

N∑i=1

θi(t)ddxhi(x) is piecewise continuous so that substituting

θN(t, x) =N∑i=1

θi(t)hi(x) into the weak equation (8.35) yields

∫ 1

0

[N∑i=1

θi(t)hi(x)

]η(x)dx

= −∫ 1

0

[k

N∑i=1

θi(t)d

dxhi(x)

][d

dxη(x)

]dx

+

[∫ 1

0

g(x)η(x)dx

]u(t)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all η(·) ∈ PWS0(0, 1). This equation can be written as

N∑i=1

θi(t)

(∫ 1

0

hi(x)η(x)dx

)

= −kN∑i=1

θi(t)

(∫ 1

0

[d

dxhi(x)

] [d

dxη(x)

]dx

)(8.36)

+

[∫ 1

0

g(x)η(x)dx

]u(t),

for all η(·) ∈ PWS0(0, 1). In order to use the variational equa-tion to compute the coefficients θi(t) for (8.34), we note that fori = 1, 2, . . . , N the basis function hi(·) belongs to PWS0(0, 1).Therefore, setting η(·) = hj(·) ∈ PWS0(0, 1) for each indexj = 1, 2, . . . , N , yields N equations

N∑i=1

θi(t)

(∫ 1

0

[hi(x)] [hj(x)] dx

)

= −kN∑i=1

θi(t)

(∫ 1

0

[d

dxhi(x)

] [d

dxhj(x)

]dx

)(8.37)

+

[∫ 1

0

g(x)hj(x)dx

]u(t).

Again, as in Section 2.4.5 we define the N × N mass matrixM = MN by

M = MN = [mi,j]i,j=i,2,...,N ,

where the entries mi,j of MN are given by the integrals

mi,j =

(∫ 1

0

[hi(x)] [hj(x)] dx

).

Likewise, define the N ×N stiffness matrix K = KN by

K = KN = [ki,j]i,j=i,2,...,N ,

where the entries ki,j of KN are given by the integrals

ki,j =

(∫ 1

0

[d

dxhi(x)

] [d

dxhj(x)

]dx

).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Finally, let gN be the N × 1 (column) vector defined by

gN =[g1 g2 · · · gN

]T, (8.38)

where entries gj of gN are given by the integrals

gj =

(∫ 1

0

hj(x)g(x)dx

). (8.39)

If θN(t) is the solution vector

θN(t) =

θ1(t)θ2(t)

...θN(t)

,of (8.37), then θN(t) satisfies the matrix differential equation

MN θN(t) = − kKNθN + gNu(t). (8.40)

DefiningAN = −k [MN ]−1KN

andBN = [MN ]−1 gN

yields the linear control system

θN(t) = AN θN +BNu(t). (8.41)

which must be solved to find the coefficients θi(t) for i =1, 2, . . . , N .

In order to find the correct initial data for the finite elementmodel (8.41), one approximates θ (0, x) = ϕ (x), 0 < x < 1 by

ϕN(x) =N∑i=1

ϕihi(x)

and selects the coefficients ϕi for i = 1, 2, . . . , N so that ϕN(x)is the “best” approximation of ϕ (x). To make this precise, definethe subspace V h

0 (0, 1) ⊆ PWS0(0, 1) by

V h0 (0, 1) = span hi(x) : i = 1, 2, . . . , N =

ψ(x) =

N∑i=1

αihi(x)

.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Thus, we seek the function ϕN(x) =N∑i=1

ϕihi(x) such that ϕN(·)minimizes ∫ 1

0

|ϕN(x)− ϕ(x)|2 dx.

Elementary geometry implies that ϕN(·)− ϕ(·) must be “orthog-onal to V h

0 (0, 1)” so that∫ 1

0

[ϕN(x)− ϕ(x)]η(x))dx = 0

for all η(·) ∈ V h0 (0, 1). In particular,∫ 1

0

[ϕN(x)− ϕ(x)]hj(x)dx = 0

for all i = 1, 2, . . . , N so that∫ 1

0

[N∑i=1

ϕihi(x)− ϕ(x)

]hj(x)dx = 0

which implies

N∑i=1

ϕi

(∫ 1

0

[hi(x)] [hj(x)] dx

)−

N∑i=1

(∫ 1

0

[ϕ(x)] [hj(x)] dx

)= 0.

In matrix form we have

[MN ]ϕN = ϕN ,

where

ϕj =

(∫ 1

0

[ϕ(x)] [hj(x)] dx

)and

ϕN =[ϕ1 ϕ2 · · · ϕN

]T.

Therefore, the initial condition for the finite element model (8.41)is

θN(0) = ϕN

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


whereϕN = [MN ]−1 ϕN (8.42)

and the finite dimensional “finite element” system is defined by

θN(t) = AN θN +BNu(t) (8.43)

with initial conditionθN(0) = ϕN . (8.44)

The finite element ODE system (8.41) can be solved using stan-dard numerical methods. Again, the key to developing the ODEmodel (8.41) is to approximate the weak form of the PDE model(8.28) - (8.29). This is a fundamental idea that is the basis of thefinite element method.


Problem 8.1 Consider the problem of minimizing the functional

J(x(·)) =

b∫0

[x (s)]2 − [x (s)]2 + f(s)x(s)

ds

among all piecewise smooth functions satisfying

x (0) = 0, x (b) = 0,

where f(·) is a given continuous function. Assume x∗(·) minimizesJ(·).(A) Compute the first variation δJ(x∗(·), η(·)) for this problem.(B) Write out the equation δJ(x∗(·), η(·)) = 0.(C) Show that if x∗(·) minimizes J(·) subject to x (0) = 0, x (1) =0, then x∗(·) is a weak solution to the two point boundary valueproblem

x(t) + x(t) = f(t), x (0) = 0, x (b) = 0. (8.45)

(D) Show that there are solutions of (8.45) that do not minimizeJ(x(·)).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(x(·)) =

1∫0

[x (s)]2 + 2x(s)x(s) + [x(s)]2 + 2esx (s)

ds,

subject to the endpoint conditions x (0) = 0 and x (1) = 1. Assumex∗(·) is piecewise smooth and minimizes J(·).(A) Compute the first variation δJ(x∗(·), η(·)) for this problem.(B) Write out the equation δJ(x∗(·), η(·)) = 0.(C) What two point boundary value problem will x∗(·) satisfy?


J(x(·)) =

3∫1

[3s− x(s)]x(s) ds,

subject to the endpoint conditions x (1) = 1 and x (3) is free. As-sume x∗(·) is piecewise smooth and minimizes J(·).(A) Compute the first variation δJ(x∗(·), η(·)) for this problem.(B) Write out the equation δJ(x∗(·), η(·)) = 0.(C) What two point boundary value problem will x∗(·) satisfy?


J(x(·)) =

2∫1

x(s)[1 + s2x(s)]

ds,

subject to the endpoint conditions x (1) is free and x (2) = 5. As-sume x∗(·) is piecewise smooth and minimizes J(·).(A) Compute the first variation δJ(x∗(·), η(·)) for this problem.(B) Write out the equation δJ(x∗(·), η(·)) = 0.(C) What two point boundary value problem will x∗(·) satisfy?

Problem 8.5 Apply Hamilton’s Principle to the idealized doublependulum described in Section 2.1.3 to derive the equations of mo-tion for the system. Hint: Use polar coordinates and Figure 2.3 todetermine the kinetic and potential energy.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Advanced Problems

Problem 8.6 Let k = .25, g(x) = x and assume the initial func-tion is given by ϕ(x) = 1. Construct the finite element model(8.43) - (8.44) for N = 4, 8, 16, 32 and 64. Let u(t) = e−t

and use a numerical method to solve the finite element equa-tions (8.43) - (8.44) on the interval 0 < t ≤ 2. Hint: Since

g(x) = x =N+1∑i=1

( iN+1

)hi(x), it follows that

gj =

(∫ 1

0

hj(x)g(x)dx

)can be computed exactly. In particular, note

(∫ 1

0

hi(x)hj(x)dx

)=

∆x

6

4 0 < i = j < N + 12 i = j = 0 or i = j = N + 11 |i− j| = 10 elsewhere

and

(∫ 1

0

[d

dxhi(x)][

d

dxhj(x)]dx

)=

1

∆x

2 0 < i = j < N + 11 i = j = 0 or

i = j = N + 1−1 |i− j| = 1

0 elsewhere

.

Thus, the N ×N mass and stiffness matrices for this problem aregiven by

MN =∆x

6

4 1 0 0 · · · 0 01 4 1 0 · · · 0 00 1 4 1 · · · 0 00 0 1 4 · · · 0 0...

......

.... . .

......

0 0 · · · · · · 1 4 10 0 · · · · · · 0 1 4

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and

KN =1

∆x

2 −1 0 0 · · · 0 0−1 2 −1 0 · · · 0 0

0 −1 2 −1 · · · 0 00 0 −1 2 · · · 0 0...

......

.... . .

......

0 0 · · · · · · −1 2 −10 0 · · · · · · 0 −1 2

,

respectively.

Problem 8.7 Let k = 1, g(x) = 1 and assume the initial functionis given by ϕ(x) = x. Construct the finite element model (8.43)- (8.44) for N = 4, 8, 16, 32 and 64. Let u(t) = sin(t) and usea numerical method to solve the finite element equations (8.43) -(8.44) on the interval 0 < t ≤ 2π.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Part II

Optimal Control

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 9

Optimal Control Problems

In this chapter we describe the mathematical formulation of atypical optimal control problem as an optimization problem overa space of admissible functions. We provide some typical examplesand use these examples to motivate the Maximum Principle. Westart with some standard problems in the calculus of variations andmake a few modifications to illustrate where the classical necessaryconditions begin to break down.

The history of the development of optimal control theory isvery interesting. The reader should look at the perspectives foundin the references [30], [84], [85], [91], [134], [149], [154] and [172].

9.1 An Introduction to Optimal Con-

trol Problems

Consider the optimization problem: Minimize

J(x(·), u(·)) =

∫ t1

t0

|x(s)|2 + |u(s)|2

ds, (9.1)

where u(·) ∈ PWC(t0, t1) and x(·) ∈ PWS(t0, t1) are related by

x(t)− x0 −∫ t

t0

ax(s) + bu(s) ds = 0, (9.2)

309

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

310 Chapter 9. Optimal Control Problems

where b 6= 0. Observe that x(·) ∈ PWS(t0, t1), x(t) = ax(t)+bu(t)e.f. and x(t0) = x0. Therefore, the constraint (9.2) is equivalentto the initial value problem

x(t) = ax(t) + bu(t), x(t0) = x0 (9.3)

Thus, we can solve for u(t) in terms of x(t) and x(t) so that[x(t)− ax(t)] /b = u(t) and the problem of minimizing (9.1) sub-ject to the constraint (9.2) is equivalent to the free endpoint prob-lem of minimizing

J(x(·), u(·)) =

∫ t1

t0

[x(s)]2 +

1

b2[x(s)− ax(s)]2

ds,

subject tox(t0) = x0 with x(t0) – free.

In this case we can eliminate the “control” u(·) and reformulatethe problem as a classical problem in the calculus of variations.

Also, we could have reformulated the problem as a con-strained optimization problem as follows. Let Z = PWS(t0, t1)×PWC(t0, t1) and Y = PWS(t0, t1), so that an element z ∈ Z =PWS(t0, t1)× PWC(t0, t1) has the form z = [x(·) u(·)]T . Definethe functional J : Z −→ R1 by

J(z) = J(x(·), u(·)) =

∫ t1

t0

[x(s)]2 + [u(s)]2

ds (9.4)

and the function G = Z −→ Y by

[G(z)] (t) = [G(x(·), u(·))] (t) ,x(t)− x0

−∫ t

t0

ax(s) + bu(s) ds, (9.5)

respectively. In this setting, the problem is equivalent to findingz∗ = [x∗(·) u∗(·)]T ∈ Z to minimize J(·) subject to the (equality)constraint G(z) = 0.

Consider now a vector version of the problem. Let A be a n×nconstant real matrix and B be a n×m real constant matrix with

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 9. Optimal Control Problems 311

m < n (i.e. fewer controls that states). Consider the linear vectorcontrol system

x(t) = Ax(t) +Bu(t) (9.6)


x(0) = x0 ∈ Rn. (9.7)

We assume that t0 = 0, x0 ∈ Rn and 0 < t1 are given and that thecontrol u(·) belongs to the space PWC(0, t1;Rm). The quadraticcost functional is defined by

J(x(·),u(·)) =

t1∫0

‖x(s)‖2 + ‖u(s)‖2 ds, (9.8)

where x(t) = x(t;u(·)) is the solution to the system (9.6) - (9.7).The vector form of the optimization problem defined by (9.1) withthe constraint (9.2) is defined by (9.6) - (9.8).

Remark 9.1 Observe that we can no longer eliminate the controlu(·) since the matrix B is not invertible. Thus, it is not possibleto directly reformulate this vector problem as a classical vectorproblem in the calculus of variations.

Let Z = PWS(0, t1;Rn) × PWC(0, t1;Rm) and Y =PWS(0, t1;Rn), so that an element z ∈ Z = PWS(0, t1;Rn) ×PWC(0, t1;Rm) has the form z = [x(·) u(·)]T . Also, define thefunctional J : Z −→ R1 by

J(z) = J(x(·),u(·)) =

t1∫0

‖x(s)‖2 + ‖u(s)‖2 ds, (9.9)

and the function G = Z −→ Y by

[G(z)] (t) = [G(x(·),u(·))] (t) ,x(t)− x0

−∫ t

0

Ax(s) +Bu(s) ds,

(9.10)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


respectively. Again, the vector problem is equivalent to findingz∗ = [x∗(·) u∗(·)]T ∈ Z to minimize J(·) subject to the (equality)constraint G(z) = 0.

As we shall see later, formulating the control problem as a gen-eral optimization problem with equality (or inequality) constraintscan be used to obtain necessary conditions. Moreover, some clas-sical problems such as the isoperimetric problem in the calculus ofvariations naturally fall into this formulation. The general formu-lation of the equality constrained problem assumes there are twovector spaces Z and Y and two functions

J : D(J) ⊆ Z −→ R1 (9.11)

andG : D(G) ⊆ Z −→ Y. (9.12)

The function J(·) is called the cost function and G(·) is called theconstraint function. Define the constraint set

ΘG ⊆ Z

by

ΘG = z ∈D(G) : G(z) = 0 ∈ Y ⊂D(G). (9.13)

The Equality Constrained Optimization Problem is defined to be:

Find an element z∗ ∈ ΘG ∩D(J) such that

J (z∗) ≤ J (z)

for all z ∈ ΘG ∩D(J).

Observe that since ΘG ⊂ D(G), it follows that ΘG ∩D(J) ⊂D(G) ∩D(J). Therefore, the equality constrained optimizationproblem is equivalent to finding z∗ ∈ D(G) ∩ D(J) such thatz∗ minimizes J(z) subject to G(z) = 0 ∈ Y . We first discussspecial cases and then move to the more abstract versions.

As noted before, this formulation allows us to develop and ap-ply a Lagrange Multiplier Theorem to obtain necessary conditions.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Luenberger employs this abstract approach in his book Optimiza-tion by Vector Space Methods (see [131]). However, equality con-straints such as (9.10) above are not always applicable to con-trol problems with (hard) inequality constraints on the control orstates. A simple example where one needs to use inequality con-straints is the so called Rocket Sled Time Optimal Control problemdescribed below.

In the following sections we describe some typical optimal con-trol problems and provide a preliminary discussion to motivatethe Maximum Principle. We start with the Rocket Sled Controlproblem and then revisit some of the simplest problems in thecalculus of variations. In addition, we make a few modificationsof the SPCV to illustrate where the classical necessary conditionsbegin to break down.

9.2 The Rocket Sled Problem

The Rocket Sled problem was formulated as a mathematical opti-mization problem in Section 2.4.4 above. Here we present anotherformulation that will be typical of optimal control problems con-sidered below. Recall that the system is defined by

d

dt

[x1(t)x2(t)

]=

[0 10 0

] [x1(t)x2(t)

]+

[01/m

]u(t), (9.14)

orx(t) = Ax(t) +Bu(t), x(0) = x0 (9.15)


A =

[0 10 0

]and B =

[01/m

],


]Tand x(t) =

[x1(t) x2(t)

]Tprovides a curve (trajectory) in the plane R2. To be more precise,given a control u(·) we let x(t;u(·)) denote the solution of the ini-tial value problem (9.15) with control input u(·). Given that the

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


control u(·) is constrained by |u(t)| ≤ 1, the time optimal problemis to find the control u∗(·) such that

u∗(·) ∈ [−1,+1], (9.16)

and u∗(t) steers x0 =[x1,0 x2,0

]Tto x1 =

[0 0

]Tin mini-

mum time. Here the final time t1 is not fixed and the cost functioncan be represented by the integral

J(x(·), u(·)) = t1 =

t1∫0

1ds =

t1∫0

f0(x(s), u(s)), (9.17)

where f0(x, u) ≡ 1. Let

X0 = x0 ⊆ R2 and X1 = x1 ⊆ R2

be the initial set and “target” set respectively. Also, define thecontrol constraint set

Ω = [−1,+1] ⊆ R.

The set of admissible controllers is the subset of PWC(t0,+∞)defined by

Θ =

u(·) ∈ PWC(t0,+∞) : u(t) ∈ Ω e.f.

and u(·) steers X0 to X1.

= Θ(t0,X0,X1,Ω). (9.18)

The time optimal control problem is the problem of minimizingJ(·) on the set of all admissible controllers Θ. In particular, an op-timal control is a function u∗(·) ∈ Θ such that u∗(·) steers X0 to X1

in time t∗1 > t0 and

J(u∗(·)) =

t∗1∫t0

f0(x∗(s), u∗(s))ds ≤t1∫t0

f0(x(s), u(s))ds = J(u(·))

(9.19)for all u(·) ∈ Θ. This time optimal control problem will be solvedbelow by using some simple geometric arguments that lay the foun-dation for the development of the Maximum Principle.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


9.3 Problems in the Calculus of Varia-

tions

Problems in the calculus of variations can be formulated as specialoptimal control problems. We discuss two examples to illustratehow classical problems in the calculus of variations can easily betransformed into optimal control problems. We begin with theSimplest Problem in the Calculus of Variations (SPCV).

9.3.1 The Simplest Problem in the Calculus ofVariations

We assume that f0(t, x, u) is a C2 smooth function of three vari-ables. In particular, we assume that f0 is continuous and all thepartial derivatives

∂f0(t, x, u)

∂t,∂f0(t, x, u)

∂x,∂f0(t, x, u)

∂u,

∂2f0(t, x, u)

∂t2,∂2f0(t, x, u)

∂x2,∂2f0(t, x, u)

∂u2

exist and are continuous. Note that this implies that all the mixedderivatives are equal, i.e.

∂2f0(t, x, u)

∂t∂x=∂2f0(t, x, u)

∂x∂t,∂2f0(t, x, u)

∂t∂u=∂2f0(t, x, u)

∂u∂t

and∂2f0(t, x, u)

∂x∂u=∂2f0(t, x, u)

∂u∂x.

Let X = PWS(t0, t1) denote the space of all real-valued piece-wise smooth functions defined on [t0, t1]. For each PWS functionx : [t0, t1]→ R1, define the functional J : X → R1 by

J(x(·)) =

t1∫t0

f0 (s, x(s), x (s)) ds. (9.20)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Assume that the points (t0, x0) and (t1, x1) are given and definethe set of PWS functions Ψ by

Ψ = x(·) ∈ PWS(t0, t1) : x (t0) = x0, x (t1) = x1 . (9.21)

Here x(·) is the derivative of x (·), i.e. x(·) = ddtx (·). Observe that

J : X → R1 is a real valued function on X.The Simplest Problem in the Calculus of Variations

(SPCV) is the problem of minimizing J(·) on Ψ. In particular,the goal is to find x∗ (·) ∈ Ψ such that

J(x∗(·)) =

t1∫t0

f0 (s, x∗(s), x∗ (s)) ds ≤ J(x(·))

=

t1∫t0

f0 (s, x(s), x (s)) ds,

for all x (·) ∈ Ψ.We introduce some notation and reformulate the SPCV as an

optimal control problem. In particular, let

x(t) = u(t) (9.22)

and observe that ifx(t0) = x0, (9.23)

then

x(t) = x0 +

t∫t0

u(s)ds (9.24)

so that x(·) ∈ PWS(t0, t1) if and only if u(·) ∈ PWC(t0, t1). Wesay that a control u(·) ∈ PWC(t0, t1) steers x0 to x1 at time t1 ifthe solution x(t;u(·)) = x(t) to (9.22) - (9.23) satisfies

x(t1) = x1.

Define the sets

X0 = x0 ⊆ R1 and X1 = x1 ⊆ R1,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and let Ω = R1. The set of admissible controllers is the subset ofPWC(t0, t1) defined by

Θ =

u(·) ∈ PWC(t0, t1) : u(t) ∈ Ω e.f. and

u(·) steers X0 to X1 at time t1.

= Θ(t0, t1, x0, x1,Ω). (9.25)

Given a control u(·) ∈ PWC(t0, t1), the cost functional is definedby

J(u(·)) =

t1∫t0

f0(s, x(s), u(s))ds =

t1∫t0

f0(s, x(s;u(·)), u(s))ds,

(9.26)where x(·) = x(·;u(·)) is the solution to the initial value problem(10.1) - (10.2).

The Simplest Problem in the Calculus of Variations (SPCV)can now be formulated as an equivalent problem in optimal con-trol. In particular, the SPCV is equivalent to the problem of min-imizing J(·) on the set of all admissible controllers Θ. The goal isto find an optimal control u∗(·) ∈ Θ such that u∗(·) steers x0 to x1

at time t1 > t0 and

J(u∗(·)) =

t1∫t0

f0(s, x(s;u∗(·)), u∗(s))ds

≤t1∫t0

f0(s, x(s;u(·)), u(s))ds = J(u(·)) (9.27)

for allu(·) ∈ Θ. (9.28)

If u∗(·) ∈ Θ minimizes J(·) on Θ, then x∗(·) defined by (9.24)solves the SPCV. Conversely, if x∗(·) is a solution to the SPCV,then x∗(·) = u∗(·) is a solution to the optimal control problemdefined by (9.22) - (9.26).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


9.3.2 Free End-Point Problem

We start as in the SPCV, where t0, t1 and x0 are given (fixed)but x1 is not specified (free). In particular, we are given t0, t1 andx0. The cost functional is

J(x(·)) =

t1∫t0

f0 (s, x (s) , x (s)) ds

and we define the set of PWS functions Ψ1 = Ψ1(t0, t1, x0) by

Ψ1 = x(·) ∈ PWS(t0, t1) : x (t0) = x0 .

Observe that the final condition x(t1) is not specified. The freeendpoint problem is to find x∗ (·) ∈ Ψ1 such that

J(x∗(·)) =

t1∫t0

f0 (s, x∗(s), x∗ (s)) ds ≤ J(x(·))

=

t1∫t0

f0 (s, x(s), x (s)) ds

for all x(·) ∈ Ψ1.Again we define the state equation by

x(t) = u(t) (9.29)

with initial conditionx(t0) = x0. (9.30)

Define the sets

X0 = x0 ⊆ R1 and X1 = R1,

and let Ω = R1. The set of admissible controllers is the subset ofPWC(t0, t1) defined by

Θ1 =

u(·) ∈ PWC(t0, t1) : u(t) ∈ Ω e.f. and

u(·) steers X0 to X1 at time t1.

= Θ1(t0, t1, x0,Ω). (9.31)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(u(·)) =

t1∫t0

f0(s, x(s), u(s))ds =

t1∫t0

f0(s, x(s;u(·)), u(s))ds,

(9.32)where x(·) = x(·;u(·)) is the solution to the initial value problem(9.29) - (9.30).

This variational problem is equivalent to the optimal controlproblem of minimizing J(·) on the set of admissible controllers Θ1.In particular, the goal is to find an optimal control u∗(·) ∈ Θ1 suchthat u∗(·) steers x0 to X1 at time t1 and

J(u∗(·)) =

t1∫t0

f0 (s, x∗(s), u∗ (s)) ds ≤ J(u(·))

=

t1∫t0

f0 (s, x(s), u (s)) ds

for all u(·) ∈ Θ1.

9.4 Time Optimal Control

The problem of time optimal control provides a natural departurefrom classical calculus of variational problems in that one placesa “hard” constraint on the control variable. Also, the final timet1 is not fixed and becomes the value of the cost functional. Webegin with the simplest time optimal control problem defined bythe rocket sled control system.

9.4.1 Time Optimal Control for the RocketSled Problem

Recall that the system is defined by

d

dt

[x1(t)x2(t)

]=

[0 10 0

] [x1(t)x2(t)

]+

[01/m

]u(t), (9.33)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


orx(t) = Ax(t) +Bu(t), x(0) = x0 (9.34)


A =

[0 10 0

], and B =

[01/m

],

respectively.In matrix form, using the Variation of Parameters formula we

have

x(t) = eAtx0 + eAtt∫

0

e−AsBu(s)ds. (9.35)

Let

A(t1,x0) =

[x1(t1;u(·))x2(t1;u(·))

]= x(t1;u(·)) : x(0)

= x0 is given, |u(t)| ≤ 1

denote the Attainable Set which is the set of all terminal pointsthat can be “attained” from x0 at time t1 by using a control sat-isfying |u(t)| ≤ 1. If x0 = 0, then we denote A(t,0) by A(t).

One can prove that A(t1,x0) is bounded and convex (see theproblems below). Moreover, the sets are “continuous in time” asdescribed below. In order to make this precise we need to definedistances between sets. Assume B is a non-empty set in R2 andthe vector x ∈ R2 is fixed. The distance between the point x andthe set B is defined by

d(x,B) = inf ‖x− b‖ : b ∈ B . (9.36)

If A and B are two non-empty sets in R2, define

d(A,B) = sup d(a,B) : a ∈ A= sup inf ‖a− b‖ : a ∈ A, b ∈ B .

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


It is easy to show that if A and B are bounded, then d(A,B) <+∞. Moreover, if d(A,B) = 0, then A ⊆ B, but it is not nec-essarily true that A = B. The Hausdorff distance between twobounded sets A and B is defined by

dH(A,B) = max d(A,B), d(B,A) (9.37)

Lemma 9.1 (Basic Properties) Assume that A and B arebounded sets.

1. If dH(A,B) = 0, then A and B have the same closure.

2. If A and B are compact and dH(A,B) = 0, then A = B.

3. If A, B and C are compact sets then the triangle inequality

dH(A,C) ≤ dH(A,B) + dH(B,C)

holds.

For each 0 < t, let A(t) = A(t,0) be the attainable set definedabove. Then the following results hold.

Lemma 9.2 (Properties of the Attainable Set) The attain-able set A(t) is symmetric, bounded, convex and closed.

Lemma 9.3 (Continuity of the Attainable Set) Let 0 < t.If t −→ t, then dH(A(t),A(t))→ 0.

Remark 9.2 Except for showing that the attainable set is closed,proofs of the previous lemmas are straightforward and are given asexercises. In general, A(t) is not closed (see [97] and [164]), but inthis particular rocket car problem the special structure of the con-trol constraint set Ω = [−1,+1] implies that A(t) is closed. Thisis a rather subtle but important issue and the core difficultly arisesbecause we assume the controls belong to the space of piecewisecontinuous functions. There are two approaches to deal with thisissue. One approach is to “enlarge” the class of controllers to thebigger set of (Lebesgue) integrable functions and follow the methodin [100]. Another approach is to “restrict” the control constraint

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


sets to be of a special type (in this case, the interval [−1,+1]). Al-though Lebesgue integration and measure theory are the foundationof modern analysis and this approach provides the most advancedtheory, the mathematical background required to apply this methodlies outside the scope of this book. Thus, we will simply assumethe control constraint set Ω has a structure sufficient to imply thatA(t) is closed (see the problems at the end of this chapter).

Let t∗1 be the first time that A(t,x0) contains the target x1 = 0= [0 0]T . In particular, x1 ∈ A(t∗1,x0) and x /∈ A(t,x0) for any0 < t < t∗1. When the attainable set A(t∗1,x0) first touches x1 = 0,the state 0 lies on the boundary of the convex set A(t∗1,x0). Hence,there is a support plane at x(t∗1;u∗(·)) = x1 = 0 with an outernormal given by

η =

[η1

η2

]as shown in Figure 9.1. Let u(·) be any other control satisfying|u(t)| ≤ 1 with trajectory ending at x(t∗1;u(·)) ∈ A(t∗1,x0). If θ isthe angle between the outer normal and x(t∗1;u(·)), then π

2≤ θ ≤

3π2

and hence one has

〈η,x(t∗1;u(·))〉 ≤ 0 = 〈η,0〉 = 〈η,x(t∗1;u∗(·))〉 ,

or equivalently,⟨[η1

η2

],

[x1(t∗1;u(·))x2(t∗1;u(·))

]⟩≤⟨[

η1

η2

],

[x1(t∗1;u∗(·))x2(t∗1;u∗(·))

]⟩= 0.

Therefore, we have shown that the optimal control u∗(·) max-imizes the inner product⟨[

η1

η2

],

[x1(t∗1;u(·))x2(t∗1;u(·))

]⟩and this maximum is 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


1x

2x

ˆ( ; ( ))t u x

0x

1

00

x

*1( ; ( ))t u x

1

2

ˆˆ

ˆ

η

0ˆ( , )t xA

*1 0( , )t xA

* *1( ; ( ))t u x

Figure 9.1: Attainable Sets at Time t and t∗1


〈η,x(t∗1;u(·))〉 = η1x1(t∗1;u(·)) + η2x2(t∗1;u(·)) ≤ 0,

for each control satisfying |u(t)| ≤ 1. Also, since

〈η,x(t∗1;u∗(·))〉 = 0,

it follows that

〈η,x(t∗1;u∗(·))〉 = max|u(t)|≤1

〈η,x(t∗1;u(·))〉 = 0.

For this simple problem we can solve the system of equationsand use this explicit solution to derive a useful necessary condition.We consider the case for m = 1 and note that for a given u(·),

x1(t;u(·)) = x0 + v0t+

t∫0

s∫0

u(σ)dσ

ds (9.38)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and

x2(t;u(·)) = x(t) = v(t) = v0 +

t∫0

u(σ)dσ. (9.39)

For 0 ≤ t ≤ t∗1 define the two linear functions η∗1(·) and η∗2(·)by

η∗1(t) ≡ η1, (9.40)

η∗2(t) = η1(t∗1 − t) + η2, (9.41)

respectively. Note that η∗1(·) is a constant function and η∗2(·) is alinear function. The vector function

η∗(·) =

[η∗1(·)η∗2(·)

]has the property that

η∗(t∗1) =

[η∗1(t∗1)η∗2(t∗1)

]=

[η1

η2

].

Also,

〈η∗(t∗1), x(t∗1;u(·))〉 = 〈η, x(t∗1;u(·))〉

= η1

x0 + v0t∗1 +

t∗1∫0

s∫0

u(σ)dσds

+ η2

v0 +

t∗1∫0

u(σ)dσ

= η1 [x0 + v0t

∗1] +

η1

t∗1∫0

s∫0

u(σ)dσds

+ η2v0 + η2

t∗1∫0

u(σ)dσ.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Since the control only appears in the integrals, we have that

max|u(t)|≤1

〈η∗(t∗1), x(t∗1;u(·))〉 = η1 [x0 + v0t∗1]

+ max|u(t)|≤1

η1

t∗1∫0

s∫0

u(σ)dσds

+ η2v0 + max

|u(t)|≤1η2

t∗1∫0

u(σ)dσ.

Therefore, to find u∗(·) so that

〈η∗(t∗1), x(t∗1;u∗(·))〉 = max|u(t)|≤1

〈η∗(t∗1), x(t∗1;u(·))〉 ,

we only have to find the control u∗(·) so that

η1

t∗1∫0

s∫0

u∗(σ)dσds+ η2

t∗1∫0

u∗(σ)dσ

= max|u(t)|≤1

η1

t∗1∫0

s∫0

u(σ)dσds+ η2

t∗1∫0

u(σ)dσ

.

Observe that interchanging the order of integration and integrationby parts yields

η1

t∫0

s∫0

u(σ)dσds =

t∫0

η1(t− σ)u(σ)dσ,

so that η1

t∗1∫0

s∫0

u(σ)dσds+ η2

t∗1∫0

u(σ)dσ

=

t∗1∫0

η1(t∗1 − σ)u(σ)dσ + η2

t∗1∫0

u(σ)dσ

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


=

t∗1∫0

η1(t∗1 − σ) + η2u(σ)dσ

=

t∗1∫0

η∗2(σ)u(σ)dσ,

where η∗2(·) is defined by (9.41) above. Hence, the maximizationproblem is equivalent to

max|u(t)|≤1

〈η∗(t∗1), x(t∗1;u(·))〉 = max|u(t)|≤1

t∗1∫0

η∗2(σ)u(σ)dσ.

Since |u(t)| ≤ 1, it follows that u∗(·) will maximize

t∗1∫0

η∗2(σ)u(σ)dσ

if and only ifu∗(t) = sgn η∗2(t) ,

where the sgn function is defined by

sgn z =

+1, z > 0,

0, z = 0,−1, z < 0.

Since η∗2(·) is linear, it follows that

u∗(t) =

+1, η∗2(t) > 0,

0, η∗2(t) = 0,−1, η∗2(t) < 0

and u∗(t) = ±1 can change sign at most one time (when η∗2(t)crosses the axis).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


To finish the problem we need to find a way to compute η∗2(·).Note that the definitions (9.40)-(9.41) imply

η∗1(t) = 0,

and

η∗2(t) =d

dtη1(t∗1 − t) + η2 = −η1 = −η∗1(t).

We can write this as a system of the form[η∗1(t)η∗2(t)

]=

[0 0−1 0

] [η∗1(t)η∗2(t)

], (9.42)

with terminal condition[η∗1(t∗1)η∗2(t∗1)

]=

[η1

η2

]. (9.43)

It is important to observe that the system (9.42) can be writtenas

η∗(t) = −ATη∗(t), (9.44)

where

A =

[0 10 0

]is the matrix defining the state equation. The system defined by(9.44) is called the adjoint system, or the co-state system. We shallreturn to this point later.

Since the optimal control is determined by the sign of the linearfunction η∗2(t) = η1(t∗1 − t) + η2, we know that u∗(t) = 1 whenη∗2(t) > 0 and u∗(t) = −1 when η∗2(t) < 0. This type of control isknown as a bang-bang control, meaning that the optimal controltakes on the extreme values of the constraint set Ω = [−1, 1].Thus, we need only consider controls that take the value of +1 or−1 and this allows us to “synthesize” the optimal controller bythe following method.

Rather than fixing the initial condition and integrating for-

ward in time, we fix the terminal condition at x0 =[

0 0]T

and integrate the equation backwards in time to see what possible

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


states can be reached by controls of this form. Thus, consider theproblem

x1(t) = x2(t),

x2(t) = u(t)

with initial values

x1(0) = 0, x2(0) = 0

and integrate backward in time −∞ < t < 0. When u(t) = −1the solutions are given by

x1(t) =−t2

2< 0,

x2(t) = −t > 0.

Observe thatx2(t) = +

√−2x1(t)

so that the backward trajectory lies on the parabola

Γ− =

[x y]T : y = +√−2x, −∞ < x ≤ 0

.

Likewise, if u(t) = +1, then

x1(t) =t2

2> 0,

x2(t) = t < 0.

In this case we have

x2(t) = −√

2x1(t)

and the backward trajectory lies on the parabola

Γ+ =

[x y]T : y = −√

2x, 0 ≤ x < +∞.

Let Γ = Γ−∪Γ+ denote the union of the curves Γ− and Γ+ denotethe switching curve as shown in Figure 9.2.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


-5 0 5-4

-3

-2

-1

0

1

2

3

4

+

-

Figure 9.2: The Switching Curves

When time is positive and if u(t) = −1, then any initial stateon the curve Γ− will follow the parabola until it reaches x1 = 0 ata time t∗1. If the control is then turned off (i.e. u(t) = 0 for t ≥ t∗1),then

u∗(t) =

−1, 0 ≤ t ≤ t∗1,0, t∗1 < t,

is optimal. See Figure 9.3. Likewise, if the initial state is on thecurve Γ+, then

u∗(t) =

+1, 0 ≤ t ≤ t∗1,0, t∗1 < t

will be optimal. Here, t∗1 is the time it takes for the trajectory toreach x1 = 0. See Figure 9.4.

When the initial data lies above Γ one applies a control ofu∗(t) = −1 until the trajectory intersects the switching curve onΓ+ at some time t∗s. At this time the optimal control is switched tou∗(t) = +1 until the (optimal) time t∗1 when the trajectory reachesx1 = 0. At t = t∗1 the control is set to u∗(t) = 0 for all time t > t∗1.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


−5 −4 −3 −2 −1 0 1 2 3 4 5−4

−3

−2

−1

0

1

2

3

4

Γ−

u(t)=−1

Γ+

u(t)=+1

−∞ < t ≤ 0

Figure 9.3: Switching Curves for Negative Times

Thus, the optimal control is

u∗(t) =

−1, 0 ≤ t ≤ t∗s,+1, t∗s < t ≤ t∗1,0, t∗1 < t.

When the initial data lies below Γ one applies a control of u∗(t) =+1 until the trajectory intersects the switching curve Γ − at sometime t∗s. At this time the optimal control is switched to u∗(t) = −1until the (optimal) time t∗1 when the trajectory reaches x1 = 0. Att = t∗1 the control is set to u∗(t) = 0 for all time t > t∗1. Thus, theoptimal control is

u∗(t) =

+1, 0 ≤ t ≤ t∗s,−1, t∗s < t ≤ t∗1,0, t∗1 < t.

Let W : R −→ R be defined by

x2 = W (x1) =

−√

2x1, x1 ≥ 0,+√−2x1, x1 < 0.

(9.45)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


−5 −4 −3 −2 −1 0 1 2 3 4 5−4

−3

−2

−1

0

1

2

3

4

Γ−0 ≤ t < +∞

u(t)=−1

Γ+

u(t)=+1

Figure 9.4: Switching Curves for Positive Times

Define the function Ψ(x1, x2) : R2 −→ R by

Ψ(x1, x2) =

−1, if x2 > W (x1) or (x1, x2) ∈ Γ −,

0, if x1 = x2 = 0,+1, if x2 < W (x1) or (x1, x2) ∈ Γ +.

(9.46)

It follows that the optimal control is given by the feedback controllaw

u∗(t) = Ψ(x∗1(t), x∗2(t)) = Ψ(x∗(t)), (9.47)

where x∗(t) =

[x∗1(t)x∗2(t)

]is the optimal trajectory.

The previous synthesis can be summarized as follows. The stateequation is given by

x(t) = Ax(t) +Bu(t), x(0) = x0 (9.48)


A =

[0 10 0

]and B =

[01

],

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


−5 −4 −3 −2 −1 0 1 2 3 4 5−4

−3

−2

−1

0

1

2

3

4

Γ−0 ≤ t < +∞

u(t)=−1

Γ+

u(t)=+1

u*(t)=−1

u*(t)=+1

Figure 9.5: Optimal Trajectories for Initial State above Γ


]Tand x(t) =

[x1(t) x2(t)

]T.

The adjoint system (9.42) is defined by

η∗(t) = −ATη∗(t), η∗(t∗1) = η, (9.49)

where η∗(t) =[η∗1(t) η∗2(t)

]Tsatisfies the terminal boundary

condition η∗(t∗1) = η =[η1 η2

]T. The optimal control is given

byu∗(t) = sgn(η∗2(t)) = sgn

(BTη∗(t)

). (9.50)

If we substitute (9.50) into (9.48), then we obtain the two pointboundary value problem defined by the coupled system

x∗(t) = Ax∗(t) +B[sgn(BTη∗(t)

)] (9.51)

η∗(t) = −ATη∗(t)

and boundary conditions

x∗(0) = x0 η∗(t∗1) = η, (9.52)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


−5 −4 −3 −2 −1 0 1 2 3 4 5−4

−3

−2

−1

0

1

2

3

4

Γ−0 ≤ t < +∞

u(t)=−1

u*(t)=+1

u*(t)=−1

u(t)=+1Γ+

Figure 9.6: Optimal Trajectories for Initial State below Γ

where x∗(t) is the optimal trajectory. Consequently, if one cancompute the outer normal η, then the optimal control can be com-puted by solving the two point boundary value problem (9.51)- (9.52) for x∗(t) and η∗(t) and using (9.50) to give u∗(t) =sgn(η∗2(t)) = sgn

(BTη∗(t)

).

9.4.2 The Bushaw Problem

Here we consider the so-called Bushaw Problem [52]. The gen-eral Bushaw type problem is concerned with second order controlsystems of the form

x(t) + 2γx(t) + κx(t) = u(t), (9.53)

where γ ≥ 0 and κ > 0. This model can be thought of as adamped mechanical oscillator. We write this equation as a second

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


order system

d

dt

[x1(t)x2(t)

]=

[0 1−κ −2γ

] [x1(t)x2(t)

]+

[01

]u(t). (9.54)

In particular, we write the system as

x(t) = Ax(t) +Bu(t) (9.55)

with initial conditionx(0) = x0, (9.56)


A =

[0 1−κ −2γ

], and B =

[01

],


]Twhere x1,0 is the initial

displacement and x2,0 is the initial velocity of the oscillator definedby (9.53). Given a control u(t), the solution to (9.55) - (9.56) is

denoted by x(t;u(·)) =[x1(t;u(·)) x1(t;u(·))

]Tand defines the

state trajectory at time t with the initial state x(0;u(·)) = x0 ∈R2. Again, we are interested in time optimal transfer of the initial

state x0 to a fixed terminal state x1 =[

0 0]T ∈ R2. We shall

introduce some notation to make the statement of this problemmore precise and formal.

Recall that a control u(·) is said to steer the state x0 to thestate x1 in time t1 if there is a solution to (9.55) - (9.56) anda finite time t1 > 0 such that x(t1;u(·)) = x1 ∈ R2. LetΩ = [−1, 1] ⊂ R1 be the control constraint set so that the con-dition |u(t)| ≤ 1 can be written as u(t) ∈ Ω = [−1, 1]. Recallthat the optimal controller for the rocket sled problem above was“bang-bang” and hence discontinuous. It turns out that many timeoptimal controllers are bang-bang (see [97] and [100]) and henceto ensure the existence of an optimal control we need to allow forsuch controls to be “admissible”. Consequently, the set of admis-sible controls is a subset Θ of all piecewise continuous functionsdefined on [0,+∞) defined by

Θ = Θ(x0,x1) ,u(·) ∈ PWC[0,+∞) : u(·) steers

x0 to x1, u(t) ∈ Ω e.f. (9.57)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where

PWC[0,+∞) = u(·) ∈ PWC[0, T ] : for all finite T > 0 .(9.58)

Let f0 : R2 × R1 −→ R1 be defined by

f0(x, u) = f0(x1, x2, u) ≡ 1

and define the cost functional J : Θ −→ R1 by

J(u(·)) ,∫ t1

0

f0(x(s;u(s)), u(s))ds. (9.59)

Note that if u(·) ∈ Θ, then u(·) steers x0 to x1 in time t1 andx(s;u(s)) is the corresponding solution of (9.55) - (9.56). Hence,J(·) is well defined and only depends on the control u(·) ∈ Θ. Forthe time optimal control problem considered here

J(u(·)) ,∫ t1

0

f0(x(s;u(s)), u(s))ds =

∫ t1

0

1ds = t1 (9.60)

is the time. If u∗(·) ∈ Θ steers x0 to x1 in time t∗1, then u∗(·) issaid to be a time optimal control if

J(u∗(·)) ≤ J(u(·))

for any control u(·) ∈ Θ. The Time Optimal Control Problemis the problem of minimizing the functional J(·) on theset of admissible controls Θ. In particular, the goal is to findu∗(·) ∈ Θ such that

t∗1 = J(u∗(·)) =

∫ t∗1

0

f0(x(s;u∗(s)), u∗(s))ds ≤ J(u(·))

=

∫ t1

0

f0(x(s;u(s)), u(s))ds = t1.

for all u(·) ∈ Θ.To obtain the optimal control we proceed as in the rocket sled

problem and arrive at the same conclusion that the optimal controlu∗(·) maximizes the inner product

〈η,x(t∗1;u(·))〉

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


so that〈η,x(t∗1;u(·))〉 = max

u(·)∈Θ〈η, x(t∗1;u(·))〉 = 0.

Applying the Variation of Parameters formula

x(t∗1;u(·)) = eAt∗1x0 +

∫ t∗1

0

eA(t∗1−s)Bu(s)ds

we find that

〈η, x(t∗1;u(·))〉 =

⟨η, eAt

∗1x0 +

∫ t∗1

0


⟩=⟨η, eAt

∗1x0

⟩+

⟨η,

∫ t∗1

0


⟩=⟨η, eAt

∗1x0

⟩+

∫ t∗1

0

⟨η, [eA(t∗1−s)]Bu(s)

⟩ds.

However,∫ t∗1

0

⟨η, [eA(t∗1−s)]Bu(s)

⟩ds =

∫ t∗1

0

⟨[eA(t∗1−s)]T η, Bu(s)

⟩ds

=

∫ t∗1

0

⟨[eA

T (t∗1−s)η], Bu(s)⟩ds

and if we define η∗(s) by

η∗(s) = [eAT (t∗1−s)η],

then

〈η, x(t∗1;u(·))〉 =⟨η, eAt

∗1x0

⟩+

∫ t∗1

0

〈η∗(s), Bu(s)〉 ds

=⟨η, eAt

∗1x0

⟩+

∫ t∗1

0

⟨BTη∗(s), u(s)

⟩ds.

In order to maximize

〈η, x(t∗1;u(·))〉 =⟨η, eAt

∗1x0

⟩+

∫ t∗1

0

⟨BTη∗(s), u(s)

⟩ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


over all u(·) ∈ Θ it is clear that one needs only maximize theintegral containing u(·). Hence, the optimal control satisfies∫ t∗1

0

⟨BTη∗(s), u∗(s)

⟩ds = max

u(·)∈Θ

∫ t∗1

0

⟨BTη∗(s), u(s)

⟩ds

and, like in the rocket sled problem, u∗(·) must be

u∗(s) = sgn[BTη∗(s)]. (9.61)

The only issue is how do we compute BTη∗(s)? Differentiating

η∗(s) = [eAT (t∗1−s)η]

it follows that

η∗(s) = −AT [eAT (t∗1−s)η] = −ATη∗(s). (9.62)

Note that

η∗(s) = [eAT (t∗1−s) η] = [e−A

T s][eAT t∗1 η] = [e−A

T s]η,

so that η∗(s) is the solution to the adjoint equation (9.62) withinitial data

η∗(0) =η = [eAT t∗1 η].

Therefore, to compute η∗(·) from the outward normal η we setη = [eA

T t∗1 η] and solve the initial value problem

η∗(s) = −ATη∗(s), η∗(0) = η.

The optimal control is then given by

u∗(s) = sgn[BTη∗(s)] = sgn[BT e−AT sη] (9.63)

as indicated by (9.61). We will return to this problem and providea complete solution for the special case where γ = 0 and κ = 1.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



Problem 9.1 Consider the system

x1(t) = x2(t),

x2(t) = u(t),

and assume |u(t)| ≤ 1. Compute and sketch the attainable set A(t)for t = 1 and t = 2.


x1(t) = u1(t),

x2(t) = u2(t),

x3(t) = −1.

Show that there is a control [u1(t) u2(t)]T that steers [0 0 1]T to[0 0 0]T and satisfies [u1(t)]2 + [u2(t)]2 ≤ 1.


x1(t) = −x2(t) + u1(t),

x2(t) = u2(t),

and assume |u1(t)| ≤ 1 and |u2(t)| ≤ 1. Compute and sketch theattainable set A(t) for t = 2.


x(t) = x(t) + u(t),

and assume u(t) ∈ Ω. Compute and sketch the attainable set A(t)for t = 1 when Ω = [−1, 0], Ω = [−1, 1] and Ω = [0, 1].


x1(t) = u1(t),

x2(t) = u2(t),

and assume |u1(t)| ≤ 1 and |u2(t)| ≤ 1. Show that for t > 0 theattainable set A(t) is a square.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



x1(t) = u1(t),

x2(t) = u2(t),

and assume |u1(t)|2 + |u2(t)|2 ≤ 1. Compute and sketch the attain-able set A(t) for t = 1.


x1(t) = x2(t),

x2(t) = −x1(t) + u(t),

and assume |u(t)| ≤ 1. Show that the attainable set A(t) has “cor-ners” for 0 < t < π and is a circle of radius 2 for t = π.

Problem 9.8 Compute the distances d(A,B) and dH(A,B) be-tween the following sets in R1:(1) A = 1, 2, 3, 4 and B = 2, 3;(2) A = 2, 3 and B = 1, 2, 3, 4;(3) A = [0, 1] and B = [0, 2];(4) A = [0, 2] and B = [0, 1].

Problem 9.9 Let A be the constant matrix

A =

[−1 R0 −2

],

where R ≥ 0. Compute the matrix exponential eAt.

Problem 9.10 Let A be the constant matrix

A =

[0 1−1 0

].

Compute the matrix exponential eAt.


x(t) = Ax(t) +Bu(t),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where A is a n × n constant matrix and B is a n × m constantmatrix. Derive the variation of parameters formula

x(t) = eAtx0 +

∫ t

0

eA(t−s)Bu(s)ds.

Advanced Problems

Problem 9.12 Prove Lemma 9.1.



Problem 9.15 Consider the control system


where A is a n × n constant matrix and B is a n × m constantmatrix. Assume Ω ⊆ Rm is a compact and convex set. Show thatthe attainable set A(t) is convex and bounded. Also, prove that ifΩ is a convex polyhedron, then A(t) is closed. (Hint: See [97] and[121]. )

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 10

Simplest Problem inOptimal Control

As in the previous chapters, let PWC(a, b;Rm) denote the spaceof all Rm-valued piecewise continuous functions defined on [a, b].Also, PWC(a,+∞;Rm) denotes the set of all functions u(·) :[a,+∞) → Rm such that u(·) ∈ PWC(a, T ;Rm) for all a < T <+∞. A function u(·) ∈ PWC(t0,+∞;Rm) is called a control orcontrol function. We start with a statement of the Simplest Prob-lem in Optimal Control (SPOC) and state the basic MaximumPrinciple for this simplest problem.

10.1 SPOC: Problem Formulation

Assume we are given functions of class C2

f : Rn × Rm → Rn, f0 : Rn × Rm → R,

initial set and timeX0 ⊆ Rn, t0 ∈ R,

terminal set (but no final time t1)

X1 ⊆ Rn,

and control constraint set

Ω ⊆ Rm.

341

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

342 Chapter 10. Simplest Problem in Optimal Control

The state equation (or control system or control process) is definedby the system of differential equations

(S) x(t) = f(x(t),u(t)), t0 < t. (10.1)


x(t0) = x0 ∈ X0. (10.2)

Definition 10.1 We say that a control u(·) ∈ PWC(t0,+∞;Rm)steers the state x0 to the state x1 if there is a solution x(·) =x(·;x0;u(·)) to the initial value problem (10.1) - (10.2) satisfying

x(t1) = x1

for some finite time t0 < t1. A control u(·) ∈ PWC(t0,+∞;Rm)steers the set X0 to the set X1 if for some x0 ∈ X0, there is asolution x(·) = x(·;x0;u(·)) to the initial value problem (10.1) -(10.2) satisfying

x(t1) = x1 ∈ X1

for some finite time t0 < t1.

We make the following standing assumption about the initialvalue problem.

Standing Assumption for Optimal Control: Foreach x0 ∈ X0 and u(·) ∈ PWC(t0, t1;Rm), the ini-tial value problem (10.1) - (10.2) has a unique so-lution defined on [t0, t1]. We denote this solution byx(·;x0;u(·)).

Definition 10.2 The set of admissible controllers is the sub-set of PWC(t0,+∞;Rm) defined by

Θ =

u(·) ∈ PWC(t0,+∞;Rm) : u(t) ∈ Ω e.f.


= Θ(t0,X0,X1,Ω). (10.3)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 10. Simplest Problem in Optimal Control 343

Again, the abbreviation e.f. stands for “except at a finite num-ber of points”. Given a control u(·) ∈ PWC(t0,+∞;Rm) and atime t1 > t0, the cost functional at time t1 > t0 is defined by

J(u(·)) =

t1∫t0

f0(x(s),u(s))ds =

t1∫t0

f0(x(s;u(·)),u(s))ds, (10.4)

where x(·) = x(·;u(·)) is the solution to the initial value problem(10.1) - (10.2).

The Simplest Problem in Optimal Control is the problemof minimizing J(·) on the set of all admissible controllers Θ. Inparticular, an optimal control is a function u∗(·) ∈ Θ such thatu∗(·) steers X0 to X1 in time t∗1 > t0 and

J(u∗(·)) =

t∗1∫t0

f0(x(s;u∗(·)),u(s))ds ≤t1∫t0

f0(x(s;u(·)),u(s))ds

= J(u(·)) (10.5)

for all u(·) ∈ Θ.

10.2 The Fundamental Maximum Prin-

ciple

In this section we state the Basic Maximum Principle that ap-plies to the SPOC stated above. We will not prove the basic resulthere. The proof may be found in several standard references. Thebooks The Calculus of Variations and Optimal Control by Leit-mann [120] and Foundations of Optimal Control Theory [119] byLee and Markus both contain a complete proof of the basic result.These references provide the necessary background and alternativeproofs of the theorem.

In order to state the Maximum Principle we first have to intro-duce some notation. Given a vector z = [z1 z2 ... zn]T ∈ Rn and areal number z0, define an augmented vector z = [z0 z1 z2 ... zn]T ∈

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Rn+1 by

z = [z0 z1 z2 ... zn]T =

[z0

z

]∈ Rn+1.

Also, given the smooth functions

f : Rn × Rm → Rn, f0 : Rn × Rm → R,

where f : Rn × Rm → Rn is given by

f(x,u) =

f1(x,u)f2(x,u)f3(x,u)

...fn(x,u)

=

f1(x1, x2, ..., xn, u1, u2,..., um)f2(x1, x2, ..., xn, u1, u2,..., um)f3(x1, x2, ..., xn, u1, u2,..., um)

...fn(x1, x2, ..., xn, u1, u2,..., um)

,it follows that all the partial derivatives

∂fi(x1, x2, ..., xn, u1, u2,..., um)

∂xj

and∂f0(x1, x2, ..., xn, u1, u2,..., um)

∂xj

exist and are continuous. Define the augmented vector field

f : Rn+1 × Rm → Rn+1

by

f(x,u) =


...fn(x,u)

=


...fn(x,u)

=

f0(x1, x2, ..., xn, u1, u2,..., um)f1(x1, x2, ..., xn, u1, u2,..., um)f2(x1, x2, ..., xn, u1, u2,..., um)

...fn(x1, x2, ..., xn, u1, u2,..., um)

. (10.6)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The augmented control system is defined by

d

dtx(t) = f(x(t),u(t)), (10.7)

or equivalently, by the system

d

dtx0(t) = f0(x(t),u(t)) (10.8)

d

dtxi(t) = fi(x(t),u(t)), for i = 1, 2, ..., n. (10.9)

Observe that if

x0(t) =

t∫t0

f0(x(s;u(·)),u(s))ds,

then x0(·) satisfies (10.8) with initial data x0(t0) = 0. In particular,x0(t) represents the cost of transferring the state from an initialx(t0) = x0 ∈ X0 to x(t;u(·)) by the control u(·).

Assume that (x∗(·),u∗(·)) is an optimal pair for the SPOC.In particular, u∗(·) ∈ Θ steers x∗0 ∈ X0 to x∗1 ∈ X1 in time t∗1 andx∗(·) = x(·;u∗(·)) is the corresponding optimal trajectory thatsatisfies the initial value problem (10.1) - (10.2) with x0 = x∗0 ∈X0. Define the (n+ 1)× (n+ 1) matrix A(t) by

A(t) =

[∂fi(x0, x1, x2, ..., xn, u1, u2,..., um)

∂xj

]|(x∗(t),u∗(t))

=

[∂fi(x

∗(t),u∗(t))

∂xj

], (10.10)

so that

A(t) =

∂f0(x∗(t),u∗(t))

∂x0

∂f0(x∗(t),u∗(t))∂x1

∂f0(x∗(t),u∗(t))∂x2

· · · ∂f0(x∗(t),u∗(t))∂xn

∂f1(x∗(t),u∗(t))∂x0

∂f1(x∗(t),u∗(t))∂x1

∂f1(x∗(t),u∗(t))∂x2

· · · ∂f1(x∗(t),u∗(t))∂xn

......

.... . .

...∂fn(x∗(t),u∗(t))

∂x0

∂fn(x∗(t),u∗(t))∂x1

∂fn(x∗(t),u∗(t))∂x2

· · · ∂fn(x∗(t),u∗(t))∂xn

.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Moreover, since fi(x,u) = fi(x,u) = fi(x,u) = fi(x1, x2, ..., xn, u1,u2,..., um) does not depend on x0, then

∂fi(x∗,u∗)

∂x0

= 0

for all i = 0, 1, 2, ..., n and it follows that

A(t) =

[∂fi(x

∗(t),u∗(t))

∂xj

]

=

0 ∂f0(x∗(t),u∗(t))

∂x1

∂f0(x∗(t),u∗(t))∂x2

· · · ∂f0(x∗(t),u∗(t))∂xn

0 ∂f1(x∗(t),u∗(t))∂x1

∂f1(x∗(t),u∗(t))∂x2

· · · ∂f1(x∗(t),u∗(t))∂xn

......

.... . .

...

0 ∂fn(x∗(t),u∗(t))∂x1

∂fn(x∗(t),u∗(t))∂x2

· · · ∂fn(x∗(t),u∗(t))∂xn

.

It is important to note that the first column of A(t) is allzeros. Also, note that for x = [x1 x2 ... xn]T ∈ Rn, the gradient off0(x,u) with respect to x at (x,u) is defined by

∇xf0(x,u) ,

∂f0(x,u)∂x1

∂f0(x,u)∂x2...

∂f0(x,u)∂xn

and the Jacobian matrix of f(x,u) with respect to x =[x1 x2 ... xn]T ∈ Rn at (x,u) is given by

Jxf(x,u) =

∂f1(x,u)∂x1

∂f1(x,u)∂x2

· · · ∂f1(x,u)∂xn

∂f2(x,u)∂x1

∂f2(x,u)∂x2

· · · ∂f2(x,u)∂xn

......

. . ....

∂fn(x,u)∂x1

∂fn(x,u)∂x2

· · · ∂fn(x,u)∂xn

.Consequently,

A(t) =

[∂fi(x

∗(t),u∗(t))

∂xj

]=

[0 [∇xf0(x∗(t),u∗(t))]T

0 [Jxf(x∗(t),u∗(t))]

]

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and the transpose matrix can be written as

[A(t)]T =

[0 0

∇xf0(x∗(t),u∗(t)) [Jxf(x∗(t),u∗(t))]T

]. (10.11)

We now define the adjoint equation by

d

dtη(t) = −[A(t)]T η(t) (10.12)

where η(t) given by

η(t) = [η0(t) η1(t) η2(t) ... ηn(t)]T =

[η0(t)η(t)

]is called the adjoint state (or co-state) variable.

We need to define two more functions. The augmentedHamiltonian is the function H : Rn+1×Rn+1×Rm → R1 definedby

H(η, x, u) = H(η0, η1, η2, ..., ηn, x0, x1, x2, ..., xn, u1, u2,..., um)

(10.13)

= η0f0(x, u) + 〈η, f(x,u)〉

= η0f0(x, u) +n∑i=1

ηifi(x, u).

Hence,

H(η, x,u) = η0f0(x0, x1, x2, ..., xn, u1, u2,..., um)

+n∑i=1

ηifi(x0, x1, x2, ..., xn, u1, u2,..., um).

is a function of 2(n + 1) + m variables. Finally, let M : Rn+1 ×Rn+1 → R be defined by

M(η, x) = maxu∈Ω

H(η, x,u) (10.14)

when M(η, x) exists. We can now state the (Pontryagin) Maxi-mum Principle as the following theorem.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Theorem 10.1 (Maximum Principle) Assume that f : Rn ×Rm → Rn, f0 : Rn × Rm → R1, X0 ⊆ Rn, t0 ∈ R, X1 ⊆ Rn andΩ ⊆ Rm are given as above and consider the control system

(S) x(t) = f(x(t),u(t)), t0 < t, (10.15)

with piecewise continuous controllers u(·) ∈ PWC(t0,+∞;Rm)satisfying u(·) ∈ Ω ⊆ Rm e.f. If

u∗(·) ∈ Θ =

u(·) ∈ PWC(t0,+∞;Rm) : u(t) ∈ Ω e.f.


(10.16)

minimizes

J(u(·)) =

t1∫t0

f0(x(s),u(s))ds (10.17)

on the set of admissible controls Θ with optimal response x∗(·)satisfying x∗(t0) = x∗0 ∈ X0 and x∗(t∗1) = x∗1 ∈ X1 at time t∗1 > t0,then there exists a non-trivial solution

η∗(t) = [η∗0(t) η∗1(t) η∗2(t) ... η∗n(t)]T =

[η∗0(t)η∗(t)

](10.18)

to the augmented adjoint equation

d

dtη(t) = −[A(t)]T η(t), (10.19)

such that

H(η∗(t), x∗(t), u∗(t)) = M(η∗(t), x∗(t))

= maxu∈Ω

H(η∗(t), x∗(t),u). (10.20)

Moreover, there is a constant η∗0 ≤ 0 such that

η∗0(t) ≡ η∗0 ≤ 0 (10.21)

and for all t ∈ [t0, t∗1]

M(η∗(t), x∗(t)) = maxu∈Ω

H(η∗(t), x∗(t),u) ≡ 0. (10.22)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Also, if X0 ⊆ Rn and X1 ⊆ Rn are manifolds with tangent spacesT0 and T1 at x∗(t0) = x∗0 ∈ X0 and x∗(t∗1) = x∗1 ∈ X1, respectively,then

η∗(t) = [η∗0(t) η∗1(t) η∗2(t) ... η∗n(t)]T =[η∗0(t) η∗(t)

]Tcan be selected to satisfy the transversality conditions

η∗(t0) ⊥ T0 (10.23)

andη∗(t∗1) ⊥ T1. (10.24)

Remark 10.1 We have not discussed the precise definitions of“manifolds” and “tangent spaces”. A rigorous presentation of thedefinitions of manifolds and tangent spaces would require a back-ground in differential geometry and topology which is outside the

*0( )t

* *1( )t

*0x

0X

*1x

1X0T

1T

2( ; ( ))t u x

1( ; ( ))t u x

* *( ) ( ; ( )) t t u x x

Figure 10.1: The Maximum Principle

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


scope of this book. However, roughly speaking a manifold of di-mension k is a topological space that near each point “resembles”k-dimensional Euclidean space. Two dimensional manifolds arecalled surfaces. Examples of manifolds are the entire space Rn, k-dimensional subspaces of Rn, hyperplanes and 0-dimensional man-ifolds which are points and may be viewed as a translation of thezero subspace. Also, given p ≤ n and p smooth functions g1(·),g2(·), ... ,gp(·) satisfying gi : Rn −→ R, define the set

Xg = x ∈ Rn : gi(x) = 0, for i = 1, 2, ..., p . (10.25)

If the vectors∇gi(x), i = 1, 2, ..., p

are linearly independent, then Xg is a (smooth) manifold of di-mension k = n − p. Note that this condition is equivalent to thestatement that the matrix

∂g1(x)∂x1

∂g1(x)∂x2

· · · · · · ∂g1(x)∂xn

∂g2(x)∂x1

∂g2(x)∂x2

· · · · · · ∂g2(x)∂xn

......

. . ....

......

. . ....

∂gp(x)

∂x1

∂gp(x)

∂x2· · · · · · ∂gp(x)

∂xn

has maximal rank p. In this case the tangent plane Tg(x) at x ∈ Xg

is given by

Tg(x) = Tg1(x) ∩ Tg2(x) ∩ . . . ∩ Tgp(x)

where Tgi(x) is the tangent plane of the smooth surface Xgi =x ∈ Rn : gi(x) = 0 with normal vector given by ∇gi(x). There-fore, a vector η ∈ Rn is orthogonal to Tg(x) at x ∈ Xg if and onlyif

ηT z =n∑j=1

ηjzz = 0

for z ∈ Rn satisfying

n∑i=1

∂gi(x)

∂xjzj = 0, for all i = 1, 2, ..., p.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


We will use this result when the initial and target sets are definedby (10.25). The special cases where X = Rn and X = x are sim-ple since the tangent space to any subspace is that space and thetangent space to a single point is the zero subspace. Thus, everyvector η is transversal to a single point and only the zero vector istransversal to the whole space Rn.

The references [107], [116], [147] and [184] provide introduc-tions to differential geometry with rigorous definitions.

10.3 Application of the Maximum

Principle to Some Simple Prob-

lems

Here we apply the Maximum Principle to some of the problemsdescribed above. We focus on the Simplest Problem (i.e. t1 notfixed) and then move to other variations of this problem in latersections.

10.3.1 The Bushaw Problem

The Bushaw problem is governed by the control system

d

dt

[x1(t)x2(t)

]=

[0 1−κ −2γ

] [x1(t)x2(t)

]+

[01

]u(t), (10.26)

which has the form

x(t) = Ax(t) +Bu(t) (10.27)


x(0) = x0 ∈ R2, (10.28)


A =

[0 1−κ −2γ

]and B =

[01

],

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



]Tis the initial condition.

Given a control u(·), the solution to (10.27) - (10.28) is denotedby x(·;u(·)). We are interested in time optimal transfer of the

initial state x0 to a fixed terminal state x1 =[

0 0]T ∈ R2 by a

control satisfying the condition |u(t)| ≤ 1. We formulate this timeoptimal control problem as a SPOC described above.Formulation as a Simplest Problem in Optimal Control:In order to set up the optimal control problem as a SPOC andapply the Maximum Principle, we need to:

1. Identify the initial time t0;

2. Identify the initial set X0 ⊆ R2, the terminal set X1 ⊆ R2;

3. Identify the control constraint set Ω ⊆ R1;

4. Identify the functions f : R2 × R1 −→ R1, f0 : R2 × R1 −→R1;

5. Define the augmented Hamiltonian H(η, x, u);

6. Form the augmented adjoint system matrix A(t).

Remark 10.2 We will work through these steps for every examplewe discuss and each time the above list will be repeated. Althoughthis may seem like overkill, it helps the reader focus on the essentialsteps required to formulate and apply the Maximum Principle andits extensions. Moreover, many problems require additional stepsbefore one can formulate the problem as an optimal control problemand explicitly listing these six steps can be helpful in determiningthe proper formulation (see the point-to-curve problem in Section11.2.3).

Recall that the time optimal cost functional can be written as

J(u(·)) =

∫ t1

0

1ds = t1

and the initial time is t0 = 0. Since there are two state equationsand one control, the state space is R2 and the control space is R1

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


so that n = 2 and m = 1. Let Ω = [−1, 1] ⊂ R1 be the controlconstraint set; so that the condition |u(t)| ≤ 1 can be written asu(t) ∈ Ω = [−1, 1]. The initial and terminal sets are single pointsso that

X0 = x0 ⊆ R2 and X1 = 0 ⊆ R2

define these sets. The functions f : R2 × R1 −→ R2 and f0 :R2 × R1 −→ R1 are defined by

f(x, u) = f(x1, x2, u) = Ax+Bu =

[x2

−κx1 − 2γx2 + u

]and

f0(x, u) = f0(x1, x2, u) ≡ 1,

respectively. Hence,

f(x, u) = f(x1, x2, u) =

[f1(x1, x2, u)f2(x1, x2, u)

]=

[x2

−κx1 − 2γx2 + u

]defines the functions f1(x1, x2, u) and f2(x1, x2, u).

Now to set up the Maximum Principle we start by letting

x =[x0 x1 x2

]Tdenote the augmented state so that the aug-

mented function f : R3 × R1 −→ R3 is given by

f(x, u) = f(x0, x1, x2, u) = f(x1, x2, u) =

f0(x1, x2, u)f1(x1, x2, u)f2(x1, x2, u)

=

1x2

−κx1 − 2γx2 + u

.We also need to compute the augmented Hamiltonian H : R3 ×R3 × R1 −→ R1 defined by

H(η, x, u) = η0[f0(x1, x2, u)] + η1[f1(x1, x2, u)] + η2[f2(x1, x2, u)],

where η =[η0 η

]T=[η0 η1 η2

]T. By direct substitution

we have that

H(η, x, u) = η0[1] + η1[x2] + η2[−κx1 − 2γx2 + u]

= η0 + η1x2 + η2[−κx1 − 2γx2] + η2u.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


To set up the adjoint equation we need to compute the matrix

Jx(x, u) = Jx(x, u) =

∂f0(x1,x2,u)

∂x0

∂f0(x1,x2,u)∂x1

∂f0(x1,x2,u)∂x2

∂f1(x1,x2,u)∂x0

∂f1(x1,x2,u)∂x1

∂f1(x1,x2,u)∂x2

∂f2(x1,x2,u)∂x0

∂f2(x1,x2,u)∂x1

∂f2(x1,x2,u)∂x2

,which is easily seen to be

Jx(x, u) =

0 0 00 0 10 −κ −2γ

=

[0 00 A

].

Now assume that (x∗(·), u∗(·)) is an optimal pair so that the ma-trix A(t) can be computed by

A(t) = Jx(x, u)|(x∗(t),u∗()) = Jx(x∗(t), u∗(t)) =

0 0 00 0 10 −κ −2γ

=

[0 00 A

].

Observe that in this special case

A(t) = A =

0 0 00 0 10 −κ −2γ

=

[0 00 A

]is a constant matrix and does not depend on (x∗(·), u∗(·)).

The augmented adjoint equation

d

dtη(t) = −[A(t)]T η(t) (10.29)

has the form

d

dt

η0(t)η1(t)η2(t)

= −

0 0 00 0 −κ0 1 −2γ

η0(t)η1(t)η2(t)

=

0 0 00 0 κ0 −1 2γ

η0(t)η1(t)η2(t)

.Observe that this system is given by

d

dt

η0(t)η1(t)η2(t)

=

0κη2(t)

−η1(t) + 2γη2(t)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and the first equation implies

d

dtη0(t) = 0

which means that all solutions to the augmented adjoint equationshave

η0(t) ≡ η0.

Also for this special case, the non-augmented adjoint state

η(t) =

[η1(t)η2(t)

]satisfies the system

d

dtη(t) = −ATη(t) (10.30)

where the matrix A is the state matrix in (10.27).

Application of The Maximum Principle: We are now ready toapply the Maximum Principle. Assume (x∗(·), u∗(·)) is an optimalpair. Then there exists a non-trivial solution

η∗(t) =

[η∗0(t)η∗(t)

]=

η∗0(t)η∗1(t)η∗2(t)

to the augmented adjoint equation (10.29) such that η∗0(t) ≡ η∗0 ≤0 and

H(η∗(t), x∗(t), u∗(t)) = maxu∈Ω

H(η∗(t), x∗(t), u) ≡ 0.

Since

H(η∗(t), x∗(t), u) = η∗0 + η∗1(t)x∗2(t) + η∗2(t)[−κx∗1(t)− 2γx∗2(t)]

+ η∗2(t)u,

it follows that in order to maximize H(η∗(t), x∗(t), u) on [−1, 1]one needs only to maximize the last term in this expression. Inparticular, the optimal control must maximize the expression

η∗2(t)u

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all t where −1 ≤ u ≤ +1. Clearly, this occurs when

u∗(t) = sgn[η∗2(t)]

and the problem now reduces to the case of computing η∗2(t).Let’s see what the transversality conditions tell us. Since X0 =

x0 ⊆ R2 and X1 = 0 ⊆ R2 and the tangent plane to a pointis the “zero plane”, i.e. T0 = 0 ⊆ R2 and T1 = 0 ⊆ R2, theconditions

η∗(0) ⊥ T0 and η∗(t∗1) ⊥ T1

are satisfied by any vectors η∗(0) and η∗(t∗1). Thus, the transver-sality conditions give us no additional information. However, wedo know that

x∗(0) = x0 and x∗(t∗1) = 0

andd

dtη∗(t) = −ATη∗(t) (10.31)

so that

η∗(t) = e−AT (t−t∗1)η∗(t∗1) = e−A

T teAT t∗1η∗(t∗1) = e−A

T tη∗

whereη∗ , eA

T t∗1η∗(t∗1).

Hence,

η∗2(t) =[

0 1] [ η∗1(t)

η∗2(t)

]= BT e−A

T tη∗

andu∗(t) = sgn[η∗2(t)] = sgn[BT e−A

T tη∗]

where η∗ , eAT t∗1η∗(t∗1) is not known since (at this point) neither

t∗1 nor η∗(t∗1) are known.Observe that the Maximum Principle says that η∗(t∗1) can be

selected so that η∗(t∗1) ⊥ T1. However, any vector satisfies thiscondition. We might try the simple case where η∗(t∗1) = 0 and seewhat happens. In this case, η∗(t) satisfies the linear differentialequation d

dtη∗(t) = −ATη∗(t) with zero value at η∗(t∗1). Therefore,

η∗(t) must be identically zero for all t i.e. η∗(t) ≡ 0. However,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


this would imply that u∗(t) = sgn(η∗2(t)) ≡ 0 and the augmentedHamiltonian has the form

H(η∗(t), x∗(t), u) = η∗0 + η∗1(t)x∗2(t) + η∗2(t)[−κx∗1(t)− 2γx∗2(t)]

+ η∗2(t)u

= η∗0 + 0x∗2(t) + 0[−κx∗1(t)− 2γx∗2(t)] + 0u

≡ η∗0.

On the other hand, the Maximum Principle yields

η∗0 = H(η∗(t), x∗(t), u∗(t)) = maxu∈Ω

H(η∗(t), x∗(t), u) ≡ 0

and the constant η∗0 would be zero also. Thus, η∗(t) ≡ 0 and thiscontradicts the statement (in the Maximum Principle) that η∗(t)is a non-trivial solution to the augmented adjoint equation. Conse-quently, we know that η∗(t∗1) 6= 0, and hence η∗ , eA

T t∗1η∗(t∗1) 6= 0.Therefore, we know that if u∗(t) is a time optimal controller,

then the following conditions hold. There is a non-trivial solutionto the adjoint equation

d

dtη∗(t) = −ATη∗(t)

with initial dataη∗(t∗1) 6= 0

and the optimal control has the form

u∗(t) = sgn(BT e−AT tη∗).

The optimal trajectories x∗(t) satisfy

d

dtx∗(t) = Ax∗(t) +Bsgn(BT e−A

T tη∗),

along with the boundary conditions

x∗(0) = x0, x∗(t∗1) = 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Moreover,

0 = maxu∈Ω

H(η∗(t), x∗(t), u) ≡ H(η∗(t), x∗(t), u∗(t))

= η∗0 + η∗1(t)x∗2(t) + η∗2(t)[−κx∗1(t)− 2γx∗2(t)] + η∗2(t)(sgn[η∗2(t)])

= η∗0 + η∗1(t)x∗2(t) + η∗2(t)[−κx∗1(t)− 2γx∗2(t)] + |η∗2(t)|

holds for all 0 ≤ t ≤ t∗1. In addition, since η∗0 ≤ 0, it follows that

η∗1(t)[x∗2(t)] + η∗2(t)[−κx∗1(t)− 2γx∗2(t)] + |η∗2(t)| ≥ 0

and this is all we can say at this point about this problem.

10.3.2 The Bushaw Problem: Special Case γ =0 and κ = 1

Here we synthesize the optimal controller for a special case ofBushaw’s problem. Since

A =

[0 1−1 0

]it follows that

−AT =

[0 1−1 0

]= A,

and hence

e−AT s = eAs =

[cos(s) sin(s)− sin(s) cos(s)

].

Therefore

BT e−AT sη =

[0 1

] [ cos(s) sin(s)− sin(s) cos(s)

]η

=[− sin(s) cos(s)

] [ η1

η2

]= −η1 sin(s) + η2 cos(s).

Using the standard identity

−η1 sin(s) + η2 cos(s) =√

[η1]2 + [η2]2 sin(s+ φ)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where

φ =

arcsin(η2/√

[η1]2 + [η2]2)

if η1 ≤ 0,

π − arcsin(η2/√

[η1]2 + [η2]2)

if η1 > 0,

it follows from (9.63) that

u∗(s) = sgn[BTη∗(s)

]= sgn [−η1 sin(s) + η2 cos(s)]

= sgn[√

[η1]2 + [η2]2 sin(s+ φ)].

Since√

[η1]2 + [η2]2 > 0, we have that√

[η1]2 + [η2]2 sin(s+φ) and

sin(s + φ) have the same sign. Hence the optimal control is givenby

u∗(t) = sgn[√

[η1]2 + [η2]2 sin(t+ φ)]

= sgn[sin(t+ φ)

].

(10.32)The expression (10.32) provides considerable information aboutthe optimal control. In particular:

• The optimal control u∗(·) is bang-bang just like therocket sled problem.

• Switches (if they occur) occur exactly π secondsapart.

To complete the synthesis we solve the equations for controlsu(·) that are bang-bang with switching time intervals of length π.Thus, we focus on the cases where u(t) = ±1. If u(t) = +1, thenthe state equation has the form

x1(t) = x2(t)

x2(t) = −x1(t) + 1.

Observe thatdx1

dx2

=x2

−x1 + 1

so that(−x1 + 1)dx1 = x2dx2

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and hence integration yields

c+

∫(−x1 + 1)dx1 =

∫x2dx2.

Thus,

c+−1

2[−x1 + 1]2 =

1

2[x2]2

which implies that

2c = [−x1 + 1]2 + [x2]2 > 0,

or equivalently,

[−x1 + 1]2 + [x2]2 = α2 > 0.

Therefore, the trajectories are circles centered at [1 0]T with radiusα. Likewise, if u(t) = −1, then

[−x1 − 1]2 + [x2]2 = α2 > 0

and trajectories are circles centered at [−1 0]T with radius α.Since an optimal control is bang-bang, optimal trajectories con-

sist of portions of circles centered about the two points [1 0]T and[−1 0]T with trajectories moving clockwise. If time is reversed thebackward trajectories move counter clockwise. Moreover, since theswitching times are multiples of π only semi-circles can be usedto construct optimal trajectories. First assume that 0 < φ ≤ π,set u(t) = 1 and integrate backwards from [0 0]T . The solution isgiven by

x1(t) = 1− cos(t) > 0

x2(t) = sin(t) < 0

for −π ≤ t ≤ 0. To be an optimal control, u(t) = sgn[sin(t+ φ)] =1 must switch from +1 to −1 at ts1 = −φ. Thus, u(t) = +1 untilts1 = −φ and then one starts applying u(t) = −1 with initial data

x1(−φ) = 1− cos(−φ) > 0

x2(−φ) = sin(−φ) < 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


−6 −4 −2 0 2 4 6−4

−3

−2

−1

0

1

2

3

4

Γ−u(t)=−1

Γ+

u(t)=+1

−∞ < t ≤ 0

Figure 10.2: Switching Curves for Negative Time

However, now u(t) = −1 and the trajectory moves counter clock-wise along the circle centered about [−1 0]T with radius

α1 =

√[x1(−φ) + 1]2 + [x2(−φ)]2

until ts2 = −(φ + π) when the control takes the value u(t) = 1again. The trajectory will now move counter clockwise along thecircle centered about [1 0]T with radius

α2 =

√[x1(−(φ+ π))− 1]2 + [x2(−φ+ π)]2.

Continuing this procedure and reversing time to positive valueswe arrive at the switching curves as shown in the Figures 10.2 and10.3 below.

Let W = Γ− ∪ Γ+ ∪H− ∪H+ where H− is the curve definedby the positive semicircles to the left of Γ− and H+ is the curvedefined by the negative semicircles to the right of Γ+. The curve

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


−6 −4 −2 0 2 4 6−4

−3

−2

−1

0

1

2

3

4

Γ−

Γ+

u*(t)=+1

u*(t)=−1

0 ≤ t < +∞

Figure 10.3: Switching Curves for Positive Time

W is the switching curve for this problem. Therefore, the optimalcontrol has the feedback law given by

u∗(t) = Ψ(x∗1(t), x∗2(t)) = Ψ(x∗(t)), (10.33)

where

Ψ(x1, x2) =

−1, if (x1, x2) is above W,

or (x1, x2) ∈ H − or (x1, x2) ∈ Γ −,0, if x1 = x2 = 0,

+1, if (x1, x2) is below W,or (x1, x2) ∈ H + or (x1, x2) ∈ Γ +.

(10.34)

10.3.3 A Simple Scalar Optimal Control Prob-lem

Consider the linear control system

x(t) = −x(t) + 1− u(t) (10.35)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


with initial datax(0) = 1 ∈ R1 (10.36)

and targetx(t1) = 0 ∈ R1. (10.37)

Here t0 = 0, x0 = 1 ∈ R1 and x1 = 0 ∈ R1 are given and thecontrols belong to the space PWC(0,+∞;R1). The quadratic costfunctional is defined by

J(x(·), u(·)) =

t1∫0

1

2

1 + [u(s)]2

ds, (10.38)

where x(·) is the solution to the system (10.35) - (10.37). Theoptimal control problem is to find u∗(·) ∈ PWC(0,+∞;R1) sothat

J(x∗(·), u∗(·)) =

t∗1∫0

1

2

1 + [u∗(s)]2

ds ≤

t1∫0

1

2

1 + [u(s)]2

ds

= J(x(·), u(·))

for all u(·) ∈ PWC(0,+∞;R1) that steer x0 = 1 to x1 = 0, wherex∗(s) , x∗(s;u∗(·)) is the optimal trajectory. Observe that this isa Simplest Problem in Optimal Control since the final time t1 isnot fixed.Formulation as a Simplest Problem in Optimal Control:In order to set up the optimal control problem as a SPOC andapply the Maximum Principle, we need to:






ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



It is clear that X0 = 1 and X1 = 0. Also, since there isno bound on the control, it follows that Ω = R1. The functionsf : R1 × R1 −→ R1 and f0 : R1 × R1 −→ R1 are given by

f(x, u) = −x+ 1− u

and

f0(x, u) =1

21 + [u]2,

respectively. The augmented Hamiltonian is thus defined by

H(η, x, u) = η0f0(x, u)+ηf(x, u) = η01

21+[u]2+η(−x+1−u).

Moreover, since the augmented right-hand side is given by

f(x, u) =

[f0(x, u)f(x, u)

]=

[121 + [u]2−x+ 1− u

],

it follows that

A(t) =

[∂f0(x,u)∂x0

∂f0(x,u)∂x

∂f(x,u)∂x0

∂f(x,u)∂x

]|(x,u)=(x∗(t),u∗(t))

=

[0 00 −1

]|(x,u)=(x∗(t),u∗(t))

=

[0 00 −1

].

The augmented adjoint equation is defined by

d

dt

[η0(t)η(t)

]=

d

dtη(t) = −[A(t)]T η(t) =

[0 00 1

] [η0(t)η(t)

],

(10.39)so that

η0(t) = 0

andη(t) = η(t).

Thus, as stated in the Maximum Principle, η0(t) ≡ η0 ≤ 0 is anon-positive constant and hence

η(t) = cet

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for some constant c. If (x∗(·), u∗(·)) is an optimal pair, then it

follows that there is a non-trival solution η∗(t) =

[η∗0(t)η∗(t)

]to the

augmented adjoint equation (10.39) such that u = u∗(t) maximizesthe augmented Hamiltonian

H(η∗(t), x∗(t), u) = η∗01

21 + [u]2+ η∗(t)(−x∗(t) + 1− u).

Since Ω = R1 it follows that

∂

∂uH(η∗(t), x∗(t), u)|u=u∗(t) =

∂

∂uH(η∗(t), x∗(t), u∗(t)) = 0.

However, a direct calculation yields

∂

∂uH(η∗(t), x∗(t), u) = η∗0u− η∗(t)

so that

0 = [η∗0u− η∗(t)]|u=u∗(t) = [η∗0u∗(t)− η∗(t)]. (10.40)

Note that η∗0 can not be zero since if η∗0 = 0, then (10.40) wouldimply that 0 = [η∗0u

∗(t) − η∗(t)] = η∗(t) which contradicts thestatement that η∗(t) is non-trival. Hence we can solve (10.40) forthe optimal control

u∗(t) =η∗(t)

η∗0= −

[η∗(t)

−η∗0

].

If we define

λ∗(t) ,

[η∗(t)

−η∗0

],

thenλ∗(t) = λ∗(t)

andλ∗(t) = Aet,

for some constant A. Thus,

u∗(t) = −λ∗(t) = −Aet,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and since t1 is not given it follows that

maxu∈R1

H(η∗(t), x∗(t), u) = H(η∗(t), x∗(t), u∗(t)) ≡ 0.

At t = 0 we have

H(η∗(o), x∗(0), u∗(0)) = η∗01

21 + [u∗(0)]2

+ η∗(0)(−x∗(0) + 1− u∗(0)) = 0.

Consequently,

−1

21 + [u∗(0)]2+

[η∗(0)

−η∗0

](−x∗(0) + 1− u∗(0)) = 0,

and

−1

21 + [u∗(0)]2+ λ∗(0)(−x∗(0) + 1− u∗(0)) = 0

so that

−1

21 + [−A]2+ A(−1 + 1 + A) = 0.

This equation becomes

−1

21 + A2+ A2 = −1

2+

1

2A2 = 0

orA2 = 1.

There are two possible roots A = 1 and A = −1. In any case,the optimal state is given by the Variation of Parameters Formula

x∗(t) = e−t +

∫ t

0

e−(t−s)[1− u∗(s)]ds

= e−t +

∫ t

0

e−(t−s)[1 + Aes]ds

= e−t +

∫ t

0

e−(t−s)[1]ds+

∫ t

0

e−(t−s)[Aes]ds

= e−t +[e−(t−s)] |s=ts=0 + A

∫ t

0

e−t+2sds

= e−t +[1− e−t

]+A

2

[e−t+2s

]|s=ts=0

= e−t +[1− e−t

]+A

2

[et − e−t

].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Therefore,

x∗(t) = 1 +1

2Aet − 1

2Ae−t

and using the end condition

x∗(t1) = 0

it follows that

1 = −A[1

2et1 − 1

2e−t1 ] = −A sinh(t1).

Thus, at t1 = t∗1

sinh(t∗1) =1

−A,

where from above A = ±1. Clearly, A = −1 and the optimal finaltime is

t∗1 = sinh−1(1).

Completing the problem, we have that the optimal control is givenby

u∗(t) = −Aet = et, 0 ≤ t ≤ t∗1 = sinh−1(1)

and the optimal trajectory is

x∗(t) = 1− 1

2et +

1

2e−t.



x(t) = u(t),

with initial conditionx(0) = 1,

and terminal conditionx(t1) = 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Let the control constraint set be Ω = [−1, 0]. Investigate the opti-mal control problem for the cost functional

J(u(·)) = t1 =

t1∫0

1ds.


x1(t) = x2(t),

x2(t) = u(t),


x1(0) = 1, x2(0) = 1

and terminal condition

x1(t1) = 0, x2(t1) = 0.

Let the control constraint set be Ω = [−1,+1]. Investigate the op-timal control problem for the cost functional

J(u(·)) = t1 =

t1∫0

1ds.


x1(t) = −x1(t) + 10x2(t) + u(t),

x2(t) = −x2(t) + u(t),


x1(0) = 1, x2(0) = 0,


x1(t1) = 0, x2(t1) = 1.


J(u(·)) = t1 =

t1∫0

1ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



x1(t) = u1(t) + u2(t),

x2(t) = u1(t)− u2(t),


x1(0) = 1, x2(0) = 1,


x1(t1) = 0, x2(t1) = 0.

Let the controls be constrained by |u1(t)| ≤ 1 and |u2(t)| ≤ 1.Investigate the optimal control problem for the cost functional

J(u(·)) = t1 =

t1∫0

1ds.


x(t) = u(t),



x(t1) = ϕ(t1),

where ϕ(t) = t − 5. Let the control constraint set be Ω = R1.Investigate the optimal control problem for the cost functional

J(u(·)) =

t1∫0

√1 + [u(s)]2

x(s)

ds.


x1(t) = −x1(t) + x2(t),

x2(t) = u(t),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



x1(0) = 0, x2(0) = 0,


[x(t1)]2 + [x(t1)]2 = ϕ(t1),

where ϕ(t) = t2 + 1. Let the control constraint set be Ω = R1.Investigate the optimal control problem for the cost functional

J(u(·)) =

t1∫0

[u(s)]2ds.


x(t) = −x(t) + u(t),


and terminal conditionx(t1) = 0.

Let the control constraint set be Ω = R1. Investigate the optimalcontrol problem for the cost functional

J(u(·)) =

t1∫0

1 + [x(t)]2 + [u(t)]2

ds.


x1(t) = x2(t),

x2(t) = u(t),


x1(0) = 1, x2(0) = 1

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



x1(t1) = ϕ(t1), x2(t1) = 0,

where ϕ(t) = −t2. Let the control constraint set be Ω = R1. Inves-tigate the optimal control problem for the cost functional

J(u(·)) =

t1∫0

[u(s)]2ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 11

Extensions of theMaximum Principle

We now modify the SPOC by changing the problem data. In thischapter we treat the cases where the final time t1 is fixed andconsider the case where the system is nonautonomous. In addition,we discuss two other formulations of the optimal control problemand show that all the formulations are equivalent.

11.1 A Fixed-Time Optimal Control

Problem

We start with a statement of an optimal control problem definedon a given fixed time interval [t0, t1] with t0 < t1 and t1 is fixedand given. Let PWC(t0, t1;Rm) denote the space of all Rm-valuedpiecewise continuous functions defined on [t0, t1]. A function u(·) ∈PWC(t0, t1;Rm) is called a control or control function.

Therefore, we are given [t0, t1], smooth functions

f : R1 × Rn × Rm → Rn, f0 : R1 × Rn × Rm → R,

an initial setX0 ⊆ Rn,

a terminal setX1 ⊆ Rn,

373

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

374 Chapter 11. Extensions of the Maximum Principle

and control constraint set

Ω ⊆ Rm.

As for the SPOC, the state equation is defined by the system ofdifferential equations

(S) x(t) = f(t,x(t),u(t)), t0 < t < t1. (11.1)


x(t0) = x0 ∈ X0. (11.2)

Again, we make the following standing assumption about the ini-tial value problem.

Standing Assumption for Optimal Control: Foreach x0 ∈ X0 and u(·) ∈ PWC(t0, t1;Rm), the ini-tial value problem (11.1) - (11.2) has a unique so-lution defined on [t0, t1]. We denote this solution byx(·;x0;u(·)).

The set of admissible controllers is the subset of PWC(t0, t1;Rm)defined by

Θ =

u(·) ∈ PWC(t0, t1;Rm) : u(t) ∈ Ω e.f.

and u(·) steers X0 to X1 at time t1.

= Θ(t0, t1,X0,X1,Ω). (11.3)

Given a control u(·) ∈ PWC(t0, t1;Rm), the cost functional isdefined by

J(u(·)) =

t1∫t0

f0(s,x(s),u(s))ds =

t1∫t0

f0(s,x(s;x0;u(·)),u(s))ds,

(11.4)where x(·) = x(·;x0;u(·)) is the solution to the initial value prob-lem (11.1) - (11.2).

Consider the optimal control problem of minimizing J(·) onthe set of all admissible controllers Θ. In particular, an optimal

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 11. Extensions of the Maximum Principle 375

control is a function u∗(·) ∈ Θ such that u∗(·) steers X0 to X1 attime t1 > t0 and

J(u∗(·)) =

t1∫t0

f0(s,x(s;x∗0;u∗(·)),u(s))ds

(11.5)

≤t1∫t0

f0(s,x(s;x0;u(·)),u(s))ds = J(u(·))

for all u(·) ∈ Θ.

Remark 11.1 As before, if u∗(·) ∈ Θ is an optimal control thatsteers x∗0 ∈ X0 to x∗1 ∈ X1 at time t1 > t0, then the correspond-ing optimal trajectory x(·;x∗0;u∗(·)) will be denoted by x∗(·). Inparticular, x∗(·) satisfies the two point boundary value problem

(S∗) x∗(t) = f(t,x∗(t),u∗(t)), t0 < t < t1,(BCs∗) x∗(t0) = x∗0 ∈ X0, x∗(t1) = x∗1 ∈ X1.

(11.6)

This notation is consistent since we have assumed that the initialvalue problem (11.1) - (11.2) has a unique solution.

Remark 11.2 For the Simplest Problem in Optimal Control thefinal time t1 > t0 is free and finding the optimal time t∗1 > t0 ispart of the problem. The time optimal control problem falls intothis category. In the fixed-time optimal control problem the finaltime t1 > t0 is specified. Later we shall consider optimal controlproblems defined on the interval 0 ≤ t < +∞ which is a “special”fixed time problem.

11.1.1 The Maximum Principle for Fixed t1

When the final time t1 is given at a fixed value t0 < t1, the Max-imum Principle is the same except that the condition (10.22) isreplaced by


H(η∗(t), x∗(t),u) ≡ c, (11.7)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where the constant c is not necessarily 0. In particular, the follow-ing theorem is valid in this case.

Theorem 11.1 (Maximum Principle for t1 Fixed) Assumethat f : Rn × Rm → Rn, f0 : Rn × Rm → R, X0 ⊆ Rn, t0 ∈ R,t1 ∈ R, X1 ⊆ Rn and Ω ⊆ Rm are given as above and consider thecontrol system

(S) x(t) = f(x(t),u(t)), t0 < t ≤ t1, (11.8)

with piecewise continuous controllers u(·) ∈ PWC(t0, t1;Rm) sat-isfying u(t) ∈ Ω ⊆ Rm e.f. If

u∗(·) ∈ Θ =

u(·) ∈ PWC(t0, t1;Rm) : u(t) ∈ Ω e.f.

and u(·) steers X0 to X1 at time t1.

(11.9)

minimizes

J(u(·)) =

t1∫t0

f0(x(s),u(s))ds (11.10)

on the set of admissible controls Θ with optimal response x∗(·)satisfying x∗(t0) = x∗0 ∈ X0 and x∗(t1) = x∗1 ∈ X1, then thereexists a non-trivial solution

η∗(t) = [η∗0(t) η∗1(t) η∗2(t) ... η∗n(t)]T =

[η∗0(t)η∗(t)

](11.11)


d

dtη(t) = −[A(t)]T η(t), (11.12)

such that

H(η∗(t), x∗(t), u∗(t)) = M(η∗(t), x∗(t)) = maxu∈Ω

H(η∗(t), x∗(t),u).

(11.13)Moreover, there are constants η∗0 ≤ 0 and c such that

η∗0(t) ≡ η∗0 ≤ 0 (11.14)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and for all t ∈ [t0, t1],


H(η∗(t), x∗(t),u) ≡ c. (11.15)

Also, if X0 ⊆ Rn and X1 ⊆ Rn are manifolds with tangent spacesT0 and T1 atx∗(t0) = x∗0 ∈ X0 and x∗(t1) = x∗1 ∈ X1, respectively, then

η∗(t) = [η∗0(t) η∗1(t) η∗2(t) ... η∗n(t)]T =[η∗0(t) η∗(t)

]Tcan be selected to satisfy the transversality conditions

η∗(t0) ⊥ T0 (11.16)

andη∗(t1) ⊥ T1. (11.17)

11.2 Application to Problems in the


We start with some classical problems in the calculus of variations.First we assume the cost function does not explicitly depend ontime. In particular, the cost functional is given by

J(x(·)) =

t1∫t0

f0 (x(s), x (s)) ds

where f0(x, u) is independent of time. We will address the timedependent case later.

11.2.1 The Simplest Problem in the Calculusof Variations

Assume t0, t1, x0, and x1 are given and the cost functional is ofthe form

J(x(·)) =

t1∫t0

f0 (x(s), x (s)) ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Let Ψ1 = Ψ(t0, t1, x0, x1) be the set of PWS functions defined by

Ψ = x(·) ∈ PWS(t0,+∞) : x (t0) = x0, x (t1) = x1

and consider the SPCV where the objective is to find x∗ (·) ∈ Ψsuch that

J(x∗(·)) =

t1∫t0

f0 (x∗(s), x∗ (s)) ds ≤ J(x(·))

=

t1∫t0

f0 (x(s), x (s)) ds = J(x(·))

for all x(·) ∈ Ψ.In order to formulate an equivalent Simplest Problem in Opti-

mal Control (with fixed t1), we define the state equation by

x(t) = u(t) (11.18)

with initial conditionx(t0) = x0. (11.19)

Define the sets

X0 = x0 ⊆ R1 and X1 = x1 ⊆ R1,

and let the control constraint set be Ω = R1. The set of admissiblecontrollers is the subset of PWC(t0, t1) defined by

Θ =

u(·) ∈ PWC(t0, t1) : u(t) ∈ Ω e.f. andu(·) steers X0 to X1 at time t1.

= Θ(t0, t1, x0, x1,Ω). (11.20)


J(u(·)) =

t1∫t0

f0(x(s), u(s))ds, (11.21)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where x(·) = x(·;u(·)) is the solution to the initial value problem(11.18) - (11.19). This variational problem is equivalent to theoptimal control problem of minimizing J(·) on the set of admissiblecontrollers Θ. In particular, the goal is to find an optimal controlu∗(·) ∈ Θ such that u∗(·) steers X0 to X1 at time t1 > t0 and

J(u∗(·)) =

t1∫t0

f0 (x∗(s), u∗ (s)) ds ≤t1∫t0

f0 (x(s), u (s)) ds

for all u(·) ∈ Θ.Formulation as an Optimal Control Problem: In order to setup the Simplest Problem in the Calculus of Variations as a fixedtime optimal control problem and apply the Maximum Principle,we need to:

1. Identify the initial time t0 and final time t1;






Clearly, X0 = x0 ⊆ R1, X1 = x1 ⊆ R1 and Ω = R1.Moreover,

f(x, u) = u

andf0(x, u) = f0(x, u).

Therefore, the augmented function f : R2 × R1 −→ R2 is definedby

f(x, u) = f(x0, x, u) =

[f0(x, u)

u

],

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where x = [x0 x]T and the augmented Hamiltonian H(η, x, u) :R2 × R2 × R1 −→ R1 is given by

H(η, x, u) = η0f0(x, u) + ηf(x, u) = η0f0(x, u) + ηu

where η = [η0 η]T . The augmented Jacobian is given by

Jxf(x, u) = Jxf(x, u) =

[0 ∂f0(x,u)

∂x

0 0

]so that if (x∗(·), u∗(·)) = (x∗(·), x∗(·)) is an optimal pair, then theaugmented matrix A(t) is given by

A(t) = Jxf(x, u)|(x∗(t),u∗(t)) = Jxf(x∗(t), u∗(t)) =

[0 ∂f0(x∗(t),u∗(t))

∂x

0 0

].

Therefore,

−[A(t)]T = −[

0 ∂f0(x∗(t),u∗(t))∂x

0 0

]T=

[0 0

−∂f0(x∗(t),u∗(t))∂x

0

]and the augmented adjoint equation becomes

d

dt

[η0(t)η(t)

]= −[A(t)]T

[η0(t)η(t)

]=

[0 0

−∂f0(x∗(t),u∗(t))∂x

0

][η0(t)η(t)

].

(11.22)Consequently,

d

dtη0(t) = 0

d

dtη(t) = −

[∂f0(x∗(t), u∗(t))

∂x

]η0

which implies (as always) that the zero adjoint state is a constantη0(t) ≡ η0.

If (x∗(·), u∗(·)) = (x∗(·), x∗(·)) is an optimal pair for theSPCV, then the Maximum Principle implies that there is a non-trivial solution

η∗(t) =

[η∗0(t)η∗(t)

]∈ R2

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


to the augmented adjoint equation (11.22) such that η∗0(t) ≡ η∗0 ≤0. Observe that

d

dtη∗(t) = −

[∂f0(x∗(t), u∗(t))

∂x

]η∗0 (11.23)

so that

η∗(t) = η∗(t0)− η∗0∫ t

t0

[∂f0(x∗(s), u∗(s))

∂x

]ds. (11.24)

The Maximum Principle implies that

H(η∗(t), x∗(t), u∗(t)) = maxu∈R1

H(η∗(t), x∗(t), u)

= maxu∈R1

[η∗0f0(x∗(t), u) + η∗(t)u] ≡ c (11.25)

and since Ω = R1, the derivative of H(η∗(t), x∗(t), u) with respectto u at u = u∗(t) must be zero. In particular,

DuH(η∗(t), x∗(t), u)|u=u∗(t) = DuH(η∗(t), x∗(t), u∗(t)) = 0,

where

DuH(η∗(t), x∗(t), u) =∂

∂u[η∗0f0(x∗(t), u) + η∗(t)u]

= η∗0

[∂f0(x∗(s), u)

∂u

]+ η∗(t).

Thus, when u = u∗(t)

DuH(η∗(t), x∗(t), u)|u=u∗(t) = η∗0

[∂f0(x∗(t), u∗(t))

∂u

]+ η∗(t) = 0

(11.26)and we have

η∗(t) = −η∗0[∂f0(x∗(t), u∗(t))

∂u

]. (11.27)

Observe that (11.27) implies that η∗0 6= 0 because if η∗0 = 0, thenwe would also have η∗(t) = 0 which contradicts the statement thatη∗(t) is nontrivial. Combining (11.24) and (11.27) we find that

−η∗0[∂f0(x∗(s), u∗(t))

∂u

]= η∗(t0)− η∗0

∫ t

t0

[∂f0(x∗(s), u∗(s))

∂x

]ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and since η∗0 < 0, we divide both sides of this equation by −η∗0 toobtain[

∂f0(x∗(s), u∗(t))

∂u

]= c+

∫ t

t0

[∂f0(x∗(s), u∗(s))

∂x

]ds (11.28)

where the constant c is given by

c =η∗(t0)

−η∗0.

Note that (11.28) is the Euler Integral Equation and hence wehave shown that the optimal pair (x∗(·), u∗(·)) with x∗(t) = u∗(t)satisfies the Euler Integral Equation[∂f0(x∗(s), x∗(t))

∂u

]= c+

∫ t

t0

[∂f0(x∗(s), x∗(s))

∂x

]ds, t0 < t < t1.

(11.29)Also, between corners of x∗(·),

d

dt

[∂f0(x∗(s), x∗(t))

∂u

]=

[∂f0(x∗(s), x∗(s))

∂x

], t0 < t < t1

(11.30)and

x∗(t0) = x0 and x∗(t1) = x1

which is the Euler Differential Equation for the SPCV. Conse-quently, we have used the Maximum Principle to derive the clas-sical Euler Necessary Condition.

The Weierstrass Necessary Condition

One can also use the Maximum Principle to derive the WeierstrassNecessary Condition. Observe that for the SPCV

H(η, x, u) = [η0f0(x, u) + ηu] (11.31)

and the Maximum Principle (11.25) implies that

H(η∗(t), x∗(t), u∗(t)) = maxu∈R1

H(η∗(t), x∗(t), u) ≥ H(η∗(t), x∗(t), u)

(11.32)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all u ∈ R1. Using the form of the augmented HamiltonianH(η, x, u) : R2 × R2 × R1 −→ R1 given by

H(η, x, u) = η0f0(x, u) + ηf(x, u) = η0f0(x, u) + ηu,

it follows that

[η∗0f0(x∗(t), u∗(t))+η∗(t)u∗(t)] ≥ [η∗0f0(x∗(t), u)+η∗(t)u]. (11.33)

Consequently, dividing by η∗0 < 0 yields

f0(x∗(t), u∗(t)) +η∗(t)

η∗0u∗(t) ≤ f0(x∗(t), u) +

η∗(t)

η∗0u.

However, from equation (11.27) above, it follows that

η∗(t)

η∗0= −∂f0(x∗(t), u∗(t))

∂u.

Therefore, substituting this into the previous inequality yields

f0(x∗(t), u∗(t))− ∂f0(x∗(t), u∗(t))

∂uu∗(t) ≤f0(x∗(t), u)

− ∂f0(x∗(t), u∗(t))

∂uu

which implies

0 ≤ f0(x∗(t), u)− f0(x∗(t), u∗(t))− [u− u∗(t)]∂f0(x∗(t), u∗(t))

∂u

for all u ∈ R1.Recalling the definition of the Weierstrass Excess Function we

have

E(t, x∗(t), u∗(t), u) = f0(x∗(t), u)− f0(x∗(t), u∗(t))

− [u− u∗(t)]∂f0(x∗(t), u∗(t))

∂u

and hence it follows that

0 ≤ E(t, x∗(t), u∗(t), u)

for all u ∈ R1. This is the Weierstrass Necessary Condition - (II)for the SPCV.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


11.2.2 Free End-Point Problems

Formulation as an Optimal Control Problem: In order toset up the End-Point Problem as an optimal control problem andapply the Maximum Principle, we need to:







This problem is the same as the previous problem except thatx1 is not given. Therefore, X0 = x0 ⊆ R1 and X1 = R1 and thetangent spaces are given by

T0 = 0 and T1 = R1,

respectively. Therefore, the transversality condition at t1 impliesthat

η∗(t1) = 0. (11.34)

All the analysis above for the SPCV holds. In particular,

d

dtη∗(t) = −

[∂f0(x∗(t), u∗(t))

∂x

]η∗0

and

DuH(η∗(t), x∗(t), u)|u=u∗(t) = η∗0

[∂f0(x∗(t), u∗(t))

∂u

]+ η∗(t) = 0,

which implies

η∗(t) = −η∗0[∂f0(x∗(t), u∗(t))

∂u

]. (11.35)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Also, (11.35) implies that η∗0 6= 0.Therefore, we have shown the optimal pair (x∗(·), u∗(·)) with

x∗(t) = u∗(t) satisfies Euler’s equation in integral form[∂f0(x∗(s), x∗(t))

∂u

]= c+

∫ t

t0

[∂f0(x∗(s), x∗(s))

∂x

]ds, t0 < t < t1,

with initial conditionx∗(t0) = x0.

In order to obtain the “natural boundary condition”, we apply thetransversality condition (11.34) to (11.35) and obtain

0 = η∗(t1) = −η∗0[∂f0(x∗(t1), u∗(t1))

∂u

].

Again, since η∗0 6= 0, it follows that the natural boundary conditionis given by

∂f0(x∗(t1), x∗(t1))

∂u=∂f0(x∗(t1), u∗(t1))

∂u= 0

which matches the classical results.

11.2.3 Point-to-Curve Problems

Recall that in this problem we are given t0, x0, and a “curve”defined by the graph of a smooth function ϕ : R1 −→ R1. Thepoint-to-curve problem is to find x∗(·) ∈ PWS(t0,+∞) that min-imizes

J(x(·)) =

t1∫t0

f0(s, x(s), x(s))ds

on the set

Ψϕ = x(·) ∈ PWS(t0,+∞) :x (t0) = x0,

x (t1) = ϕ(t1) for a finite time t1 > t0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


In particular, one must find t∗1 and x∗(·) ∈ PWS(t0, t∗1) so that

J(x∗(·)) =

t∗1∫t0

f0 (s, x∗(s), x∗ (s)) ds

≤t1∫t0

f0 (s, x(s), x (s)) ds = J(x(·))

for all x∗(·) ∈ Ψϕ.

Remark 11.3 In order to properly formulate this problem we notethat the cost functional integrand f0 (s, x, u) is time dependent anddoes not allow a direct application of the previous Maximum Prin-ciples. In order to overcome this issue, again use the idea of an“augmented state” to reformulate the problem as a time indepen-dent optimal control problem. This basic idea is also used to extendthe Maximum Principle for the SPOC to the case where the in-tegrands and differential equations defining the state equations aretime dependent (see Section 11.7).

The basic idea is to treat time t as a “state” and “augment”the system with this new state. In particular, we define

x1(t) = t

andx2(t) = x(t).

Observe that

x1(t) = 1 and x1(t0) = t0 ∈ R1,

and

x2(t) = x(t) = u(t) and x2(t0) = x0 ∈ R1, x2(t1) = x1 ∈ R1.

Therefore, the control problem is governed by the two dimensionalcontrol system

d

dt

[x1(t)x2(t)

]=

[1u(t)

],

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


with initial condition [x1(t0)x2(t0)

]=

[t0x0

]and terminal condition[

x1(t1)x2(t1)

]=

[t1

ϕ(t1)

],

where t1 is not specified. Note that because the integrand f0(·)is a time dependent function f0(t, x, u), the t variable becomesan additional “state” so the control problem is a 2-dimensionalproblem.Formulation as a Simplest Problem in Optimal Control:In order to set up the Point-to-Curve Problem as an SPOC andapply the Maximum Principle, we need to:







Define the sets X0 ⊆ R2 and X0 ⊆ R2 by

X0 =

[t0x0

]⊆ R2

and

X1 =

[x1

x2

]=

[t1x2

]∈ R2 : x2 = ϕ(x1) = ϕ(t1)

⊆ R2,

respectively.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Observe that t1, and hence x1(t1) = t1 is not specified (otherthan t0 ≤ t1). Moreover, if we let G : R2 → R1 be defined byG(x1, x2) = ϕ(x1)− x2, then

X1 =

[x1

x2

]∈ R2 : G(x1, x2) = ϕ(x1)− x2 = 0

.

If

[x∗1x∗2

]∈ X1, then the tangent plane T1 at

[x∗1x∗2

]is defined by

the gradient of G(x1, x2) at

[x∗1x∗2

]. In particular,

T1 =

x =

[x1

x2

]: x ⊥ ∇G(x∗1, x

∗2)

and hence a non-zero vector

[η1

η2

]is orthogonal to T1 if and only

if [η1

η2

]= k∇G(x∗1, x

∗2) = k

[ϕ(x∗1)−1

](11.36)

for some constant k 6= 0. We use this transversality conditionseveral places below. Finally, note that Ω = R1 since there are nobounds on the control u(·). Observe that f : R2 × R1 −→ R2 isdefined by

f(x1, x2, u) =

[f1(x1, x1, u)f2(x1, x1, u)

]=

[1u

],

and f0 : R2 × R1 −→ R1 is given as the integrand f0(x1, x2, u).Hence, the augmented function f : R3 × R1 −→ R3 is defined

by

f(x, u) = f(x0,x, u) =

f0(x1, x2, u)f1(x1, x2, u)f2(x1, x2, u)

=

f0(x1, x2, u)1u

and the augmented Hamiltonian H(η, x, u) : R3×R3×R1 −→ R1

is given by

H(η, x, u) = η0f0(x1, x2, u) + η1f1(x1, x2, u) + η2f2(x1, x2, u).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


In particular, the augmented Hamiltonian has the form

H(η, x, u) = η0f0(x1, x2, u) + η1 + η2u,



∂f0(x1,x2,u)∂x0

∂f0(x1,x2,u)∂x1

∂f0(x1,x2,u)∂x2

0 0 00 0 0

so that if (x∗(·), u∗(·)) = (x∗(·), x∗(·)) is an optimal pair, then theaugmented matrix A(t) is given by

A(t) = Jxf(x, u)|(x∗(t),u∗(t)) =

0∂f0(x∗1(t),x∗2(t),u∗(t))

∂x1

∂f0(x∗1(t),x∗2(t),u∗(t))

∂x2

0 0 00 0 0

.Therefore,

−[A(t)]T =

0 0 0

−∂f0(x∗1(t),x∗2(t),u∗(t))

∂x10 0

−∂f0(x∗1(t),x∗2(t),u∗(t))

∂x20 0

and the augmented adjoint equation becomes

d

dt

η0(t)η1(t)η2(t)

= −[A(t)]T

η0(t)η1(t)η2(t)

. (11.37)

Consequently,

d

dtη0(t) = 0

d

dtη1(t) = −

[∂f0(x∗1(t), x∗2(t), u∗(t))

∂x1

]η0

d

dtη2(t) = −

[∂f0(x∗1(t), x∗2(t), u∗(t))

∂x2

]η0


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


If (x∗(·), u∗(·)) = (x∗(·), x∗(·)) is an optimal pair for theSPCV, then the Maximum Principle implies that there is a non-trivial solution

η∗(t) =

η∗0(t)η∗1(t)η∗2(t)

∈ R3

to the augmented adjoint equation (11.37) such that η∗0(t) ≡ η∗0 ≤0. Observe that

d

dtη∗2(t) = −

[∂f0(x∗1(t), x∗2(t), u∗(t))

∂x2

]η0 (11.38)

so that

η∗2(t) = η∗2(t0)− η∗0∫ t

t0

[∂f0(x∗1(s), x∗2(s), u∗(s))

∂x2

]ds. (11.39)

Since t1 is free, the Maximum Principle implies that

H(η∗(t), x∗(t), u∗(t)) = maxu∈R1

H(η∗(t), x∗(t), u) = 0

and for this problem we have

maxu∈R1

H(η∗(t), x∗(t), u) = maxu∈R1

[η∗0f0(x∗1(t), x∗2(t), u) + η∗1(t) + η∗2(t)u]

≡ 0. (11.40)

Since Ω = R1, the derivative of H(η∗(t), x∗(t), u) with respect tou at u = u∗(t) must be zero. In particular, since t1 is free we have


where

DuH(η∗(t), x∗(t), u) =∂

∂u[η∗0f0(x∗1(t), x∗2(t), u) + η∗1(t) + η∗2(t)u]

= η∗0

[∂f0(x∗1(t), x∗2(t), u)

∂u

]+ η∗2(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Thus, when u = u∗(t)

DuH(η∗(t), x∗(t), u)|u=u∗(t) = η∗0

[∂f0(x∗1(t), x∗2(t), u∗(t))

∂u

]+ η∗2(t)

= 0 (11.41)

and we have

η∗2(t) = −η∗0[∂f0(x∗1(t), x∗2(t), u∗(t))

∂u

]. (11.42)

Observe that (11.42) and (11.40) together imply that η∗0 6= 0.To see this assume that η∗0 = 0. Equation (11.42) then implies thatη∗2(t) ≡ 0. The Maximum Principle (11.40) yields

H(η∗(t), x∗(t), u∗(t)) = η∗0f0(x∗1(t), x∗2(t), u∗(t))+η∗1(t)+η∗2(t)u∗(t) ≡ 0

so that if η∗0 = 0, then η∗2(t) ≡ 0 and

η∗0f0(x∗1(t), x∗2(t), u∗(t)) + η∗1(t) + η∗2(t)u∗(t) = η∗1(t) ≡ 0.

Hence, it would follow that η∗(t) ≡ 0 which contradicts the state-ment that η∗(t) is nontrivial. Consequently, it follows that η∗0 < 0.Combining (11.39) and (11.42) we find that

−η∗0[∂f0(x∗1(t), x∗2(t), u∗(t))

∂u

]=η∗2(t0)

− η∗0∫ t

t0

[∂f0(x∗1(s), x∗2(s), u∗(s))

∂x2

]ds

and since η∗0 < 0, we divide both sides of this equation by −η∗0 toobtain[∂f0(x∗1(t), x∗2(t), u∗(t))

∂u

]= c+

∫ t

t0

[∂f0(x∗1(s), x∗2(s), u∗(s))

∂x2

]ds

(11.43)where the constant c is given by

c =η∗2(t0)

−η∗0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Observe that (11.43) is Euler’s equation in integral form.Therefore, we have shown that the optimal solution (x∗1(·), x∗2(·),u∗(·)) with x∗2(t) = x∗(t) = u∗(t) and x∗1(·) = t satisfies Euler’sequation in integral form[∂f0(t, x∗(t), x∗(t))

∂u

]= c+

∫ t

t0

[∂f0(t, x∗(t), x∗(t))

∂x

]ds, t0 < t < t∗1.

(11.44)Also, we have that between corners of x∗(·),

d

dt

[∂f0(t, x∗(t), x∗(t))

∂u

]=

[∂f0(t, x∗(t), x∗(t))

∂x

], t0 < t < t∗1

(11.45)and

x∗(t0) = x0 and x∗(t∗1) = ϕ(t∗1).

To complete the problem we need to use the transversalitycondition at t∗1. The transversality condition (11.36) implies that[η∗1(t∗1)η∗2(t∗1)

]= k∇G(x∗1(t∗1), x∗2(t∗1)) = k

[ϕ(x∗1(t∗1))−1

]= k

[ϕ(t∗1)−1

]and hence it follows that

η∗1(t∗1) = kϕ(t∗1) (11.46)

and

η∗2(t∗1) = −η∗0[∂f0(x∗1(t∗1), x∗2(t∗1), u∗(t∗1))

∂u

]= −k. (11.47)

Again, the Maximum Principle (11.40) implies that at t = t∗1

0 = H(η∗(t∗1), x∗(t∗1), u∗(t∗1))

= η∗0f0(x∗1(t∗1), x∗2(t∗1), u∗(t∗1)) + η∗1(t∗1) + η∗2(t∗1)u∗(t∗1)

= η∗0f0(t∗1, x∗(t∗1), x∗(t∗1)) + η∗1(t∗1) + η∗2(t∗1)x∗(t∗1),

and applying (11.46) - (11.47) we have

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


0 = η∗0f0(t∗1, x∗(t∗1), x∗(t∗1)) + kϕ(t∗1)

− η∗0[∂f0(x∗1(t∗1), x∗2(t∗1), u∗(t∗1))

∂u

]x∗(t∗1)

= η∗0f0(t∗1, x∗(t∗1), x∗(t∗1)) + η∗0

[∂f0(t∗1, x

∗(t∗1), x∗(t∗1))

∂u

]ϕ(t∗1)

− η∗0[∂f0(t∗1, x

∗(t∗1), x∗(t∗1))

∂u

]x∗(t∗1).

Dividing by η∗0 < 0 yields

0 = f0(t∗1, x∗(t∗1), x∗(t∗1)) +

[∂f0(t∗1, x

∗(t∗1), x∗(t∗1))

∂u

]ϕ(t∗1)

−[∂f0(t∗1, x

∗(t∗1), x∗(t∗1))

∂u

]x∗(t∗1)

= f0(t∗1, x∗(t∗1), x∗(t∗1)) +

[∂f0(t∗1, x

∗(t∗1), x∗(t∗1))

∂u

][ϕ(t∗1)− x∗(t∗1)] ,

which implies the classical natural transversality condition

f0(t∗1, x∗(t∗1), x∗(t∗1)) +

[∂f0(t∗1, x

∗(t∗1), x∗(t∗1))

∂u

][ϕ(t∗1)− x∗(t∗1)] = 0.

(11.48)

11.3 Application to the Farmer’s Allo-

cation Problem

Here we apply the Maximum Principle to the Farmer’s AllocationProblem first presented as Example 7.8 in Section 7.7.1 above. Werefer to Luenburger’s book ([131]) for details. The basic problemis that a farmer is assumed to produce a single crop (wheat, rice,etc.) and when he sells his crop he stores the crop or else sellsthe crop and reinvests the money into his business to increase hisproduction rate. The farmer’s goal is to maximize the total amount

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


of crop stored up to two years. Luenberger formulates this as thefollowing optimal control problem.

Given the initial time t0 = 0, the final time t1 = 2 and theinitial state 1 > 0, find a control u∗(·) to minimize

J(u(·)) =

2∫0

(u(s)− 1)x(s)ds,

subject tox(t) = u(t)x(t) (11.49)

with initial conditionx(0) = 1. (11.50)

The control constraint is given by

u(t) ∈ [0, 1] .

Here, u(t) is the fraction of the production rate that is rein-vested at time t. It is important to note that since u(t) ≥ 0 andx(t) = u(t)x(t), all solutions to the system (11.49) - (11.50) satisfy

x(t) ≥ 1. (11.51)

Formulation as an Optimal Control Problem: In order toset up the Farmer’s Allocation Problem as a fixed time optimalcontrol problem and apply the Maximum Principle, we need to:







ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


It is obvious that t0 = 0, t1 = 2, X0 = x0 = 1 ⊆ R1 andX1 = R1. Note that the corresponding tangent spaces are given by

T0 = 0 and T1 = R1

and hence the transversality condition on the adjoint equation att = 2 is

η∗(2) = 0. (11.52)

The control constraint set is

Ω = [0, 1] ⊆ R1,

the functions f : R1 × R1 −→ R1, f0 : R1 × R1 −→ R1 are givenby

f(x, u) = ux

andf0(x, u) = (u− 1)x,

respectively. The set of admissible controllers is the subset ofPWC(0, 2) defined by

Θ =

u(·) ∈ PWC(0, 2) : u(t) ∈ Ω e.f. and

u(·) steers X0 to X1 at time 2.

= Θ(0, 2, x0 = 1,Ω). (11.53)

Given a control u(·) ∈ PWC(0, 2), the cost functional is definedby

J(u(·)) =

2∫0

(u(s)− 1)x(s)ds =

2∫0

f0(x(s), u(s))ds, (11.54)

where x(·) = x(·;u(·)) is the solution to the initial value problem(11.18) - (11.19). This optimal control problem is equivalent to theproblem of minimizing J(·) on the set of admissible controllers Θ.In particular, the goal is to find an optimal control u∗(·) ∈ Θ suchthat u∗(·) steers X0 to X1 at time 2 and

J(x∗(·)) =

2∫0

f0 (x∗(s), u∗ (s)) ds ≤2∫

0

f0 (x(s), u (s)) ds = J(x(·))

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all u(·) ∈ Θ.The augmented function f : R2 × R1 −→ R2 is defined by

f(x, u) = f(x0, x, u) =

[f0(x, u)f(x, u)

]=

[(u− 1)xux

],

where x = [x0 x]T . The augmented Hamiltonian H(η, x, u): R2×R2 × R1 −→ R1 is given by

H(η, x, u) = η0f0(x, u) + ηf(x, u) = η0(u− 1)x+ ηux,



[0 ∂f0(x,u)

∂x

0 ∂f(x,u)∂x

]=

[0 (u− 1)0 u

]so that if (x∗(·), u∗(·)) is an optimal pair, then the augmentedmatrix A(t) is given by


[0 (u∗(t)− 1)0 u∗(t)

].

Therefore,

−[A(t)]T = −[

0 (u∗(t)− 1)0 u∗(t)

]T=

[0 0

−(u∗(t)− 1) −u∗(t)

]and the augmented adjoint equation becomes

d

dt

[η0(t)η(t)

]= −[A(t)]T

[η0(t)η(t)

]=

[0 0

−(u∗(t)− 1) −u∗(t)

] [η0(t)η(t)

].

(11.55)Consequently,

d

dtη0(t) = 0

d

dtη(t) = −η0 [u∗(t)− 1]− u∗(t)η(t)


ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


If (x∗(·), u∗(·)) is an optimal pair for the Farmer’s AllocationProblem, then the Maximum Principle implies that is a nontrivialsolution

η∗(t) =

[η∗0(t)η∗(t)

]∈ R2

to the augmented adjoint equation (11.55) such that η∗0(t) ≡ η∗0 ≤0. Observe that the transversality condition (11.52) implies thatη∗(t) satisfies

d

dtη∗(t) = −u∗(t)η∗(t) + η∗0 [1− u∗(t)] , η∗(2) = 0. (11.56)

It again follows that η∗0 6= 0 because if η∗0 = 0, then η∗(t) wouldbe a solution to the homogenous linear initial value problem withzero initial data

d

dtη∗(t) = −u∗(t)η∗(t), η∗(2) = 0, (11.57)

which would imply that η∗(t) = 0. Dividing (11.56) by −η∗0 > 0yields

d

dt

[η∗(t)

−η∗0

]= −u∗(t)

[η∗(t)

−η∗0

]− [1− u∗(t)] ,

[η∗(2)

−η∗0

]= 0.

(11.58)Defining the normalized adjoint variable λ∗(t) by

λ∗(t) =

[η∗(t)

−η∗0

]yields

d

dtλ∗(t) = −u∗(t)λ∗(t)− [1− u∗(t)] , λ∗(2) = 0. (11.59)

Also, since −η∗0 > 0 it follows that maximizing the augmentedHamiltonian

H(η∗(t), x∗(t), u) = [η∗0(u− 1)x∗(t) + η∗(t)x∗(t)u]

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is equivalent to maximizing

1

−η∗0[η∗0(u− 1)x∗(t) + η∗(t)x∗(t)u]

= [−(u− 1)x∗(t) +

[η∗(t)

−η∗0

]x∗(t)u]

= [−(u− 1)x∗(t) + λ∗(t)x∗(t)u]

where λ∗(t) satisfies (11.59).Note that

max0≤u≤1

[−(u−1)x∗(t)+λ∗(t)ux∗(t)] = max0≤u≤1

[(1−u)x∗(t)+λ∗(t)ux∗(t)]

and since x∗(t) ≥ 1 it follows that

max0≤u≤1

[(1− u)x∗(t) + λ∗(t)ux∗(t)] =

max

0≤u≤1[(1− u) + λ∗(t)u]

x∗(t)

=

max

0≤u≤1[(1 + (λ∗(t)− 1)u]

x∗(t).

Thus, the optimal control must maximize the term

(λ∗(t)− 1)u

on the interval [0, 1] which implies that

u∗(t) =

1, if λ∗(t) > 1,0, if λ∗(t) < 1.

(11.60)

We need only compute λ∗(t) using the adjoint equation

d

dtλ∗(t) = −u∗(t)λ∗(t)− [1− u∗(t)] , λ∗(2) = 0. (11.61)

However, at t = 2 we know that λ∗(2) = 0 and since λ∗(t) iscontinuous, there is an interval [T, 2] with T < 2 such that

λ∗(t) < 1 for all t ∈ (T, 2].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


On this interval the optimal control must be u∗(t) = 0 and thecorresponding adjoint equation (11.61) has the form

d

dtλ∗(t) = −1, λ∗(2) = 0.

Thus,λ∗(t) = −t+ 2

until the time where

1 = λ∗(T ) = −T + 2

and then the optimal control switches to u∗(t) = 1. Clearly, T = 1and hence the optimal control on the interval [0, T ] = [0, 1] is givenby u∗(t) = 1. Again, returning to the adjoint equation (11.61) onthe interval [0, 1] yields

d

dtλ∗(t) = −u∗(t)λ∗(t)−[1− u∗(t)] = −λ∗(t), λ∗(1) = 1 (11.62)

which has the solution

λ∗(t) = e1−t.

Therefore, we know that

λ∗(t) =

e1−t, if 0 ≤ t ≤ 1,−t+ 2, if 1 ≤ t ≤ 2

and

u∗(t) =

1, if 0 ≤ t ≤ 1,0, if 1 ≤ t ≤ 2

(11.63)

is the corresponding optimal control.Finally, the corresponding optimal trajectory is given by

x∗(t) =

et, if 0 ≤ t ≤ 1,e, if 1 ≤ t ≤ 2

(11.64)

and if the optimal control problem has a solution, then theoptimal controller is given by (11.63) with corresponding optimaltrajectory (11.64).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


11.4 Application to a Forced Oscillator

Control Problem

Suppose we have a forced oscillator with its equilibrium positionat the origin and driven by the force u(t). The equation of motionis given by the second order equation

x(t) + x(t) = u(t)

and we consider the cost function given by

J(u(·)) =

∫ t1

0

1

2u2(s) ds, (11.65)

where t1 <12π is fixed. Note that if the terminal time t1 was free

or if t1 ≥ 12π, then one can reach the target without applying any

control in a quarter-period of the oscillation and the cost is thenzero. Thus, the problem is only interesting if t1 <

12π is fixed.

If x1(t) = x(t) denotes the position, then x2(t) = x(t) is thevelocity and the state equations have the form

d

dt

[x1(t)x2(t)

]=

[0 1−1 0

] [x1(t)x2(t)

]+

[01

]u(t). (11.66)

Suppose that initially at time t0 = 0, the system is at rest at adistance 2 (miles) from the equilibrium position 0 = [0 0]T andthat we wish to move to the zero position at a prescribed fixedtime t1, without requiring that we arrive there with zero velocity.

Formulation as an Optimal Control Problem: In order to setup the forced oscillator control problem as a fixed time optimalcontrol problem and apply the Maximum Principle, we need to:

1. Identify the initial time t0 and the final time t1 ;




ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii




The initial and final states are given by[x1(0)x2(0)

]=

[20

]and [

x1(t1)x2(t1)

]∈[

0x2

]: x2 ∈ R1

= 0 × R1,

respectively. Therefore, the initial and final sets are given by

X0 =

[20

]and X1 = 0 × R1, (11.67)

respectively. Since since there is no hard constraint on the control,the control constraint set is Ω = R1. The functions f0 : R2×R1 −→R1 and f : R2 × R1 −→ R1 are defined by

f(x1, x2, u) =1

2u2

and

f (x1, x2, u) =

[x2

(−x1 + u)

]=

[f1 (x1, x2, u)f2 (x1, x2, u)

],

so that the augmented function f : R3 × R1 → R3 is given by

f (x0, x1, x2, u) =

12u2

x2

(−x1 + u)

.The augmented Hamiltonian is given by

H(η, x, u) = η01

2u2 + η1x2 + η2(−x1 + u).

If (x∗(·), u∗(·)) is an optimal pair, then

A(t) =

0 0 00 0 10 −1 0

|x=x∗(t),u=u∗(t)

=

0 0 00 0 10 −1 0

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and the adjoint equations become

d

dtη(t) =

0 0 00 0 10 −1 0

η(t). (11.68)

The Maximum Principle implies that there exists a nonzerofunction

η∗(·) = [η∗0(·) η∗1(·) η∗2(·)]T

which is a solution to the adjoint equation (11.68) such thatη∗0(t) ≡ η∗0 ≤ 0 and

H(η∗(t), x∗(t), u∗(t)) = maxu∈R1

H(η∗(t), x∗(t), u) ≡ c.

Since Ω = R1 it follows that

∂

∂uH(η∗(t), x∗(t), u)|u=u∗(t) ≡ 0

and hence we have

η∗0u∗(t) + η∗2(t) ≡ 0. (11.69)

Moreover, the transversality condition states that η∗(t1) can betaken to be normal to the tangent plane for the target set X1 andsince X1 = 0 × R1 is a subspace it follows that T1 = 0 × R1.Consequently, if

η∗(t1) =

[η∗1(t1)η∗2(t1)

]⊥0 × R1,

then η∗2(t1) = 0, so that there is a constant A so that

η∗1(t) = A cos(t1 − t)

andη∗2(t) = A sin(t1 − t).

Observe that if η∗0 = 0, then (11.69) implies that η∗2(t) ≡ 0 sothat A = 0. Hence, it would follow that η∗1(t) = A cos(t1 − t) ≡ 0

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


which can not happen since η∗(·) is nonzero. Therefore, η∗0 < 0and we can solve (11.69) for u∗(t) to yield

u∗(t) = −η∗2(t)

η∗0= A sin(t1 − t) (11.70)

where A = −A/η∗0. When u∗(t) = A sin(t1− t), the solution to thestate equation (11.66) is easily found to be given by

x∗1(t) =1

2At cos(t1 − t) + 2 cos t+ B sin t, (11.71)

x∗2(t) =1

2A cos(t1− t) +

1

2At sin(t1− t)− 2 sin t+ B cos t, (11.72)

for some constant B. However, x∗2(0) = 0 and x∗1(t1) = 0 so that

0 = x∗2(0) =1

2A cos(t1) + B

and

0 = x∗1(t1) =1

2At1 + 2 cos t1 + B sin t1.

Solving these two equations for A and B we find

A =−4 cos t1

t1 − sin t1 cos t1

and

B = −1

2A cos t1.

Finally, the optimal cost can be found by substituting

u∗(t) = A sin(t1 − t)

into the cost function (11.65) and integrating we find

J(u∗(·)) =4 cos2 t1

t1 − sin t1 cos t1. (11.73)

Note that if t1 = 12π, then A = 0 and the optimal cost is zero.

For each t1 < 12π, the optimal cost is positive and as t1 → 0,

J(u∗(·)) → +∞ so that the cost grows to infinity as the timeinterval shrinks.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


11.5 Application to the Linear

Quadratic Control Problem

We apply the Maximum Principle to a Linear Quadratic (LQ)optimal control problem. Let A be a n × n constant real matrixand B be a n×m real constant matrix. Consider the linear controlsystem

x(t) = Ax(t) +Bu(t) (11.74)

with initial datax(0) = x0 ∈ Rn. (11.75)

We assume that t0 = 0, x0 ∈ Rn and 0 < t1 are given. Also, letQ be a n× n constant symmetric real matrix and R be a m×mconstant symmetric real matrix such that

QT = Q ≥ 0 and RT = R > 0.

Here, the inequality Q ≥ 0 is equivalent to the condition thatQ is non-negative

〈Qx,x〉 ≥ 0 for all x ∈ Rn,

and R > 0 is equivalent to the condition that R is positive definite

〈Ru,u〉 > 0 for all u ∈ Rm with u 6= 0.

We assume that the controls belong to the space PWC(0, t1;Rm).The quadratic cost functional is defined by

J(u(·)) =1

2

∫ t1

0

〈Qx(s),x(s)〉+ 〈Ru(s),u(s)〉 ds, (11.76)

where x(t) = x(t;u(·)) is the solution to the system (11.74) -(11.75).The Linear Quadratic (LQ) Optimal Control prob-lem is to find u∗(·) ∈ PWC(0, t1;Rm) so that

J(u∗(·)) =1

2

∫ t1

t0

〈Qx(s;u∗(·)),x(s;u∗(·))〉+ 〈Ru∗(s),u∗(s)〉 ds

≤ J(u(·)) =1

2

∫ t1

t0

〈Qx(s;u(·)),x(s;u(·))〉

+ 〈Ru(s),u(s)〉ds,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for all u(·) ∈ PWC(0, t1;Rm).

Formulation as an Optimal Control Problem: In order toset up the Quadratic Optimal Control Problem as a fixed timeoptimal control problem and apply the Maximum Principle, weneed to:


2. Identify the initial set X0 ⊆ Rn, the terminal set X1 ⊆ Rn;

3. Identify the control constraint set Ω ⊆ Rm;

4. Identify the functions f : Rn×Rm −→ Rn, f0 : Rn×Rm −→R1;

5. Define the augmented Hamiltonian H(η, x,u);


The initial time is t0 = 0 and the final time t1 is given andfixed. The initial set is the single initial vector X0 = x0 andsince there is no terminal constraint, the terminal set is the wholestate space X1 = Rn. Also, there is no constraint on the controlu(·) ∈ PWC(0, t1;Rm), hence the control constraint set is all ofRm and Ω = Rm. The functions f : Rn × Rm −→ Rn and f0 :Rn × Rm −→ R1 are given by

f(x,u) = Ax+Bu,

and

f0(x,u) =1

2〈Qx,x〉+

1

2〈Ru,u〉 ≥ 0,

respectively.The augmented Hamiltonian H : Rn+1 ×Rn+1 ×Rm −→ R1 is

defined by

H(η, x,u) =η0

2〈Qx,x〉+ 〈Ru,u〉+ 〈η, Ax+Bu〉 (11.77)

=η0

2xTQ x+

η0

2uTRu+ [Ax]Tη + [Bu]Tη

=η0

2xTQ x+

η0

2uTRu+ xTATη + uTBTη,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where x =[x0 x

]T ∈ Rn+1, η =[η0 η

]T ∈ Rn+1 and u ∈Rm. The augmented function f : Rn+1 × Rm −→ Rn+1 is definedby

f(x,u) =

[12〈Qx,x〉+ 1

2〈Ru,u〉

Ax+Bu,

]=

[12xTQx+ 1

2uTRu

Ax+Bu,

],

(11.78)so that the Jacobian is given by

Jxf(x,u) = Jxf(x, u) =

[0 [Qx]T

0 A

]. (11.79)

Now assume that (x∗(·),u∗(·)) is an optimal pair so that thematrix A(t) is given by

A(t) = Jxf(x,u)|(x∗(t),u∗(t))

= Jxf(x∗(t),u∗(t)) =

[0 [Qx∗(t)]T

0 A

](11.80)

and

−[A(t)]T = −[

0 0[Qx∗(t)] AT

]=

[0 0

− [Qx∗(t)] −AT].

The augmented adjoint equation has the form

d

dt

[η0(t)η(t)

]= −[A(t)]T

[η0(t)η(t)

]=

[0 0

− [Qx∗(t)] −AT] [

η0(t)η(t)

](11.81)

which is equivalent to the system

d

dtη0(t) = 0

d

dtη(t) = −η0(t)Qx∗(t)− ATη(t),

where again x∗(·) is the optimal trajectory. Again, the first equa-tion above implies that the zero adjoint state is a constant η0(t) ≡

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


η0. The second equation is coupled to the state equation by theterm −η0(t)Qx∗(t).

If (x∗(·),u∗(·)) is an optimal pair for the LQ optimal controlproblem, then the Maximum Principle implies that is a nontrivialsolution

η∗(t) =

[η∗0(t)η∗(t)

]∈ Rn+1

to the augmented adjoint equation (11.81) such that η∗0(t) ≡ η∗0 ≤0,

d

dtη∗(t) = −η∗0Qx∗(t)− ATη∗(t) (11.82)

and

H(η∗(t), x∗(t),u∗(t)) = maxu∈Rm

H(η∗(t), x∗(t),u) ≡ c.

Since Ω = Rm is open, then

DuH(η∗(t), x∗(t),u)|u=u∗(t) = DuH(η∗(t), x∗(t),u∗(t)) = 0,

where

DuH(η∗(t), x∗(t),u) =∂

∂u

η∗02

[x∗(t)]TQ x+η∗02uTRu

+[x∗(t)]TATη∗(t) + uTBTη∗(t)

=

∂

∂u

η∗02uTRu+ uTBTη∗(t)

= η∗0Ru+BTη∗(t),

so that when u = u∗(t)

DuH(η∗(t), x∗(t),u)|u=u∗(t) = η∗0Ru+BTη∗(t) |u=u∗(t) (11.83)

= η∗0Ru∗(t) +BTη∗(t) = 0.

Applying the transversality condition at x∗(0) = x0 ∈ X0 =x0 yields that η∗(0) can be any vector since T0 = x0. However,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


at t1 we have that x∗(t1) ∈ X1 = Rn and since T1 = X1 = Rn, thetransversality condition

η∗(t∗1) ⊥ T1 = Rn

implies thatη∗(t1) = 0. (11.84)

This boundary condition in turn implies that η∗0 < 0. To see thisassume that η∗0 = 0 and observe that the adjoint equation (11.82)reduces to the linear system

d

dtη∗(t) = −η∗0Qx∗(t)− ATη∗(t)

= −ATη∗(t).

Therefore, η∗(·) would be a solution of the homogenous linearinitial value problem

d

dtη∗(t) = −ATη∗(t), η∗(t1) = 0

and hence it follows that η∗(t) ≡ 0. Consequently, we have shownthat if η∗0 = 0, then η∗(t) ≡ 0 and hence

η∗(t) =

[η∗0(t)η∗(t)

]≡ 0

which contradicts the statement that η∗(t) is a nontrivial solutionof the augmented adjoint equation (11.81).

Since η∗0 < 0 we can solve (11.83) for the optimal control. Wehave

η∗0Ru∗(t) +BTη∗(t) = 0

which yieldsη∗0Ru

∗(t) = −BTη∗(t)

and

Ru∗(t) =−1

η∗0BTη∗(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


By assumption, the matrix R = RT > 0 is nonsingular and hencewe have the following expression for the optimal control

u∗(t) = R−1BT

[η∗(t)

−η∗0

]. (11.85)

Summarizing, it follows that the optimal trajectory can be ob-tained by solving the two point boundary value problem definedby the coupled state and adjoint equations

ddtx∗(t) = Ax∗(t)− 1

η∗0BR−1BTη∗(t), x∗(0) = x0,

ddtη∗(t) = −η∗0Qx∗(t)− ATη∗(t), η∗(t1) = 0,

(11.86)

and setting

u∗(t) = R−1BT

[η∗(t)

−η∗0

].

To eliminate the η∗0 term, we divide the adjoint equation aboveby −η∗0 which yields

d

dt

[η∗(t)

−η∗0

]= Qx∗(t)− AT

[η∗(t)

−η∗0

],

[η∗(t1)

−η∗0

]= 0.

Defining the normalized adjoint state λ∗(t) by

λ∗(t) ,η∗(t)

−η∗0

produces the optimality conditions

ddtx∗(t) = Ax∗(t) +BR−1BTλ∗(t), x∗(0) = x0,

ddtλ∗(t) = Qx∗(t)− ATλ∗(t), λ∗(t1) = 0,

(11.87)

where the optimal control is defined by

u∗(t) = R−1BTλ∗(t). (11.88)

We can write the optimality system as

d

dt

[x∗(t)λ∗(t)

]=

[A BR−1BT

Q −AT] [

x∗(t)λ∗(t)

], (11.89)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


with boundary conditions

x∗(0) =[In×n 0n×n

] [ x∗(0)λ∗(0)

]= x0, (11.90)

and

λ∗(t1) =[

0n×n In×n] [ x∗(t1)

λ∗(t1)

]= 0. (11.91)

Thus, if one solves the two point boundary value problem (11.89)- (11.91), the optimal control is defined by (11.88).

11.5.1 Examples of LQ Optimal Control Prob-lems

Here we apply the Maximum Principle to specific LQ optimal con-trol problem. We consider a problem with fixed final x1 and aproblem where x1 is not specified.

LQ Optimal Control Problem: Example 1


x(t) = x(t) + u(t) (11.92)

with initial datax(0) = 1/2 ∈ R1 (11.93)


x(1) = 0 ∈ R1. (11.94)

We assume that t0 = 0, 0 < t1 = 1, x0 = 1/2 ∈ R1 and x1 = 0 ∈ R1

are given. The quadratic cost functional is defined by

J(u(·)) =

1∫0

1

2[u(s)]2ds. (11.95)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The Linear Quadratic (LQ) optimal control problem is to findu∗(·) ∈ PWC(0, 1) so that

J(u∗(·)) =

1∫0

1

2[u∗(s)]2ds ≤ J(u(·)) =

1∫0

1

2[u(s)]2ds

for all u(·) ∈ PWC(0, 1) that steer x0 = 1/2 to x1 = 0.

Formulation as an Optimal Control Problem: In order to setup the fixed time optimal control problem and apply the MaximumPrinciple, we need to:







The initial time is t0 = 0 and the final time t1 = 1 is given andfixed. The initial set is X0 = 1/2 and the terminal constraintset is X1 = 0. Also, there is no constraint on the control u(·) ∈PWC(0, 1), hence the control constraint set is all of R1 and Ω =R1. The functions f : R1×R1 −→ R1 and f0 : R1×R1 −→ R1 aregiven by

f(x, u) = x+ u,

and

f0(x, u) =1

2u2 ≥ 0,

respectively.The augmented Hamiltonian H : R2×R2×R1 −→ R1 defined

by

H(η, x, u) = η01

2u2 + η(x+ u) (11.96)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where x =[x0 x

]T ∈ R2, η =[η0 η

]T ∈ R2 and u ∈ R1.

The augmented function f : R2 × R1 −→ R2 is defined by

f(x, u) =

[12u2

x+ u

], (11.97)

so that the Jacobian is given by


[0 00 1

]. (11.98)

Now assume that (x∗(·), u∗(·)) is an optimal pair so that thematrix A(t) is given by


[0 00 1

](11.99)

and

−[A(t)]T = −[

0 00 1

]=

[0 00 −1

].


d

dt

[η0(t)η(t)

]= −[A(t)]T

[η0(t)η(t)

]=

[0 00 −1

] [η0(t)η(t)

](11.100)


d

dtη0(t) = 0

d

dtη(t) = −η(t),

where again x∗(t) is the optimal trajectory. The first equationabove implies that the zero adjoint state is a constant η0(t) ≡ η0.Observe that

η(t) = ke−t

for some k. We can now apply the Maximum Principle.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


If (x∗(·), u∗(·)) is an optimal pair for the LQ optimal controlproblem, then the Maximum Principle implies that there is a non-trivial solution

η∗(t) =

[η∗0(t)η∗(t)

]∈ R2


d

dtη∗(t) = −η∗(t) (11.101)

and

H(η∗(t), x∗(t), u∗(t)) = maxu∈R1

H(η∗(t), x∗(t), u) ≡ c.

Since Ω = R1 is open and H(η∗(t), x∗(t), u) is maximized on theopen set Ω = R1, then the derivative of H(η∗(t), x∗(t), u) withrespect to u at u = u∗(t) must be zero. In particular,


where

DuH(η∗(t), x∗(t), u) =∂

∂u[η∗0

1

2u2 + η∗(t)(x∗(t) + u)]

= [η∗0u+ η∗(t)],

so that when u = u∗(t)

DuH(η∗(t), x∗(t), u)|u=u∗(t) = [η∗0u∗(t) + η∗(t)] = 0.

Thus,η∗0u

∗(t) = −η∗(t). (11.102)

Note that if η∗0 = 0, then 0 = η∗0u∗(t) = −η∗(t) which would imply

that η∗(t) ≡ 0. However, this in turn implies that

η∗(t) =

[η∗0(t)η∗(t)

]≡[

00

]∈ R2

which would contradict the fact that η∗(t) is a nontrivial solution.Hence, η∗0 < 0 and we can solve (11.102) for u∗(t) yielding

u∗(t) =η∗(t)

−η∗0, λ∗(t),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where we normalize the adjoint variable by defining

λ∗(t) =η∗(t)

−η∗0.

Note that by (11.101)

λ∗(t) =η∗(t)

−η∗0=η∗(t)

η∗0= −λ∗(t),

so thatλ∗(t) = Ae−t

for some constant A and

x∗(t) = x∗(t) + u∗(t) = x∗(t) + Ae−t. (11.103)

Solving (11.103) and applying the boundary conditions atx∗(0) = 1/2 and x∗(1) = 0 yields that

x∗(t) =1

2(e−1 − e)[et−1 − e1−t]

and

A =1

1− e−2,

so the optimal control is given by

u∗(t) = [1

1− e−2]e−t.

The second example is Linear Quadratic (LQ) optimal controlproblem with hard constraint on the control. In this case, theMaximum Principle does not reduce to taking the derivative ofthe augmented Hamiltonian and setting the resulting equation tozero. One must compute a solution to a constrained optimizationproblem.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


LQ Optimal Control Problem: Example 2

In this problem we allow the final state to be free and place abound on the control. Thus, we consider the linear control system

x(t) = x(t) + u(t) (11.104)

with initial datax(0) = 1 ∈ R1. (11.105)

We assume that t0 = 0, x0 = 1 ∈ R1 and 0 < t1 = 1 are given.The control is required to be bounded

|u(t)| ≤ 1 (11.106)

and the quadratic cost functional is defined by

J(u(·)) =

1∫0

1

2[x(s;u(·))]2ds =

1∫0

1

2[x(s)]2ds, (11.107)

where x(·) = x(·;u(·)) is the solution to initial value problem(11.104) - (11.105). The Linear Quadratic (LQ) optimal controlproblem is to find u∗(·) ∈ PWC(0, 1) satisfying (11.106) so that

J(u∗(·)) =

1∫0

1

2[x∗(s;u(·))]2ds ≤ J(u(·)) =

1∫0

1

2[x(s;u(·))]2ds

for all u(·) ∈ PWC(0, 1) that steer x0 = 1/2 to x1 ∈ R1.

Formulation as an Optimal Control Problem: In order to setup the fixed time optimal control problem and apply the MaximumPrinciple, we need to:





ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii




The initial time is t0 = 0 and the final time t1 = 1 is given andfixed. The initial set is the single initial vector X0 = 1 and sincethere is no terminal constraint, the terminal set is the whole statespace X1 = R1. The constraint on the control u(·) ∈ PWC(0, 1)is |u(t)| ≤ 1 which implies that Ω = [−1, 1]. The functions f :R1 × R1 −→ R1 and f0 : R1 × R1 −→ R1 are given by

f(x, u) = x+ u,

and

f0(x, u) =1

2x2 ≥ 0,

respectively.The augmented Hamiltonian H : R2×R2×R1 −→ R1 defined

by

H(η, x, u) = η01

2x2 + η(x+ u) (11.108)

where x =[x0 x

]T ∈ R2, η =[η0 η

]T ∈ R2 and u ∈ R1.

The augmented function f : R2 × R1 −→ R2 is defined by

f(x, u) =

[12x2

x+ u

], (11.109)

so that the Jacobian is given by


[0 x0 1

]. (11.110)

Now assume that (x∗(·), u∗(·)) is an optimal pair so that thematrix A(t) is given by

A(t) = Jxf(x, u)|(x∗(t),u∗(t))

= Jxf(x∗(t), u∗(t)) =

[0 x∗(t)0 1

](11.111)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and

−[A(t)]T = −[

0 00 1

]=

[0 0

−x∗(t) −1

].


d

dt

[η0(t)η(t)

]= −[A(t)]T

[η0(t)η(t)

]=

[0 0

−x∗(t) −1

] [η0(t)η(t)

](11.112)


d

dtη0(t) = 0

d

dtη(t) = −η0(t)x∗(t)− η(t),

where again x∗(t) is the optimal trajectory. The first equationabove implies (as always) that the zero adjoint state is a constantη0(t) ≡ η0. Observe that

d

dtη(t) = −η(t)− η0x

∗(t) (11.113)

is a linear nonhomogeneous equation. We can now apply the Max-imum Principle.

If (x∗(·), u∗(·)) is an optimal pair for the LQ optimal controlproblem, then the Maximum Principle implies that there is a non-trivial solution

η∗(t) =

[η∗0(t)η∗(t)

]∈ R2


d

dtη∗(t) = −η∗(t)− η∗0x∗(t) (11.114)

and

H(η∗(t), x∗(t), u∗(t)) = maxu∈[−1,1]

H(η∗(t), x∗(t), u) ≡ c.

In particular,

H(η∗(t), x∗(t), u) = η∗01

2x2 + η∗(t)(x∗(t) + u),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


so that

maxu∈[−1,1]

H(η∗(t), x∗(t), u) = maxu∈[−1,1]

η∗01

2x2 + η∗(t)(x∗(t) + u)

= maxu∈[−1,1]

η∗(t)u)

+ η∗01

2x2 + η∗(t)x∗(t) ≡ c.

Thus, u = u∗(t) must be selected to maximize

η∗(t)u)

on the interval −1 ≤ u ≤ 1. Clearly, this implies that

u∗(t) = sgn[η∗(t)]. (11.115)

Applying the transversality condition at t1 = 1, we find thatT1 = R1 so η∗(t1)⊥T1 implies that

η∗(t1) = η∗(1) = 0. (11.116)

Returning to the adjoint equation (11.113) we see that η∗(t) sat-isfies the terminal value problem

d

dtη∗(t) = −η∗(t)− η∗0x∗(t), η∗(1) = 0. (11.117)

One can see that η∗0 6= 0 since η∗0 = 0 would imply that η∗(t) solvesthe linear equation problem

d

dtη∗(t) = −η∗(t), η∗(1) = 0,

and hence η∗(t) ≡ 0. Thus, if η∗0 = 0, then η∗(t) ≡ 0 which contra-

dicts the statement that η∗(t) =[η∗0 η∗(t)

]Tis nonzero.

Hence, η∗0 < 0 and we can simplify the adjoint equation(11.117). Divide both sides of (11.117) by the positive number−η∗0 to obtain

d

dt

[η∗(t)

−η∗0

]= −

[η∗(t)

−η∗0

]+ x∗(t),

[η∗(1)

−η∗0

]= 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Let λ∗(t) be defined by

λ∗(t) ,

[η∗(t)

−η∗0

]and note that since −η∗0 > 0, λ∗(t) and η∗(t) have the same sign.Also,

λ∗(t) = −λ∗(t) + x∗(t), λ∗(1) = 0 (11.118)

andu∗(t) = sgn[η∗(t)] = sgn[λ∗(t)]. (11.119)

Combining (11.118) - (11.119) with the state equation, we findthat the system

x∗(t) = x∗(t) + sgn[λ∗(t)], x∗(0) = 1, (11.120)

λ∗(t) = −λ∗(t) + x∗(t), λ∗(1) = 0,

needs to be solved in order to compute u∗(t) = sgn(η∗(t)). This isa nonlinear two-point boundary value problem that must be solvednumerically.

At this point it is helpful to review numerical methods for solv-ing two point boundary value problems. The following references[11], [25], [103], [104], [112], [162], [169] and [185] contain someuseful results on this topic.

11.5.2 The Time Independent Riccati Differ-ential Equation

We return to the general LQ optimal control problem and focuson the optimality system defined by the two point boundary valueproblem (11.89) - (11.91). We show that one can transform thestate variable to the adjoint variable by a matrix that satisfiesa Riccati differential equation. This transformation is a key stepin connecting the theory of Riccati equations with the optimalityconditions for linear quadratic optimal control. Also, the resultingRiccati equation provides one method for developing numericalmethods for solving LQ optimal control problems.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


First write (11.89) as the linear system

d

dt

[x(t)λ(t)

]=

[A BR−1BT

Q −AT] [

x(t)λ(t)

], F

[x(t)λ(t)

](11.121)

where λ(t1) = 0 and

F =

[A BR−1BT

Q −AT].

The solution to (11.121) has the form[x(t)λ(t)

]= eF (t−t1)

[x(t1)λ(t1)

]= eF (t−t1)

[x(t1)

0

].

Let

Ψ(t) = eFt =

[ψ11(t) ψ12(t)ψ21(t) ψ22(t)

],

where ψij(t), i, j = 1, 2 are n×n square matrix functions. It followsthat[x(t)λ(t)

]= eF (t−t1)

[x(t1)

0

]=

[ψ11(t− t1) ψ12(t− t1)ψ21(t− t1) ψ22(t− t1)

][x(t1)

0

]so that

x(t) = ψ11(t− t1)x(t1), (11.122)

andλ(t) = ψ21(t− t1)x(t1). (11.123)

If ψ11(t− t1) is non-singular for 0 ≤ t ≤ t1, then we can solve(11.122) for x(t1). In particular,

x(t1) = [ψ11(t− t1)]−1x(t)

which, when substituted into (11.123), yields

λ(t) = [ψ21(t− t1)(ψ11(t− t1))−1]x(t).

If P (t) is the n× n matrix defined by

P (t) , −[ψ21(t− t1)(ψ11(t− t1))−1], (11.124)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


then we have that λ(t) and x(t) are linearly related by the matrixP (t) and the relationship is given by

λ(t) = −P (t)x(t).

The choice of the negative sign in defining P (·) is made to be con-sistent with much of the existing literature. In order to make thisstep rigorous, one needs to prove that ψ11(t−t1) is non-singular for0 ≤ t ≤ t1. On the other hand, we could simply ask the question:

Is there a matrix P (t) so that λ(t) = −P (t)x(t) andhow can P (t) be computed?

We will address the issue of the existence of P (t) later. How-ever, assume for the moment that x(·) and λ(t) satisfying (11.89)- (11.91) and

λ(t) = −P (t)x(t), (11.125)

with P (t) differentiable. Differentiating the equation (11.125) oneobtains

d

dtλ(t) = −

[d

dtP (t)

]x(t)− P (t)

[d

dtx(t)

]= −

[d

dtP (t)

]x(t)− P (t)

[Ax(t) +BR−1BTλ(t)

]= −

[d

dtP (t)

]x(t)− P (t)

[Ax(t)−BR−1BTP (t)x(t)

]= −

[d

dtP (t)

]x(t)− P (t)Ax(t) + P (t)BR−1BTP (t)x(t).

However, from (11.121) it follows that

d

dtλ(t) = Qx(t)− ATλ(t)

= Qx(t) + ATP (t)x(t)

so that

Qx(t) + ATP (t)x(t) = −[d

dtP (t)

]x(t)− P (t)Ax(t)

+ P (t)BR−1BTP (t)x(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Rearranging the terms we have

−[d

dtP (t)

]x(t) = ATP (t)x(t) + P (t)Ax(t)

− P (t)BR−1BTP (t)x(t) +Qx(t),

or equivalently

−[d

dtP (t)

]x(t)

=[ATP (t) + P (t)A− P (t)BR−1BTP (t) +Q

]x(t).(11.126)

Consequently, P (t) satisfies (11.126) along the trajectory x(t).Observe that (11.126) is satisfied for any solution of the system(11.121) with λ(t1) = 0 and all values of x(t1). Therefore, if

λ(t) = −P (t)x(t),

then P (t) satisfies the matrix Riccati differential equation

− P (t) = ATP (t) + P (t)A− P (t)BR−1BTP (t) +Q, 0 ≤ t < t1,(11.127)

with terminal condition

P (t1) = 0n×n, (11.128)

since−P (t1)x(t1) = λ(t1) = 0

and x(t1) can be any vector in Rn.We shall show below that under the assumption that there is a

solution P (t) to the Riccati differential equation (11.127) satisfying(11.128), then the LQ optimal control problem has a solution andthe optimal control is given by

u∗(t) = −R−1BTP (t)x∗(t). (11.129)

In order to provide a rigorous treatment of this problem, wepresent two lemmas. These results relate the existence of a solu-tion to the Riccati equation (11.127) to the existence of an optimal

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


control for the LQ optimal control problem. First we note that anysolution to the Riccati differential equation must be symmetric. Inparticular, P (t) = [P (t)]T for all t.

Lemma 11.1 Suppose that P (t) = [P (t)]T is any n × n matrixfunction with P (t) differentiable on the interval [t0, t1]. If u(·) ∈PWC(t0, t1;Rm) and

x(t) = Ax(t) +Bu(t), t0 ≤ t ≤ t1,

then

〈P (s)x(s),x(s)〉 |t1t0 =

∫ t1

t0

⟨[P (s)+P (s)A+ATP (s)

]x(s),x(s)

⟩ds

+

∫ t1

t0

〈P (s)Bu(s),x(s)〉 ds (11.130)

+

∫ t1

t0

⟨BTP (s)x(s),u(s)

⟩ds.

Proof: Observe that

〈P (s)x(s),x(s)〉 |t1t0 =

∫ t1

t0

d

ds〈P (s)x(s),x(s)〉 ds

=

∫ t1

t0

⟨P (s)x(s),x(s)

⟩ds

+

∫ t1

t0

〈P (s)x(s),x(s)〉 ds

+

∫ t1

t0

〈P (s)x(s), x(s)〉 ds

and by substituting Ax(s) +Bu(s) for x(s) we obtain

〈P (s)x(s),x(s)〉 |t1t0 =

∫ t1

t0

⟨P (s)x(s),x(s)

⟩ds

+

∫ t1

t0

〈P (s) [Ax(s) +Bu(s)] ,x(s)〉 ds

+

∫ t1

t0

〈P (s)x(s), [Ax(s) +Bu(s)]〉 ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Simplifying this expression we obtain

〈P (s)x(s),x(s)〉 |t1t0 =

∫ t1

t0

⟨P (s)x(s),x(s)

⟩ds

+

∫ t1

t0

〈P (s) [Ax(s)] ,x(s)〉 ds

+

∫ t1

t0

〈P (s) [Bu(s)] ,x(s)〉 ds

+

∫ t1

t0

〈P (s)x(s), [Ax(s)]〉 ds

+

∫ t1

t0

〈P (s)x(s), [Bu(s)]〉 ds

and collecting terms yields

〈P (s)x(s),x(s)〉 |t1t0 =

∫ t1

t0

⟨[P (s) + P (s)A+ ATP (s)]x(s),x(s)

⟩ds

+

∫ t1

t0

〈P (s)Bu(s),x(s)〉 ds

+

∫ t1

t0

⟨BTP (s)x(s),u(s)

⟩ds.

which establishes (11.130).

Lemma 11.2 Assume that the Riccati differential equation(11.127) has a solution P (t) = [P (t)]T for t0 ≤ t < t1 andP (t1) = 0n×n. If u(·) ∈ PWC(t0, t1;Rm) and

x(t) = Ax(t) +Bu(t), t0 ≤ t ≤ t1,

then the cost function J(·) has the representation

J(u(·)) =

∫ t1

t0

∥∥R1/2u(s) +R−1/2BTP (s)x(s)∥∥2ds

+ 〈P (t0)x(t0),x(t0)〉 .

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Proof: Let

N(x(·),u(·)) =

∫ t1

t0

∥∥R1/2u(s) +R−1/2BTP (s)x(s)∥∥2ds

and expanding N(x(·),u(·)) we obtain

N(x(·),u(·)) =

∫ t1

t0

⟨R1/2u(s) +R−1/2BTP (s)x(s), R1/2u(s)

+R−1/2BTP (s)x(s)⟩ds

=

∫ t1

t0

⟨R1/2u(s), R1/2u(s)

⟩ds

+

∫ t1

t0

⟨R1/2u(s), R−1/2BTP (s)x(s)

⟩ds

+

∫ t1

t0

⟨R−1/2BTP (s)x(s), R1/2u(s)

⟩ds

+

∫ t1

t0

⟨R−1/2BTP (s)x(s), R−1/2BTP (s)x(s)

⟩ds.

Simplifying each term we have

N(x(·),u(·)) =

∫ t1

t0

⟨R1/2R1/2u(s),u(s)

⟩ds

+

∫ t1

t0

⟨u(s), R1/2R−1/2BTP (s)x(s)

⟩ds

+

∫ t1

t0

⟨R1/2R−1/2BTP (s)x(s),u(s)

⟩ds

+

∫ t1

t0

⟨R−1/2R−1/2BTP (s)x(s), BTP (s)x(s)

⟩ds,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


which implies

N(x(·),u(·)) =

∫ t1

t0

〈Ru(s),u(s)〉 ds

+

∫ t1

t0

⟨u(s), BTP (s)x(s)

⟩ds

+

∫ t1

t0

⟨BTP (s)x(s),u(s)

⟩ds

+

∫ t1

t0

⟨R−1BTP (s)x(s), BTP (s)x(s)

⟩ds,

or equivalently

N(x(·),u(·)) =

∫ t1

t0

〈Ru(s),u(s)〉

+

∫ t1

t0


⟩ds (11.131)

+

∫ t1

t0

⟨BTP (s)x(s),u(s)

⟩ds

+

∫ t1

t0

⟨P (s)BR−1BTP (s)x(s),x(s)

⟩ds.

Since the matrix P (s) satisfies the Riccati equation (11.127), itfollows that

P (s)BR−1BTP (s)x(s) =[P (s) + ATP (s) + P (s)A+Q

]x(s)

and the last term above becomes∫ t1

t0

⟨[P (s) + ATP (s) + P (s)A+Q

]x(s),x(s)

⟩ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Substituting this expression into (11.131) and rearranging yields

N(x(·),u(·)) =

∫ t1

t0

〈Ru(s),u(s)〉 ds+

∫ t1

t0

〈Qx(s),x(s)〉 ds

+

∫ t1

t0

⟨[P (s) + ATP (s) + P (s)A

]x(s),x(s)

⟩ds

+

∫ t1

t0


⟩ds

+

∫ t1

t0

⟨BTP (s)x(s),u(s)

⟩ds,

which implies

N(x(·),u(·)) = J(u(·))

+

∫ t1

t0

⟨[P (s) + ATP (s) + P (s)A

]x(s),x(s)

⟩ds

+

∫ t1

t0


⟩ds

+

∫ t1

t0

⟨BTP (s)x(s),u(s)

⟩ds.

Applying (11.130) from the previous Lemma yields

N(x(·),u(·)) = J(u(·)) + 〈P (s)x(s),x(s)〉 |t1t0 ,

or equivalently

J(u(·)) = N(x(·),u(·))− 〈P (s)x(s),x(s)〉 |t1t0 .

However, since P (t1) = 0n×n we conclude that

J(u(·)) =

∫ t1

t0

∥∥R1/2u(s) +R−1/2BTP (s)x(s)∥∥2ds

+ 〈P (t0)x(t0),x(t0)〉 , (11.132)

which completes the proof. We now have the fundamental result on the relationship be-

tween solutions to the Riccati equation and the existence of anoptimal control for the LQ optimal problem.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Theorem 11.2 (Existence of LQ Optimal Control) If theRiccati differential equation (11.127) has a solution P (t) = [P (t)]T

for 0 ≤ t < t1 and P (t1) = 0n×n, then there is a controlu∗(·) ∈ PWC(0, t1;Rm) such that u∗(·) minimizes

J(u(·)) =

∫ t1

0

〈Qx(s),x(s)〉+ 〈Ru(s),u(s)〉 ds

on the set PWC(0, t1;Rm), where the state equation is given by

x(t) = Ax(t) +Bu(t) (11.133)


In addition, the optimal control is a linear feedback law

u∗(t) = −R−1BTP (t)x∗(t) (11.135)

and the minimum value of J(u(·)) is

J(u∗(·)) = 〈P (0)x0,x0〉 . (11.136)

Proof : Let t0 = 0 and apply the identity (11.132) above. Inparticular, it follows that J(·) is minimized when the quadraticterm ∫ t1

0

∥∥R1/2u(s) +R−1/2BTP (s)x(s)∥∥2ds ≥ 0

is minimized. If u∗(t) = −R−1BTP (t)x∗(t), then

R1/2u∗(t) +R−1/2BTP (t)x∗(t) = 0

and

J(u∗(·)) =

∫ t1

0

∥∥R1/2u∗(s) +R−1/2BTP (s)x∗(s)∥∥2ds

+ 〈P (0)x∗(0),x∗(0)〉= 〈P (0)x0,x0〉 .

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Consequently, for any u(·) ∈ PWC(0, t1;Rm) it follows that

J(u∗(·)) = 〈P (0)x0,x0〉

≤ 〈P (0)x0,x0〉+

∫ t1

0

∥∥R1/2u(s) +R−1/2BTP (s)x(s)∥∥2ds

= J(u(·)),

which completes the proof. Later we will return to linear quadratic control problems and

consider more general time dependent problems. Also, we will gen-eralize the cost function to include a terminal penalty such as

J(u(·)) = 〈Gx(t1),x(t1)〉+1

2

∫ t1

t0

〈Qx(s),x(s)〉

+ 〈Ru(s),u(s)〉 ds, (11.137)

where G = GT ≥ 0 is a symmetric non-negative matrix.

11.6 The Maximum Principle for a

Problem of Bolza

Consider the case where there is an additional explicit cost on theterminal state

J(u(·)) = G(x(t1)) +

t1∫t0

f0(x(s),u(s))ds (11.138)

where G : Rn → R is a C2 function. In order to keep the discussionsimple, we begin with the simple case where

X0 = x0 ⊆ Rn

is a single vector and t1 is fixed. We show that by augmentingthe problem we can construct an equivalent problem without theterminal cost defined by G. Also note that the final target set istaken to be

X1 = Rn.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The state equation is augmented by adding a new variable at theend of the vector. In particular, let

x = [x1 x2 ...xn xn+1]T = [x xn+1]T ∈ Rn+1

and define fn+1 : Rn+1 × Rm −→ R1 by

fn+1(x, u) = 0.

By adding the equation

xn+1(t) = fn+1(x(t), u(t) ) = 0

to the system

x(t) = f(x(t),u(t)), t0 < t ≤ t1,

we obtain the new state equation

d

dtx(t) = f(x(t),u(t)) =

[f(x(t),u(t))

0

], t0 < t ≤ t1.

(11.139)Consider the new cost functional

J(u(·)) =

t1∫t0

xn+1(s) + f0(x(s),u(s))ds =

t1∫t0

f0(x(s),u(s))ds

wheref0(x,u) , xn+1 + f0(x,u).

Observe that since xn+1(s) is a constant (recall that xn+1(s) = 0),then

J(u(·)) =

t1∫t0

xn+1(s) + f0(x(s),u(s))ds

= xn+1(t1)(t1 − t0) +

t1∫t0

f0(x(s),u(s))ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Hence, if we require that xn+1(t1) = G(x(t1))t1−t0 , it follows that

J(u(·)) = xn+1(t1)(t1 − t0) +

t1∫t0

f0(x(s),u(s))ds

= G(x(t1)) +

t1∫t0

f0(x(s),u(s))ds.

Thus, we can reformulate the problem as the equivalent optimalcontrol problem in Rn+1.

Let

X0 = x0 × R1 =x = [x0 y]T : y ∈ R1

⊆ Rn+1 (11.140)

and

X1 =x = [x xn+1]T : G(x) , xn+1 −G(x)/(t1 − t0) = 0

⊆ Rn+1. (11.141)

Therefore, minimizing the cost functional (11.138) among all con-trols that steer X0 = x0 to X1 = Rn is equivalent to minimizingthe cost functional

J(u(·)) =

t1∫t0

f0(x(s),u(s))ds =

t1∫t0

xn+1(s) + f0(x(s),u(s))ds

(11.142)among all controls that steer X0 = x0 × R1 to

X1 =x = [x xn+1]T : G(x) , xn+1 −G(x)/(t1 − t0) = 0

with the state equation defined by (11.139) above. We now applythe Maximum Principle to this equivalent problem.

Observe that X0 = x0×R1 is a subspace of Rn+1 with tangentspace T0 = 0 × R1 so that a vector η = [η, ηn+1]T ∈ Rn+1 isorthogonal to T0 = 0 × R1 if and only if ηn+1 = 0. Since thetarget set

X1 =x=[x xn+1]T : G(x) , xn+1−G(x)/(t1−t0) = 0

⊆ Rn+1

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is defined by the level set of the function G : Rn+1 → R1, then avector η = [η, ηn+1]T ∈ Rn+1 is orthogonal to T1 at x(t1) if

η =

[ηηn+1

]= α∇G(x(t1)) = α

[−∇G(x(t1))/(t1 − t0)

1

](11.143)

for some nonzero α.The augmented state and co-state are given byx = [x0 x xn+1]T = [x0 x]T ∈ Rn+2

and η = [η0 η ηn+1]T = [η0 η]T ∈ Rn+2,

respectively. The augmented state equation is given by

d

dtx(t) = f(x(t),u(t)) =

f0(x(t),u(t))f(x(t),u(t))

fn+1(x(t),u(t))

=

f0(x(t),u(t))f(x(t),u(t))

0

, t0 < t ≤ t1,

so that the augmented Hamiltonian has the formH(η∗(t), x∗(t),u) , η0f0(x,u) +⟨η,f(x,u)

⟩(11.144)

= η0[xn+1 + f0(x,u)] + 〈η,f(x,u)〉+ ηn+1 · 0= η0[xn+1 + f0(x,u)] + 〈η,f(x,u)〉 .

To construct the corresponding adjoint system we construct theJacobian

Jx(x,u) =

∂f0(x,u)∂x0

∂f0(x,u)∂x1

∂f0(x,u)∂x2

· · · ∂f0(x,u)∂xn

∂f0(x,u)∂xn+1

∂f1(x,u)∂x0

∂f1(x,u)∂x1

∂f1(x,u)∂x2

· · · ∂f1(x,u)∂xn

∂f1(x,u)∂xn+1

∂f2(x,u)∂x0

∂f2(x,u)∂x1

∂f2(x,u)∂x2

· · · ∂f2(x,u)∂xn

∂f2(x,u)∂xn+1

......

.... . .

......

∂fn(x,u)∂x0

∂fn(x,u)∂x1

∂fn(x,u)∂x2

· · · ∂fn(x,u)∂xn

∂fn(x,u)∂xn+1

∂fn+1(x,u)∂x0

∂fn+1(x,u)∂x1

∂fn+1(x,u)∂x2

· · · ∂fn+1(x,u)∂xn

∂fn+1(x,u)∂xn+1

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


which becomes

Jx(x,u) =

0 ∂f0(x,u)∂x1

∂f0(x,u)∂x2

· · · ∂f0(x,u)∂xn

1

0 ∂f1(x,u)∂x1

∂f1(x,u)∂x2

· · · ∂f1(x,u)∂xn

∂f1(x,u)∂xn+1

0 ∂f2(x,u)∂x1

∂f2(x,u)∂x2

· · · ∂f2(x,u)∂xn

∂f2(x,u)∂xn+1

......

.... . .

......

0 ∂fn(x,u)∂x1

∂fn(x,u)∂x2

· · · ∂fn(x,u)∂xn

∂fn(x,u)∂xn+1

0 0 0 · · · 0 0

,

or equivalently,

Jx(x,u) =

0 [∇f0(x,u)]T 10 Jxf(x,u) 00 0 0

,where Jxf(x,u) is the Jacobian of f(x,u). Consequently, if(x∗(t), u∗(t)) is an optimal pair, then

A(t) =

0 [∇f0(x∗(t), u∗(t))]T 10 Jxf(x∗(t), u∗(t)) 00 0 0

=

[A(t) 1

0 0

]and the augmented co-state equation becomes

d

dtη(t) = −[ A(t)]T

d

dtη(t),

which has the form

d

dt

η0(t)η(t)ηn+1(t)

=

0 0 0−∇f0(x∗(t),u∗(t)) −[Jxf(x∗(t), u∗(t))]T 0

−1 0 0

η0(t)η(t)ηn+1(t)

.(11.145)

Observe that the solutions of (11.145) satisfy η0(t) ≡ η0 and

ηn+1(t) = −η0t+ k (11.146)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for some constant k. In addition, η(t) satisfies the equation

η(t) = −η0∇f0(x∗(t),u∗(t))− [Jxf(x∗(t), u∗(t))]Tη(t) (11.147)

which is the same as the adjoint equation for the Simplest Prob-lem in Optimal Control. With the observations above it is nowstraightforward to establish the following Maximum Principle fora Problem of Bolza.

Theorem 11.3 (Maximum Principle for the Problem ofBolza) Assume that f : Rn × Rm → Rn, f0 : Rn × Rm → R,G : R → R, X0 = x0 ⊆ Rn, t0 ∈ R, t1 is fixed, X1 = Rn andΩ ⊆ Rm are given as above. Consider the control system

(S) x(t) = f(x(t),u(t)), t0 < t ≤ t1, (11.148)

with piecewise continuous controllers u(·) ∈ PWC(t0, t1;Rm) sat-isfying u(t) ∈ Ω ⊆ Rm e.f. If

u∗(·) ∈ Θ =

u(·) ∈ PWC(t0, t1;Rm) : u(t) ∈ Ω e.f.and u(·) steers X0 to X1 at time t1.

(11.149)

minimizes

J(u(·)) = G(x(t1)) +

t1∫t0

f0(x(s),u(s))ds, (11.150)

on the set of admissible controls Θ with optimal response x∗(·)satisfying x∗(t0) = x∗0, then there exists a non-trivial solution

η∗(t) = [η∗0(t) η∗1(t) η∗2(t) ... η∗n(t)]T =

[η∗0(t)η∗(t)

](11.151)


d

dtη(t) = −[A(t)]T η(t), (11.152)

such that

H(η∗(t), x∗(t), u∗(t)) = M(η∗(t), x∗(t)) = maxu∈Ω

H(η∗(t), x∗(t), u).

(11.153)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Moreover, there are constants η∗0 ≤ 0 and c such that

η∗0(t) ≡ η∗0 ≤ 0 (11.154)

and for all t ∈ [t0, t1],


H(η∗(t), x∗(t),u) ≡ c. (11.155)

Also, the transversality condition at t = t1 is given by

η∗(t1) = η∗0∇G(x∗(t1)). (11.156)

Proof. To prove this result, we need only to establish the finaltransversality condition (11.156) above. Since the tangent spaceT0 = 0×R1, it follows from the Maximum Principle that η∗(t) =[η∗(t0) η∗n+1(t0)]T ∈ Rn+1 is orthogonal to T0 = 0×R1 and thisis true if and only if η∗n+1(t0) = 0. From (11.146) η∗n+1(t) = −η∗0t+kand η∗n+1(t0) = 0 implies that

η∗n+1(t) = −η∗0(t− t0).

Also, the condition (11.143) implies that

η∗(t1) =

[η∗(t1)η∗n+1(t1)

]=

[η∗(t1)

−η∗0(t1 − t0)

]= α∇G(x(t1)) =

[−α∇G(x∗(t1))/(t1 − t0)

α

]and hence

−η∗0(t1 − t0) = α.

Therefore,

η∗(t1) = −α∇G(x∗(t1))/(t1 − t0) = η∗0(t1 − t0)∇G(x∗(t1))/(t1 − t0)

= η∗0∇G(x∗(t1))

and this completes the proof.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


11.7 The Maximum Principle for

Nonautonomous Systems

When the system is nonautonomous the Maximum Principle mustbe modified. However, we show that the general nonautonomousproblem can be reduced to the SPOC and the Maximum Principlefor the SPOC can be used to obtain the corresponding MaximumPrinciple. Thus, we assume that f : R× Rn × Rm → Rn, f0 :R× Rn × Rm → R, X0 ⊆ Rn, t0 ∈ R, X1 ⊆ Rn and Ω ⊆ Rm aregiven with X0 ⊆ Rn, X1 ⊆ Rn and Ω ⊆ Rm nonempty. Here weassume that t1 ∈ R is free as in the SPOC. The control system isgiven by

(S) x(t) = f(t,x(t),u(t)), t0 < t ≤ t1 (11.157)

and the cost functional is defined by

J(u(·)) =

∫ t1

t0

f0(s,x(s),u(s))ds. (11.158)

The admissible controllers are given by

Θ =

u(·) ∈ PWC(t0,+∞;Rm) : u(t) ∈ Ω e.f. andu(·) steers X0 to X1 at a finite time t1.

(11.159)

and the problem is to find u∗(·) ∈ Θ such that

J(u∗(·)) =

∫ t1

t0

f0(s,x∗(s),u∗(s))ds

≤ J(u(·)) =

∫ t1

t0

f0(s,x(s),u(s))ds

for all u(·) ∈ Θ.In order to state the Maximum Principle for this nonau-

tonomous case we again have to introduce some notation. Givena vector z = [z1, z2, ..., zn]T ∈ Rn, define the n + 2 dimensionalaugmented vector z ∈ Rn+2 by

z = [z0 z1 z2 ... zn zn+1]T =

z0

zzn+1

=

[zzn+1

]∈ Rn+2,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where as before z = [z0 z1 z2 ... zn]T ∈ Rn+1. Also, given thefunctions

f : R1×Rn × Rm → Rn, f0 : R1×Rn × Rm → R,

where f : R1×Rn × Rm → Rn is given by

f(t,x,u) =

f1(t,x,u)f2(t,x,u)f3(t,x,u)

...fn(t,x,u)

=

f1(t, x1, x2, ..., xn, u1, u2,..., um)f2(t, x1, x2, ..., xn, u1, u2,..., um)f3(t, x1, x2, ..., xn, u1, u2,..., um)

...fn(t, x1, x2, ..., xn, u1, u2,..., um)

,it follows that all the partial derivatives

∂fi(t, x1, x2, ..., xn, u1, u2,..., um)

∂xj,

∂fi(t, x1, x2, ..., xn, u1, u2,..., um)

∂ujexist.

Remark 11.4 For the moment we make a rather strong assump-tion that the partial derivatives

∂fi(t, x1, x2, ..., xn, u1, u2,..., um)

∂t

also exist. This is made to simplify the derivation of the MaximumPrinciple for the nonautonomous problem and is not essential.

Define the (time) augmented vector field

f : Rn+2 × Rm → Rn+2

by

f(x,u) =

f0(x,u)

f1(x,u)

f2(x,u)...

fn(x,u)

fn+1(x,u)

,

f0(xn+1, x1, x2, ..., xn, u1, u2,..., um)f1(xn+1, x1, x2, ..., xn, u1, u2,..., um)f2(xn+1, x1, x2, ..., xn, u1, u2,..., um)

...fn(xn+1, x1, x2, ..., xn, u1, u2,..., um)

1

.

(11.160)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The time augmented control system is defined by

d

dtx(t) = f(x(t),u(t)), (11.161)

or equivalently by the system

d

dtx0(t) = f0(xn+1(t),x(t),u(t)), (11.162)

d

dtx(t) = f(xn+1(t),x(t),u(t)), (11.163)

d

dtxn+1(t) = fn+1(xn+1(t),x(t),u(t)) = 1. (11.164)

Observe that

x0(t) =

∫ t

t0

f0(xn+1(s),x(s),u(s))ds,

and xn+1(t0) = t0 so that the initial state for (11.161) is

x(t0) = [ 0 x0 t0 ]T . (11.165)

In particular, x0(t) represents the cost of transferring the statefrom an initial x(t0) = x0 ∈ X0 by the control u(·) and xn+1(t) = tis the time variable.

Assume that (x∗(·),u∗(·)) is an optimal pair. In particular,u∗(·) ∈ Θ steers x∗0 ∈ X0 to x∗1 ∈ X1 in time t∗1 and x∗(·) is thecorresponding optimal trajectory that satisfies the initial valueproblem (10.1) - (10.2) with x0 = x∗0 ∈ X0. Define the (n + 2) ×(n+ 2) matrix A(t) by

A(t) =

[∂fi(x0,x, xn+1,u)

∂xj

]|(x∗(t),u∗(t))

(11.166)

=

[∂fi(x

∗(t),u∗(t))

∂xj

], i, j = 0, 1, 2, ..., n, n+ 1.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Thus,

A(t) =

[∂fi(x

∗(t),u∗(t))

∂xj

]

=

∂f0(x∗(t),u∗(t))∂x0

∂f0(x∗(t),u∗(t))∂x1

· · · ∂f0(x∗(t),u∗(t))∂xn

∂f0(x∗(t),u∗(t))∂xn+1

∂f1(x∗(t),u∗(t))∂x0

∂f1(x∗(t),u∗(t))∂x1

· · · ∂f1(x∗(t),u∗(t))∂xn

∂f1(x∗(t),u∗(t))∂xn+1

∂f2(x∗(t),u∗(t))∂x0

∂f2(x∗(t),u∗(t))∂x1

· · · ∂f2(x∗(t),u∗(t))∂xn

∂f2(x∗(t),u∗(t))∂xn+1

......

. . ....

...∂fn(x∗(t),u∗(t))

∂x0

∂fn(x∗(t),u∗(t))∂x1

· · · ∂fn(x∗(t),u∗(t))∂xn

∂fn(x∗(t),u∗(t))∂xn+1

∂fn+1(x∗(t),u∗(t))∂x0

∂fn+1(x∗(t),u∗(t))∂x1

· · · ∂fn+1(x∗(t),u∗(t))

∂xn

∂fn+1(x∗(t),u∗(t))

∂xn+1

.

Since fi(x,u) = fi(x, xn+1,u) = fi(xn+1,x,u) = fi(t, x1, x2, ..., xn, u1,u2,..., um) does not depend on x0, it follows that

∂fi(x∗(t),u∗(t))

∂x0

= 0

for all i = 0, 1, 2, ..., n, n+ 1 and since fn+1(x,u) = 1 we have that

∂fn+1(x∗(t),u∗(t))

∂xj= 0

for all i = 0, 1, 2, ..., n, n+ 1. Consequently,

A(t) =

0 ∂f0(x∗(t),u∗(t))∂x1

· · · ∂f0(x∗(t),u∗(t))∂xn

∂f0(x∗(t),u∗(t))∂xn+1

0 ∂f1(x∗(t),u∗(t))∂x1

· · · ∂f1(x∗(t),u∗(t))∂xn

∂f1(x∗(t),u∗(t))∂xn+1

0 ∂f2(x∗(t),u∗(t))∂x1

· · · ∂f2(x∗(t),u∗(t))∂xn

∂f2(x∗(t),u∗(t))∂xn+1

......

. . ....

...

0 ∂fn(x∗(t),u∗(t))∂x1

· · · ∂fn(x∗(t),u∗(t))∂xn

∂fn(x∗(t),u∗(t))∂xn+1

0 0 · · · 0 0

(11.167)

and using the definition of fi(x,u) = fi(xn+1,x,u) =fi(t,x

∗(t),u∗(t)), it follows that

∂fi(x∗(t),u∗(t))

∂xn+1

=∂fi(t,x

∗(t),u∗(t))

∂t.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Therefore, the matrix A(t) is given by

A(t) =

0 ∂f0(t,x∗(t),u∗(t))∂x1

· · · ∂f0(t,x∗(t),u∗(t))∂xn

∂f0(t,x∗(t),u∗(t))∂t

0 ∂f1(t,x∗(t),u∗(t))∂x1

· · · ∂f1(x∗(t),u∗(t))∂xn

∂f1(t,x∗(t),u∗(t))∂t

0 ∂f2(t,x∗(t),u∗(t))∂x1

· · · ∂f2(x∗(t),u∗(t))∂xn

∂f2(t,x∗(t),u∗(t))∂t

......

. . ....

...

0 ∂fn(t,x∗(t),u∗(t))∂x1

· · · ∂fn(t,x∗(t),u∗(t))∂xn

∂fn(t,x∗(t),u∗(t))∂t

0 0 · · · 0 0

,

which has the block form

A(t) =

0 [∇xf0(t,x∗(t),u∗(t))]T ∂f0(t,x∗(t),u∗(t))∂t

0[∂fi(t,x

∗(t),u∗(t))∂xj

]n×n

[∂fi(t,x

∗(t),u∗(t))∂t

]n×1

0 0 0

.Observe that A(t) can be written in block form

A(t) =

[A(t) [∂f(t,x∗(t),u∗(t))

∂t](n+1)×1

0 0

]where

A(t) =

[0 [∇xf0(t,x∗(t),u∗(t))]T

0[∂fi(t,x

∗(t),u∗(t))∂xj

]n×n

](11.168)

is time dependent because of the explicit dependence of time in theproblem as well as the inclusion of the optimal state and control.

The negative transpose of A(t) is given by

−[A(t)]T =

0 [0]1×n 0

−[∇xf0(t,x∗(t),u∗(t))]n×1 −[∂fi(t,x

∗(t),u∗(t))∂xj

]Tn×n

0

−∂f0(t,x∗(t),u∗(t))∂t

−[∂f(t,x∗(t),u∗(t))

∂t

]Tn×1

0

so that the augmented adjoint system

d

dt

η0(t)η(t)ηn+1(t)

= −[A(t)]T

η0(t)η(t)ηn+1(t)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is equivalent to

d

dtη0(t) = 0, (11.169)

d

dtη(t) = −η0(t) [∇xf0(t,x∗(t),u∗(t))]

−[Jxf(t,x∗(t),u∗(t)]Tη(t), (11.170)

d

dtηn+1(t) = −η0(t)

∂f0(t,x∗(t),u∗(t))

∂t

−n∑i=1

ηi(t)∂fi(t,x

∗(t),u∗(t))

∂t. (11.171)

Note that the first two equations have the form

d

dt

[η0(t)η(t)

]= −[A(t)]T

[η0(t)η(t)

],

where the matrix A(t) is given by (11.168).The augmented Hamiltonian function is given by

H(η, x,u) = H(η0, η1, η2, ..., ηn, ηn+1, x0, x1, x2, ...,

xn, xn+1, u1, u2,..., um) (11.172)

= η0f0(xn+1,x, u) + 〈η,f(xn+1,x, u)〉+ ηn+1

= η0f0(xn+1,x, u) +n∑i=1

ηifi(xn+1,x, u) + ηn+1.

Since x = [x xn+1]T and η = [η ηn+1]T , we can write

H(η, x, u) = H(t, η, x,u) + ηn+1,

whereH(t, η, x, u)

is called the time dependent augmented Hamiltonian. Note alsothat M : Rn+2 × Rn+2 → R defined by


H(η, x, u) (11.173)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


has the form


H(η, x,u) = maxu∈Ω

H(t, η, x,u) + ηn+1

= M(t, η, x) + ηn+1. (11.174)

so that maximizing H(η, x,u) over Ω is equivalent to maximizingH(t, η, x,u) over Ω.

We now state the (Pontryagin) Maximum Principle for thenonautonomous optimal control problem and use the basic Maxi-mum Principle to prove this result.

Theorem 11.4 (Maximum Principle for NonautonomousSystems) Assume that f : R1×Rn × Rm → Rn, f0 : R1×Rn ×Rm → R, X0 ⊆ Rn, t0 ∈ R, X1 ⊆ Rn and Ω ⊆ Rm are given asabove and consider the control system

(S) x(t) = f(t,x(t),u(t)), t0 < t ≤ t1, (11.175)

with piecewise continuous controllers u(·) ∈ PWC(t0,+∞;Rm)satisfying u(t) ∈ Ω ⊆ Rm e.f. If

u∗(·) ∈ Θ =

u(·) ∈ PWC(t0, t1;Rm) : u(t) ∈ Ω e.f. andu(·) steers X0 to X1 at a finite time t1.

(11.176)

minimizes

J(u(·)) =

∫ t1

t0

f0(s,x(s),u(s))ds, (11.177)

on the set of admissible controls Θ with optimal response x∗(·)satisfying x∗(t0) = x∗0 ∈ X0 and x∗(t∗1) = x∗1 ∈ X1, then thereexists a non-trivial solution

η∗(t) =

η∗0(t)η∗(t)η∗n+1(t)

(11.178)


d

dt

η0(t)η(t)ηn+1(t)

= −[A(t)]T

η0(t)η(t)ηn+1(t)

, (11.179)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


such that

H(η∗(t), x∗(t),u∗(t)) = M(η∗(t), x∗(t)) = maxu∈Ω

H(η∗(t), x∗(t),u).

(11.180)Moreover, there is a constant η∗0 ≤ 0 such that

η∗0(t) ≡ η∗0 ≤ 0 (11.181)

and for all t ∈ [t0, t∗1],


H(η∗(t), x∗(t),u) ≡ 0. (11.182)

The above maximal principle is equivalent to the maximizationof the time augmented Hamiltonian

H(t, η∗(t), x∗(t),u)

and

maxu∈Ω

H(t, η∗(t), x∗(t),u) =

∫ t

t∗1

n∑i=0

η∗i (s)∂fi(s,x

∗(s), u∗(s))

∂tds.

(11.183)The transversality conditions imply that

η∗n+1(t∗1) = 0 (11.184)

and hence

maxu∈Ω

H(t∗1, η∗(t∗1), x∗(t∗1),u) = M(t∗1, η

∗(t∗1), x∗(t∗1)) = 0.

(11.185)Also, if X0 ⊆ Rn and X1 ⊆ Rn are manifolds with tangent spacesT0 and T1 at x∗(t0) = x∗0 ∈ X0 and x∗(t1) = x∗1 ∈ X1, respectively,then

η∗(t) = [η∗0(t) η∗1(t) η∗2(t) ... η∗n(t)]T = [η∗0(t) η∗(t)]T

can be selected to satisfy the transversality conditions

η∗(t0) ⊥ T0 (11.186)

andη∗(t∗1) ⊥ T1. (11.187)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The proof of this result comes from a direct application of thethe Maximum Principle given in Theorem 11.4 to this case wherex is replaced by [x xn+1]T (see pages 318 - 322 in Lee and Markus[119]).

Remark 11.5 One can still derive a Maximum Principle wherethe functions f : R1×Rn × Rm → Rn and f0 : R1×Rn × Rm → Rare not differentiable in time. However, some information is lostsince (11.171) and (11.183) do not make sense when the partialderivatives ∂

∂tfi(t, x1, x2, ..., xn, u1, u2,..., um) do not exist. Also, it

is important to note that A(·) given by equation (11.167) is not welldefined unless fi(t,x,u) are differentiable in time. Finally, notethat if the adjoint states η∗0 and η∗(·) are both zero, then η∗n+1(·)would be a constant and the transversality condition (11.184) im-plies that η∗n+1(·) = 0. Thus, η∗0 and η∗(·) can not be both zero.

We provide a statement of the Maximum Principle for thenonautonomous control problem that does not require smooth-ness in time. The result can be found in the books by Young [186]and Fleming [79].

Theorem 11.5 (Second Maximum Principle for Nonau-tonomous Systems) Assume that f : R1×Rn × Rm → Rn, f0 :R1×Rn × Rm → R, X0 ⊆ Rn, t0 ∈ R, X1 ⊆ Rn and Ω ⊆ Rm aregiven as above and consider the control system

(S) x(t) = f(t,x(t),u(t)), t0 < t ≤ t1, (11.188)

with piecewise continuous controllers u(·) ∈ PWC(t0,+∞;Rm)satisfying u(t) ∈ Ω ⊆ Rm e.f. If

u∗(·) ∈ Θ =

u(·) ∈ PWC(t0, t1;Rm) : u(t) ∈ Ω e.f. andu(·) steers X0 to X1 at a finite time t1.

(11.189)

minimizes

J(u(·)) =

∫ t1

t0

f0(s,x(s),u(s))ds, (11.190)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


on the set of admissible controls Θ with optimal response x∗(·)satisfying x∗(t0) = x∗0 ∈ X0 and x∗(t∗1) = x∗1 ∈ X1, then thereexists a non-trivial solution

η∗(t) =

[η∗0(t)η∗(t)

](11.191)

to the time dependent augmented adjoint equation

d

dt

[η0(t)η(t)

]= −[A(t)]T

[η0(t)η(t)

], (11.192)

such that the time dependent augmented Hamiltonian is maximized

H(t, η∗(t), x∗(t), u∗(t)) = M(t, η∗(t), x∗(t))

= maxu∈Ω

H(t, η∗(t), x∗(t),u). (11.193)

Moreover, there is a constant η∗0 ≤ 0 such that

η∗0(t) ≡ η∗0 ≤ 0. (11.194)

The transversality conditions imply that

maxu∈Ω

H(t∗1, η∗(t∗1), x∗(t∗1),u) = M(t∗1, η

∗(t∗1), x∗(t∗1)) = 0.

(11.195)Also, if X0 ⊆ Rn and X1 ⊆ Rn are manifolds with tangent spacesT0 and T1 at x∗(t0) = x∗0 ∈ X0 and x∗(t1) = x∗1 ∈ X1, respectively,then

η∗(t) = [η∗0(t) η∗1(t) η∗2(t) ... η∗n(t)]T = [η∗0(t) η∗(t)]T

can be selected to satisfy the transversality conditions

η∗(t0) ⊥ T0 (11.196)

andη∗(t∗1) ⊥ T1. (11.197)

If in addition, the functions fi(t,x, u), i = 0, 1, 2, . . . , n are C1 int, then

M(t, η∗(t), x∗(t)) =

∫ t

t∗1

n∑i=0

η∗i (s)∂fi(s,x

∗(s), u∗(s))

∂tds. (11.198)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


11.8 Application to the Nonautonomous

LQ Control Problem

The Maximum Principle Theorem 11.5 will be applied to theNonautonomous Linear Quadratic (LQ) control problem. We usethe notation

M : I → Rp×q

to denote a p × q (real valued) matrix function defined on aninterval I. In particular, if M : I → Rp×q, then

M(t) =

m1,1(t) m1,2(t) · · · m1,q(t)m2,1(t) m2,2(t) · · · m2,q(t)

......

. . ....

mp,1(t) mp,2(t) · · · mp,q(t)

where each entry is a real valued function defined on the intervalI. Likewise we say that M(·) is piecewise continuous, continuous,etc. if each component is piecewise continuous, continuous, etc.

We assume that t0, x0 ∈ Rn and t0 < t1 are given and thatA : [t0, t1] → Rn×n and B : [t0, t1] → Rn×m are piecewise continu-ous matrix valued functions. Consider the nonautonomous linearcontrol system

x(t) = A(t)x(t) +B(t)u(t) (11.199)


Also, let Q : [t0, t1] → Rn×n and R : [t0, t1] → Rm×m be piecewisecontinuous and symmetric matrix valued functions such that

QT (t) = Q(t) ≥ 0 and RT (t) = R(t) > 0,

for all t ∈ [t0, t1]. In addition we require that there is an α > 0such that for all t ∈ [t0, t1] we have

0 < αIm×m ≤ R(t). (11.201)

Condition (11.201) implies that for any u ∈ Rm,

α ‖u‖2 = α 〈u,u〉 ≤ 〈R(t)u,u〉

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and [R(t)]−1 exists. Let w ∈ Rm and define

u = [R(t)]−1 w ∈ Rm

so that

α∥∥[R(t)]−1w

∥∥2= α ‖u‖2 ≤ 〈R(t)u,u〉 ≤ ‖R(t)u‖ ‖u‖= ‖w‖

∥∥[R(t)]−1w∥∥ .

If w 6= 0, then [R(t)]−1w 6= 0 and dividing by ‖[R(t)]−1w‖ it fol-lows that

α∥∥[R(t)]−1w

∥∥ ≤ ‖w‖so that ∥∥[R(t)]−1w

∥∥ ≤ ‖w‖α

for all w ∈ Rm. Therefore,∥∥[R(t)]−1∥∥ ≤ 1

α

and [R(t)]−1 is bounded on [t0, t1]. Since R : [t0, t1] → Rm×m ispiecewise continuous, it then follows that R−1 : [t0, t1] → Rm×mis also piecewise continuous. This is important since we need themapping Λ : [t0, T ]→ Rm×n defined by

Λ(t) = [B(t)][R(t)]−1[B(t)]T

to also be piecewise continuous.We assume that the controls belong to the space PWC(t0, T ;Rm)

and the quadratic cost functional is defined by

J(u(·)) = 〈Sx(t1),x(t1)〉

+1

2

∫ t1

t0

〈Q(s)x(s),x(s)〉+ 〈R(s)u(s),u(s)〉 ds,

(11.202)

where S = ST ≥ 0 is a constant symmetric matrix and x(t) =x(t;u(·)) is the solution to the system (11.199) - (11.200). The

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Linear Quadratic (LQ) Optimal Control problem is to findu∗(·) ∈ PWC(t0, t1;Rm) so that

J(u∗(·)) = 〈Sx∗(T ),x∗(T )〉

+1

2

∫ t1

t0

〈Q(s)x∗(s),x∗(s)〉+ 〈R(s)u∗(s),u∗(s)〉 ds

≤ 〈Sx(T ),x(T )〉

+1

2

∫ t1

t0

〈Q(s)x(s),x(s)〉+ 〈R(s)u(s),u(s)〉 ds,

for all u(·) ∈ PWC(0, t1;Rm).

Formulation as an Optimal Control Problem: In order to setup the nonautonomous linear quadratic optimal control problemas a fixed time optimal control problem and apply the MaximumPrinciple, we need to:


2. Identify the initial set X0 ⊆ Rn, the terminal set X1 ⊆ Rn;

3. Identify the control constraint set Ω ⊆ Rm;

4. Identify the functions f : R1 × Rn × Rm −→ Rn, f0 : R1 ×Rn × Rm −→ R1;

5. Define the time dependent augmented Hamiltonian H(t, η,x,u);

6. Form the time dependent augmented adjoint system matrixA(t).

The initial time is t0 and the final time t1 is given and fixed.The initial set is the single initial vector X0 = x0 and sincethere is no terminal constraint, the terminal set is the whole statespace X1 = Rn. Also, there is no constraint on the control u(·) ∈PWC(t0, t1;Rm), hence the control constraint set is all of Rm andΩ = Rm. The functions f : R1 × Rn × Rm −→ Rn and f0 :R1 × Rn × Rm −→ R1 are given by

f(t,x,u) = A(t)x+B(t)u,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and

f0(t,x,u) =1

2〈Q(t)x,x〉+

1

2〈R(t)u,u〉 ≥ 0,

respectively.The time dependent augmented Hamiltonian H : R1×Rn+1×

Rn+1 × Rm −→ R1 is defined by

H(t, η, x,u) =η0

2〈Q(t)x,x〉+ 〈R(t)u,u〉

+ 〈η, A(t)x+B(t)u〉 (11.203)

=η0

2xTQ(t) x+

η0

2uTR(t)u+ [A(t)x]Tη

+ [B(t)u]Tη

=η0

2xTQ(t) x+

η0

2uTR(t)u+ xT [A(t)]Tη

+uT [B(t)]Tη,

where x =[x0 x

]T ∈ Rn+1, η =[η0 η

]T ∈ Rn+1 and u ∈Rm. The time dependent augmented function f : R1 × Rn+1 ×Rm −→ Rn+1 is defined by

f(t, x,u) =

[12〈Q(t)x,x〉+ 1

2〈R(t)u,u〉

A(t)x+B(t)u,

]=

[12xTQ(t)x+ 1

2uTR(t)u

A(t)x+B(t)u,

], (11.204)

so that the time dependent Jacobian is given by

Jxf(t, x,u) = Jxf(t,x, u) =

[0 [Q(t)x]T

0 A(t)

]. (11.205)

Now assume that (x∗(·),u∗(·)) is an optimal pair so that thematrix A(t) is given by

A(t) = Jxf(t, x,u)|(x∗(t),u∗(t)) = Jxf(t,x∗(t),u∗(t))

=

[0 [Q(t)x∗(t)]T

0 A(t)

](11.206)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and

−[A(t)]T = −[

0 0[Q(t)x∗(t)] [A(t)]T

]=

[0 0

− [Q(t)x∗(t)] −[A(t)]T

].

The time dependent augmented adjoint equation has the form

d

dt

[η0(t)η(t)

]= −[A(t)]T

[η0(t)η(t)

]=

[0 0

− [Q(t)x∗(t)] −[A(t)]T

] [η0(t)η(t)

](11.207)


d

dtη0(t) = 0

d

dtη(t) = −η0(t)Q(t)x∗(t)− [A(t)]Tη(t),

where again x∗(·) is the optimal trajectory. The first equationabove implies (as always) that the zero adjoint state is a constantη0(t) ≡ η0. The second equation is coupled to the state equationby the term −η0(t)Q(t)x∗(t).

If (x∗(·),u∗(·)) is an optimal pair for the LQ optimal controlproblem, then the Maximum Principle implies that there is a non-trivial solution

η∗(t) =

[η∗0(t)η∗(t)

]∈ Rn+1


d

dtη∗(t) = −η∗0Q(t)x∗(t)− [A(t)]Tη∗(t) (11.208)

and

H(t, η∗(t), x∗(t),u∗(t)) = maxu∈Rm

H(t, η∗(t), x∗(t),u) ≡ c(t).

Since Ω = Rm is open, then

DuH(t, η∗(t), x∗(t),u)|u=u∗(t) = DuH(t, η∗(t), x∗(t),u∗(t)) = 0,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where

DuH(t,η∗(t), x∗(t),u)

=∂

∂u

η∗02

[x∗(t)]TQ(t)x∗(t) +η∗02uTR(t)u

+

∂

∂u

[x∗(t)]T [A(t)]Tη∗(t) + uT [B(t)]Tη∗(t)

=

∂

∂u

η∗02uTR(t)u+ uT [B(t)]Tη∗(t)

= η∗0R(t)u+ [B(t)]Tη∗(t).

Hence, when u = u∗(t) it follows that

DuH(t, η∗(t), x∗(t),u)|u=u∗(t) = η∗0R(t)u+ [B(t)]Tη∗(t) |u=u∗(t)

(11.209)

= η∗0R(t)u∗(t) + [B(t)]Tη∗(t) = 0.

Applying the transversality condition at x∗(0) = x0 ∈ X0 =x0 yields that η∗(0) can be any vector since T0 = x0. However,at t1 we have the transversality condition

η∗(t1) = η0Sx∗(t1). (11.210)

This boundary condition in turn implies that η∗0 < 0. To see thisassume that η∗0 = 0 and observe that the adjoint equation (11.82)reduces to the linear system

d

dtη∗(t) = −η∗0Qx∗(t)− ATη∗(t)

= −ATη∗(t).

Therefore, η∗(·) would be a solution of the homogenous linearinitial value problem

d

dtη∗(t) = −ATη∗(t), η∗(t∗1) = η0Sx

∗(t1) = 0

and hence it follows that η∗(t) ≡ 0. Consequently, we have shownthat if η∗0 = 0, then η∗(t) ≡ 0 and hence

η∗(t) =

[η∗0(t)η∗(t)

]≡ 0

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


which contradicts the statement that η∗(t) is a nontrivial solutionof the time augmented adjoint equation (11.207).

Since η∗0 < 0 we can solve (11.209) for the optimal control. Wehave

η∗0R(t)u∗(t) + [B(t)]Tη∗(t) = 0

which yieldsη∗0R(t)u∗(t) = −[B(t)]Tη∗(t)

and

u∗(t) = [R(t)]−1[B(t)]T[η∗(t)

−η∗0

].

Since the matrix R(t) = [R(t)]T > 0 is nonsingular and [R(t)]−1 ispiecewise smooth we have the following expression for the (possiblypiecewise continuous) optimal control

u∗(t) = [R(t)]−1[B(t)]T[η∗(t)

−η∗0

]. (11.211)

Summarizing, it follows that the optimal trajectory can be ob-tained by solving the two point boundary value problem definedby the coupled state and adjoint equations

ddtx∗(t) = A(t)x∗(t)− 1

η∗0[B(t)][R(t)]−1[B(t)]Tη∗(t), x∗(0) = x0,

ddtη∗(t) = −η∗0Q(t)x∗(t)− [A(t)]Tη∗(t), η∗(t1) = η0Sx

∗(t1),(11.212)

and setting

u∗(t) = [R(t)]−1[B(t)]T[η∗(t)

−η∗0

].

To eliminate the η∗0 term, we divide the adjoint equation above by−η∗0 which yields

d

dt

[η∗(t)

−η∗0

]= Q(t)x∗(t)−[A(t)]T

[η∗(t)

−η∗0

],

[η∗(t1)

−η∗0

]= −Sx∗(t1).

Defining the normalized adjoint state λ∗(t) by

λ∗(t) ,η∗(t)

−η∗0

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


produces the optimality conditions

ddtx∗(t) = A(t)x∗(t) + [B(t)][R(t)]−1[B(t)]Tλ∗(t), x∗(0) = x0,

ddtλ∗(t) = Q(t)x∗(t)− [A(t)]Tλ∗(t), λ∗(t1) = −Sx∗(t1),

(11.213)where the optimal control is defined by

u∗(t) = R−1(t)BT (t)λ∗(t). (11.214)


d

dt

[x∗(t)λ∗(t)

]=

[A(t) [B(t)][R(t)]−1[B(t)]T

Q(t) −[A(t)]T

] [x∗(t)λ∗(t)

],

(11.215)with boundary conditions

x∗(0) =[In×n 0n×n

] [ x∗(0)λ∗(0)

]= x0, (11.216)

and

λ∗(t1) =[−S 0n×n

] [ x∗(t1)λ∗(t1)

]. (11.217)




x1(t) = x2(t) + u1(t),

x2(t) = u1(t),


x1(0) = 1, x2(0) = 1,

and terminal conditionx2(2) = 0.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



J(u(·)) =1

2

2∫0

√[u1(s)]2 + [u2(s)]2ds.


x1(t) = x2(t),

x2(t) = u(t),


x1(0) = 1, x2(0) = 1


x1(1) = 0, x2(1) is free.


J(u(·)) =1

2

1∫0

[u(s)]2ds.


x1(t) = x2(t),

x2(t) = u(t),


x1(0) = 1, x2(0) = 1


x1(1) = 0, x2(1) = 0.


J(u(·)) =1

2

1∫0

[u(s)]2ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



x1(t) = x2(t),

x2(t) = u(t),


x1(0) = 1, x2(0) = 1,


x1(t1) = 0, x2(1) = 0.


J(u(·)) =

1∫0

√1 + [u(s)]2ds.


x(t) = x(t) + u(t),



x(2) is free.


J(u(·)) =1

2[x(2)]2 +

1

2

2∫0

[x(s)]2 + [u(s)]2

ds.


x(t) = x(t) + u(t),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii




x(1) is free.


J(u(·)) =1

2

1∫0

[x(s)− e−s]2 + [u(s)]2

ds.


x(t) = x(t) + u(t),



x(1) is free.


J(u(·)) =1

2

2∫0

[x(s)− e−s]2ds.


x1(t) = x2(t) + cos(u(t)),

x2(t) = sin(u(t)),


x1(0) = 0, x2(0) = 0,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



x1(1) and x2(1) are free.

Let the control constraint set be Ω = [−π,+π]. Investigate theoptimal control problem for the cost functional

J(u(·)) = −1∫

0

x2(s) + cos(u(s)) ds.


x(t) = −x(t) + u(t),



x(2) is free.


J(u(·)) =

2∫0

[x(s)]2 + [u(s)]2

ds.

Derive and solve the Riccati differential equation that defines theoptimal feedback controller.


x(t) = x(t) + bu(t),



x(1) is free.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Here, b 6= 0 and the control constraint set is Ω = R1. Investigatethe optimal control problem for the cost functional

J(u(·)) =

1∫0

q[x(s)]2 + [u(s)]2

ds,

where q ≥ 0. Derive and solve the Riccati differential equationthat defines the optimal feedback controller. What happens whenq −→ 0?

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 12

Linear Control Systems

Although we have focused on control and optimal control of non-linear control systems, linear control systems play a key role inpractical applications of modern control. Linear systems arise nat-urally when the dynamic model is linear and when linearization isused to produce an approximate system. We discuss linearizationlater, but first we present a short introduction to the basic ideasand control system properties. Good references for this sectioninclude [7], [45], [110] and [115].

12.1 Introduction to Linear Control

Systems

We focus on linear control systems defined by a linear differentialequation

x(t) = A(t)x(t) +B(t)u(t) +G(t)w(t) (12.1)

with initial datax(t0) = x0 ∈ Rn. (12.2)

The inputs to the system are defined by the control u(·) and a“disturbance” w(·). As before, we assume that u(·) is piecewisecontinuous and in this section we assume that the disturbancew(·)is also piecewise continuous. The sensed output is defined by

y(t) = C(t)x(t) +H(t)v(t) ∈ Rp (12.3)

459

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

460 Chapter 12. Linear Control Systems

where y(·) is called the sensed (or measured) output and v(·) rep-resents sensor noise. Again, in this section it is useful to think ofv(·) as a piecewise continuous function. However, later it will beimportant to view both w(·) and v(·) as white noise. In the casewith no disturbances, the control system is defined by

x(t) = A(t)x(t) +B(t)u(t),y(t) = C(t)x(t).

(12.4)

Recall that many of the previous optimal control problems areformulated in terms of finding a control u(·) that steers an initialstate x0 to a terminal state x1 in a finite time t1 > t0. Thus, wemust consider the following question.

Given an initial state x0 ∈ Rn and a final state x1 ∈ Rnis there a time t1 > t0 and a corresponding controlu(·) ∈ PWC(t0, t1;Rm) such that u(·) steers x0 ∈ Rnto x1 ∈ Rn at time t1? This question motivates thefollowing definition.

Definition 12.1 (Controllability) The system

x(t) = A(t)x(t) +B(t)u(t) (12.5)

is said to be (completely) controllable if for any initial time t0 andstates x0 ∈ Rn and x1 ∈ Rn, there exists a finite time t1 > t0 anda piecewise continuous control u(·) ∈ PWC(t0, t1;Rm) such thatthe solution to (12.5) satisfying the initial condition

x(t0) = x0

also satisfiesx(t1) = x1.

In particular, (12.5) is controllable if and only if x0 can be steeredto x1 in finite time t1 − t0.

We shall also be interested in the concept of observability.Loosely speaking, observability implies that one can reconstructthe initial state of the system from given input - output pairs.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Chapter 12. Linear Control Systems 461

Definition 12.2 (Observability) The systemx(t) = A(t)x(t) +B(t)u(t)y(t) = C(t)x(t)

is (completely) observable if for all t0, there exists a t1 > t0 suchthat if

y(t; t0,x0,u(·)) = C(t)x(t; t0,x0,u(·)) = y(t; t0, x0,u(·))= C(t)x(t; t0, x0,u(·))

for all t ∈ [t0, t1] and all controls u(·) ∈ PWC(t0, t1;Rm), then

x0 = x0.

Here, x(t; t0,x0,u(·)) denotes the solution to (12.5) with initialcondition x(t0) = x0. In particular, the initial state x0 at time t0 isuniquely determined by the output y(t; t0,x0,u(·)) on the interval[t0, t1].

Remark 12.1 In the special case where the matrix C(t) is ann× n nonsingular matrix, then

y(t0) = C(t0)x(t0)

implies x0 = x(t0) = [C(t0)]−1y(t0) so that the system is triviallyobservable. Note that controllability is determined by the matricesA(·) and B(·) and observability is determined by the matrices A(·)and C(·). Thus, we shall simply say that the pair (A(·), B(·)) iscontrollable or that (A(·), C(·)) is observable.

In the previous section we saw where linear control systemsoccur either naturally or from linearization. In this section we con-sider the LQ control problem with time varying matrix coefficients.Most of the following material can be found in the standard books[6], [5], [7], [16], [45], [71] and [73], [81], [115] and [187].

Consider the linear system on the time interval [t0, t1] given by

x(t) = A(t)x(t) +B(t)u(t), x(t0) = x0 ∈ Rn. (12.6)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Recall that the state transition matrix is defined by

d

dtΦ(t, s) = A(t)Φ(t, s), Φ(τ, τ) = In×n (12.7)

and the solution to (12.6) is given by the Variation of ParametersFormula

x(t) = Φ(t, t0)x0 +

∫ t

t0

Φ(t, s)B(s)u(s)ds. (12.8)

Let

Wc(t0, t1) ,∫ t1

t0

Φ(t0, s)B(s)BT (s)ΦT (t0, s)ds

and

Wr(t0, t1) ,∫ t1

t0

Φ(t1, s)B(s)BT (s)ΦT (t1, s)ds.

The matrices Wc(t0, t1) and Wr(t0, t1) are called the controlla-bility Gramian and reachability Gramian, respectively (see Chap-ter 3 in [7]). Observe that

Φ(t1, t0)Wc(t0, t1)ΦT (t1, t0)

= Φ(t1, t0)

[∫ t1

t0


]ΦT (t1, t0)

=

[∫ t1

t0

Φ(t1, t0)Φ(t0, s)B(s)BT (s)ΦT (t0, s)ΦT (t1, t0)ds

]=

[∫ t1

t0

[Φ(t1, t0)Φ(t0, s)]B(s)BT (s)[Φ(t1, t0)Φ(t0, s)]Tds

]=

[∫ t1

t0

[Φ(t1, s)]B(s)BT (s)[Φ(t1, s)]Tds

]= Wr(t0, t1),

so thatWr(t0, t1) = Φ(t1, t0)Wc(t0, t1)ΦT (t1, t0).

Moreover, since Φ(t1, t0) is non-singular it follows that Wr(t0, t1)and Wc(t0, t1) have the same range.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Theorem 12.1 (Minimum Energy Theorem) The state x0

at time t0 can be transferred to the state x1 at time t1 if andonly if

w = x1 − Φ(t1, t0)x0 ∈ Range(Wr(t0, t1)).

If w = x1−Φ(t1, t0)x0 ∈ Range(Wr(t0, t1)), then there is a vectorv ∈ Rn such that w = x1−Φ(t1, t0)x0 = Wr(t0, t1)v and a controlthat transfers x0 at time t0 to x1 at time t1 is given by

u(t) = BT (t)ΦT (t1, t)v. (12.9)

If uo(·) is any other controller that steers x0 at time t0 to the statex1 at time t1, then

1

2

∫ t1

t0

uT (s)u(s)ds ≤ 1

2

∫ t1

t0

uTo (s)uo(s)ds (12.10)

and u(t) = BT (t)ΦT (t1, t)v minimizes the “energy” functional de-fined by

E = J(u(·)) =1

2

∫ t

t0

〈u(s),u(s)〉 ds =1

2

∫ t

t0

u(s)Tu(s)ds.

Moreover, if [Wr(t0, t1)]−1 exists, then the energy required to makethis transfer is given by

E =1

2

∫ t1

t0

uT (s)u(s)ds =1

2wT [Wr(t0, t1)]−1w

=1

2

⟨[Wr(t0, t1)]−1w,w

⟩. (12.11)

Proof : Note that if x1 − Φ(t1, t0)x0 = Wr(t0, t1)v and u(t) =BT (s)ΦT (t1, t)v, the Variation of Parameters formula implies

x(t) = Φ(t, t0)x0 +

∫ t

t0

Φ(t, s)B(s)u(s)ds.

= Φ(t, t0)x0 +

∫ t

t0

Φ(t, s)B(s)BT (s)ΦT (t1, s)vds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


so that

x(t1) = Φ(t1, t0)x0 +

[∫ t1

t0


]v

= Φ(t1, t0)x0 +Wr(t0, t1)v.

Hence, we have

x(t1) = Φ(t1, t0)x0 +Wr(t0, t1)v = x1

so that u(t) = BT (s)ΦT (t1, t)v steers x0 at time t0 to x1 at timet1.

Assume now that uo(·) is any other controller that steers x0

at time t0 to the state x1 at time t1. The Variation of Parametersformula implies

x1 = Φ(t1, t0)x0 +

∫ t1

t0

Φ(t1, s)B(s)uo(s)ds.

Likewise,

x1 = Φ(t1, t0)x0 +

∫ t1

t0

Φ(t1, s)B(s)u(s)ds

and subtracting the two we obtain∫ t1

t0

Φ(t1, s)B(s)[uo(s)− u(s)]ds = 0.

Pre-multiplying by vT yields

0 = vT∫ t1

t0

Φ(t0, s)B(s)[uo(s)− u(s)]ds

=

∫ t1

t0

vTΦ(t0, s)B(s)[uo(s)− u(s)]ds

=

∫ t1

t0

[BT (s)ΦT (t1, t)v

]T[uo(s)− u(s)]ds

and sinceu(t) = BT (t)ΦT (t1, t)v,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


it follows that

0 =

∫ t1

t0

[u(t)]T [uo(s)− u(s)]ds.

Therefore, we have

0 ≤ 1

2

∫ t1

t0

[uo(s)− u(s)]T [uo(s)− u(s)]ds

=1

2

∫ t1

t0

[uTo (s)uo(s)− 2uT (s)uo(s) + uT (s)u(s)

]ds

=1

2

∫ t1

t0

[uTo (s)uo(s)− 2uT (s)uo(s) + 2uT (s)u(s)− uT (s)u(s)

]ds

=1

2

∫ t1

t0

uTo (s)uo(s)ds−∫ t1

t0

[uT (s)uo(s)− uT (s)u(s)

]ds

− 1

2

∫ t1

t0

uT (s)u(s)ds

=1

2

∫ t1

t0

uTo (s)uo(s)ds−∫ t1

t0

uT (s) [uo(s)− u(s)] ds

− 1

2

∫ t1

t0

uT (s)u(s)ds

=1

2

∫ t1

t0

uTo (s)uo(s)ds−1

2

∫ t1

t0

uT (s)u(s)ds

and hence

1

2

∫ t1

t0

uT (s)u(s)ds ≤ 1

2

∫ t1

t0

uTo (s)uo(s)ds,

so that u(t) = BT (t)ΦT (t1, t)v minimizes the energy needed totransfer x0 at time t0 to the state x1 at time t1.

Finally, if [Wr(t0, t1)]−1 exists then

v = [Wr(t0, t1)]−1[x1 − Φ(t1, t0)x0] = [Wr(t0, t1)]−1w]

and the control is given by

u(t) = BT (t)ΦT (t1, t) v = BT (t)ΦT (t1, t)[Wr(t0, t1)]−1w.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Thus, the minimum energy required to transfer x0 at time t0 tothe state x1 at time t1 is given by

E =1

2

∫ t

t0

〈u(s),u(s)〉 ds =1

2

∫ t

t0

u(s)Tu(s)ds

=1

2

∫ t

t0

[BT (s)ΦT (t1, s)[Wr(t0, t1)]−1w]T

× [BT (s)ΦT (t1, s)[Wr(t0, t1)]−1w]ds

=1

2

∫ t

t0

wT [[Wr(t0, t1)]−1]TΦ(t1, s)BT (s)

×BT (s)ΦT (t1, s)[Wr(t0, t1)]−1w]ds

=1

2wT [[Wr(t0, t1)]−1]T

[∫ t

t0

Φ(t1, s)BT (s)BT (s)ΦT (t1, s)ds

]× [Wr(t0, t1)]−1w]

=1

2wT [[Wr(t0, t1)]−1]T [Wr(t0, t1)] [Wr(t0, t1)]−1w]

=1

2wT [[Wr(t0, t1)]−1]Tw].

Since [Wr(t0, t1)]T = Wr(t0, t1), it follows that [[Wr(t0, t1)]−1]T =[Wr(t0, t1)]−1 and hence

E =1

2wT [[Wr(t0, t1)]−1]Tw =

1

2wT [Wr(t0, t1)]−1w (12.12)

and this completes the proof. Observe that the previous theorem provides a characterization

of the (complete) controllability of the system (12.6). In particular,we have the following result.

Theorem 12.2 The system (12.6) is completely controllable attime t0 if and only if there is a time t1 > t0 such that Wr(t0, t1)has maximal rank n. Since rank[Wr(t0, t1)] = rank[Wc(t0, t1)], italso follows that (12.6) is completely controllable at time t0 if andonly if there is a time t1 > t0 such that Wc(t0, t1) has maximalrank n.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


For time invariant systems

x(t) = Ax(t) +Bu(t) (12.13)

one has the following result (see [7], [45], [110] and [115]).

Theorem 12.3 For the time invariant system (12.13),

Φ(t, s) = eA(t−s)

and Range(CM) = Range(Wr(0, t)) for all time t > 0, where

CM = [B,AB,A2B, . . . , An−1B].

Remark 12.2 It is important to note that for time invariant sys-tems where

Φ(t, s) = eA(t−s)

the transpose of Φ(t, s) is given by

[Φ(t, s)]T = eAT (t−s)

so thatd

dt[Φ(t, s)]T = AT eA

T (t−s) = AT [Φ(t, s)]T .

However, in general one has

d

dt[Φ(t, s)]T =

[d

dt[Φ(t, s)]

]T= [A(t)Φ(t, s)]T = [Φ(t, s)]TA(t)T .

Moreover, (in general) Φ(t, s) and A(t) do not commute so that

[Φ(t, s)]TA(t)T 6= A(t)T [Φ(t, s)]T

and one cannot imply that ddt

[Φ(t, s)]T = A(t)T [Φ(t, s)]T . For ex-ample, consider the problem with

A(t) =

[0 1−2t2

2t

]for t > 1. The state transition matrix is given by

Φ(t, s) =

[(2ts− t2

s2) ( t

2

s− t)

(2s− 2t2

s2) (2t

s− 1)

]and it is easy to show that

[Φ(t, s)]A(t) 6= A(t)[Φ(t, s)].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Example 12.1 Consider the model of the rocket sled

d

dt

[x1(t)x2(t)

]=

[0 10 0

] [x1(t)x2(t)

]+

[0

1/m

]u(t).

A direct calculation yields

Φ(t, s) = eA(t−s) =

[1 (t− s)0 1

]and

Φ(t, s)BBT [Φ(t, s)]T =1

m2

[(t− s)2 (t− s)(t− s) 1

].

Therefore,

Wr(0, t) =

∫ t

0

Φ(t1, s)B(s)BT (s)ΦT (t1, s)ds =1

m2

[t33

t22

t22

t

]is non-singular.

Again, in the special case where the system is autonomous (i.e.time-invariant) and has the form

x(t) = Ax(t) +Bu(t) ∈ Rny(t) = Cx(t) ∈ Rp , (12.14)

one has the following results which can be found in [7], [45], [110],[115] and [119].

Theorem 12.4 Consider the linear autonomous system (12.14).(i) The autonomous linear system (12.14) is controllable if andonly if the n× nm controllability matrix

CM = [B,AB,A2B, . . . , An−1B] (12.15)

has rank n.

(ii) The linear autonomous system (12.14) is observable if andonly if the n× np observability matrix

OM = [CT , ATCT , [AT ]2CT , . . . , [AT ]n−1CT ] (12.16)

has rank n.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


The previous results point to an important dual relationshipbetween controllability and observability. Given the system (12.14)we define the dual system by

z(t) = ATz(t) + CTv(t) ∈ Rnζ(t) = BTz(t) ∈ Rm (12.17)

where CT is now an n × p matrix and BT is an m × n matrix.Note that the dual system has inputs from the output space of thesystem (12.14) and outputs from the input space of (12.14). Thus,we have the following duality result.

Theorem 12.5 The system (12.14) is controllable if and only ifthe dual system (12.17) is observable and (12.14) is observable ifand only if the dual system (12.17) is controllable.

Example 12.2 Consider the rocket control problem defined by thesystem

d

dt

[x1(t)x2(t)

]=

[0 10 0

] [x1(t)x2(t)

]+

[01/m

]u(t), (12.18)

orx(t) = Ax(t) +Bu(t), (12.19)


A =

[0 10 0

]and B =

[01/m

],

respectively. Here, x(t) =[x1(t) x2(t)

]T, where x1(t) is the po-

sition of the sled and x1(t) is the velocity. Since n = 2, the con-trollability matrix is given by

CM = [B,AB] =

[0

1/m1/m

0

]and clearly rank(CM) = 2 so that the system is controllable. Con-sider the output given by

y(t) = Cx(t) =[

0 1] [ x1(t)

x2(t)

]= x2(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Here C =[

0 1]

and hence

CT =

[01

].

Moreover,

AT =

[0 01 0

]so that the observability matrix is given by

OM = [CT , ATCT ] =

[01

01

].

Since rank(OM) = 1 the system is not observable. On the otherhand, consider the case where the the output is given by

y(t) = Cx(t) =[

1 0] [ x1(t)

x2(t)

]= x1(t).

Here C =[

1 0]

and hence

CT =

[10

]so that the observability matrix is given by

OM = [CT , ATCT ] =

[10

01

]and rank(OM) = 2. In this case the system is observable.

We see that sensing (measuring) the velocity leads to an unob-servable system while sensing the position produces an observablesystem.

Example 12.3 Consider the system

d

dt

[x1(t)x2(t)

]=

[−1 00 1

] [x1(t)x2(t)

]+

[01

]u(t), (12.20)

with output

y(t) = y(t) = Cx(t) =[

1 1] [ x1(t)

x2(t)

]= x1(t) + x2(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Here,

A =

[−1 00 1

], B =

[01

]and C =

[1 1

]so that

CM = [B,AB] =

[01

01

]and

OM = [CT , ATCT ] =

[11−11

].

Since rank(CM) = 1 the system is not controllable, but sincerank(OM) = 2 the system is observable.

Stability is of primary importance in many control systems andensuring stability often is a major goal in feedback design. Werecall the basic results for stability of linear autonomous systems.

Theorem 12.6 Let A be a constant n × n matrix and supposethat all the eigenvalues λi, i = 1, 2, . . . , n of A have negative realpart. Then every solution of x(t) = Ax(t) is exponentially asymp-totically stable. In particular, there exists constants M ≥ 1 andγ > 0 such that

‖x(t)‖ ≤Me−γt ‖x(0)‖ (12.21)

for all t ≥ 0. On the other hand, if A has one eigenvalue withpositive real part, then every solution of x(t) = Ax(t) is unstable.

Since the stability of the system

x(t) = Ax(t)

is determined by the eigenvalues of A, we say that A is a stablematrix if and only if

<(λi) < 0 (12.22)

for all eigenvalues λi, i = 1, 2, . . . , n of A.As observed in the previous chapter (see Section 11.5), linear

quadratic optimal control problems yield optimal controllers thatare of the form of linear state feedback

u∗(t) = −K(t)x∗(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


We shall see later that in some cases the gain matrix K(t) is con-stant so that the feedback controller has the form

u∗(t) = −Kx∗(t). (12.23)

If we apply a feedback control law of the form (12.23) to the controlsystem


then the closed-loop system has the form

x(t) = Ax(t)−BKx(t) = [A−BK]x(t). (12.24)

Consequently, the closed-loop system is stable if and only if thematrix [A − BK] is a stable matrix. Thus, we are interested inthe issue of finding a gain matrix such that [A − BK] is a stablematrix and this motivates the following definitions.

Definition 12.3 The control systemx(t) = Ax(t) +Bu(t) ∈ Rny(t) = Cx(t) ∈ Rp (12.25)

is stabilizable if there exists a m×n matrix K such that [A−BK]is a stable matrix.

Definition 12.4 The control systemx(t) = Ax(t) +Bu(t) ∈ Rny(t) = Cx(t) ∈ Rp (12.26)

is detectable if there exists a n × p matrix F such that [A − FC]is a stable matrix.

Observe that if one considers the dual system (12.17)z(t) = ATz(t) + CTv(t) ∈ Rnζ(t) = BTz(t) ∈ Rm , (12.27)

then the dual system is stabilizable if there is a matrix K suchthat [AT − CT K] is a stable matrix. However, since a ma-trix and its transpose have the same eigenvalues it follows that

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


[AT − CT K]T = [A − KTC] is a stable matrix. Thus, if we setF = KT , then [A−FC] is a stable matrix and the system (12.26)is detectable. Likewise, we see that if the dual system (12.27) isdetectable, then the system (12.26) is stabilizable. Consequently,like controllability and observability, stabilizability and detectabil-ity are dual concepts for autonomous systems.

Consider the example above defined by the system

d

dt

[x1(t)x2(t)

]=

[−1 00 1

] [x1(t)x2(t)

]+

[01

]u(t). (12.28)

Recall that (12.28) is not controllable. However, let K =[

0 4]

and compute

[A−BK] =

[−1 00 1

]−[

01

] [0 4

]=

[−1 00 1

]−[

0 00 4

]=

[−1 00 −3

].

Since the eigenvalues of

[A−BK] =

[−1 00 −3

]are λ1 = −1 and λ2 = −3, [A − BK] is a stable matrix andhence (12.28) is stabilizable. Thus, uncontrollable systems can bestabilizable. Likewise, there are detectable systems that are notobservable so that stabilizability and detectability are weaker con-ditions than controllability and observability.

12.2 Linear Control Systems Arising

from Nonlinear Problems

Although linear control systems occur naturally in many applica-tions, linear systems also play a central role in control problemsgoverned by nonlinear systems. In some cases, linearization offersthe only practical approach to the analysis and design of controlsystems. Also, of immense importance is the issue of robustnessand sensitivity of optimal controllers with respect to parametersand disturbances. We briefly discuss these processes below.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


12.2.1 Linearized Systems

Consider the case where u(·) ∈ PWC(t0, t1) and x(·) ∈PWS(t0, t1) is the response to the nonlinear control system

(NS)d

dtx(t) = f(x(t), u(t)), (12.29)

with initial conditionx(t0) = x0.

We call u(·) the nominal control and x(·) the nominal trajectory.In many cases the nominal control could be an optimal controller

for a given problem, but for the moment we simply assume it is agiven control. Let

u(t) = u(t) + v(t)

where we assume that v(t) is “small” and let

x(t) = x(t) + z(t)

be the response to the control u(t) = u(t) + v(t) with initialcondition

x(t0) = x(t0) + z(t0) = x0 + z0.

Assuming that both v(t) and z0 are small, one would expectthat z(t) is small. Applying Taylor’s theorem we have

d

dtx(t) =

d

dtx(t) +

d

dtz(t) = f(x(t) + z(t), u(t) + v(t))

= f(x(t), u(t)) + [Jxf(x(t), u(t))]z(t)

+ [Juf(x(t), u(t))]v(t) +HOT (t).

Here, [Jxf(x(t), u(t))] and [Juf(x(t), u(t))] are the Jacobian ma-trices and the higher order terms HOT (t) are “small” with respectto the control v(t) and response z(t). Hence,

d

dtx(t) +

d

dtz(t) = f(x(t), u(t)) + [Jxf(x(t), u(t))]z(t)

+ [Juf(x(t), u(t))]v(t) +HOT (t),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and sinced

dtx(t) = f(x(t), u(t)),

it follows that

d

dtz(t) = [Jxf(x(t), u(t))]z(t) + [Juf(x(t), u(t))]v(t)

+HOT (t).

Therefore, the “variation” z(t) is an approximate solution tothe linear system

d

dtz(t) = [Jxf(x(t), u(t))]z(t) + [Juf(x(t), u(t))]v(t)

with input v(t). Defining

A(t) , [Jxf(x(t), u(t))] and B(t) , [Juf(x(t), u(t))]

leads to the linearized system

z(t) = A(t)z(t) +B(t)v(t). (12.30)

The theory of ordinary differential equations can be used to pro-vide a rigorous basis of the linearization (see [58], [138] and [153]).

In the special case where u(t) ≡ u and x(t) ≡ x are constants,then

A , [Jxf(x, u)] and B , [Juf(x, u)]

are time independent and the linearized system (12.30) is an au-tonomous system

z(t) = Az(t) +Bv(t). (12.31)

12.2.2 Sensitivity Systems

Assume the control is parameterized, say u(t) = u(t, q) whereq ∈ Rp. For a fixed parameter q let the nominal control be definedby u(t) = u(t, q) with nominal trajectory x(t) = x(t, q). The rawsensitivity of x(t, q) at q = q is defined by the partial derivative

∂x(t, q)

∂q= ∂qx(t, q) = xq(t, q)|q=q = xq(t, q).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Observe that for a fixed t, the raw sensitivity is nothing more thanthe derivative of the nonlinear mapping G(t) : Rp −→ Rn definedby

G(t)q , x(t, q).

Hence, for each t and q the derivative ∂qx(t, q) is a linear operator∂qx(t, q) : Rp −→ Rn which has the matrix representation

Jqx(t, q) =

∂x1(t,q)∂q1

∂x1(t,q)∂q2

· · · ∂x1(t,q)∂qp

∂x2(t,q)∂q1

∂x2(t,q)∂q2

· · · ∂x2(t,q)∂qp

......

. . ....

∂xn(t,q)∂q1

∂xn(t,q)∂q2

· · · ∂xn(t,q)∂qp

. (12.32)

We are interested in computing the raw sensitivity as a functionof time.

Given a parameter q ∈ Rp and the control u(t) = u(t, q), thenthe corresponding trajectory x(t) = x(t, q) satisfies the nonlinearsystem (12.29) which explicitly has the form

x(t, q) =d

dtx(t, q) = f(x(t, q),u(t, q)).

It is important to note that x(t, q) is a function of two variables tand q so that when we write d

dtx(t, q) or x(t, q) we mean

d

dtx(t, q) =

∂

∂tx(t, q)

so that∂

∂tx(t, q) = f(x(t, q),u(t, q)). (12.33)

Consequently, taking the partial derivatives of both sides of (12.33)with respect to q yields

∂

∂q

[∂

∂tx(t, q)

]=

∂

∂q[f(x(t, q),u(t, q))]

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and applying the chain rule we obtain

∂

∂q

[∂

∂tx(t, q)

]=

∂

∂q[f(x(t, q),u(t, q))]

=∂

∂xf(x(t, q),u(t, q)) ∂

∂qx(t, q)

+∂

∂uf(x(t, q),u(t, q)) ∂

∂qu(t, q),

where “” denotes function composition. Assuming that x(t, q)is smooth in both parameters, we can interchange the order ofintegration to produce

∂

∂t

[∂

∂qx(t, q)

]=

∂

∂q[f(x(t, q),u(t, q))]

=∂

∂xf(x(t, q),u(t, q)) ∂

∂qx(t, q)

+∂

∂uf(x(t, q),u(t, q)) ∂

∂qu(t, q).

In terms of the matrix representations of the operators, we havethat

∂

∂t[Jqx(t, q)] = [Jxf(x(t, q),u(t, q))] [Jqx(t, q)]

+ [Juf(x(t, q),u(t, q))] [Jqu(t, q)] .

Again, we define

A(t) , Jxf(x(t, q),u(t, q)) and B(t) , Juf(x(t, q),u(t, q))

and lets(t, q) , Jqx(t, q).

It now follows that the sensitivity (matrix) s(t, q) satisfies thelinear matrix equation

∂

∂t[s(t, q)] = A(t)s(t, q) + B(t) [Jqu(t, q)] , (12.34)

where Jqu(t, q) is the Jacobian matrix for u(t, q).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Therefore, the linear matrix differential equation (12.34) de-fines the dynamics of the time evolution of the sensitivitiess(t, q) , Jqx(t, q) at q = q. Equation (12.34) is called the Con-tinuous Sensitivity Equation. If we introduce the additionalnotation

S(t) , s(t, q) = Jqx(t, q)

andV (t) , Jqu(t, q),

then the continuous sensitivity equation has the form

S(t) = A(t)S(t) + B(t)V (t). (12.35)

Observe that if the initial state x(t0) = x0 does not depend onq, then the initial condition for the continuous sensitivity equation(12.35) is given by

S(t0) = 0.

However, if x(t0) = x0(q) also depends on the parameter q, thenthe appropriate initial condition for the matrix sensitivity equation(12.35) is given by

S(t0) = Jqx0(q). (12.36)

Even in the special case where x(t) = x(t, q) is a scalar, thesensitivity s(t, q) is a vector,

s(t, q) , Jqx(t, q) = ∇qx(t, q)

and the sensitivity system has the form

∂

∂t[∇qx(t, q)] = A(t)[∇qx(t, q)] + B(t) [∇qu(t, q)] .

12.3 Linear Quadratic Optimal Con-

trol

In the previous sections we saw where linear control systems occurnaturally as the basic model, from linearization of a nonlinearsystem and as the sensitivity equations for a nonlinear problem. In

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


many cases these systems can be time dependent (nonautonomous)even if the nonlinear system is autonomous. In this section wefocus on the LQ optimal control problem with time varying matrixcoefficients and a terminal cost. Thus, we consider nonautonomousproblems of Bolza type. Most of the following material can befound in the standard books [6], [5], [16], [45], [71] and [73], [81],[115] and [187].

We begin with the LQ optimal control problem defined on afixed time interval [t0, t1] which is governed by the system

x(t) = A(t)x(t) +B(t)u(t), x(t0) = x0 (12.37)

with cost function of Bolza type

J(u(·),x0) = 〈Sx(t1),x(t1)〉

+

∫ t1

t0

[〈Q(s)x(s),x(s)〉+ 〈R(s)u(s),u(s)〉]ds.

(12.38)

Here, we assume that S = ST ≥ 0, Q(t) = Q(t)T ≥ 0 and R(t) =R(t)T ≥ αIm×m > 0 for some α > 0.

The results in Section 11.8 can be extended to this Bolza prob-lem. In particular, a direct combination of the Maximum Princi-ple for the Problem of Bolza (Theorem 11.3) and the MaximumPrinciple for nonautonomous systems (Theorem 11.4) yields thefollowing result.

Theorem 12.7 If an optimal control exists, then the optimalityconditions are given by

ddtx∗(t) = A(t)x∗(t) + [B(t)][R(t)]−1[B(t)]Tλ∗(t), x∗(t0) = x0,

ddtλ∗(t) = Q(t)x∗(t)− [A(t)]Tλ∗(t), λ∗(t1) = −Sx∗(t1),

(12.39)where the optimal control is defined by

u∗(t) = R−1(t)BT (t)λ∗(t). (12.40)


d

dt

[x∗(t)λ∗(t)

]=

[A(t) [B(t)][R(t)]−1[B(t)]T

Q(t) −[A(t)]T

] [x∗(t)λ∗(t)

],

(12.41)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


with boundary conditions

x∗(t0) =[In×n 0n×n

] [ x∗(t0)λ∗(t0)

]= x0, (12.42)

and

λ∗(t1) =[−S 0n×n

] [ x∗(t1)λ∗(t1)

]. (12.43)


We will show that the optimal control has the form

u∗(t) = −K(t)x∗(t), (12.44)

where the gain matrix K(t) is given by

K(t) = R(t)−1B(t)TP (t) (12.45)

and P (t) is a non-negative definite solution to the Riccati differ-ential equation (RDE)

− P (t) = P (t)A(t)+A(t)TP (t)−P (t)B(t)R(t)−1B(t)TP (t)+Q(t)(12.46)

with final dataP (t1) = S. (12.47)

In addition, the optimal cost is given by

J(u(·),x0) = 〈P (t0)x(t0),x(t0)〉 . (12.48)

In order to establish this result, we follow the basic approach firstused in Sections 11.5 and 11.8.

12.4 The Riccati Differential Equation

for a Problem of Bolza

Here we consider the Bolza type LQ optimal control problem forthe general nonautonomous system discussed in the previous sec-tion. We extend the results on Riccati equations presented in Sec-tion 11.5.2 to this case. Again, the starting point is the optimality

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


system defined by the two point boundary value problem (12.41) -(12.43) and the goal is to show that one can transform the “statevariable” to the “adjoint variable” by a matrix that satisfies aRiccati differential equation.

First write (12.41) as the linear system

d

dt

[x(t)λ(t)

]=

[A(t) B(t)[R(t)]−1[B(t)]T

Q(t) −[A(t)]T

] [x(t)λ(t)

], H(t)

[x(t)λ(t)

](12.49)

where H(t) is the 2n× 2n matrix

H(t) =

[A(t) B(t)[R(t)]−1[B(t)]T

Q(t) −[A(t)]T

]. (12.50)

Let Ψ(t, τ) be the fundamental matrix for H(t). In particular,Ψ(t, τ) satisfies the matrix differential equation

d

dtΨ(t, τ) = H(t)Ψ(t, τ), (12.51)

with initial conditions

Ψ(τ, τ) = I2n×2n. (12.52)

The solution to (12.41) - (12.43) has the form[x(t)λ(t)

]= Ψ(t, t1)

[x(t1)λ(t1)

]= Ψ(t, t1)

[x(t1)−Sx(t1)

],

since λ(t1) = −Sx(t1). Setting

Ψ(t, τ) =

[ψ11(t, τ) ψ12(t, τ)ψ21(t, τ) ψ22(t, τ)

],

it follows that[x(t)λ(t)

]= Ψ(t, t1)

[x(t1)−Sx(t1)

]=

[ψ11(t, t1) ψ12(t, t1)ψ21(t, t1) ψ22(t, t1)

] [x(t1)−Sx(t1)

]

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


so that

x(t) = ψ11(t, t1)x(t1)− ψ12(t, t1)Sx(t1)

= [ψ11(t, t1)− ψ12(t, t1)S]x(t1), (12.53)

and

λ(t) = ψ21(t− t1)x(t1)− ψ22(t, t1)Sx(t1)

= [ψ21(t− t1)− ψ22(t, t1)S]x(t1). (12.54)

If [ψ11(t, t1) − ψ12(t, t1)S] is non-singular for 0 ≤ t ≤ t1, thenwe can solve (12.53) for x(t1). In particular,

x(t1) = [ψ11(t, t1)− ψ12(t, t1)S]−1x(t)

which, when substituted into (12.54), yields

λ(t) = [ψ21(t− t1)− ψ22(t, t1)S][ψ11(t, t1)− ψ12(t, t1)S]−1x(t).

If P (t) is the n× n matrix defined by

P (t) , −[ψ21(t− t1)− ψ22(t, t1)S][ψ11(t, t1)− ψ12(t, t1)S]−1,(12.55)

then we have that λ(t) and x(t) are linearly related by the matrixP (t) and the relationship is given by

λ(t) = −P (t)x(t).

The choice of the negative sign in defining P (·) is made to beconsistent with much of the existing literature. In order to makethis step rigorous, one needs to prove that [ψ11(t, t1)−ψ12(t, t1)S]is non-singular for t0 ≤ t ≤ t1. On the other hand, we could simplyask the question:

Is there a matrix P (t) so that λ(t) = −P (t)x(t) andhow can P (t) be computed?

We will address the issue of the existence of P (t) later. How-ever, assume for the moment that x(·) and λ(t) satisfying (12.41)- (12.43) and

λ(t) = −P (t)x(t), (12.56)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


with P (t) differentiable. Differentiating the equation (12.56) oneobtains

d

dtλ(t) = −

[d

dtP (t)

]x(t)− P (t)

[d

dtx(t)

]= −

[d

dtP (t)

]x(t)

− P (t)[A(t)x(t) +B(t)[R(t)]−1[B(t)]Tλ(t)

]= −

[d

dtP (t)

]x(t)

− P (t)[A(t)x(t)−B(t)[R(t)]−1[B(t)]TP (t)x(t)

]= −

[d

dtP (t)

]x(t)

− P (t)A(t)x(t) + P (t)B(t)[R(t)]−1[B(t)]TP (t)x(t).

However, from (12.49) we have

d

dtλ(t) = Q(t)x(t)− [A(t)]Tλ(t)

= Q(t)x(t) + [A(t)]TP (t)x(t)

so that

Q(t)x(t) + [A(t)]TP (t)x(t) = −[d

dtP (t)

]x(t)− P (t)A(t)x(t)

+ P (t)B(t)[R(t)]−1[B(t)]TP (t)x(t).

Rearranging the terms we have

−[d

dtP (t)

]x(t) = [A(t)]TP (t)x(t) + P (t)A(t)x(t) (12.57)

− P (t)B(t)[R(t)]−1[B(t)]TP (t)x(t) +Q(t)x(t).

Consequently, P (t) satisfies (12.57) along the trajectory x(t). Ob-serve that (12.57) is satisfied for any solution of the system (12.49)with λ(t1) = −Sx(t1) and all values of x(t1). Therefore, if

λ(t) = −P (t)x(t),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


then P (t) satisfies the matrix Riccati differential equation

− P (t) = ATP (t) + P (t)A− P (t)BR−1BTP (t) +Q, t0 ≤ t < t1,(12.58)

with terminal condition

P (t1) = S, (12.59)

since−P (t1)x(t1) = λ(t1) = −Sx(t1)

and x(t1) can be any vector in Rn.We shall show below that under the assumption that there is a

solution P (t) to the Riccati differential equation (12.58) satisfying(12.59), then the LQ optimal control problem has a solution andthe optimal control is given by

u∗(t) = −[R(t)]−1[B(t)]TP (t)x∗(t). (12.60)

In order to provide a rigorous treatment of this problem, wepresent two lemmas. These results relate the existence of a solu-tion to the Riccati equation (12.58) to the existence of an optimalcontrol for the LQ optimal control problem. First we note that anysolution to the Riccati differential equation must be symmetric. Inparticular, P (t) = [P (t)]T for all t.

Lemma 12.1 Suppose that P (t) = [P (t)]T is any n × n matrixfunction with P (t) differentiable on the interval [t0, t1]. If u(·) ∈PWC(t0, t1;Rm) and

x(t) = A(t)x(t) +B(t)u(t), t0 ≤ t ≤ t1,

then

〈P (s)x(s),x(s)〉 |t1t0

=

∫ t1

t0

⟨[P (s) + P (s)A(s) + [A(s)]TP (s)

]x(s),x(s)

⟩ds

(12.61)

+

∫ t1

t0

〈P (s)B(s)u(s),x(s)〉 ds (12.62)

+

∫ t1

t0

⟨[B(s)]TP (s)x(s),u(s)

⟩ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Proof: Observe that

〈P (s)x(s),x(s)〉 |t1t0 =

∫ t1

t0

d

ds〈P (s)x(s),x(s)〉 ds

=

∫ t1

t0

⟨P (s)x(s),x(s)

⟩ds

+

∫ t1

t0

〈P (s)x(s),x(s)〉 ds

+

∫ t1

t0

〈P (s)x(s), x(s)〉 ds

and by substituting Ax(s) +Bu(s) for x(s) we obtain

〈P (s)x(s),x(s)〉 |t1t0 =

∫ t1

t0

⟨P (s)x(s),x(s)

⟩ds

+

∫ t1

t0

〈P (s) [Ax(s) +Bu(s)] ,x(s)〉 ds

+

∫ t1

t0

〈P (s)x(s), [Ax(s) +Bu(s)]〉 ds.

Simplifying this expression we obtain

〈P (s)x(s),x(s)〉 |t1t0 =

∫ t1

t0

⟨P (s)x(s),x(s)

⟩ds

+

∫ t1

t0

〈P (s) [A(s)x(s)] ,x(s)〉 ds

+

∫ t1

t0

〈P (s) [B(s)u(s)] ,x(s)〉 ds

+

∫ t1

t0

〈P (s)x(s), [A(s)x(s)]〉 ds

+

∫ t1

t0

〈P (s)x(s), [B(s)u(s)]〉 ds

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and collecting terms yields

〈P (s)x(s),x(s)〉 |t1t0

=

∫ t1

t0

⟨[P (s) + P (s)A(s) + [A(s)]TP (s)]x(s),x(s)

⟩ds

+

∫ t1

t0

〈P (s)B(s)u(s),x(s)〉 ds

+

∫ t1

t0


⟩ds

which establishes (12.62).

Lemma 12.2 Assume that the Riccati differential equation(12.58) has a solution P (t) = [P (t)]T for t0 ≤ t < t1 andP (t1) = 0n×n. If u(·) ∈ PWC(t0, t1;Rm) and

x(t) = A(t)x(t) +B(t)u(t), t0 ≤ t ≤ t1,

then the cost function J(u(·)) = 〈Sx(t1),x(t1)〉 +∫ t1t0

[〈Q(s)x(s),x(s)〉+ 〈R(s)u(s),u(s)〉]ds has the representation

J(u(·)) =

∫ t1

t0

∥∥[R(s)]1/2u(s) + [R(s)]−1/2[B(s)]TP (s)x(s)∥∥2ds

+ 〈P (t0)x(t0),x(t0)〉 .Proof: Let

N(x(·),u(·))=

∫ t1

t0

∥∥[R(s)]1/2u(s)+[R(s)]−1/2[B(s)]TP (s)x(s)∥∥2ds

and expanding N(x(·),u(·)) we obtain

N(x(·),u(·)) =

∫ t1

t0

⟨[R(s)]1/2u(s), [R(s)]1/2u(s)

⟩ds

+

∫ t1

t0

⟨[R(s)]1/2u(s), [R(s)]−1/2[B(s)]TP (s)x(s)

⟩ds

+

∫ t1

t0

⟨[R(s)]−1/2[B(s)]TP (s)x(s), [R(s)]1/2u(s)

⟩ds

+

∫ t1

t0

⟨[R(s)]−1/2[B(s)]TP (s)x(s), [R(s)]−1/2

× [B(s)]TP (s)x(s)⟩ds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Simplifying each term we have

N(x(·),u(·)) =∫ t1

t0

⟨[R(s)]1/2[R(s)]1/2u(s),u(s)

⟩ds

+

∫ t1

t0

⟨u(s), [R(s)]1/2[R(s)]−1/2[B(s)]TP (s)x(s)

⟩ds

+

∫ t1

t0

⟨[R(s)]1/2[R(s)]−1/2[B(s)]TP (s)x(s),u(s)

⟩ds

+

∫ t1

t0

⟨[R(s)]−1/2[R(s)]−1/2[B(s)]TP (s)x(s), [B(s)]TP (s)x(s)

⟩ds,

which implies

N(x(·),u(·)) =

∫ t1

t0

〈R(s)u(s),u(s)〉 ds

+

∫ t1

t0

⟨u(s), [B(s)]TP (s)x(s)

⟩ds

+

∫ t1

t0


⟩ds

+

∫ t1

t0

⟨[R(s)]−1[B(s)]TP (s)x(s), [B(s)]TP (s)x(s)

⟩ds,

or equivalently

N(x(·),u(·)) =

∫ t1

t0

〈R(s)u(s),u(s)〉

+

∫ t1

t0

⟨u(s), [B(s)]TP (s)x(s)

⟩ds (12.63)

+

∫ t1

t0


⟩ds

+

∫ t1

t0

⟨P (s)B(s)[R(s)]−1[B(s)]TP (s)x(s),x(s)

⟩ds.

Since the matrix P (s) satisfies the Riccati equation (12.58), it

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


follows that

P (s)B(s)[R(s)]−1[B(s)]TP (s)x(s)

=[P (s) + [A(s)]TP (s) + P (s)A(s) +Q(s)

]x(s)

and the last term above becomes∫ t1

t0

⟨[P (s) + [A(s)]TP (s) + P (s)A(s) +Q(s)

]x(s),x(s)

⟩ds.

Substituting this expression into (11.131) and rearranging yields

N(x(·),u(·)) =∫ t1

t0

〈R(s)u(s),u(s)〉 ds+

∫ t1

t0

〈Q(s)x(s),x(s)〉 ds

+

∫ t1

t0

⟨ [P (s) + [A(s)]TP (s) + P (s)A(s)

]x(s),x(s)

⟩ds

+

∫ t1

t0

⟨u(s), [B(s)]TP (s)x(s)

⟩ds

+

∫ t1

t0


⟩ds,

which implies

N(x(·),u(·)) =

J(u(·))− 〈Sx(t1),x(t1)〉

+

∫ t1

t0

⟨ [P (s) + [A(s)]TP (s) + P (s)A(s)

]x(s),x(s)

⟩ds

+

∫ t1

t0

⟨u(s), [B(s)]TP (s)x(s)

⟩ds

+

∫ t1

t0


⟩ds.

Applying (12.62) from the previous Lemma yields

N(x(·),u(·)) = J(u(·))− 〈Sx(t1),x(t1)〉+ 〈P (s)x(s),x(s)〉 |t1t0= J(u(·))− 〈Sx(t1),x(t1)〉+ 〈P (t1)x(t1),x(t1)〉− 〈P (t0)x(t0),x(t0)〉 ,

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


or equivalently

J(u(·)) = N(x(·),u(·)) + 〈Sx(t1),x(t1)〉 − 〈P (t1)x(t1),x(t1)〉+ 〈P (t0)x(t0),x(t0)〉 .

Since (12.59) holds, we have that P (t1) = S and hence

J(u(·)) = N(x(·),u(·)) + 〈Sx(t1),x(t1)〉 − 〈Sx(t1),x(t1)〉+ 〈P (t0)x(t0),x(t0)〉

= N(x(·),u(·)) + 〈P (t0)x(t0),x(t0)〉 .

We conclude that

J(u(·)) =

∫ t1

t0

∥∥R1/2u(s) +R−1/2BTP (s)x(s)∥∥2ds

+ 〈P (t0)x(t0),x(t0)〉 (12.64)

which completes the proof. We now have the fundamental result on the relationship be-

tween solutions to the Riccati equation and the existence of anoptimal control for the LQ optimal problem.

Theorem 12.8 (Existence of LQ Optimal Control) If theRiccati differential equation (12.58) has a solution P (t) = [P (t)]T

for t0 ≤ t < t1 and P (t1) = S, then there is a control u∗(·) ∈PWC(t0, t1;Rm) such that u∗(·) minimizes

J(u(·)) = 〈Sx(t1),x(t1)〉

+

∫ t1

t0

〈Q(s)x(s),x(s)〉+ 〈R(s)u(s),u(s)〉ds

on the set PWC(t0, t1;Rm). In addition, the optimal control is alinear feedback law

u∗(t) = −[R(t)]−1[B(t)]TP (t)x∗(t) (12.65)

and the minimum value of J(·) is

J(u∗(·)) = 〈P (t0)x(t0),x(t0)〉 . (12.66)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Proof : The identity (12.64) above implies that J(·) is mini-mized when the quadratic term∫ t1

t0

∥∥[R(s)]1/2u(s) + [R(s)]−1/2[B(s)]TP (s)x(s)∥∥2ds ≥ 0

is minimized. If u∗(s) = −[R(s)]−1[B(s)]TP (s)x∗(s), then

[R(s)]1/2u∗(s) + [R(s)]−1/2[B(s)]TP (s)x∗(s) = 0

and

J(u∗(·)) =

∫ t1

t0

∥∥[R(s)]1/2u∗(s) + [R(s)]−1/2[B(s)]TP (s)x∗(s)∥∥2ds

+ 〈P (t0)x∗(t0),x∗(t0)〉= 〈P (t0)x∗(t0),x∗(t0)〉 .

Consequently, if u(·) ∈ PWC(t0, t1;Rm), then

J(u∗(·)) = 〈P (t0)x0,x0〉 ≤ 〈P (t0)x0,x0〉

+

∫ t1

t0

∥∥[R(s)]1/2u(s) + [R(s)]−1/2[B(s)]TP (s)x(s)∥∥2ds

= J(u(·)),

which completes the proof. The important observation here is that the optimal control is

linear state feedback with gain operator K(t) defined by (12.45).Thus, to implement the control law one needs the full state x∗(t).However, when the full state is not available one must develop amethod to estimate those states that can not be sensed. This leadsto the construction of state estimators or so-called observers.

12.5 Estimation and Observers

Assume that the system is described by the differential equation

x(t) = A(t)x(t) +B(t)u(t), x(t0) = x0 ∈ Rn, t ∈ [t0, t1](12.67)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


with sensed or measured output

y(t) = C(t)x(t) ∈ Rp. (12.68)

If one has a feedback control law of the form

u(t) = −K(t)x(t) (12.69)

for some control gain matrix K(t), then in order to compute thecontrol u(t) at a time t one must have the full state x(t) at time t.Since we can only sense y(t) and the control law requires x(t), weare led to the problem of finding a method to estimate x(t) fromthe data y(t). In particular, we look for a system of the form

xe(t) = A(t)xe(t) + F (t) [y(t)− C(t)xe(t)] +B(t)u(t),

xe(t0) = xe,0 ∈ Rn, (12.70)

such that the solution xe(t) provides an estimate of the state x(t)on the interval of interest. Thus, we seek an observer gain ma-trix F (t) so that the system (12.70) which is driven by the mea-sured (sensed) output y(t) produces values xe(t) that are as closeto x(t) as possible. In this case, one replaces the full state feedbacklaw (12.69) with

u(t) = ue(t) = −K(t)xe(t). (12.71)

In order for this to work and to be practical, one needs to knowthat xe(t) ' x(t) and that the equation (12.70) can be solvedin real time. Under suitable conditions we shall say that the sys-tem (12.70) is an observer (or state estimator) for the system(12.67)-(12.68).

The selection of the system (12.70) as the form of an observer isnot arbitrary, but follows naturally. A “good” observer should havethe same structure as the system and be driven by the measuredoutput y(t) = C(t)x(t). Moreover, we require that if the initialdata for the estimator is exactly the initial data for the system,then the estimated state should equal the actual state. Therefore,if xe(t0) = xe,0 = x0 = x(t0), then xe(t) = x(t) for all t ≥ t0. Sincethe system (12.67)-(12.68) is linear, it is natural to think that the

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


observer equation should also be linear so that an observer wouldhave the form

xe(t) = Ae(t)xe(t) + F (t)y(t) +G(t)u(t), xe(t0) = xe,0 ∈ Rn,(12.72)

for some matrices Ae(t), F (t) and G(t). The corresponding errore(t) , xe(t) − x(t) would satisfy the non-homogenous linear dif-ferential equation

e(t) = xe(t)− x(t)

= Ae(t)xe(t) + F (t)y(t) +G(t)u(t)− A(t)x(t)−B(t)u(t)

= Ae(t)xe(t)− Ae(t)x(t) + Ae(t)x(t)

+F (t)y(t)− A(t)x(t) + [G(t)−B(t)]u(t)

= Ae(t)[xe(t)− x(t)]

+Ae(t)x(t) + F (t)y(t)− A(t)x(t) + [G(t)−B(t)]u(t)

= Ae(t)e(t) + Ae(t)x(t) + F (t)C(t)x(t)− A(t)x(t)

+[G(t)−B(t)]u(t)

= Ae(t)e(t) + [Ae(t) + F (t)C(t)− A(t)]x(t)

+[G(t)−B(t)]u(t)

with initial data given by the error vector

e(t0) = xe(t0)− x(t0).

The requirement that if e(t0) = xe(t0)−x(t0) = 0, then e(t) =xe(t)− x(t) ≡ 0 for all states x(t) and controls u(t) implies thatthe matrices [Ae(t) +F (t)C(t)−A(t)] and [G(t)−B(t)] should bezero. Thus,

[Ae(t) + F (t)C(t)− A(t)] = 0

and[G(t)−B(t)] = 0,

which implies that

Ae(t) = A(t)− F (t)C(t)

andG(t) = B(t),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


respectively. Therefore, the observer (12.72) must have the form

xe(t) = Ae(t)xe(t) + F (t)y(t) +G(t)u(t)

= [A(t)− F (t)C(t)]xe(t) + F (t)y(t) +B(t)u(t)

= A(t)xe(t) + F (t)[y(t)− C(t)xe(t)] +B(t)u(t)

which is the form of the observer (12.70).Let e(t) , xe(t)− x(t) denote the error vector and note that

e(t) = xe(t)− x(t)

= [A(t)xe(t) + F (t) [y(t)− C(t)xe(t)] +B(t)u(t)]

−[A(t)x(t) +B(t)u(t)]

= A(t)[xe(t)− x(t)] + F (t)C(t)x(t)− F (t)C(t)xe(t)

= [A(t)− F (t)C(t)][xe(t)− x(t)]

= [A(t)− F (t)C(t)]e(t).

Hence, the error satisfies the linear differential equation

e(t) = [A(t)− F (t)C(t)]e(t) (12.73)

with initial data

e(t0) = xe(t0)− x(t0) = xe,0 − x0. (12.74)

Since the error equation (12.73) is linear and homogenous, ifthe error in the initial estimate xe,0 of x0 is zero (i.e. xe,0 = x0),then

e(t) = xe(t)− x(t) ≡ 0

and the estimate of x(t) is exact as desired. Moreover, it followsthat

e(t) = ΨF (t, t0)e(t0),

where Ψ(t, τ) is the fundamental matrix operator for (12.73). Inparticular, ΨF (t, τ) satisfies the matrix differential equation

d

dtΨF (t, τ) = [A(t)− F (t)C(t)]ΨF (t, τ)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



ΨF (τ, τ) = In×n.

This leads to a bound on the error of the form

‖e(t)‖ = ‖ΨF (t, t0)e(t0)‖ ≤ ‖ΨF (t, t0)‖ ‖e(t0)‖ (12.75)

for t0 ≤ t ≤ t1. The selection of the observer gain F (t) determinesthis error and hence one approach might be to select F (t) so that

supt0≤t≤t1

‖ΨF (t, t0)‖

is minimized. We shall return to this issue later.There are many approaches to “designing” a good observer,

but we focus on the Luenberger Observer and the Kalman Filter.

12.5.1 The Luenberger Observer

As noted above, many optimal controllers require full state feed-back which could be implemented if all of x(t) were available.However, in most complex systems the entire state is not availablethrough sensing and hence such control laws cannot be imple-mented in practice. Thus, one must either consider an alternateform of the control or else devise a method for approximatingthe state x(t) using only knowledge about the system dynam-ics and sensed measurements. In 1964 Luenberger first consideredthis problem from a deterministic point of view (see [130], [132]and [129]). Although Luenberger considered only time invariantsystems, the main results can be modified and extended to nonau-tonomous and even non-linear systems.

One of the basic results concerns the question of observing anoutput to the uncontrolled system given by

x(t) = A(t)x(t), x(t0) = x0 (12.76)

using only sensed measurements

y(t) = C(t)x(t), (12.77)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where A(t) and C(t) are given n× n and p× n matrix functions,respectively. The idea is to use the output y(t) to drive anotherlinear system

z(t) = Ae(t)z(t) + F (t)y(t), z(t0) = z0, (12.78)

so that one matches a desired output

r(t) = T (t)x(t), (12.79)

where T (t) is a given q× n matrix function. Observe that if Ae(t)is a l× l matrix, then F (t) must be a l× p matrix since there arep measured outputs. Moreover, since

F (t)y(t) = F (t)C(t)x(t)

then H(t) , F (t)C(t) must be a l × n matrix. The goal is tofind Ae(t) and F (t) so that if z(t0) = r(t0) = T (t0)x(t0), thenz(t) = r(t) for all t > t0. Clearly this implies that l = q sothat the size of system (12.78) is q. The following result is due toLuenberger and may be found in [132].

Theorem 12.9 Let x(t) be a solution to the system (12.76) whichdrives the system (12.78). Suppose the matrix function T (t) isdifferentiable and there exist Ae(t) and F (t) such that

T (t) + [T (t)A(t)− Ae(t)T (t)] = H(t) = F (t)C(t) (12.80)

on the interval t0 < t < t1. If z(t0) = r(t0) = T (t0)x(t0), thenz(t) = r(t) = T (t)x(t) for all t0 < t < t1.

Proof : Let the error vector be defined by e(t) = z(t)− r(t) =z(t)− T (t)x(t) and note that

e(t) = z(t)− d

dt[T (t)x(t)] = z(t)− T (t)x(t)− T (t)x(t)

= Ae(t)z(t) + F (t)y(t)− T (t)x(t)− T (t)[A(t)x(t)]

= Ae(t)z(t) + F (t)C(t)x(t)− T (t)x(t)− T (t)A(t)x(t)

= Ae(t)z(t) +H(t)x(t)− T (t)x(t)− T (t)A(t)x(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


However, since H(t) = T (t) + T (t)A(t)−Ae(t)T (t) it follows that

e(t) = Ae(t)z(t) + [T (t) + T (t)A(t)− Ae(t)T (t)]x(t)

− T (t)x(t)− T (t)A(t)x(t)

= Ae(t)z(t)− Ae(t)T (t)x(t)

= Ae(t)[z(t)− T (t)x(t)]

= Ae(t)e(t).

Therefore, e(t) is given by

e(t) = Ψe(t, t0)e(t0)

where Ψe(t, s) is the state transition matrix satisfying

∂

∂tΨe(t, s) = Ae(t)Ψe(t, s), Ψe(s, s) = Iq×q.

Hence we have

z(t)− T (t)x(t) = Ψe(t, t0)[z(t0)− T (t0)x(t0)],

so thatz(t) = T (t)x(t) + Ψe(t, t0)[z0 − r0]. (12.81)

Consequently, if z(t0) = r(t0) then z(t) = r(t) and this completesthe proof.

Observe that equation (12.81) implies that the error e(t) =z(t)− r(t) = z(t)− T (t)x(t) is equal to Ψe(t, t0)[z0 − r0] so that

‖e(t)‖ = ‖z(t)− T (t)x(t)‖ = ‖Ψe(t, t0)[z0 − r0]‖≤ ‖Ψe(t, t0)‖ ‖z0 − r0‖

is bounded by the norm of the transition matrix. Consequently,

‖z(t)− T (t)x(t)‖ ≤[

supt0≤t≤t1

‖Ψe(t, t0)‖]‖z0 − r0‖ .

Since Ψe(t, t0) depends on Ae(t), and hence implicitly on F (t)through the equation (12.80), the error can be made “small” ifone can choose Ae(t) and F (t) satisfying (12.80) such that

supt0≤t≤t1

‖Ψe(t, t0)‖

is small. Note that in the special autonomous case one has thefollowing result which was first given in [130].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Corollary 12.1 Let x(t) be a solution to the system x(t) = Ax(t)which drives the system z(t) = Aez(t)+Fy(t) where y(t) = Cx(t).Suppose there exist Ae and F such that

[TA− AeT ] = H = FC. (12.82)

If z(t0) = r(t0) = Tx(t0), then z(t) = r(t) = Tx(t) for all t0 <t < t1.

Remark 12.3 It is important to note that for a given T (t) (or Tin the time invariant case) it may not be possible to find Ae(t) andF (t) satisfying (12.80) (or (12.82)). See [130], [132] and [129] formore details. Luenberger actually posed the following question:

Given H = FC and Ae, then what conditions on Twill insure that z(t0) = r(t0) = Tx(t0) implies z(t) =r(t) = Tx(t) for all t0 < t < t1?

Clearly, the answer is that T must satisfy [TA− AeT ] = H.

Remark 12.4 Note that if T is an n × n nonsingular matrix sothat T−1 exists, then given any F one can solve (12.82) for Ae. Inparticular,

AeT = TA−H = TA− FC

and hence

Ae = [TA−H]T−1 = TAT−1 − FCT−1

satisfies (12.82). If T = In×n, then Ae = A− FC and

z(t) = Aez(t) + Fy(t)

is called the identity observer.

We will return to these issues later and consider related prob-lems of asymptotic tracking. However, we turn now to a stochas-tic version of this problem which produces an “optimal observer”called the Kalman Filter.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


12.5.2 An Optimal Observer: The Kalman Fil-ter

In order to fully develop this topic requires a background instochastic systems and control and is outside the scope of thesenotes. However, Section 1.10 in the book [115] by Kwakernaakand Sivan contains a very nice and brief discussion of the basicmaterial. However, it is essential that we discuss the definition ofa stochastic process and present a few fundamental definitions andresults.A Short Review of Random Variables and Stochastic Pro-cesses

We review the concepts needed to precisely define a stochas-tic process. This requires we discuss measures and some measuretheory concepts. We begin with the concept of a σ-algebra.

Definition 12.5 Given a set Ω, F is called a σ-algebra of subsetsof Ω if F is a collection of subsets of Ω and:(1) F is non-empty: i.e. there is at least one set E ⊆ Ω in F,(2) F is closed under complementation: i.e. if E is in Ω, then sois its complement Ω\E, where Ω\E is defined by

Ω\E , x ∈ Ω : x /∈ E ,

(3) F is closed under countable unions: i.e. if E1, E2, E3, . . . are

in F, then+∞⋃i=1

Ei is in F.

Definition 12.6 A pair (Ω,F), where Ω is a set and F is a σ-algebra of subsets of Ω, is called a measurable space. A func-tion µ : F → [0,+∞] is a (countably additive) measure on (Ω,F)provided that(1) µ(∅) = 0, where ∅ is the empty subset of Ω;(2) if Ei ∈ F for each i = 1, 2, 3, . . . are disjoint sets, then

µ

(+∞⋃i=1

Ei

)=

+∞∑i=1

µ(Ei).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


If µ : F → [0,+∞), then µ is called a finite measure. If Ω canbe decomposed into a countable union of measurable sets of finitemeasure, then µ is called a σ-finite measure. The triple (Ω,F, µ)is called a measure space and sets in F are called measurablesets.

Although these definitions may seem rather abstract, considerthe simple example where Ω = I is a subinterval of the real line (Icould be all of R1) and let FB be the “smallest” collection of setscontaining all open and closed subintervals and their countableunions and intersections. The σ-algebra FB is called the Borelσ-algebra. If m : FB → [0,+∞) is the usual Lebesgue measure(see [176]), then (I,FB,m) is a measure space. If I is a finitesubinterval, then µ is a finite measure. If I is an infinite interval,then µ is a σ-finite measure. This measure space plays the centralrole in the development of modern integration theory. We willmake use of the standard measure space (Rn,FB,m) where againFB is the Borel σ-algebra on Rn and m is the usual Lebesguemeasure.

Definition 12.7 A probability space is a measure space(Ω,F, p) where the probability measure p satisfies(1) 0 ≤ p(E) ≤ 1;(2) p(Ω) = 1.The set Ω is called the sample space, F is called the event spaceand p is called the probability measure.

Definition 12.8 Let (Ω1,F1, µ1) and (Ω2,F2, µ2) be measurespaces. A function X(·) : Ω1 → Ω2 is said to be a measurablefunction if for every measurable set S ∈ F2, the inverse image

X−1(S) = x ∈ Ω1 : X(x) ∈ S

is measurable, i.e. X−1(S) ∈ F1. If (Ω1,F1, µ1) = (Ω,F, p) is aprobability space, then a measurable function X(·) : Ω → Ω2 iscalled a random variable. In particular, random variablesare measurable functions from a probability space into a mea-sure space.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Remark 12.5 Note that we have used capital letters to denoterandom variables. This is done to distinguish between the randomvariable X(·) and the value of this function at a particular x ∈ Ω.Thus, X(x) is not the same as X(·). This is the same notationwe have consistently used before. For example, recall that we writeu(·) ∈ PWS(t0, t1) to mean that u(·) is a function and for a givenfixed t ∈ [t0, t1], u(t) is the value of u(·) at t. However, this precisenotation is often abused so that one sees things like “consider thefunction u = t”. Of course we know that this means that u(·) isa function such that u(·) evaluated at t is just t or, u(t) = t.This same abuse of notation occurs in probability and statisticswhere you will see statements like “consider the random variablex”. What this really means is that one is talking about the randomvariable X(·) where X(x) = x.

Definition 12.9 Let (Ω,F, p) be a probability space and (Rn,FB,m)be the standard Lebesgue measure space. Let X denote the set ofall random variables X(·) : Ω→ Rn. In particular,

X = X(·) : Ω→ Rn : X(·) is measurable .

Let I be a subinterval of the real line. A (vector-valued) stochas-tic process on I is a family of Rn-valued random variables

[Xt](·) ∈ X : t ∈ I ,

indexed by t ∈ I. In particular, for each t ∈ I, Xt(·) is a ran-dom variable (i.e. a measurable function). Thus, a stochasticprocess is a function from I to X. We use the notation

X(t, ·) = Xt(·)

to indicate that for each fixed t ∈ I, X(t, ·) is the random variablewith range in Rn. In this setting, we may consider a stochasticprocess as a function of two variables

X(·, ·) : I × Ω→ Rn

and if ω ∈ Ω, we write

X(t, ω) = Xt(ω).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


If X(t, ·) = Xt(·) is a vector-valued stochastic process on I,then

Xt(·) =[

[Xt]1(·) [Xt]2(·) [Xt]3(·) · · · [Xt]n(·)]T,

where each random variable [Xt]i(·) is a real-valued random vari-able. Thus, we write

Xt(·) = X(t, ·) =[X1(t, ·) X2(t, ·) X3(t, ·) · · · Xn(t, ·)

]Tso that

Xt(ω) = X(t, ω)

=[

[X1(t)](ω) [X2(t)](ω) [X3(t)](ω) · · · [Xn(t)](ω)]T,

or equivalently,

X(t, ω) =[X1(t, ω) X2(t, ω) X3(t, ω) · · · Xn(t, ω)

]T.

At this point, one really starts to abuse notation. In particular,the statement that

Vt(·) =[

[Vt]1(·) [Vt]2(·) [Vt]3(·) · · · [Vt]n(·)]T

is a Rn-valued stochastic process on I, is often written by statingthat

v(t) =[v1(t) v2(t) v3(t) · · · vn(t)

]Tis a Rn-valued stochastic process on I. It is important that onefully appreciates what is meant by this simplification of notation.In particular, always keep in mind that vi(t) is “representing” arandom variable (i.e. a function) [Vt]i(·). If this is well understood,then one has the background required to read Section 1.10 in thebook [115].

The Kalman Filter and an Optimal ObserverWe return now to the problem of constructing an observer that

is optimal in some sense. As noted above, one use of an observeris to provide state estimates from measured (sensed) outputs thatcan be used in a practical feedback law. In particular,

u(t) = −K(t)x(t)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


is replaced byue(t) = −K(t)xe(t)

and if xe(t) ' x(t), then one would expect that under suitable con-ditions the controllers satisfy ue(t) ' u(t). In this application, theselection of an observer for a given system depends on the choiceof the gain matrix K(t). However, we can formulate an “optimalobserver” problem independent of the control problem and focuson the problem of finding an optimal observer. In addition, thisapproach allows us to deal with some practical issues where thecontrol system is subject to disturbances and the sensors producenoisy signals. This is done by making specific assumptions con-cerning these disturbances and measurement errors that occur inthe system that is to be observed. We follow the approach in [115].

It is assumed that the system equations are described by thestochastic differential equation on t0 ≤ t ≤ T given by

x(t) = A(t)x(t) +B(t)u(t) +G(t)w(t), x(t0) = x0 ∈ Rn,(12.83)

with sensed output

y(t) = C(t)x(t) + E(t)v(t) ∈ Rp, (12.84)

where w(t) and v(t) represent system disturbances and sensornoise, respectively.

A Comment on Notation: If z is a random variable, thenwe use E for the expectation operator so that E [z] denotes theexpected value of z and E

[zzT

]is the covariance matrix.

We assume that the disturbance w(t) and sensor noise v(t) arewhite, Gaussian, have zero mean and are independent. Mathemat-ically, this implies that for all t, τ ∈ [t0, T ],

E[w(t)w(τ)T

]= W (t)δ(t− τ), E [v(t)] ≡ 0, (12.85)

E[v(t)v(τ)T

]= V (t)δ(t− τ), E [w(t)] ≡ 0, (12.86)

andE[v(t)w(τ)T

]= 0, (12.87)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where W (t) = W (t)T ≥ 0 and V (t) = V (t)T ≥ 0 are the so-calledintensity matrices. We define the n × n and matrices M(t) andN(t) by

M(t) = G(t)W (t)[G(t)]T and N(t) = E(t)V (t)[E(t)]T , (12.88)

respectively. In addition, we make the standing assumption thatthere is an α > 0 such that

N(t) = E(t)V (t)[E(t)]T = N(t)T ≥ αI > 0. (12.89)

The initial condition x0 is assumed to be a Gaussian randomvariable with mean m0 and covariance Π0 so that

E

[x0 −m0][x0 −m0]T

= Π0 and E [x0] = m0. (12.90)

Finally, we assume that x0 is independent of the noise terms sothat

E [x0v(t)T ] = E [x0w(t)T ] = 0 for all t ∈ [t0, T ]. (12.91)

Suppose we are seeking an observer of the form

xe(t) = A(t)xe(t) +B(t)u(t) + F (t) [y(t)− C(t)xe(t)] ,

xe(t0) = xe,0 ∈ Rn, (12.92)

to minimize the expected value of the error between the state andthe estimate of the state. In particular, let

e(t) = xe(t)− x(t) (12.93)

denote the error.The optimal estimation problem is to find the observer gainF (t) and the initial estimate xe,0 to minimize the weighted error

E [〈Σ(t)e(t), e(t)〉] = E [e(t)TΣ(t)e(t)]

= E

[xe(t)− x(t)]TΣ(t)[xe(t)− x(t)],

(12.94)

where Σ(t) = Σ(t)T ≥ 0. The following theorem provides theanswer to the optimal estimation problem and is known as theKalman Filter.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Theorem 12.10 (Kalman Filter) The optimal observer is ob-tained by choosing the observer gain matrix to be

F opt(t) = Π(t)CT (t)N(t)−1 (12.95)

where Π(t) = [Π(t)]T is the solution to the matrix Riccati differ-ential equation

Π(t) = A(t)Π(t) + Π(t)A(t)T −Π(t)C(t)TN(t)−1C(t)Π(t) +M(t),(12.96)

with initial conditionΠ(t0) = Π0. (12.97)

The optimal initial condition should be

xe(t0) = m0 = E [x(t0)] = E [x0]. (12.98)

If conditions (12.95) - (12.98) hold, then

E

[xe(t)− x(t)][xe(t)− x(t)]T

= Π(t), (12.99)

while the mean square reconstruction error is

E [〈Σ(t)e(t), e(t)〉] = E

[xe(t)− x(t)]TΣ(t)[xe(t)− x(t)]

= trace [Π(t)Σ(t)] . (12.100)

In addition to the properties given in Theorem 12.10, one canshow that Π(t) is the “minimal” nonnegative solution to (12.96)in the following sense. If F (t) is any observer gain and Φ(t) is asolution to the Lyapunov equation

Φ(t) =[A(t)− F (t)C(t)]Φ(t) + Φ(t)[A(t)− F (t)C(t)]T (12.101)

+ F (t)V (t)F (t)T +M(t)

with initial data (12.97), then

0 ≤ trace[Π(t)Σ(t)] ≤ trace[Φ(t)Σ(t)]. (12.102)

Let Φ(t) denote the variance matrix of the error e(t) and letem(t) denote the mean e(t) so that

E

[e(t)− em(t)][e(t)− em(t)]T

= Ψ(t) and E [e(t)] = em(t).(12.103)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


It can be shown (see [115], Section 4.3) that em(t) satisfies theordinary differential equation

em(t) = [A(t)− F (t)C(t)]em(t)

and Φ(t) is the solution to the Lyapunov equation (12.101) withinitial condition Φ(t0) = Π0. To see how this is related to theestimation problem, assume that

F (t) = Φ(t)CT (t)N(t)−1 (12.104)

and Φ(t) satisfies the Lyapunov equation (12.101) above. Then,by substitution

Φ(t) = [A(t)− F (t)C(t)]Φ(t) + Φ(t)[A(t)− F (t)C(t)]T

+ F (t)N(t)F (t)T +M(t)

= A(t)Φ(t)− [Φ(t)CT (t)N(t)−1]C(t)Φ(t) + Φ(t)A(t)T

− Φ(t)[Φ(t)CT (t)N(t)−1C(t)]T

+ [Φ(t)CT (t)N(t)−1]W (t)[Φ(t)CT (t)N(t)−1]T +M(t)

= A(t)Φ(t) + Φ(t)A(t)T − Φ(t)CT (t)N(t)−1C(t)Φ(t)

− Φ(t)C(t)TN(t)−1C(t)Φ(t)

+ [Φ(t)CT (t)N(t)−1N(t)[N(t)−1C(t)Φ(t)] +M(t)

= A(t)Φ(t) + Φ(t)A(t)T − 2Φ(t)CT (t)N(t)−1C(t)Φ(t)

+ Φ(t)CT (t)N(t)−1C(t)Φ(t) +M(t)

= A(t)Φ(t) + Φ(t)A(t)T − Φ(t)CT (t)V (t)−1C(t)Φ(t) +M(t)

and hence Φ(t) satisfies the observer Riccati differential equation(12.96). If Φ(t) satisfies the initial condition Φ(t0) = Π0, thenΦ(t) = Π(t) and we have the optimal filter.

Note that the time averaged error is given by

1

T − t0

∫ T

t0

E [〈Σ(s)e(s), e(s)〉]ds =1

T − t0

∫ T

t0

trace [Π(s)Σ(s)] ds.

(12.105)This cost has been used for optimal sensor location problems (see[10], [39], [50], [51], [49], [69], [70], [78], [152], [181]). A similarapproach has been used for optimal actuator location problems(see [66], [139]).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


12.6 The Time Invariant Infinite Inter-

val Problem

In the case where t1 = +∞ and the linear system is time invariant,the linear quadratic optimal control problem takes the form

x(t) = Ax(t) +Bu(t), x(0) = x0, (12.106)

with cost functional

J(u(·)) =

∫ +∞

0

〈Qx(s),x(s)〉+ 〈Ru(s),u(s)〉 ds. (12.107)

Here, Q = DTD ≥ 0, and R = RT > 0. The linear quadraticoptimal control problem defined by (12.106)-(12.107) is called theLinear Quadratic Regulator Problem, or LQR Problem forshort. If an optimal control exists, then one can show (see [115])that it has the form of state feedback

u∗(t) = −Koptx∗(t), (12.108)

where the constant gain matrix Kopt is given by

Kopt = R−1BTPopt (12.109)

and Popt is a non-negative definite solution to the Algebraic Ric-cati Equation (ARE)

PA+ ATP − PBR−1BTP +Q = 0. (12.110)

The proof of this statement follows by considering LQ optimalcontrol problems on 0 ≤ t1 < +∞ and then taking the limit ast1 → +∞ (see [115] and [119]). In particular, Popt is the minimalnonnegative solution to (12.110) in the sense that if P is any non-negative definite solution of (12.110), then

P ≥ Popt ≥ 0. (12.111)

Moreover, the optimal cost is given by

J(u∗(·)) =

∫ +∞

0

〈Qx∗(s),x∗(s)〉+ 〈Ru∗(s),u∗(s)〉 dt

= 〈Poptx0,x0〉 , (12.112)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where x(0) = x0 is the initial state.Some important questions are:

1. When does the LQR problem defined by (12.106)-(12.107)have a solution?

2. When is this solution unique?

3. How is this related to solutions of the ARE (12.110)?

As noted above we consider the case where the weighting ma-trix Q can be factored as

Q = DTD (12.113)

for a q×n matrix D. Hence, we define the controlled output by

z(t) = Dx(t), (12.114)

where the matrix D represents the mapping D : Rn → Rq. Thus,the cost function (12.107) can be written as

J(u(·)) =

∫ +∞

0

〈Qx(s),x(s)〉Rn + 〈Ru(s),u(s)〉 ds

=

∫ +∞

0

⟨DTDx(s),x(s)

⟩+ 〈Ru(s),u(s)〉

ds

=

∫ +∞

0

〈Dx(s), Dx(s)〉+ 〈Ru(s),u(s)〉 ds

=

∫ +∞

0

〈z(s), z(s)〉

Rq+ 〈Ru(s),u(s)〉

ds.

In order to guarantee the existence of an optimal controller wemust place additional conditions on the system. The following re-sult can be found many control theory books. In particular, seeSection 3.4.3 in [115].

Theorem 12.11 (Existence for LQR Problem) If (A,B) isstabilizable and (A,D) is detectable, then the optimal LQR controlproblem has a unique solution and is given by (12.108)-(12.110).Moreover, there is only one non-negative definite solution Popt to(12.110) and (12.112) holds.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Now assume that the full state is not available for feedback.Thus, we have a sensed output

y(t) = Cx(t) + Ev(t) ∈ Rp, (12.115)

and we must build an state estimator (observer). We will employa Kalman filter of the form

xe(t) = Axe(t)+Bu(t)+F [y(t)− Cxe(t)] , xe(t0) = xe,0 ∈ Rn,(12.116)

where the disturbances v(t) and w(t) are white, gaussian, havezero mean and are independent. In particular, we assume that forall t > 0 and τ > 0,

E[v(t)v(τ)T

]= In×nδ(t− τ), E [v(t)] ≡ 0, (12.117)

E[w(t)w(τ)T

]= Ip×pδ(t− τ), E [w(t)] ≡ 0, (12.118)

andE[v(t)w(τ)T

]= 0. (12.119)

Hence, M = GGT and condition (12.89) implies N = NNT ≥αIp×p > 0 so that the optimal filter is defined by the observer gainF and given by

F = ΠCTN−1, (12.120)

where

AΠ + ΠAT − ΠCTN−1CΠ +M = 0. (12.121)

Now consider the closed-loop system obtained by using thefilter defined by (12.120) - (12.121). It follows that if one sets

u(t) = −Kxe(t), (12.122)

then the controlled system becomes

x(t) = Ax(t) +Bu(t) +Gw(t)

= Ax(t)−BKxe(t) +Gw(t)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and the observer equation (12.116) becomes

xe(t) = Axe(t) +Bu(t) + F [y(t)− Cxe(t)]= Axe(t)−BKxe(t) + Fy(t)− FCxe(t)= Axe(t)−BKxe(t) + F [Cx(t) + Ev(t)]− FCxe(t)= [Axe(t)−BKxe(t)− FCxe(t)] + FCx(t) + FEv(t)

= [A−BK − FC]xe(t) + FCx(t) + FEv(t).

Consequently, the closed-loop system has the form

d

dt

[x(t)xe(t)

]=

[A −BKFC Ae

] [x(t)xe(t)

]+

[Gw(t)FEv(t)

],

(12.123)where the observer matrix is given by

Ae = A−BK − FC. (12.124)

The controller defined by the observer (12.116) with feedbacklaw (12.122) is called the Linear Quadratic Gaussian Regu-lator or LQG Controller for short. The rigorous developmentof these concepts can be found in any modern control book. Theclosed-loop system defined by (12.123) is stable and for many yearsit was thought to be robust. However, it is well known that thisfeedback control is not robust (see the famous paper by John Doyle[72]) so there needs to be a way to deal with both performanceand robustness. One approach is to apply the so-called Min-Maxtheory (see [16]) to obtain a more robust controller. This theoryis based on a game theory approach to controller design and willnot be discussed here. However, in the next section we summarizethe basic approach for the linear regulator problem above.

12.7 The Time Invariant Min-Max

Controller

Observe that the solution to the LQG problem involves solving twouncoupled Riccati equations (12.110) and (12.121). In particular,there is no relationship between the optimal controller determined

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


by the solution of the Riccati equation (12.110) and the optimalobserver determined by the Riccati equation (12.121). Thus, thecontroller design does not consider disturbance and sensor noise.This is one source of the lack of robustness. In the papers [155]and [175] an optimal control problem based on a game theoreticformulation is used to develop an optimal controller and estimatorwhere the control, disturbances and noise are included in the for-mulation. The cost function is formulated as a quadratic functionof the state history, the initial state, the system disturbances, sen-sor noise and the control. We shall not discuss the derivation of thiscontrol here, but briefly summarize the approach. The books byBasar and Bernhard [16] and [17] contain a very nice presentationof the theory of H∞ and Min-Max control.

Let θ2 ≥ 0 and Q = DTD, M = GGT , N = EET and R =RT > 0 be as above. Consider the Riccati equations

PA+ ATP − P [BR−1BT − θ2M ]P +Q = 0 (12.125)

and

AΠ + ΠAT − Π[CTN−1C − θ2Q]Π +M = 0 (12.126)

and let Pθ and Πθ be the minimal nonnegative definite solutions of(12.125) and (12.126), respectively. If Πθ > 0 is positive definite,then for θ2 sufficiently small [Πθ]

−1 − θ2Pθ > 0 and[I − θ2ΠθPθ

]−1

exists. DefineKθ = R−1B∗Pθ, (12.127)

Fθ =[I − θ2ΠθPθ

]−1ΠθC

TN−1, (12.128)

Aθ = A−BKθ − FθC + θ2MPθ (12.129)

and consider the closed-loop system

d

dt

[x(t)xθ(t)

]=

[A −BKθ

FθC Aθ

] [x(t)xθ(t)

]+

[Gv(t)FθEw(t)

].

(12.130)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


This system is called the Min-Max closed-loop system andprovides some robustness when it exists (see[16]). One can showthat there is a θ2 > 0 so that if 0 ≤ θ2 < θ2, then there existminimal positive definite solutions Pθ and Πθ to the Ricatti equa-tions (12.125) and (12.126), [I − θ2ΠθPθ] is non-singular and theclosed-loop system (12.130) is stable (robustly). Roughly speak-ing, larger θ2 implies more robustness. Note that when θ2 = 0, theMin-Max controller reduces to the LQG controller.

Remark 12.6 The short treatment of LQ optimal control pre-sented here is clearly incomplete. The primary purpose of dis-cussing this topic was to illustrate how optimal control and theMaximum Principle can be used to address problems of this type.There are numerous excellent books devoted to these topics and tomore advanced applications involving systems governed by delayand partial differential equations. Hopefully, this brief introduc-tion will encourage and motivate the reader to dive more deeplyinto these exciting areas.



x(t) = A(t)x(t) +B(t)u(t), (12.131)

with measured output

y(t) = C(t)x(t) (12.132)

and controlled output

z(t) = D(t)x(t). (12.133)

Here, we assume that A : [t0, t1] → Rn×n, B : [t0, t1] → Rn×m,C : [t0, t1]→ Rp×n andD : [t0, t1]→ Rq×n are piecewise continuousmatrix valued functions.

Let Q(t) = [D(t)]TD(t) and assume R : [t0, t1] → Rm×m is apiecewise continuous symmetric matrix valued function such that

0 < αIm×m ≤ R(t),

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


for some α > 0. Finally, we assume that the controls belong to thespacePWC(t0, T ;Rm) and the quadratic cost functional is defined by

J(u(·)) = 〈Sx(t1),x(t1)〉 (12.134)

+1

2

∫ t1

t0

〈Q(s)x(s),x(s)〉+ 〈R(s)u(s),u(s)〉 ds,

(12.135)

where S = ST ≥ 0 is a constant symmetric matrix. All the follow-ing problems concern these LQ control problems.


d

dt

[x1(t)x2(t)

]=

[−1 Re

0 −2

] [x1(t)x2(t)

]+

[01

]u(t),

with output

y(t) = x1(t) + x2(t) =[

1 1] [ x1(t)

x2(t)

],

where Re ≥ 0 is a constant. (a) For what values of Re is this systemcontrollable? (b) For what values of Re is the system observable?


d

dt

[x1(t)x2(t)

]=

[−1 Re

0 −2

] [x1(t)x2(t)

]+

[αβ

]u(t),

with output

y(t) = x1(t) + x2(t) =[

1 1] [ x1(t)

x2(t)

].

(a) For what values of Re, α and β is this system controllable? (b)For what values of Re, α and β is the system stabilizable?

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



d

dt

[x1(t)x2(t)

]=

[−1 100 2

] [x1(t)x2(t)

]+

[01

]u(t)

with output

y(t) = αx1(t) + βx2(t) =[α β

] [ x1(t)x2(t)

].

(a) Is this system controllable? (b) For what values of α and βis the system observable? (c) Is this system stabilizable? (d) Forwhat values of α and β is the system detectable?


d

dt

[x1(t)x2(t)

]=

[1 00 −1

] [x1(t)x2(t)

]+

[δε

]u(t)

with output

y(t) = x1(t) + x2(t) =[

1 1] [ x1(t)

x2(t)

].

For what values of δ and ε is this system (a) controllable, (b)observable, (c) stabilizable, (d) detectable?

Problem 12.5 Consider the LQR control problem for the system

d

dt

[x1(t)x2(t)

]=

[1 00 −1

] [x1(t)x2(t)

]+

[δε

]u(t)

with quadratic cost

J(u(·)) =

∫ +∞

0

[x1(s)]2 + [x2(s)]2 + [u(s)]2

ds.

For each δ and ε, find the solution P = P (δ, ε) of the algebraicRiccati equation

ATP + PA− PBBTP +Q = 0

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and compute the optimal feedback gain

K(δ, ε) = BTP (δ, ε).

Discuss these solutions as δ −→ 0 and ε −→ 0.

Problem 12.6 Let Re ≥ 0 be a constant and consider the LQRcontrol problem for the system

d

dt

[x1(t)x2(t)

]=

[−1 Re

0 −2

] [x1(t)x2(t)

]+

[11

]u(t)

with quadratic cost

J(u(·)) =

∫ +∞

0

[x1(s)]2 + [x2(s)]2 + [u(s)]2

ds.

For each Re > 0, find the solution P = P (Re) of the algrbraicRiccati equation

ATP + PA− PBBTP +Q = 0

and compute the optimal feedback gain

K(Re) = BTP (Re).

Discuss these solutions as Re −→ 0 and as Re −→ +∞.


d

dt

[x1(t)x2(t)

]=

[0 10 0

] [x1(t)x2(t)

]+

[01

]u(t).

Show that this system is controllable. Find the minimum energycontrol that transfers a state x1 to zero in time t1 = 2. Hint: thematrix exponential is given by

eAt =

[1 t0 1

].

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii



d

dt

[x1(t)x2(t)

]=

[−1 e2t

0 −1

] [x1(t)x2(t)

]+

[0e−t

]u(t).

Show that this system is completely controllable for all t1 > 0. Findthe minimum energy control that transfers the state x1 = [1 1]T

to zero in time t1 = 2. Hint: the state transition matrix is givenby

Φ(t, s) =

[e−(t−s) 1

2[e(t+s) − e−(t−−3s)]

0 e−(t−s)

].


d

dt

[x1(t)x2(t)

]=

[0 10 0

] [x1(t)x2(t)

]


y(t) =

[2x1(t)x2(t)

]=

[2 00 1

] [x1(t)x2(t)

].

Construct a Luenberger observer for this system.


d

dt

[x1(t)x2(t)

]=

[0 10 0

] [x1(t)x2(t)

]+

[01

]u(t).


y(t) =

[2x1(t)x2(t)

]=

[2 00 1

] [x1(t)x2(t)

].

Let

Q =

[2 00 1

], R = 1, M =

[1 00 1

], and N = 1.

Construct the Kalman (LQG) observer for this system.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


Advanced Problems

These problems require MatlabTM or some other numericaltoolbox.


d

dt

[x1(t)x2(t)

]=

[−0.4 Re

0 −0.2

] [x1(t)x2(t)

]+

[01

]u(t),

with output

y(t) = x1(t) + x2(t) =[

1 1] [ x1(t)

x2(t)

],

where Re ≥ 0 is a constant. Let

Q =

[1 00 1

], R = 1, M =

[1 00 1

], and N = 1.

For each Re = 1 and 2, compute the LQR gain K(Re) =R−1BTP (Re) and the Kalman filter gain F (Re) = ΠCTN−1, whereP (Re) and Π(Re) satisfy

ATP + PA− PBR1BTP +Q = 0

andAΠ + ΠAT − ΠCTN−1CΠ +M = 0,

respectively. Construct the optimal LQG closed-loop system andsimulate the response to the initial conditions

x0 = [1 0]T , x0 = [0 0]T and x0 = [0 1]T ,

with initial state estimates xe,0 = (0.95) · x0.

Problem 12.12 Consider the nonlinear control system

d

dt

[x1(t)x2(t)

]=

[−0.4 Re

0 −0.2

] [x1(t)x2(t)

]+√

[x1(t)]2 + [x1(t)]2[

0 −1+1 0

] [x1(t)x2(t)

]+

[01

]u(t).

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


where Re = 2. Use the LQR optimal controller from Problem12.11 above and simulate the nonlinear response to the initial con-ditions

x0 = [1 0]T , x0 = [0 0]T and x0 = [0 1]T .

Problem 12.13 Consider the nonlinear control system

d

dt

[x1(t)x2(t)

]=

[−0.4 Re

ε −0.2

] [x1(t)x2(t)

]+√

[x1(t)]2 + [x1(t)]2[

0 −1+1 0

] [x1(t)x2(t)

]+

[01

]u(t).

where Re = 2 and ε = 0.05. Use the LQR optimal controller fromProblem 12.11 above and simulate the nonlinear response to theinitial conditions

x0 = [1 0]T , x0 = [0 0]T and x0 = [0 1]T .


xN(t) = ANxN(t) +BNu(t),

where for each N = 2, 4, 8, 16, ..., the N ×N system matrix An isgiven by

AN = (N + 1)2

−2 1 0 · · · 0 01 −2 1 0 · · · 0

0 1 −2 1. . .

......

.... . . . . . . . . 0

0 0 · · · 1 −2 10 0 · · · 0 1 −2

N×N

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii


and

BN =

1/(N + 1)2/(N + 1)3/(N + 1)

...(N − 1)/(N + 1)

N/(N + 1)

For N = 2, 4, 8, use MatlabTM to solve the LQR problem whereQ = IN×N is the N × N identity matrix and R = 1. (This sys-tem comes from approximating an optimal control problem for heattransfer in a rod.)

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Bibliography

[1] B. M. Adams, H. T. Banks, H. D. Kwon, and H. T. Tran.Dynamic multidrug therapies for hiv: Optimal and sti con-trol approaches. Mathematical Biosciences and Engineering,1(2):223–241, 2004.

[2] R. A. Adams. Sobolev Spaces. Academic Press, New York,1975.

[3] J. H. Ahlberg, E. N. Nilson, and J. L. Walsh. The Theoryof Splines and Their Applications. Academic Press, 1967.

[4] P. S. Aleksandrov, V. G. Boltyanskii, R. V. Gamkrelidze,and E. F. Mishchenko. Lev Semenovich Pontryagin (on hissixtieth birthday). Uspekhi Mat. Nauk, 23:187–196, 1968.

[5] B. D. Anderson and J. B. Moore. Linear Optimal Control.Prentice-Hall, Englewood Cliffs, 1971.

[6] B. D. O. Anderson and J. B. Moore. Optimal Control: LinearQuadratic Methods. Prentice-Hall, Inc., Upper Saddle River,1990.

[7] P. J. Antsaklis and A. N. Michel. Linear Systems.Birkhauser, Boston, 2005.

[8] J. Appel, E. Cliff, M. Gunzburger, and A. Godfrey.Optimization-based design in high-speed flows. In CFD forDesign and Optimization, ASME, New York, pages 61–68,1995.

519

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

520 Bibliography

[9] E. Arian and S. Ta’asan. Multigrid one shot methods for op-timal control problems: Infinite dimensional control. Tech-nical Report ICASE Report No. 94–52, NASA, Langley Re-search Center, Hampton VA 23681–0001, 1994.

[10] A. Armaou and M. A. Demetriou. Optimal actuator/sensorplacement for linear parabolic PDEs using spatial H2 norm.Chemical Engineering Science, 61(22):7351–7367, 2006.

[11] B. S. Attili and M. I. Syam. Efficient shooting method forsolving two point boundary value problems. Chaos, Solitonsand Fractals, 35(5):895–903, 2008.

[12] J. P. Aubin. Approximation of Elliptic Boundary-ValueProblems. RE Krieger Pub. Co., 1972.

[13] S. Badrinarayanan and N. Zabaras. A sensitivity analysisfor the optimal design of metal-forming processes. ComputerMethods in Applied Mechanics and Engineering, 129(4):319–348, 1996.

[14] H. T. Banks, S. Hu, T. Jang, and H. D. Kwon. Modellingand optimal control of immune response of renal transplantrecipients. Journal of Biological Dynamics, 6(2):539–567,2012.

[15] R. G. Bartle. The Elements of Real Analysis: Second Edition.John Wiley & Sons, New York, 1976.

[16] T. Basar and P. Bernhard. H∞-Optimal Control and RelatedMinimax Design Problems: A Dynamic Game Approach.Birkhauser, 1995.

[17] T. Basar and P. Bernhard. H-Infinity Optimal Control andRelated Minimax Design Problems: a Dynamic Game Ap-proach. Birkhauser, Boston, 2008.

[18] J. Z. Ben-Asher. Optimal Control Theory with AerospaceApplications. AIAA, 2010.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Bibliography 521

[19] J. A. Bennett and M. E. Botkin. Structural shape optimiza-tion with geometric description and adaptive mesh refine-ment. AIAA Journal, 23(3):458–464, 1985.

[20] A. Bensoussan and A. Friedman. Nonlinear variational in-equalities and differential games with stopping times. Jour-nal of Functional Analysis, 16(3):305–352, 1974.

[21] A. Bensoussan and J. L. Lions. Applications of Varia-tional Inequalities in Stochastic Control, volume 12. North-Holland, 1982.

[22] A. Bensoussan, G. Da Prato, M. C. Delfour, and S. K. Mit-ter. Representation and Control of Infinite Dimensional Sys-tems - I. Systems & Control: Foundations & Applications.Birkhauser, Boston, 1992.

[23] A. Bensoussan, G. Da Prato, M. C. Delfour, and S. K. Mit-ter. Representation and Control of Infinite Dimensional Sys-tems - II. Systems & Control: Foundations & Applications.Birkhauser, Boston, 1992.

[24] L. D. Berkovitz. Optimal Control Theory, volume 12 of Ap-plied Mathematical Sciences. Springer-Verlag, New York,1974.

[25] J. T. Betts. Practical Methods for Optimal Control UsingNonlinear Programming. Advances in Design and Control.Society for Industrial and Applied Mathematics, Philadel-phia, 2001.

[26] F. Billy, J. Clairambault, and O. Fercoq. Optimisationof cancer drug treatments using cell population dynamics.Mathematical Methods and Models in Biomedicine, pages263–299, 2012.

[27] G. A. Bliss. Calculus of Variations. Carus MathematicalMonographs, Chicago, 1925.

[28] G. A. Bliss. Normality and abnormality in the calculus ofvariations. Trans. Amer. Math. Soc, 43:365–376, 1938.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

522 Bibliography

[29] G. A. Bliss. Lectures on the Calculus of Variations. PhoenixScience Series, University of Chicago Press, Chicago, 1963.

[30] V. G. Boltyanski. The maximum principle - how it came tobe? report 526. Technical report, Mathematisches Institut,1994.

[31] O. Bolza. Lectures on the Calculus of Variations. Universityof Chicago Press, Chicago, 1904.

[32] J. Borggaard. The Sensitivity Equation Method for OptimalDesign. PhD thesis, Virginia Tech, Blacksburg, VA, Decem-ber 1994.

[33] J. Borggaard. On the presence of shocks in domain optimiza-tion of Euler flows. In M. Gunzburger, editor, Flow Control,volume 68 of Proceedings of the IMA. Springer-Verlag, 1995.

[34] J. Borggaard and J. Burns. A sensitivity equationapproach to optimal design of nozzles. In Proc. 5thAIAA/USAF/NASA/ISSMO Symposium on Multidisci-plinary Analysis and Optimization, pages 232–241, 1994.

[35] J. Borggaard and J. Burns. A sensitivity equation approachto shape optimization in fluid flows. In M. Gunzburger,editor, Flow Control, volume 68 of Proceedings of the IMA.Springer-Verlag, 1995.

[36] J. Borggaard and J. Burns. Asymptotically consistent gra-dients in optimal design. In N. Alexandrov and M. Hussaini,editors, Multidisciplinary Design Optimization: State of theArt, pages 303–314, Philadelphia, 1997. SIAM Publications.

[37] J. Borggaard and J. Burns. A PDE sensitivity equationmethod for optimal aerodynamic design. Journal of Com-putational Physics, 136(2):366–384, 1997.

[38] J. Borggaard, J. Burns, E. Cliff, and M. Gunzburger. Sensi-tivity calculations for a 2D, inviscid, supersonic forebody

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Bibliography 523

problem. In H.T. Banks, R. Fabiano, and K. Ito, edi-tors, Identification and Control of Systems Governed by Par-tial Differential Equations, pages 14–24, Philadelphia, 1993.SIAM Publications.

[39] J. Borggaard, J. Burns, and L. Zietsman. On using lqg per-formance metrics for sensor placement. In American ControlConference (ACC), 2011, pages 2381–2386. IEEE, 2011.

[40] J. Borggaard, E. Cliff, J. Burkardt, M. Gunzburger, H. Kim,H. Lee, J. Peterson, A. Shenoy, and X. Wu. Algorithms forflow control and optimization. In J. Borggaard, J. Burkardt,M. Gunzburger, and J. Peterson, editors, Optimal Designand Control, pages 97–116. Birkhauser, 1995.

[41] J. Borggaard and D. Pelletier. Computing design sensitivi-ties using an adaptive finite element method. In Proceedingsof the 27th AIAA Fluid Dynamics Conference, 1996.

[42] J. Borggaard and D. Pelletier. Observations in adaptiverefinement strategies for optimal design. In J. Borggaard,J. Burns, E. Cliff, and S. Schreck, editors, ComputationalMethods for Optimal Design and Control, pages 59–76.Birkhauser, 1998.

[43] J. Borggaard and D. Pelletier. Optimal shape design inforced convection using adaptive finite elements. In Proc.36th AIAA Aerospace Sciences Meeting and Exhibit, 1998.AIAA Paper 98-0908.

[44] F. Brezzi and M. Fortin. Mixed and Hybrid Finite Elements.Springer-Verlag, New York, 1991.

[45] R. W. Brockett. Finite Dimensional Linear Systems. Wiley,New York, 1970.

[46] A. E. Bryson and Y. C. Ho. Applied Optimal Control. JohnWiley & Sons, New York, 1975.

[47] A.E. Bryson Jr. Optimal control-1950 to 1985. Control Sys-tems Magazine, IEEE, 16(3):26–33, 1996.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

524 Bibliography

[48] C. Buck. Advanced Calculus. McGraw Hill, New York, 1978.

[49] J. A. Burns, E. M. Cliff, C. Rautenberg, and L. Zietsman.Optimal sensor design for estimation and optimization ofpde systems. In American Control Conference (ACC), 2010,pages 4127–4132. IEEE, 2010.

[50] J. A. Burns and B. B. King. Optimal sensor location forrobust control of distributed parameter systems. In Proceed-ings of the 33rd IEEE Conference on Decision and Control,volume 4, pages 3967–3972. IEEE, 1994.

[51] J. A. Burns, B. B. King, and Y. R. Ou. A computationalapproach to sensor/actuator location for feedback control offluid flow systems. In J.D. Paduano, editor, Sensing, Actu-ation, and Control in Aeropropulsion, volume 2494 of Proc.International Society for Optical Engineering, pages 60–69,1995.

[52] D. Bushaw. Optimal discontinuous forcing terms. In Con-tributions to the Theory of Non-linear Oscillations, pages29–52, Princeton, 1958. Princeton University Press.

[53] C. Caratheodory. Die methode der geodatischenaquidistanten und das problem von lagrange. Acta Math-ematica, 47(3):199–236, 1926.

[54] C. Caratheodory. Calculus of Variations and Partial Dif-ferential Equations of First Order. American MathematicalSociety, Providence, 1999.

[55] P. G. Ciarlet. The Finite Element Method for Elliptic Prob-lems. North-Holland, Amsterdam, 1978.

[56] P. G. Ciarlet. Introduction to Numerical Linear Algebra andOptimisation. Cambridge Press, Cambridge, 1989.

[57] F. H. Clarke. Necessary Conditions for Nonsmooth Problemsin Optimal Control and the Calculus of Variations. PhDthesis, University of Washington, 1973.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Bibliography 525

[58] E. A. Coddington and N. Levinson. Theory of OrdinaryDifferential Equations. McGraw-Hill, New York, 1972.

[59] R. Courant and K. L. Friedrichs. Supersonic Flow and ShockWaves. Interscience Publishers, New York, 1948.

[60] M. G. Crandall, H. Ishii, and P. L. Lions. Users guide to vis-cosity solutions of second order partial differential equations.Bull. Amer. Math. Soc, 27(1):1–67, 1992.

[61] M. G. Crandall and P. L. Lions. Viscosity solutionsof hamilton-jacobi equations. Trans. Amer. Math. Soc,277(1):1–42, 1983.

[62] J. J. Crivelli, J. Foldes, P. S. Kim, and J. R. Wares. Amathematical model for cell cycle-specific cancer virother-apy. Journal of Biological Dynamics, 6(sup1):104–120, 2012.

[63] R. F. Curtain and H. Zwart. An Introduction to Infinite-Dimensional Linear Systems Theory. Springer, New York-Berlin - Heidelberg, 1995.

[64] B. Dacorogna. Direct Methods in the Calculus of Variations,volume 78. Springer, 2007.

[65] B. Dacorogna. Introduction to the Calculus of Variations.Imperial College Press, London, 2nd edition, 2009.

[66] N. Darivandi, K. Morris, and A. Khajepour. Lq optimalactuator location in structures. In American Control Con-ference (ACC), 2012, pages 646–651. IEEE, 2012.

[67] M. C. Delfour and J. P. Zolesio. Shape sensitivity analysisvia MinMax differentiability. SIAM Journal of Control andOptimization, 26(4), July 1988.

[68] M. C. Delfour and J. P. Zolesio. Velocity method andlagrangian formulation for the computation of the shapehessian. SIAM Journal of Control and Optimization,29(6):1414–1442, November 1991.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

526 Bibliography

[69] M. A. Demetriou. Integrated actuator-sensor placement andhybrid controller design of flexible structures under worstcase spatiotemporal disturbance variations. Journal of In-telligent Material Systems and Structures, 15(12):901, 2004.

[70] M.A. Demetriou and J. Borggaard. Optimization of a jointsensor placement and robust estimation scheme for dis-tributed parameter processes subject to worst case spatialdisturbance distributions. In American Control Conference,2004. Proceedings of the 2004, volume 3, pages 2239–2244.IEEE, 2004.

[71] P. Dorato, V. Cerone, and C. T. Abdallah. Linear-QuadraticControl. Prentice Hall, Englewood Cliffs, 1995.

[72] J. C. Doyle. Guaranteed margins for LQG regulators. Trans-actions on Automatic Control, 23:756–757, 1981.

[73] J. C. Doyle, B. A. Francis, and A. R. Tannenbaum. FeedbackControl Theory. Macmillan Publishing Company, New York,1992.

[74] M. Engelhart, D. Lebiedz, and S. Sager. Optimal controlfor selected cancer chemotherapy ode models: A view onthe potential of optimal schedules and choice of objectivefunction. Mathematical Biosciences, 229(1):123–134, 2011.

[75] L. C. Evans. On solving certain nonlinear partial differentialequations by accretive operator methods. Israel Journal ofMathematics, 36(3):225–247, 1980.

[76] L. C. Evans. Partial Differential Equations. Graduate Stud-ies in Mathematics: Vol. 19. American Mathematical Society,Providence, 1998.

[77] George M. Ewing. Calculus of Variations with Applications.Dover Publications, New York, 1985.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Bibliography 527

[78] A. L. Faulds and B. B. King. Sensor location in feedbackcontrol of partial differential equation systems. In Proceed-ings of the 2000 IEEE International Conference on ControlApplications, pages 536–541. IEEE, 2000.

[79] W. H. Fleming and R. W. Rishel. Deterministic and Stochas-tic Optimal Control. Springer, New York, 1975.

[80] A. R. Forsyth. Calculus of Variations. Cambridge UniversityPress, New York, 1927.

[81] B. A. Francis. A Course in H∞-Control Theory, LectureNotes in Control, Vol. 88. Springer-Verlag, New York, 1987.

[82] P. D. Frank and G. R. Shubin. A comparison ofoptimization-based approaches for a model computationalaerodynamics design problem. Journal of ComputationalPhysics, 98:74–89, 1992.

[83] R. H. Gallagher and O. C. Zienkiewicz, editors. OptimumStructural Design: Theory and Application, John Wiley &Sons, New York, 1973.

[84] R. V. Gamkrelidze. Discovery of the maximum principle.Journal of Dynamical and Control Systems, 5(4):437–451,1999.

[85] R. V. Gamkrelidze. Discovery of the maximum principle.Mathematical Events of the Twentieth Century, pages 85–99, 2006.

[86] I. M. Gelfand and S. V. Fomin. Calculus of Variations.Prentice-Hall, Inc., Englewood Cliffs, 1963.

[87] I. M. Gelfand and S. V. Fomin. Calculus of Variations.Prentice-Hall, Inc., Englewood Cliffs, 1963.

[88] I. M. Gelfand, G. E. Shilov, E. Saletan, N. I. A. Vilenkin,and M. I. Graev. Generalized Functions, volume 1. AcademicPress, New York, 1968.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

528 Bibliography

[89] M. Giaquinta. Multiple Integrals in the Calculus of Varia-tions and Nonlinear Elliptic Systems.(AM-105), volume 105.Princeton University Press, 1983.

[90] A. Godfrey. Using sensitivities for flow analysis. InJ. Borggaard, J. Burns, E. Cliff, and S. Schreck, editors,Computational Methods for Optimal Design and Control,pages 181–196. Birkhauser, 1998.

[91] H. H. Goldstine. A History of the Calculus of Variationsfrom the 17th through the 19th Century. Springer-Verlag,1980.

[92] G. H. Golub and C. F. Van Loan. Matrix Computations:Third Edition. Johns Hopkins University Press, Baltimore,1996.

[93] L. M. Graves. The Derivative as Independent Function in theCalculus of Variations. PhD thesis, University of Chicago,Department of Mathematics, 1924.

[94] L. M. Graves. A transformation of the problem of Lagrangein the calculus of variations. Transactions of the AmericanMathematical Society, 35(3):675–682, 1933.

[95] L. M. Graves. Calculus of Variations and Its Applications.McGraw-Hill, New York, 1958.

[96] M. Gunzburger. Finite Element Methods for Viscous Incom-pressible Flows. Academic Press, 1989.

[97] H. Halkin. A generalization of LaSalle’s bang-bang principle.Journal of the Society for Industrial & Applied Mathematics,Series A: Control, 2(2):199–202, 1964.

[98] J. Haslinger and P. Neittaanmaki. Finite Element Approxi-mation for Optimal Shape Design: Theory and Application.John Wiley & Sons, 1988.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Bibliography 529

[99] E. J. Haug, K. K. Choi, and V. Komkov. Design SensitivityAnalysis of Structural Systems, volume 177 of Mathematicsin Science and Engineering. Academic Press, Orlando, FL,1986.

[100] H. Hermes and J. P. LaSalle. Functional Analysis and TimeOptimal Control. Number 56 in Mathematics in Science andEngineering. Academic Press, New York, 1969.

[101] M. R. Hestenes. Calculus of Variations and Optimal ControlTheory. John Wiley & Sons, New York, 1966.

[102] M. R. Hestenes. Optimization Theory: The Finite Dimen-sional Case. Wiley, 1975.

[103] R. Holsapple, R. Venkataraman, and D. Doman. A modi-fied simple shooting method for solving two point boundaryvalue problems. In Proc. of the IEEE Aerospace Conference,Big Sky, MT, 2003.

[104] R. Holsapple, R. Venkataraman, and D. Doman. New,fast numerical method for solving two-point boundary-valueproblems. Journal of Guidance Control and Dynamics,27(2):301–303, 2004.

[105] A. Isidori. Nonlinear Control Systems. Springer, 1995.

[106] M. Itik, M. U. Salamci, and S. P. Banks. Optimal controlof drug therapy in cancer treatment. Nonlinear Analysis:Theory, Methods & Applications, 71(12):e1473–e1486, 2009.

[107] V. G. Ivancevic and T. T. Ivancevic. Applied Differential Ge-ometry: A Modern Introduction. World Scientific PublishingCompany Incorporated, 2007.

[108] A. Jameson. Aerodynamic design via control theory. Journalof Scientific Computing, 3:233–260, 1988.

[109] A. Jameson and J. Reuther. Control theory basedairfoil design using the Euler equations. In Proc.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

530 Bibliography

5th AIAA/USAF/NASA/ISSMO Symposium on Multidis-ciplinary Analysis and Optimization, pages 206–222, 1994.

[110] T. Kailath. Linear Systems. Prentice-Hall, 1980.

[111] J. H. Kane and S. Saigal. Design sensitivity analysis of solidsusing BEM. Journal of Engineering Mechanics, 114:1703–1722, 1988.

[112] H. B. Keller. Numerical Methods for Two-Point Boundary-Value Problems. Blaisdell, 1968.

[113] C. T. Kelley and E. W. Sachs. Quasi-Newton methods andunconstrained optimal control problems. SIAM Journal onControl and Optimization, 25:1503, 1987.

[114] H. W. Knobloch, A. Isidori, and D. Flockerzi. Topics inControl Theory. Birkhauser Verlag, 1993.

[115] H. Kwakernaak and R. Sivan. Linear Optimal Control Sys-tems. Willey-Interscience, New York, 1972.

[116] S. Lang. Fundamentals of Differential Geometry, volume160. Springer Verlag, 1999.

[117] U. Ledzewicz, M. Naghnaeian, and H. Schattler. Opti-mal response to chemotherapy for a mathematical model oftumor–immune dynamics. Journal of Mathematical Biology,64(3):557–577, 2012.

[118] U. Ledzewicz and H. Schattler. Multi-input optimal con-trol problems for combined tumor anti-angiogenic and ra-diotherapy treatments. Journal of Optimization Theory andApplications, pages 1–30, 2012.

[119] E. B. Lee and L. Markus. Foundations of Optimal ControlTheory, The SIAM Series in Applied Mathematics. JohnWiley & Sons, 1967.

[120] G. Leitmann. The Calculus of Variations and Optimal Con-trol. Springer, Berlin, 1981.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Bibliography 531

[121] N. Levinson. Minimax, Liapunov and “bang-bang,” Journalof Differential Equations, 2:218–241, 1966.

[122] D. Liberzon. Calculus of Variations and Optimal ControlTheory: A Concise Introduction. Princeton University Press,Princeton, 2012.

[123] J. L. Lions. Optimal Control of Systems Governed by PartialDifferential Equations, vol. 170 of Grundlehren Math. Wiss.Springer-Verlag, Berlin, 1971.

[124] J. L. Lions. Optimal Control of Systems Governed by PartialDifferential Equations, vol. 170 of Grundlehren Math. Wiss.Springer-Verlag, 1971.

[125] J.-L. Lions. Control of Distributed Singular Systems.Gauthier-Villars, Kent, 1985.

[126] J. L. Lions and E. Magenes. Non-homogeneous boundaryvalue problems and applications, Vol. 1, 2. Springer, 1972.

[127] J. L. Lions and E. Magenes. Non-homogeneous BoundaryValue Problems and Applications. Volume 3 (Translation ofProblemes aux limites non homogenes et applications). 1973.

[128] J. L. Lions, E. Magenes, and P. Kenneth. Non-homogeneousBoundary Value Problems and Applications, volume 1.Springer-Verlag Berlin, 1973.

[129] D. G. Luenberger. Observing the state of a linear sys-tem. Military Electronics, IEEE Transactions on, 8(2):74–80, 1964.

[130] D. G. Luenberger. Observers for multivariable systems.Automatic Control, IEEE Transactions on, 11(2):190–197,1966.

[131] D. G. Luenberger. Optimization by Vector Space Methods.John Wiley & Sons, New York, 1969.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

532 Bibliography

[132] D. G. Luenberger. An introduction to observers. AutomaticControl, IEEE Transactions on, 16(6):596–602, 1971.

[133] E. J. McShane. The calculus of variations from the be-ginning through optimal control theory. In W. G. Kelley,A. B. Schwarzkopf and S. B. Eliason, editors, Optimal Con-trol and Differential Equations, pages 3 –49. Academic Press,New York, 1978.

[134] E. J. McShane. The calculus of variations from the beginningthrough optimal control theory. SIAM Journal on Controland Optimization, 27:916, 1989.

[135] M. Mesterton-Gibbons. A Primer on the Calculus of Vari-ations and Optimal Control Theory, volume 50. AmericanMathematical Society, Providence, 2009.

[136] A. Miele. Theory of Optimum Aerodynamic Shapes. Aca-demic Press, New York, 1965.

[137] A. Miele and H.Y. Huang. Missile shapes of minimum ballis-tic factor. Journal of Optimization Theory and Applications,1(2):151–164, 1967.

[138] R. K. Miller and A. N. Michel. Ordinary Differential Equa-tions. Dover Publications, New York, 2007.

[139] K. Morris. Linear-quadratic optimal actuator location.Automatic Control, IEEE Transactions on, 56(1):113–124,2011.

[140] J. M. Murray. Optimal control for a cancer chemotheraphyproblem with general growth and loss functions. Mathemat-ical Biosciences, 98(2):273–287, 1990.

[141] J. M. Murray. Some optimal control problems in can-cer chemotherapy with a toxicity limit. Mathematical Bio-sciences, 100(1):49–67, 1990.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Bibliography 533

[142] R. Narducci, B. Grossman, and R. Haftka. Design sensitivityalgorithms for an inverse design problem involving a shockwave. Inverse Problems in Engineering, 2:49–83, 1995.

[143] M. Z. Nashed. Some remarks on variations and differentials.American Mathematical Monthly, pages 63–76, 1966.

[144] L. W. Neustadt. Optimization: A Theory of Necessary Con-ditions. Princeton University Press, 1976.

[145] G. Newbury. A Numerical Study of a Delay DifferentialEquation Model for Breast Cancer. PhD thesis, VirginiaPolytechnic Institute and State University, Blacksburg, Vir-ginia, 2007.

[146] K. O. Okosun, O. D. Makinde, and I. Takaidza. Impact ofoptimal control on the treatment of hiv/aids and screening ofunaware infectives. Applied Mathematical Modelling, 2012.

[147] B. O’neill. Elementary Differential Geometry. AcademicPress, second edition, 1997.

[148] J. C. Panetta and K. R. Fister. Optimal control applied tocell-cycle-specific cancer chemotherapy. SIAM Journal onApplied Mathematics, 60(3):1059–1072, 2000.

[149] H. J. Pesch and M. Plail. The maximum principle of optimalcontrol: A history of ingenious ideas and missed opportuni-ties. Control and Cybernetics, 38(4A):973–995, 2009.

[150] O. Pironneau. Optimal Shape Design for Elliptic Systems.Springer-Verlag, 1983.

[151] BN Pshenichny. Convex analysis and optimization. Moscow:Nauka, 25:358, 1980.

[152] C. W. Ray and B. A. Batten. Sensor placement for flexiblewing shape control. In American Control Conference (ACC),2012, pages 1430–1435. IEEE, 2012.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

534 Bibliography

[153] W. T. Reid. Ordinary Differential Equations. John Wileyand Sons, New York, 1971.

[154] W. T. Reid. A historical note on the maximum principle.SIAM Review, 20:580, 1978.

[155] I. Rhee and J. L. Speyer. A game theoretic controller and itsrelationship to H∞ and linear-exponential-gaussian synthe-sis. In Decision and Control, 1989, Proceedings of the 28thIEEE Conference on, pages 909–915. IEEE, 1989.

[156] A. C. Robinson. A survey of optimal control of distributed-parameter systems. Automatica, 7(3):371–388, 1971.

[157] D. L. Russell. Mathematics of Finite Dimensional ControlSystems: Theory and Design. M. Dekker, 1979.

[158] H. Sagan. Introduction to the Calculus of Variations. CourierDover Publications, 1992.

[159] M. H. Schultz. Spline Analysis. Prentice-Hall, 1973.

[160] L. Schumaker. Spline Functions: Basic Theory. CambridgeUniversity Press, 2007.

[161] L. Schwartz. Theorie des distributions. ii, hermann. Paris,France, 1959.

[162] L. F. Shampine, J. Kierzenka, and M. W. Reichelt. Solvingboundary value problems for ordinary differential equationsin Matlab with bvp4c. Manuscript, available at ftp://ftp.mathworks. com/pub/doc/papers/bvp, 2000.

[163] L. L. Sherman, A. C. Taylor III, L. L. Green, P.A. New-man, G.J.-W. Hou, and V.M. Korivi. First- and second-order aerodynamic sensitivity derivatives via automatic dif-ferentiation with incremental iterative methods. In Proc.5th AIAA/USAF/NASA/ISSMO Symposium on Multidisci-plinary Analysis and Optimization, pages 87–120, 1994.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Bibliography 535

[164] D. B. Silin. On the variation and riemann integrability of anoptimal control in linear systems. Dokl. Akad. Nauk SSSR,257(3):548–550, 1981. English Translation in Soviet Math.Doklady, 23(2): 309–311, 1981.

[165] M. H. N. Skandari, H. R. Erfanian, A. V. Kamyad, andS. Mohammadi. Optimal control of bone marrow in cancerchemotherapy. European Journal of Experimental Biology,2(3):562–569, 2012.

[166] D. R. Smith. Variational Methods in Optimization. Prentice-Hall, Englewood Cliffs, 1974.

[167] D. R. Smith. Variational Methods in Optimization. CourierDover Publications, 1998.

[168] D. Stewart. Numerical Methods for Accurate Computationof Design Sensitivities. PhD thesis, Virginia Tech, 1998.

[169] J. Stoer, R. Bulirsch, W. Gautschi, and C. Witzgall. Intro-duction to Numerical Analysis. Springer Verlag, 2002.

[170] G. Strang and G. Fix. An Analysis of the Finite ElementMethod. Prentice-Hall, Englewood Cliffs, 1973.

[171] G. Strang and G. Fix. An Analysis of the Finite ElementMethod. Wellesley-Cambridge Press, Wellesley, 1988.

[172] H. J. Sussmann and J. C. Willems. 300 years of optimalcontrol: from the brachystochrone to the maximum principle.IEEE Control Systems Magazine, 17(3):32–44, 1997.

[173] G. W. Swan. Role of optimal control theory in cancerchemotherapy. Mathematical Biosciences, 101(2):237–284,1990.

[174] S. Ta’asan. One shot methods for optimal control of dis-tributed parameter systems I: Finite dimensional control.Technical Report ICASE Report No. 91-2, NASA LangleyResearch Center, Hampton, VA, 1991.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

536 Bibliography

[175] M. Tahk and J. Speyer. Modeling of parameter variationsand asymptotic LQG synthesis. Automatic Control, IEEETransactions on, 32(9):793–801, 1987.

[176] A. E. Taylor. General Theory of Functions and Integration.Dover Publications, 2010.

[177] A. C. Taylor III, G. W. Hou, and V. M. Korivi. A method-ology for determining aerodynamic sensitivity derivativeswith respect to variation of geometric shape. In Proc.AIAA/ASME/ASCE/AHS/ASC 32nd Structures, Struc-tural Dynamics, and Materials Conference, Baltimore, MD,April 1991. AIAA paper 91-1101.

[178] A. C. Taylor III, G. W. Hou, and V. M. Korivi. Sensitivityanalysis, approximate analysis and design optimization forinternal and external viscous flows. In Proc. AIAA AircraftDesign Systems and Operations Meeting, 1991. AIAA paper91-3083.

[179] R. Temam. Navier-Stokes Equations: Theory and NumericalAnalysis, volume 2 of Studies in Mathematics and its Ap-plications. North-Holland Publishing Company, New York,1979.

[180] J. L. Troutman. Variational Calculus with Elementary Con-vexity. Springer-Verlag, 1983.

[181] D. Ucinski. Optimal sensor location for parameter estima-tion of distributed processes. Int. J. Control, 73:1235–1248,2000.

[182] F.A. Valentine. Convex Sets. McGraw Hill, New York, 1964.

[183] F. Y. M. Wan. Introduction to the Calculus of Variationsand Its Applications. Chapman & Hall/CRC, 1995.

[184] T. J. Willmore. An Introduction to Differential Geometry.Dover Publications, 2012.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Bibliography 537

[185] W. Yang, W. Y. Yang, W. Cao, T. S. Chung, and J. Mor-ris. Applied Numerical Methods Using MATLAB. Wiley-Interscience, 2005.

[186] L. C. Young. Lectures on the Calculus of Variations andOptimal Control Theory. Chelsea, 1969.

[187] K. Zhou, J. C. Doyle, and K. Glover. Robust and OptimalControl. Prentice Hall, Englewood Cliffs, NJ, 1996.

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Index

Bellman, Richard F., 14Bernoulli, James, 11Bernoulli, John, 8, 11Bliss, Gilbert A., 14Boltyanskii, Vladimir G., 16Bolza, Oskar, 13Boundary Condition

Natural, 218, 220Boundary Value Problem

Strong Solution, 77Weak Solution, 77

Brachistochrone Curve, 7Bushaw, Donald W., 13

Caratheodory, Constantin,14

ConditionErdmann Corner, 109,

139, 206Legendre, 144Strengthened Legendre,

144Transversality, 227, 349,

377, 443, 445Conjugate Point, 151, 210Conjugate Value, 151, 210Constraint

Control, 314, 341, 374Finite, 263Isoperimetric, 255

ControlBang-Bang, 327Feedback, 331Nominal, 474Optimal, 314, 317, 319,

343, 375, 379Steering, 75

ControllableCompletely, 460

ControllerAdmissible, 314, 317,

318, 342, 374, 378Min-Max, 511

Corner, 62, 110

Darboux, Jean G., 14Derivative

Left-Hand, 31Partial, 30Right-Hand, 31Second Weak, 119Weak, 118

DifferentialWeak, 271Weak Second, 271

DirectionAdmissible, 68Right Admissible, 68

DistanceHausdorff, 321

539

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

540 Index

Point to Set, 320Du Bois-Reymond, Paul D.

G., 12

e.f., 62Energy

Kinetic, 289Potential, 289Total, 289

EquationAdjoint, 347Algebraic Riccati, 506Continuous Sensitivity,

478Euler’s

Differential Form, 221Integral Form, 221

Heat, 296Hilbert’s, 185Jacobi’s

Differential Form, 149Integral Form, 149, 210

Riccati Differential, 419,481, 484, 505

State, 318, 342, 374, 378Stochastic Differential,

502Estimator

State, 490Euler’s Equation

Differential Form, 108,138

Integral Form, 108, 138Euler, Leonard, 11Ewing, George M., 15Excess Function Expansion,

194

Extremal, 109, 205, 221Non-singular, 112, 206Regular, 112, 144, 206Secondary, 149, 210

FieldConservative, 288of Extremals, 188

Figurative, 110Filter

Kalman, 494, 497, 498,503

Finite Element Method, 26,76, 295

FormulaVariation of Parameters,

320, 462Function

Constraint, 42, 312Continuous, 33Control, 341, 373Convex, 276

Strictly, 276Cost, 42, 312Differentiable, 34Hat, 79, 298Locally Piecewise

Smooth, 225Measurable, 499Piecewise Continuous, 57Piecewise Smooth, 59Slope, 187Smooth, 33Weierstrass Excess, 155,

192, 206Function Spaces

Ck(I;R1), 33

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Index 541

Ck(I;Rn), 34PWC(t0,+∞), 63PWC(t0, t1), 62PWC(t0, t1;Rn), 245PWS(t0,+∞), 63, 225PWS(t0, t1), 62PWS(t0, t1;Rn), 246PWSp(t0, t1), 242PWS0(t0, t1;Rn), 246V0(t0, t1), 95V0(t0, t1;Rn), 246V p

0 (t0, t1), 242VL(t0, t1), 217

Functional, 93, 216, 315Cost, 274, 317, 319, 343,

374, 378Farmer’s Allocation

Cost, 395General High Order, 253Higher Order, 241Vector Form, 240, 250

Galilei, Galileo, 11Gamkrelidze, Revaz V., 16Graimian

Controllability, 462Observability, 462

Graves, Lawrence M., 13

Hamilton, William R., 12Hamiltonian, 293

Augmented, 347Time Dependent

Augmented, 441Hestenes, Magnus R., 13Hilbert Integral, 190Hilbert, David, 13

InequalityGronwall-Reid-Bellman,

15Triangle, 30

IntegralAction, 289

IntegrandRegular, 206

Jacobi, Karl G. J., 12, 146

Kneser, Adolf, 12

Lagrange, Joseph L., 11Lagrangian, 44, 258, 289LaSalle, Joesph P., 13Legendre, Adrien-Marie, 12Leibniz’s Formula, 104, 213Lemma

Du Bois-Reymond, 95Fundamental

Calculus of Variations,95

LimitLeft-Hand, 31Right-Hand, 31

MatrixControl Gain, 491Controllability, 468Covariance, 502Mass, 82, 300Non-singular, 84Observability, 468Observer Gain, 491Stable, 471Stiffness, 82, 300Transition, 462

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

542 Index

Mayer, Christian G. A., 15McShane, Edward J., 13Measure, 498

Finite, 499Probability, 499Sigma-Finite, 499

MetricStrong, 64Weak, 65, 133

MinimizerNormal, 45, 259

MinimumGlobal, 35, 134, 205Local, 35Proper Local, 35Strong Local, 134, 205Weak Local, 134, 205

Mischenko, Evgenii F., 16Morse, Harold C. M., 15

Necessary ConditionAbstract, 267Euler, 107, 138, 205, 220Jacobi, 146, 151Legendre, 143, 144Second Erdmann Corner,

158Second Order, 139Weak Weierstrass, 176Weierstrass, 155, 207

NeighborhoodDelta, 30Strong, 65, 132, 133

Newton’s Laws, 287First, 287Second, 287Third, 287

Newton, Sir Isaac, 11Noise

Sensor, 460, 502White, 460, 502

Normp-norm, 29Euclidean, 29General, 30

ObservableCompletely, 461

Observer, 490Luenberger, 494Optimal, 498

OutputControlled, 507Sensed, 460, 491

Perron’s Paradox, 40Point

Conjugate, 151, 210Core, 270Internal, 270Radial, 270

Pontryagin, Lev S., 14, 16Principle

Hamilton’s, 291Least Action, 288, 289Maximum, 341, 343, 347,

436Nonautonomous, 442

Pontryagin Maximum,347

Nonautonomous, 442Stationary Action, 291Weierstrass Maximum,

157, 208Problem

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

Index 543

Accessory Minimum,147, 210

Brachistochrone, 17, 69Bushaw, 333, 351Double Pendulum, 20Equality Constrained

Optimization, 42,312

Estimation, 503Optimal, 503

Farmer’s Allocation, 393Fixed-Time Optimal

Control, 373Free Endpoint, 215, 216General Optimization,

265Higher Order, 238, 239Isoperimetric, 42, 255Linear Quadratic

Control, 404, 414,446

Linear QuadraticGaussian, 509

LQR, 506Minimum Surface Area,

72Newton’s Minimum

Drag, 3, 8Non-singular, 110, 206Point to Curve, 225Queen Dido, 7Regular, 110, 206Relaxed, 41River Crossing, 73Rocket Sled, 21, 313Secondary Minimum, 147Simplest

Calculus of Variations,93, 316

Higher Order Form,241, 253

Optimal Control, 341Vector Form, 240, 249,

250Time Optimal, 74Vector, 238

ProcessStochastic, 500

Reid, William T., 15Riccati

Differential Equation,422

Riccati, Jacopo F., 11

SensitivityRaw, 475

Sensitivity EquationMethod, 9

SequenceMinimizing, 36

SetAttainable, 320Constraint, 54, 312

Equality, 42Convex, 275Initial, 341, 373Level, 54Measurable, 499Terminal, 341, 373

Sigma - algebra, 498Solution

Strong, 77, 296Weak, 77, 78, 297

Space

ii

“K16538” — 2013/7/25 — 10:41 — ii

ii

ii

544 Index

Event, 499Measurable, 498Measure, 499Probability, 499Sample, 499

StabilityAsymptotic, 471Exponential, 471

StateAdjoint, 347Augmented, 343

Sufficient ConditionAbstract, 275Sufficient Condition (1),

196, 211Sufficient Condition (2),

196, 212Support Plane, 322System

Adjoint, 327Augmented Control, 345Autonomous, 475Closed-Loop, 472Co-state, 327Control, 342Detectable, 472Disturbance, 502Dual, 469Linearized, 475Stabilizable, 472

TheoremFundamental Field, 194Fundamental Sufficiency,

193Hilbert’s

Differentiability, 111,

139, 206Implicit Function, 229Inverse Function, 49Inverse Mapping, 48Lagrange Multiplier,

42–44, 48, 257Trajectory

Nominal, 474

Valentine, Frederick A., 15Variable

Gaussian Random, 503Random, 499, 503

VariationAdmissible, 270First, 68, 103, 213, 271Gateaux, 271Second, 69, 271Second Gateaux, 271

Weierstrass, Karl T. W., 12

Young, Lawrence C., 13

Zermelo, Ernst F. F., 15

Introduction to the Calculus of Variations and Control with Modern Applications

Documents

Transcript of Introduction to the Calculus of Variations and Control with Modern Applications