Optimization Methods for Circuit Design - Compendiummediatum.ub.tum.de/doc/1086708/1086708.pdf ·...

Technische Universitat MunchenDepartment of Electrical Engineering and Information TechnologyInstitute for Electronic Design Automation

Optimization Methods for Circuit Design

Compendium

H. Graeb

Version 2.8 (WS 12/13) Michael ZwergerVersion 2.0 - 2.7 (WS 08/09 - SS 12) Michael EickVersion 1.0 - 1.2 (SS 07 - SS 08) Husni Habal

Presentation follows:H. Graeb, Analog Design Centering and Sizing, Springer, 2007.R. Fletcher, Practical Methods of Optimization, John Wiley & Sons, 2nd Edition, 2000.

Status: October 12, 2012

Copyright 2008 - 2012Optimization Methods for Circuit DesignCompendiumH. GraebTechnische Universitat MunchenInstitute for Electronic Design AutomationArcisstr. 2180333 Munich, [email protected]: +49-89-289-23679

All rights reserved.

Contents

1 Introduction 1

1.1 Parameters, performance, simulation . . . . . . . . . . . . . . . . . . . . . 1

1.2 Performance specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Minimum, minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Unconstrained optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.5 Constrained optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.6 Classification of optimization problems . . . . . . . . . . . . . . . . . . . . 4

1.7 Classification of constrained optimization problems . . . . . . . . . . . . . 4

1.8 Structure of an iterative optimization process . . . . . . . . . . . . . . . . 5

1.8.1 . . . without constraints . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.8.2 . . . with constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.8.3 Trust-region approach . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Optimality conditions 9

2.1 Optimality conditions – unconstrained optimization . . . . . . . . . . . . . 9

2.1.1 Necessary first-order condition for a local minimum of an uncon-strained optimization problem . . . . . . . . . . . . . . . . . . . . . 10

2.1.2 Necessary second-order condition for a local minimum of an uncon-strained optimization problem . . . . . . . . . . . . . . . . . . . . . 11

2.1.3 Sufficient and necessary conditions for second-order derivative∇2f (x?)to be positive definite . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Optimality conditions – constrained optimization . . . . . . . . . . . . . . 13

2.2.1 Feasible descent direction r . . . . . . . . . . . . . . . . . . . . . . 13

2.2.2 Necessary first-order conditions for a local minimum of a constrainedoptimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.3 Necessary second-order condition for a local minimum of a con-strained optimization problem . . . . . . . . . . . . . . . . . . . . . 15

2.2.4 Sensitivity of the optimum with regard to a change in an activeconstraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Optimization of Analog Circuits i

3 Unconstrained optimization 17

3.1 Univariate unconstrained optimization, line search . . . . . . . . . . . . . . 17

3.1.1 Wolfe-Powell conditions . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.2 Backtracking line search . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.3 Bracketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.4 Sectioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.5 Golden Sectioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.1.6 Line search by quadratic model . . . . . . . . . . . . . . . . . . . . 26

3.1.7 Unimodal function . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Multivariate unconstrained optimization without derivatives . . . . . . . . 29

3.2.1 Coordinate search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.2 Polytope method (Nelder-Mead simplex method) . . . . . . . . . . 31

3.3 Multivariate unconstrained optimization with derivatives . . . . . . . . . . 34

3.3.1 Steepest descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.2 Newton approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.3 Quasi-Newton approach . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.4 Levenberg-Marquardt approach (Newton direction plus trust region) 37

3.3.5 Least-squares (plus trust-region) approach . . . . . . . . . . . . . . 38

3.3.6 Conjugate-gradient (CG) approach . . . . . . . . . . . . . . . . . . 40

4 Constrained optimization – problem formulations 47

4.1 Quadratic Programming (QP) . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.1 QP – linear equality constraints . . . . . . . . . . . . . . . . . . . . 47

4.1.2 QP - inequality constraints . . . . . . . . . . . . . . . . . . . . . . . 51

4.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Sequential Quadratic programming (SQP), Lagrange–Newton . . . . . . . 54

4.2.1 SQP – equality constraints . . . . . . . . . . . . . . . . . . . . . . . 54

4.2.2 Penalty function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Optimization of Analog Circuits ii

5 Statistical parameter tolerances 57

5.1 Univariate Gaussian distribution (normal distribution) . . . . . . . . . . . 58

5.2 Multivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 59

5.3 Transformation of statistical distributions . . . . . . . . . . . . . . . . . . 62

5.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6 Expectation values and their estimators 65

6.1 Expectation values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1.2 Linear transformation of expectation value . . . . . . . . . . . . . . 66

6.1.3 Linear transformation of variance . . . . . . . . . . . . . . . . . . . 66

6.1.4 Translation law of variances . . . . . . . . . . . . . . . . . . . . . . 66

6.1.5 Normalizing a random variable . . . . . . . . . . . . . . . . . . . . 66

6.1.6 Linear transformation of a normal distribution . . . . . . . . . . . . 67

6.2 Estimation of expectation values . . . . . . . . . . . . . . . . . . . . . . . . 68

6.2.1 Expectation value estimator . . . . . . . . . . . . . . . . . . . . . . 68

6.2.2 Variance estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.2.3 Variance of the expectation value estimator . . . . . . . . . . . . . 69

6.2.4 Linear transformation of estimated expectation value . . . . . . . . 70

6.2.5 Linear transformation of estimated variance . . . . . . . . . . . . . 70

6.2.6 Translation law of estimated variance . . . . . . . . . . . . . . . . . 70

Optimization of Analog Circuits iii

7 Worst-case analysis 71

7.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.2 Typical tolerance regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.3 Classical worst-case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.3.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.3.2 Linear performance model . . . . . . . . . . . . . . . . . . . . . . . 73

7.3.3 Peformance type f ↑ ”good”, specification type f > fL . . . . . . . 74

7.3.4 Performance type f ↓ ”good”, specification type f < fU . . . . . . . 75

7.4 Realistic worst-case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.4.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.4.2 Peformance type f ↑ ”good”, specification type f ≥ fL: . . . . . . . 77

7.4.3 Performance type f ↓ ”good”, specification type f ≤ fU . . . . . . . 78

7.5 General worst-case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.5.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.5.2 Peformance type f ↑ ”good”, specification type f ≥ fL: . . . . . . . 80

7.5.3 Performance type f ↓ ”good”, specification type f ≤ fU : . . . . . . 81

7.5.4 General worst-case analysis with tolerance box . . . . . . . . . . . . 81

7.6 Input/output of worst-case analysis . . . . . . . . . . . . . . . . . . . . . . 82

7.7 Summary of discussed worst-case analysis problems . . . . . . . . . . . . . 84

Optimization of Analog Circuits iv

8 Yield analysis 85

8.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8.1.1 Acceptance function . . . . . . . . . . . . . . . . . . . . . . . . . . 86

8.1.2 Parametric yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

8.2 Statistical yield analysis/Monte-Carlo analysis . . . . . . . . . . . . . . . . 87

8.2.1 Variance of yield estimator . . . . . . . . . . . . . . . . . . . . . . . 88

8.2.2 Estimated variance of yield estimator . . . . . . . . . . . . . . . . . 88

8.2.3 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 90

8.3 Geometric yield analysis for linearized performance feature (”realistic geo-metric yield analysis”) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.3.1 Yield partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

8.3.2 Defining worst-case distance βWL as difference from nominal per-formance to specification bound as multiple of standard deviationσf of linearized performance feature (βWL-sigma design) . . . . . . 93

8.3.3 Yield partition as a function of worst-case distance βWL . . . . . . . 93

8.3.4 Worst-case distance βWL defines tolerance region . . . . . . . . . . 94

8.3.5 specification type f ≤ fU . . . . . . . . . . . . . . . . . . . . . . . . 95

8.4 Geometric yield analysis for nonlinear performance feature (”general geo-metric yield analysis”) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.4.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.4.2 Advantages of geometric yield analysis . . . . . . . . . . . . . . . . 97

8.4.3 Lagrange function and first-order optimality conditions of problem (336) 98

8.4.4 Lagrange function of problem (337) . . . . . . . . . . . . . . . . . . 98

8.4.5 Second-order optimality condition of problem (336) . . . . . . . . . 99

8.4.6 Worst-case distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.4.7 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

8.4.8 Overall yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8.4.9 Consideration of range parameters . . . . . . . . . . . . . . . . . . 102

Optimization of Analog Circuits v

9 Yield optimization/design centering/nominal design 103

9.1 Optimization objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

9.2 Derivatives of optimization objectives . . . . . . . . . . . . . . . . . . . . . 104

9.3 Problem formulations of analog optimization . . . . . . . . . . . . . . . . . 105

9.4 Analysis, synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

9.5 Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

9.6 Nominal design, tolerance design . . . . . . . . . . . . . . . . . . . . . . . 106

9.7 Optimization without/with constraints . . . . . . . . . . . . . . . . . . . . 107

10 Sizing rules for analog circuit optimization 109

10.1 Single (NMOS) transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

10.1.1 Sizing rules for single transistor that acts as a voltage-controlledcurrent source (VCCS) . . . . . . . . . . . . . . . . . . . . . . . . . 110

10.2 Transistor pair: current mirror (NMOS) . . . . . . . . . . . . . . . . . . . 111

10.2.1 Sizing rules for current mirror . . . . . . . . . . . . . . . . . . . . . 111

A Matrix and vector notations 113

A.1 Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

A.2 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

A.3 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

A.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

A.5 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

A.6 Determinant of a quadratic matrix . . . . . . . . . . . . . . . . . . . . . . 116

A.7 Inverse of a quadratic non-singular matrix . . . . . . . . . . . . . . . . . . 116

A.8 Some properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

B Abbreviated notations of derivatives using the nabla symbol 119

C Norms 121

Optimization of Analog Circuits vi

D Pseudo-inverse, singular value decomposition (SVD) 123

D.1 Moore-Penrose conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

D.2 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 124

E Linear equation system, rectangular system matrix with full rank 125

E.1 underdetermined system of equations . . . . . . . . . . . . . . . . . . . . . 125

E.2 overdetermined system of equations . . . . . . . . . . . . . . . . . . . . . . 126

E.3 determined system of equations . . . . . . . . . . . . . . . . . . . . . . . . 127

F Partial derivatives of linear, quadratic terms in matrix/vector notation129

G Probability space 131

H Convexity 133

H.1 Convex set K ∈ Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

H.2 Convex function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Optimization of Analog Circuits vii

Optimization of Analog Circuits viii

1 Introduction

1.1 Parameters, performance, simulation

design parameters xd ∈ Rnxd e.g. transistor widths, capacitances

statistical parameters xs ∈ Rnxs e.g. oxide thickness, threshold voltage

range parameters xr ∈ Rnxr e.g. operational parameters:

supply voltage, temperature

(circuit) parameters x =[xTd xTs xTr

]Tperformance feature fi e.g. gain, bandwidth, slew rate,

phase margin, delay, power

(circuit) performance f = [· · · fi · · · ]T ∈ Rnf

(circuit) simulation x 7−→ f(x) e.g. SPICE

abstraction from physical level !

A design parameter and a statistical parameter may refer to the same physical parameter.E.g., an actual CMOS transistor width is the sum of a design parameter Wk and astatistical parameter ∆W . Wk is the specific width of transistor Tk while ∆W is a widthreduction that varies globally and equally for all the transistors on a die.

A design parameter and a statistical parameter may be identical.

1.2 Performance specification

performance specification feature (upper or lower limit on a performance):

fi ≥ fL,i or fi ≤ fU,i (1)

number of performance specification features:

nf ≤ nPSF ≤ 2nf (2)

performance specification:fL,1 ≤ f1 (x) ≤ fU,1

...

fL,nf ≤ fnf (x) ≤ fU,nf

⇐⇒ fL ≤ f(x) ≤ fU (3)

Optimization of Analog Circuits 1

(a)

(b)

f

xglobal minimumweak local minimumstrong local minimum

x

f

Figure 1. Smooth function (a), i.e. continuous and differentiable at least several times ona closed region of the domain. Non-smooth continuous function (b).

1.3 Minimum, minimization

without loss of generality: optimum ≡ minimum

because: max f ≡ −min−f(4)

min ≡

minimum, i.e., a result

minimize, i.e., a process(5)

min f(x) ≡ f (x)!→ min (6)

min f(x) −→ x∗, f(x∗) = f ∗ (7)

1.4 Unconstrained optimization

f ∗ = min f(x) ≡ minxf ≡ min

xf(x) ≡ minf(x)

x∗ = argmin f(x) ≡ argminxf ≡ argmin

xf(x) ≡ argminf(x)

(8)


1.5 Constrained optimization

min f(x) s.t. ci(x) = 0 , i ∈ Eci(x) ≥ 0 , i ∈ I

(9)

E: set of equality constraintsI: set of inequality constraints

Alternative formulationsminxf s.t. x ∈ Ω (10)

minx∈Ω

f (11)

min f(x) | x ∈ Ω (12)

where,

Ω =

x

∣∣∣∣∣ ci(x) = 0, i ∈ Eci(x) ≥ 0, i ∈ I

Lagrange function

combines objective function and constraints in a single expression

L(x,λ) = f(x)−∑i∈E∪I

λi · ci(x) (13)

λi: Lagrange multiplier associated with constraint i


1.6 Classification of optimization problems

deterministic, stochastic The iterative search process is deterministic or ran-dom.

continuous, discrete Optimization variables can take an infinite number ofvalues, e.g., the set of real numbers, or take a finiteset of values or states.

local, global The objective value at a local optimal point is betterthan the objective values of other points in its vicinity.The objective value at a global optimal point is betterthan the objective values of any other point.

scalar, vector In a vector optimization problem, multiple objectivefunctions shall be optimized simultaneously (multiple-criteria optimization, MCO). Usually, objectives haveto be traded off with each other. A Pareto-optimalpoint is characterized in that one objective can only beimproved at the cost of another. Pareto optimizationdetermines the set of all Pareto-optimal points. Scalaroptimization refers to a single objective. A vectoroptimization problem is scalarized by combining themultiple objectives into a single overall objective, e.g.,by a weighted sum, least-squares, or min/max.

constrained, unconstrained Besides the objective function that has to be opti-mized, constraints on the optimization variables maybe given as inequalities or equalities.

with or without derivatives The optimization process may be based on gradients(first derivative) or on gradients and Hessians (secondderivative), or it may not require any derivatives ofthe objective/constraint functions.

1.7 Classification of constrained optimization problems

objective function constraint functions

linear ∧ linear linear programming

quadratic ∧ linear quadratic programming

nonlinear ∨ nonlinear nonlinear programming

convex linear equality convex programming

constraints (local ≡ global minimum)

concave inequality

constraints


1.8 Structure of an iterative optimization process

1.8.1 . . . without constraints

Taylor series of a function f about iteration point x(κ) :

f(x) = f(x(κ)) + ∇f(x(κ)

)T · (x− x(κ))

+1

2

(x− x(κ)

)T · ∇2f(x(κ)

)·(x− x(κ)

)+ . . . (14)

= f (κ) + g(κ)T ·(x− x(κ)

)+

1

2

(x− x(κ)

)T ·H(κ) ·(x− x(κ)

)+ . . . (15)

f (κ) : value of function f at point x(κ)

g(κ) : gradient (first derivative, direction of steepest ascent) at point x(κ)

H(κ) : Hessian matrix (second derivative) at point x(κ)

Taylor series about search direction r starting from point x(κ):

x(r) = x(κ) + r (16)

f(x(κ) + r) ≡ f(r) = f (κ) + g(κ)T · r +1

2rT ·H(κ) · r + . . . (17)

Taylor series about step length along search direction r(κ) starting from point x(κ):

x(α) = x(κ) + α · r(κ) (18)

f(x(κ) + α · r(κ)) ≡ f(α) = f (κ) + g(κ)T · r(κ) · α +1

2r(κ)T ·H(κ) · r(κ) · α2 + . . . (19)

= f (κ) + ∇f (α = 0) · α +1

2∇2f (α = 0) · α2 + . . . (20)

∇f (α = 0) : slope of f along direction r(κ)

∇2f (α = 0) : curvature of f along r(κ)

repeat

determine the search direction r(κ)

determine the step length α(κ) (line search)

x(κ+1) = x(κ) + α(κ) · r(κ)

κ := κ+ 1

until termination criteria are fulfilled

Steepest-descent approach

search direction: direction of steepest descent, i.e., r(κ) = −g(κ)


x0 x?

Figure 2. Visual illustration of the steepest-descent approach for Rosenbrock’s functionf (x1, x2) = 100(x2 − x2

1)2 + (1 − x1)2. A backtracking line search is applied (see Sec.3.1.2, page 19) with an initial x(0) = [−1.0, 0.8]T and α(0) = 1, α := c3 · α. The searchterminates when the Armijo condition is satisfied with c1 = 0.7, c3 = 0.6.


1.8.2 . . . with constraints

• Constraint functions and objective functions are combined in an unconstrained op-timization problem in each iteration step

– Lagrange formulation

– penalty function

– Sequential Quadratic Programming (SQP)

• Projection on active constraints, i.e. into subspace of an unconstrained optimizationproblem in each iteration step

– active-set methods

1.8.3 Trust-region approach

model of the objective function:

f(x) ≈ m(x(κ) + r

)(21)

minr

m(x(κ) + r

)s.t. r ∈ trust region (22)

e.g., ‖r‖ < ∆

search direction and step length are computed simultaneouslytrust region to consider the model accuracy


2 Optimality conditions

2.1 Optimality conditions – unconstrained optimization

Taylor series of the objective function around the optimum point x?:

f(x) = f(x?)︸︷︷︸f?

+ ∇f (x?)T︸︷︷︸g?T

· (x− x?)

+1

2(x− x?)T · ∇2f (x?)︸︷︷︸

H?

· (x− x?) + . . . (23)

f ? : value of the function at the optimum x?

g? : gradient at the optimum x?

H? : Hessian matrix at the optimum x?

For x = x? + r close to the optimum:

f(r) = f ? + g?T · r +1

2rT ·H? · r + . . . (24)

x? is optimal ⇐⇒ there is no descent direction, r, such that f (r) < f ?.

x1

x2

f (x) < f(x(κ)

)f (x) = f(x(κ)

)f (x) > f(x(κ)

)∇f(x(κ)

)gradient

x(κ) direction

steepest descentdirection

Figure 3. Descent directions from x(κ): shaded area.


2.1.1 Necessary first-order condition for a local minimum of an unconstrainedoptimization problem

∀r6=0

g?T · r ≥ 0 (25)

g? = ∇f (x?) = 0 (26)

x?: stationary point

descent direction r: ∇f(x(κ)

)T · r < 0

steepest descent direction: r = −∇f(x(κ)

)

Figure 4. Quadratic functions: (a) minimum at x?, (b) maximum at x?, (c) saddle pointat x?, (d) positive semidefinite with multiple minima along trench.


2.1.2 Necessary second-order condition for a local minimum of an uncon-strained optimization problem

∀r6=0

rT · ∇2f (x?) · r ≥ 0 ⇐⇒ ∇2f (x?) is positive semidefinite

⇐⇒ f has non-negative curvature(27)

sufficient:

∀r6=0

rT · ∇2f (x?) · r > 0 ⇐⇒ ∇2f (x?) is positive definite

⇐⇒ has positive curvature(28)

Figure 5. Contour plots of quadratic functions that are (a),(b) positive or negative definite,(c) indefinite (saddle point), (d) positive or negative semidefinite.


2.1.3 Sufficient and necessary conditions for second-order derivative ∇2f (x?)to be positive definite

• all eigenvalues > 0

• has a Cholesky decomposition:

∇f 2 (x?) = L · LT with lii > 0

∇f 2 (x?) = L ·D · LT with lii = 1 and dii > 0(29)

• all pivot elements during gaussian elimination without pivoting > 0

• all principal minors are > 0

x1

x2

∇f(x(κ)

)ci = const

x(κ)

descent

descentunconstrained

unconstrained

f = const

∇ci(x(κ)

)

Figure 6. Dark shaded area: unconstrained directions according to (35), light shaded area:descent directions according to (34), overlap: unconstrained descent directions. When nodirection satisfies both (34) and (35) then the cross section is empty and the current pointis a local minimum of the function.


2.2 Optimality conditions – constrained optimization

2.2.1 Feasible descent direction r

descent direction: ∇f(x(κ)

)T · r < 0 (30)

feasible direction: ci(x(κ) + r

)' ci

(x(κ)

)+∇ci

(x(κ)

)T · r ≥ 0 (31)

Inactive constraint: i is inactive ⇐⇒ ci(x(κ)

)> 0

then each r with ‖r‖ < ε satisfies (31), e.g.,

r = −ci(x(κ)

)‖∇ci (x(κ)) ‖ · ‖∇f (x(κ)) ‖

· ∇f(x(κ)

)(32)

(32) in (31) gives:

ci(x(κ)

)·

[1−

∇ci(x(κ)

)T · ∇f (x(κ))

‖∇ci (x(κ)) ‖ · ‖∇f (x(κ)) ‖

]≥ 0 (33)

where,

−1 ≤∇ci

(x(κ)

)T · ∇f (x(κ))

‖∇ci (x(κ)) ‖ · ‖∇f (x(κ)) ‖≤ 1

Active constraint (Fig. 6):i is active ⇐⇒ ci

(x(κ)

)= 0

then (30) and (31) become:

∇f(x(κ)

)T · r < 0 (34)

∇ci(x(κ)

)T · r ≥ 0 (35)

no feasible descent direction exists: no vector r satisfies both (34) and (35) at x?:

∇f (x?) = λ?i · ∇ci (x?) with λ?i ≥ 0 (36)

no statement about sign of λ?i in case of an equality constraint (ci = 0 ⇐⇒ ci ≤ 0∧ci ≥ 0)


2.2.2 Necessary first-order conditions for a local minimum of a constrainedoptimization problem

x?, f ? = f (x?) ,λ?,L? = L (x?,λ?)

Karush-Kuhn-Tucker (KKT) conditions

∇L (x?) = 0 (37)

ci (x?) = 0 i ∈ E (38)

ci (x?) ≥ 0 i ∈ I (39)

λ?i ≥ 0 i ∈ I (40)

λ?i · ci (x?) = 0 i ∈ E ∪ I (41)

(37) is analogous to (26)(13) and (37) give:

∇f (x?)−∑

i∈A(x?)

λ?i · ∇ci (x?) = 0 (42)

A (x?) is the set of active constraints at x?

A (x?) = E ∪ i ∈ I | ci (x?) = 0 (43)

(41) is called the complementarity condition, Lagrange multiplier is 0 (inactive constraint)or constraint ci (x

?) is 0 (active constraint).

from (41) and (13):L? = f ? (44)


2.2.3 Necessary second-order condition for a local minimum of a constrainedoptimization problem

f (x? + r) = L (x? + r,λ?) (45)

= L (x?,λ?) + rT · ∇L (x?)︸︷︷︸0

+1

2rT · ∇2L (x?)︸︷︷︸

...

·r + · · · (46)

= f ? +1

2rT · [

...︷︸︸︷∇2f (x?)−

∑i∈A(x?)

λ?i · ∇2ci (x?)] · r + · · · (47)

for each feasible stationary direction r at x?, i.e.,

Fr =

r

∣∣∣∣∣∣∣∣r 6= 0

∇ci (x?)T · r ≥ 0 i ∈ A (x?) \A?+∇ci (x?)T · r = 0 i ∈ A?+ =

j ∈ A (x?)

∣∣j ∈ E ∨ λ?j > 0 (48)

necessary:

∀r∈Fr

rT · ∇2L (x?) · r ≥ 0 (49)

sufficient:∀

r∈Fr

rT · ∇2L (x?) · r > 0 (50)


2.2.4 Sensitivity of the optimum with regard to a change in an active con-straint

perturbation of an active constraint at x? by ∆i ≥ 0

ci (x) ≥ 0→ ci (x) ≥ ∆i (51)

L (x,λ,∆) = f (x)−∑i

λi · (ci (x)−∆i) (52)

∇f ? (∆i) = ∇L? (∆i)

=

(∂L∂xT

· ∂x∂∆i

+∂L∂λT

· ∂λ∂∆i

+∂L∂∆T

i

)∣∣∣∣x?,λ?︸︷︷︸

0T︸︷︷︸0T

=∂L∂∆i

∣∣∣∣x?,λ?

= λ?i

∇f ? (∆i) = λ?i (53)

Lagrange multiplier: sensitivity to change in an active constraint close to x?


3 Unconstrained optimization

3.1 Univariate unconstrained optimization, line search

single optimization parameter, e.g., step length α

f(α) ≡ f(x(κ) + α · r

)= f

(x(κ)

)︸︷︷︸f (κ)

+ ∇f(x(κ)

)T · r︸︷︷︸∇f(α=0)

·α

+1

2rT · ∇2f

(x(κ)

)· r︸︷︷︸

∇2f(α=0)

·α2 + · · · (54)

error vector:ε(κ) = x(κ) − x? (55)

global convergence:limκ→∞

ε(κ) = 0 (56)

convergence rate: ∥∥ε(κ+1)∥∥ = L ·

∥∥ε(κ)∥∥p (57)

p = 1: linear convergence, L < 1p = 2: quadratic convergence

exact line search is expensive, therefore only find α to:

• obtain sufficient reduction in f

• obtain a big enough step length


3.1.1 Wolfe-Powell conditions

0

f (α)

ααopt

f

f(x(κ)

)

(58) satisfied

(59) satisfied

(60) satisfied

∇f (α = 0)

c2 · ∇f (α = 0)

c1 · ∇f (α = 0)

Figure 7.

• Armijo condition: sufficient objective reduction

f(α) ≤ f(0) + α · c1 · ∇f(x(κ)

)T · r︸︷︷︸∇f(α=0)

(58)

• curvature condition: sufficient gradient increase

∇f(x(κ) + α · r

)T · r︸︷︷︸∇f(α)

≥ c2 · ∇f(x(κ)

)T · r︸︷︷︸∇f(α=0)

(59)

• strong curvature condition: step close to the optimum

|∇f (α) | ≤ c2 · |∇f (α = 0) | (60)

0 < c1 < c2 < 1 (61)


3.1.2 Backtracking line search

0 < c3 < 1

determine step length αt /* big enough, not too big */

WHILE αt violates (58)

[αt := c3 · αt


3.1.3 Bracketing

finding an interval [αlo, αhi] that contains minimum

/* take αlo as previous αhi and find a new, larger αhi until the minimum has

passed by; if necessary exchange αlo, αhi such that f (αlo) < f (αhi) */

αhi := 0

REPEAT

αlo := αhi

determine αhi ∈ [αlo, αmax]

IF αhi violates (58) OR f (αhi) ≥ f (αlo)/* minimum has been significantly passed, because the Armijo condition

is violated or because the new objective value is larger than that at the

lower border of the bracketing interval (Fig. 8) */

GOTO sectioning

IF αhi satisfies (60)/* strong curvature condition satisfied and objective value smaller than

at αlo, i.e., step length found (Fig. 9) */

α? = αhi

STOP

IF ∇f (αhi) ≥ 0/* αhi has lower objective value than αlo, and has passed the minimum

as the objective gradient has switched sign (Fig. 10) */

exchange αhi and αlo

GOTO sectioning


f

ααlo

αhi is somewhere here:

Figure 8.


f

ααlo

Figure 9.

f

ααlo


Figure 10.


3.1.4 Sectioning

sectioning starts with interval [αlo, αhi] with the following properties

• minimum is included

• f (αlo) < f (αhi)

• αlo><αhi

• ∇f (αlo) · (αhi − αlo) < 0

REPEAT

determine αt ∈ [αlo, αhi]

IF αt violates (58) OR f (αt) ≥ f (αlo)[/* new αhi found, αlo remains (Fig. 11) */

αhi := αt

ELSE /* i.e., f (αt) ≤ f (αlo), new αlo found */

IF αt satisfies (60)/* step length found */

α? = αt

STOP

IF ∇f (αt) · (αhi − αlo) ≥ 0/* αt is on the same side from the minimum as αhi, then αlo must

become the new αhi (Fig. 12) */

αhi := αlo

/* Fig. 13 */

αlo := αt


new interval

f

ααtαhi αlo

f

ααtαlo αhi

α′lo α′hinew interval

Figure 11.

new intervalα′lo α′hi

f

ααloαtαhi

f

ααt αhi

α′hi

αlo

α′lonew interval

Figure 12.

α′hi

new intervalα′lo

new intervalα′lo α′hi

f

ααhiαtαlo

f

αhi αloαtα

Figure 13.


3.1.5 Golden Sectioning

golden section ratio:

1− ττ

=τ

1⇐⇒ τ 2 + τ − 1 = 0 : τ =

−1 +√

5

2≈ 0.618 (62)

/* left inner point */

α1 := L+ (1− τ) · |R− L|/* right inner point */

α2 := L+ τ · |R− L|REPEAT

IF f (α1) < f (α2)/* α? ∈ [L, α2], α1 is new right inner point */

R := α2

α2 := α1

α1 := L+ (1− τ) · |R− L|ELSE

/* α? ∈ [α1, R], α2 is new left inner point */

L := α1

α1 := α2

α2 := L+ τ · |R− L|UNTIL ∆ < ∆min


α

f

L Rα1 α2∆

(1 − τ) · ∆ τ · ∆

τ · ∆ (1 − τ) · ∆

τ ·∆(2) (1− τ) ·∆(2)

(1− τ) ·∆(2) τ ·∆(2)

α(2)2L(2) α

(2)1 R(2)

∆(2)

Figure 14.

Accuracy after (κ)-th step

interval size: ∆(κ) = τκ ·∆(0) (63)

required number of iterations κ to reduce interval from ∆(0) to ∆(κ):

κ =

⌈− 1

log τ· log

∆(0)

∆(κ)

⌉≈⌈

4.78 · log∆(0)

∆(κ)

⌉(64)


3.1.6 Line search by quadratic model

m, f

αf (αt)

f(E(κ)

)f(R(κ)

)f(L(κ)

) m(α)

αtE(κ)L(κ) R(κ)

Figure 15.

quadratic model by, e.g.,

• 3 sampling points

• first and second-order derivatives

univariate Newton approach

m (α) = f0 + g · α +1

2· h · α2 (65)

first-order condition:

minm (α)→ ∇m (α) = g + h · αt!

= 0→ αt = −g/h (66)

second-order condition:h > 0 (67)


3.1.7 Unimodal function

f is unimodal over interval [L,R]: there exists exactly one value αopt ∈ [L,R] for whichholds:

∀L<α1<α2<R

α2 < αopt → f (α1) > f (α2) (Fig. 16 (a)) (68)

and ∀L<α1<α2<R

α1 > αopt → f (α1) < f (α2) (Fig. 16 (b)) (69)

α

f

α1α2

(a) (b)

L α1α2αopt R

Figure 16.


α

f

αopt

f

xx0

(a) (b)

Figure 17. Univariate unimodal function (a). Univariate monotone function (b).

Remarks

minimization of univariate unimodal function:interval reduction with f (L) > f (αt) < f (R), αt ∈ [L,R]3 sampling points

root finding of univariate monotone function:interval reduction with sign (f (L)) = −sign (f (R)) , [L,R]two sampling points

∆κ =1

2

(κ)

·∆(0) (70)

κ =

⌈− 1

log 2· log

∆(0)

∆(κ)

⌉≈⌈

3.32 · log∆(0)

∆(κ)

⌉(71)


3.2 Multivariate unconstrained optimization without derivatives

3.2.1 Coordinate search

optimize one parameter at a time, alternating coordinates

ej = [0 . . . 0 1 0 . . . 0]T

↑ j-th position

REPEATFOR j = 1, · · · , nx αt := arg min

αf (x + α · ej)

x := x + αt · ejUNTIL convergence or maximum allowed iterations reached

good for ”losely coupled”,”uncorrelated” parameters, e.g., ∇2f (x) diagonal matrix:

x1

f = const

x?

x2

x

1-st run of REPEAT loop

Figure 18. Coordinate search with exact line search for quadratic objective function withdiagonal Hessian matrix.


(a)

(b)

x?

x?

2nd run of REPEAT loop

1st run of REPEAT loop

Figure 19. Coordinate search with exact line search for quadratic objective function (a).Steepest-descent method for quadratic objective function (b).


3.2.2 Polytope method (Nelder-Mead simplex method)

x2x3

x1

f = const

f

x1

x2

Figure 20.

nx-simplex is the convex hull of a set of parameter vector vertices xi, i = 1, · · · , nx + 1the vertices are ordered such that fi = f (xi), f1 ≤ f2 ≤ · · · ≤ fnx+1

Cases of an iteration step

reflection

fr < f1 f1 ≤ fr < fnx fnx ≤ fr < fnx+1 fnx+1 ≤ fr

expansion

insert xr

delete xn+1

reorder

outer contraction inner contraction

fe < fr? fc ≤ fr? fcc < fnx+1?

yes no yes no no yes

xn+1 = xe xn+1 = xr xn+1 = xc reduction xn+1 = xcc

x1,v2, . . . ,vnx+1


Reflection

x2

x1

x3x0

xr

Figure 21.

x0 =1

nx·nx∑i=1

xi (72)

xr = x0 + ρ · (x0 − xnx+1) (73)

ρ: reflection coefficient ρ > 0 (default: ρ = 1)

Expansion

x2

x1

x3x0

xrxe

Figure 22.

xe = x0 + χ · (xr − x0) (74)

χ: expansion coefficient, χ > 1, χ > ρ (default: χ = 2)


Outer contraction

x2

x1

x3x0

xrxc

Figure 23.

xc = x0 + γ · (xr − x0) (75)

γ: contraction coefficient, 0 < γ < 1 (default: γ = 12)

Inner contraction

x2

x1

x3

xccx0

Figure 24.

xcc = x0 − γ · (x0 − xnx+1) (76)

Reduction (shrink)

x3

x2

v3

x1

v2

Figure 25.

vi = x1 + σ · (xi − x1) , i = 2, · · · , nx + 1 (77)

σ: reduction coefficient, 0 < σ < 1 (default: σ = 12)


3.3 Multivariate unconstrained optimization with derivatives

3.3.1 Steepest descent

(sec. 1.8.1, page 5)search direction: negative of the gradient at the current point in the parameter space (i.e.,linear model of the objective function)

3.3.2 Newton approach

quadratic model of objective function

f(x(κ) + r

)≈ m

(x(κ) + r

)= f (κ) + g(κ)T · r +

1

2rT ·H(κ) · r (78)

g(κ) = ∇f(x(κ)

)H(κ) = ∇2f

(x(κ)

) (79)

minimize m (r)

First-order optimality condition

∇m (r) = 0 → H(κ) · r = −g(κ) → r(κ) (80)

r(κ): search direction for line seachr(κ) is obtained through the solution of a system of linear equations

Second-order optimality condition

∇2m (r) positive (semi)definiteif not:

• steepest descent: r(κ) = −g(κ)

• switch signs of negative eigenvalues

• Levenberg-Marquardt approach:(H(κ) + λ · I

)· r = −g(κ)

(λ→∞: steepest-descent approach)


3.3.3 Quasi-Newton approach

second derivative not available → successive approximation B ≈ H from gradients in thecourse of the optimization process, e.g., B(0) = I

∇f(x(κ+1)

)= ∇f

(x(κ) + α(κ) · r(κ)

)= g(κ+1) (81)

approximate g(κ+1) from quadratic model (78)

g(κ+1) ≈ g(κ) + H(κ) · α(κ) · r(κ) (82)

α(κ) · r(κ) = x(κ+1) − x(κ) (83)

Quasi-Newton condition for approximation of B:

g(κ+1) − g(κ)︸︷︷︸y(κ)

= B(κ+1) · (x(κ+1) − x(κ)︸︷︷︸s(κ)

) (84)


3.3.3.1 Symmetric rank-1 update (SR1)

approach:B(κ+1) = B(κ) + u · vT (85)

substituting (85) in (84)(B(κ) + u · vT

)· s(κ) = y(κ)

u · vT · s(κ) = y(κ) −B(κ) · s(κ)

u =y(κ) −B(κ) · s(κ)

vT · s(κ)(86)

substituting (86) in (85)

B(κ+1) = B(κ) +

(y(κ) −B(κ) · s(κ)

)· vT

vT · s(κ)(87)

because of the symmetry in B:

B(κ+1) = B(κ) +

(y(κ) −B(κ) · s(κ)

)·(y(κ) −B(κ) · s(κ)

)T(y(κ) −B(κ) · s(κ))

T · s(κ)(88)

if y(κ) −B(κ) · s(κ) = 0, then B(κ+1) = B(κ)

SR1 does not guarantee positive definiteness of Balternatives: Davidon-Fletcher-Powell (DFP), Broyden-Fletcher-Goldfarb-Shannon (BFGS)

approximation of B−1 instead of B or of a decomposition of the system matrix of linearequation system (80) to compute the search direction r(κ)

s(κ), κ = 1, · · · , nx linear independent and f quadratic with Hessian H: Quasi-Newtonwith SR1 terminates after not more than nx + 1 steps with B(κ+1) = H


3.3.4 Levenberg-Marquardt approach (Newton direction plus trust region)

minm (r) s.t. ‖r‖2︸︷︷︸rT ·r

≤ ∆ (89)

constraint: trust region of model m (r), e.g., concerning positive definitiveness of B(κ),H(κ)

corresponding Lagrange function:

L (r, λ) = f (κ) + g(κ)T · r +1

2rT ·H(κ) · r− λ ·

(∆− rT · r

)(90)

stationary point ∇L (r) = 0 :

(H(κ) + 2 · λ · I

)· r = −g(κ) (91)

refer to the Newton approach, H(κ) indefinite


3.3.5 Least-squares (plus trust-region) approach

minr‖f (r)− ftarget‖2 s.t. ‖r‖2 ≤ ∆ (92)

linear modelf (r) ≈ f

(κ)(r) = f

(x(κ)

)︸︷︷︸f(κ)0

+∇f(x(κ)T

)︸︷︷︸

S(κ)

·r (93)

∆: trust region of linear model

least-squares difference of linearized objective value to target value (index (κ) left out):

‖ε(r)‖2 = ‖f (r)− ftarget‖2 (94)

= εT (r) · ε(r) = (f0 − ftarget︸︷︷︸ε0

+S · r)T · (ε0 + S · r) (95)

= εT0 · ε0 + 2 · rT · ST · ε0 + rT · ST · S · r (96)

ST · S: part of second derivative of ‖ε(r)‖2

minr‖ε(r)‖2 s.t. ‖r‖2 ≤ ∆ (97)


L (r, λ) = εT0 · ε0 + 2 · rT · ST · ε0 + rT · ST · S · r− λ ·(∆− rT · r

)(98)

stationary point ∇L (r) = 0:

2 · ST · ε0 + 2 · ST · S · r + 2 · λ · r = 0 (99)

(ST · S + λ · I

)· r = −ST · ε0 (100)

λ = 0: Gauss-Newton method

(100) yields r(κ) for a given λ

determine the Pareto front ‖ε(κ)(r(κ))‖ vs. ‖r(κ)‖ for 0 ≤ λ <∞ (Figs. 26, 27)

select r(κ) with small step length and small error (iterative process): x(κ+1) = x(κ) + r(κ)


λ→∞

λ

λ = 0

small step length

‖ε(κ)(r(κ))‖

‖r(κ)‖ = ∆

small

error

‖ε(r(κ))‖

Figure 26. Characteristic boundary curve, typical sharp bend for ill-conditioned problems.Small step length wanted, e.g., due to limited trust region of linear model.

x2 − x(κ)2

x1 − x(κ)1λ→∞ ‖r‖ = ∆

r(κ) (∆)

‖ε(κ) (r) ‖ = const

λ = 0

Figure 27. Characteristic boundary curve of a two-dimensional parameter space.


3.3.6 Conjugate-gradient (CG) approach

3.3.6.1 Optimization problem with a quadratic objective function

fq (x) = bT · x +1

2· xT ·H · x (101)

min fq (x) (102)

nx large, H sparse

first-order optimality conditions leads to a linear equation system for the stationarypoint x?

g (x) = ∇fq (x) = 0(101)→ g (x) = b + H · x = 0 (103)

H · x = −b → x? (104)

x? is computed by solving linear equation system, not by matrix inversion

second-order condition:

∇2fq (x?) = H positive definite (105)


3.3.6.2 Eigenvalue decomposition of a symmetric positive definite matrix H

H = U ·D12 ·D

12 ·UT (106)

D12 : diagonal matrix of square roots of eigenvalues

U: columns are orthogonal eigenvectors, i.e.

U−1 = UT , U ·UT = I (107)


fq (x) = bT · x +1

2·(D

12 ·UT · x

)T·(D

12 ·UT · x

)(108)

coordinate transformation:

x′ = D12 ·UT · x, x = U ·D−

12 · x′ (109)


f ′q (x) = b′T · x′ + 1

2· x′T · x′ (110)

b′ = D−12 ·UT · b (111)

Hessian of transformed quadratic function (110) is unity matrix H′ = I

gradient of f ′q

g′ = ∇f ′q (x′)(110)

= b′ + x′ (112)

transformation of gradient from (103), (106), (109), and (111)

∇f (x) = b + H · x = U ·D12 · b′ + U ·D

12 · x′ (113)

from (112)

g = ∇f (x) = U ·D12 · ∇f ′q (x′) = U ·D

12 · g′ (114)

min fq (x) ≡ min f ′q (x′) (115)

solution in transformed coordinates (g′!

= 0)

x′?

= −b′ (116)


3.3.6.3 Conjugate directions

∀i 6=j

r(i)T ·H · r(j) = 0 (”H orthogonal”) (117)

from (106)

⇐⇒ ∀i 6=j

(D

12 ·UT · r(i)

)T·(D

12 ·UT · r(j)

)= 0

⇐⇒ ∀i 6=j

r(i)′T

· r(j)′ = 0 (118)

transformed conjugate directions are orthogonal

r(i)′T

= D12 ·UT · r(i) (119)

3.3.6.4 Step length

substitute x = x(κ) + α · r(κ) in (101)

fq (α) = bT ·(x(κ) + α · r(κ)

)+

1

2·(x(κ) + α · r(κ)

)T ·H · (x(κ) + α · r(κ))

(120)

∇fq (α) = bT · r(κ) +(x(κ) + α · r(κ)

)T ·H · r(κ) (121)

(103)= ∇fq

(x(κ)

)T︸︷︷︸g(κ)T

·r(κ) + α · r(κ)T ·H · r(κ) (122)

∇fq (α) = 0:

α(κ) =−g(κ)T · r(κ)

r(κ)T ·H · r(κ)(123)

3.3.6.5 New iterative solution

x(κ+1) = x(κ) + α(κ)︸︷︷︸(123)

· r(κ)︸︷︷︸(117)

= x(0) +κ∑i=1

α(i) · r(i) (124)

r(0) = − g(0)︸︷︷︸(103)

(125)

new search direction: combination of actual gradient and previous search direction

r(κ+1) = −g(κ+1) + β(κ+1) · r(κ) (126)

substituting (126) into (117)

− g(κ+1)T ·H · r(κ) + β(κ+1) · r(κ)T ·H · r(κ) = 0 (127)

β(κ+1) =g(κ+1)T ·H · r(κ)

r(κ)T ·H · r(κ)(128)


x1

x2

x(0)

r(0) = −g(0)

x?

r(1)

with r(0)T ·H · r(1) = 0

−g(1)

x(1)

Figure 28.

x′2

x′1

r(0)′

r(1)′

with r(0)′T · r(1)′ = 0

x?

x(0)′

x(1)′

−g(0)′

Figure 29.


3.3.6.6 Some properties

g(κ+1) = b + H · x(κ+1) = b + H · x(κ) + α(κ) ·H · r(κ)

= g(κ) + α(κ) ·H · r(κ) (129)

g(κ+1)T · r(κ) (129)= g(κ)T · r(κ) + α(κ) · r(κ)T ·H · r(κ)

(123)= g(κ)T · r(κ) − g(κ)T · r(κ) · r

(κ)T ·H · r(κ)

r(κ)T ·H · r(κ)

= 0 (130)

i.e., the actual gradient is orthogonal to previous search direction

− g(κ)T · r(κ) (126)= −g(κ)T ·

(−g(κ) + β(κ) · r(κ−1)

)= g(κ)T · g(κ) − β(κ) · g(κ)T · r(κ−1)︸︷︷︸

0 by (130)

= g(κ)T · g(κ) (131)

i.e., descent direction going along conjugate direction

g(κ+1)T · g(κ) (129)= g(κ)T · g(κ) + α(κ) · r(κ)T ·H · g(κ)

(131),(126)= −g(κ)T · r(κ) + α(κ) · r(κ)T ·H ·

(−r(κ) + β(κ) · r(κ−1)

)(117)= −g(κ)T · r(κ) − α(κ) · r(κ)T ·H · r(κ)

(123)= −g(κ)T · r(κ) − g(κ)T · r(κ)

= 0 (132)

i.e., actual gradient is orthogonal to previous gradient


g(κ+1)T ·(−g(κ) + β(κ) · r(κ−1)

)= 0

(132)⇐⇒ g(κ+1)T · r(κ−1) = 0 (133)

i.e., actual gradient is orthogonal to all previous search directions


g(κ+1)T ·(g(κ−1) + α(κ−1) ·H · r(κ−1)

)= 0

(126)⇐⇒ g(κ+1)T · g(κ−1) + α(κ−1) ·(−r(κ+1) + β(κ+1) · r(κ)

)·H · r(κ−1) = 0

(117)⇐⇒ g(κ+1)T · g(κ−1) = 0 (134)

i.e., actual gradient is orthogonal to all previous gradients


3.3.6.7 Simplified computation of CG step length


α(κ) =g(κ)T · g(κ)

r(κ)T ·H · r(κ)(135)

3.3.6.8 Simplified computation of CG search direction

substituting H · r(κ) from (129) into (128)

β(κ+1) =g(κ+1)T · 1

α(κ) ·(g(κ+1) − g(κ)

)r(κ)T ·H · r(κ)

(136)

from (135) and (132)

β(κ+1) =g(κ+1)T · g(κ+1)

g(κ)T · g(κ)(137)

3.3.6.9 CG algorithm

solves optimization problem (102) or linear equation system (103) in at most nx steps

g(0) = b + H · x(0)

r(0) = −g(0)

κ = 0

while r(k) 6= 0

step length: α(κ) =g(κ)T ·g(κ)

r(κ)T ·H·r(κ)⇒ line-search

in nonlinear programming

new point: x(κ+1) = x(κ) + α(κ) · r(κ)

new gradient: g(κ+1) = g(κ) + α(κ) ·H · r(κ) ⇒ gradient computation

in nonlinear programming

Fletcher-Reeves: β(κ+1) =g(κ+1)T ·g(κ+1)

g(κ)T ·g(κ)

new search-direction: r(κ+1) = −g(κ+1) + β(κ+1) · r(κ)

κ := κ+ 1

end while

Polak-Ribiere:

β(κ+1) =g(κ+1)T ·

(g(κ+1) − g(κ)

)g(κ)T · g(κ)

(138)


4 Constrained optimization – problem formulations

4.1 Quadratic Programming (QP)

4.1.1 QP – linear equality constraints

min fq (x) s.t. A · x = c

fq (x) = bT · x + 12xT ·H · x

H: symmetric, positive definite

A<nc×nx>, nc ≤ nx, rank (A) = nc

(139)

A · x = c (140)

underdetermined linear equation system with full rank (Appendix E).


4.1.1.1 Transformation of (139) into an unconstrained optimization problemby coordinate transformation

approach:

x = [ Y<nx×nc> Z]<nx×nx>︸︷︷︸non-singular

·

c<nc>

y

<nx>

= Y · c︸︷︷︸feasible point

of (140)

+ Z · y︸︷︷︸degrees offreedom in

(140)

(141)

A · x = A ·Y︸︷︷︸(143)

·c + A · Z︸︷︷︸(144)

·y = c (142)

A ·Y = I<nc×nc> (143)

A · Z = 0<nc×nx−nc> (144)

φ (y) = fq (x (y)) (145)

= bT · (Y · c + Z · y) +1

2(Y · c + Z · y)T ·H · (Y · c + Z · y) (146)

=

(b +

1

2H ·Y · c

)T·Y · c (147)

+ (b + H ·Y · c)T · Z · y

+1

2yT · ZT ·H · Z · y

min fq (x) s.t. A · x = c ≡ minφ (y) (148)

y?: solution of the unconstrained optimization problem (148), Z · y? lies in the kernel ofA

Stationary point of optimization problem (148): ∇φ (y) = 0(ZT ·H · Z

)︸︷︷︸reduced Hessian

· y = −ZT · (b + H ·Y · c)︸︷︷︸reduced gradient

→ y? → x? = Y · c + Z · y? (149)

Computation of Y, Z by QR-decomposition

from (143) and (483): RT ·QT ·Y = I → Y = Q ·R−T (150)

from (143) and (483): RT ·QT · Z = 0 → Z = Q⊥ (151)


x1

x3

x2

Z · yz1Y ·c

A · x = c ”feasible space”

z2

Figure 30. Y · c: ”minimum length solution”, i.e., orthogonal to plane A · x = c.zi, i = 1, · · · , nx − nc: orthonormal basis of kernel (null-space) of A, i.e. A · (Z · y) = 0.

4.1.1.3 Computation of Lagrange factor λ?

required for QP with inequality constraintsLagrange function of problem (139)

L (x,λ) = fq (x)− λT · (A · x− c) (152)

∇L (x) = 0 : AT · λ = ∇fq (x) (overdetermined)

→ λ?(143)= YT · ∇fq (x?)

(153)

λ? = YT · (b + H · x?) (154)

or, in case of QR-decomposition

AT · λ = b + H · x? (155)

from (483)Q ·R · λ = b + H · x? (156)

R · λ = QT · (b + H · x?) −→backward

substitution

λ? (157)


4.1.1.4 Computation of Y · c in case of a QR-decomposition

A · x = c(483)⇐⇒ RT ·QT · x︸︷︷︸

u

= c

RT · u = c

forward

substitution

−→ u

Y · c (150)= Q ·R−T · c = Q · u (insert u)

(158)

4.1.1.5 Transformation of (139) into an unconstrained optimization problemby Lagrange multipliers

first-order optimality conditions of (139) according to Lagrange function (152)

∇L (x) = 0 :b + H · x−AT · λ = 0

A · x− c = 0

(159)

⇐⇒

[

H −AT

−A 0

]︸︷︷︸Lagrange matrix L

·

[x

λ

]=

[−b

−c

]→ x?,λ? (160)

L: symmetric, positive definite iff ZT ·H · Z positive definite

e.g.,

L−1 =

[K −T

−TT U

](161)

K = Z ·(ZT ·H · Z

)−1 · ZT (162)

T = Y −K ·H ·Y (163)

U = YT ·H ·K ·H ·Y −YT ·H ·Y (164)

factorization of L−1 by factorization of ZT ·H · Z and computation of Y · Z


4.1.2 QP - inequality constraints

min fq (x) s.t. aTi · x = ci, i ∈ EaTj · x ≥ cj, j ∈ I

fq (x) = bT · x + 12xT ·H · x

H: symmetric, positive definite

A<nc×nx>, nc ≤ nx, rank (A) = nc

(165)

A =

...

aTi , i ∈ E...

· · · · · · · · ·...

aTj , j ∈ I...


compute initial solution x(0) that satisfies all constraints, e.g., by linear programming

κ := 0

repeat

/* problem formulation at current iteration point x(κ) + δ

for active constraints A(κ) (active set method) */

min fq (δ) s.t. aTi · δ = 0, i ∈ A(κ) → δ(κ)

case I: δ(κ) = 0 /* i.e., x(κ) optimal for current A(κ) */

compute λ(κ) e.g., by (154) or (157)

case Ia: ∀i∈A(κ)∩I

λ(κ)i ≥ 0, /* i.e., first-order optimality conditions satisfied */[

x? = x(κ); stop

case Ib: ∃i∈A(κ)∩I

λ(κ)i < 0,

/* i.e., constraint becomes inactive, further improvement of f possible */q = arg min

i∈A(κ)∩Iλ

(κ)i ,

/* i.e., q is most sensitive constraint to become inactive */

A(κ+1) = A(κ)\ qx(κ+1) = x(κ)

case II: δ(κ) 6= 0, /* i.e., f can be reduced for current A(κ); step-length computation;

if no new constraint becomes active then α(κ) = 1 (case IIa); else find inequality

constraint that becomes inactive first and corresponding α(κ) < 1 (case IIb) */

α(κ) = min(1, mini /∈ A(κ), aTi · δ(κ) < 0︸︷︷︸

inactive constraints that

approach their lower bound

safety margin bound︷︸︸︷ci − aTi · x(κ)

aTi · δ(κ)︸︷︷︸consumed safety margin

at α(κ) = 1

)

q′ = argmin (–”–)

case IIa: α(κ) = 1[A(κ+1) = A(κ)

case IIb: α(κ) < 1[A(κ+1) = A(κ) ∪ (q′)

x(κ+1) = x(κ) + α(κ) · δ(κ)

κ := κ+ 1


4.1.3 Example

x1

x2

(A)

(B)

(C)

δ(4)

δ(2)

x(6)

x(5)

δ(5)

fq = const

x(1) = x(2)

x(4) = x(3)

Figure 31.

(κ) A(κ) case

(1) A,B Ib

(2) B IIa

(3) B Ib

(4) IIb

(5) C IIa

(6) C Ia

Concerning q = arg mini∈A(1)∩I

λ(κ)i for κ = 1:

∇B

∇A

x(1)

∇fq

Figure 32.

∇fq = λA · ∇A+ λB · ∇B → λA < 0, λB = 0


4.2 Sequential Quadratic programming (SQP), Lagrange–Newton

4.2.1 SQP – equality constraints

min f (x) s.t. c (x) = 0

c<nc>, nc ≤ nx(166)

4.2.1.1 Lagrange function of (166)

L (x,λ) = f (x)− λT · c (x) = f (x)−∑i

λi · ci (x) (167)

∇L (x,λ) =

∇L(x)

∇L (λ)

=

[∇f (x)−

∑i λi · ∇ci (x)

−c (x)

]

=

g(x)−AT (x)︷︸︸︷

[· · · ∇ci (x) · · · ] ·λ−c (x)

(168)

∇2L (x,λ) =

∇2L (x) ∂2L∂x·∂λT

∂2L∂λ·∂xT ∇2L (λ)

=

∇2f (x)−∑

i λi · ∇2ci (x) −AT (x)...

−∇ci (x)T

...

︸︷︷︸

−A(x)

0

(169)


4.2.1.2 Newton approach to (167) in κ-th iteration step

∇L(x(κ) + δ(κ)︸︷︷︸x(κ+1)

,λ(κ) + ∆λ(κ)︸︷︷︸λ(κ+1)

) ≈ ∇L(x(κ),λ(κ)

)︸︷︷︸

∇L(κ)

+∇2L(x(κ),λ(κ)

)︸︷︷︸

∇2L(κ)

·

[δ(κ)

∆λ(κ)

]!

= 0

(170)with (168) and (169)

W(κ)︷︸︸︷∇2f(x(κ))−

∑i

λ(κ)i · ∇2ci

(x(κ)

)−A(κ)T

−A(κ) 0

·[

δ(κ)

∆λ(κ)

]=

[−g(κ) + A(κ)T · λ(κ)

c(κ)

]

(171)

[W(κ) −A(κ)T

−A(κ) 0

]·

[δ(κ)

λ(κ+1)

]=

[−g(κ)

c(κ)

](172)

⇑

L (δ,λ) = g(κ)T · δ +1

2δT ·W(κ) · δ − λT ·

(A(κ) · δ + c(κ)

)(173)

⇑

min g(κ)T · δ +1

2δT ·W(κ) · δ s.t. A(κ) · δ = −c(κ) (174)

QP problem, W(κ) according to (171) including quadratic parts of objective and con-straint, linearized equality constraints from (166)

• Quasi-Newton approach possible

• inequality constraints, A(κ) · δ ≥ −c(κ)I , treated as in QP

4.2.2 Penalty function

transformation into a penalty function of an unconstrained optimization problem withpenalty parameter µ > 0

quadratic:

Pquad (x, µ) = f (x) +1

2µcT (x) · c (x) (175)

logarithmic:

Plog (x, µ) = f (x)− µ ·∑i

log ci (x) (176)

l1 exact:Pl1 (x, µ) = µ · f (x)−

∑i∈E

|ci (x)|+∑i∈I

max (−ci(x), 0) (177)


5 Statistical parameter tolerances

modeling of manufacturing variations through a multivariate continuous distribution func-tion of statistical parameters xs

cumulative distribution function (cdf):

cdf (xs) =

∫ xs,1

−∞· · ·∫ xs,nxs

−∞pdf (t) · dt (178)

dt = dt1 · dt2 · . . . · dtnxs

(discrete: cumulative relative frequencies)

probability density function (pdf):

pdf (xs) =∂nxs

∂xs,1 · · · ∂xs,nxscdf (xs) (179)

(discrete: relative frequencies)

xs,i denotes the random number value of a random variable Xs,i


5.1 Univariate Gaussian distribution (normal distribution)

xs ∼ N(xs,0, σ

2)

(180)

xs,0 : mean value

σ2 : variance

σ : standard deviation

probability density function of the univariate normal distribution:

pdfN(xs, xs,0, σ

2)

=1√

2π · σ· e−

12

(xs−xs,0

σ

)2(181)

cdfN

xs − xs,03σ2σσ0σ−2σ−3σ

1

0.5

pdfN

xs − xs,03σ2σσ0σ−2σ−3σ

Figure 33. Probability density function, pdf, and corresponding cdf of a univariate Gaus-sian distribution. The area of the shaded region under the pdf is the value of the cdf asshown.

xs − xs,0 −3σ −2σ −σ 0 σ 2σ 3σ 4σ

cdf (xs − xs,0) 0.1% 2.2% 15.8% 50% 84.1% 97.7% 99.8% 99.99%

Standard normal distribution table


5.2 Multivariate normal distribution

xs ∼ N (xs,0,C) (182)

xs,0: vector of mean values of statistical parameters xs

C: covariance matrix of statistical parameters xs, symmetric, positive definite

probability density function of the multivariate normal distribution:

pdfN (xs,xs,0,C) =1√

2πnxs ·

√det (C)

· e−12β2(xs,xs,0,C) (183)

β2 (xs,xs,0,C) = (x− xs,0)T ·C−1 · (x− xs,0) (184)

C = Σ ·R ·Σ (185)

Σ =

σ1 0

. . .

0 σnxs

,R =

1 ρ1,2 · · · ρ1,nxs

ρ1,2 1 · · · ρ2,nxs

.... . .

...

ρ1,nxs ρ2,nxs · · · 1

(186)

C =

σ2

1 σ1ρ1,2σ2 · · · σ1ρ1,nxsσnxs

σ1ρ1,2σ2 σ22 · · · σ2ρ2,nxsσnxs

......

. . ....

σ1ρ1,nxsσnxs · · · · · · σ2nxs

(187)

R : correlation matrix of statistical parameters

σk : standard deviation of component xs,k, σk > 0

σ2k : variance of component xs,k

σkρk,lσl : covariance of components xs,k, xs,l

ρk,l : correlation coefficient of components xs,k and xs,l, −1 < ρk,l < 1

ρk,l = 0 : uncorrelated and also independent if jointly normal

|ρk,l| = 1 : strongly correlated components


xs,0,1

xs,0,2xs,0

β2 (xs) = const

xs,1

xs,2

Figure 34. Level sets of a two-dimensional normal pdf

β2 (xs) = constβ2 (xs) = const

β2 (xs) = const

xs,0,2

xs,0,1

xs,0,2

xs,0,1

xs,0,2

x1

xs,0,1

(b)(a) (c)

xs,0xs,0xs,0

xs,2

xs,1

xs,2

xs,1

xs,2

xs,1

Figure 35. Level sets of a two-dimensional normal pdf with general covariance matrix C,(a), with uncorrelated components, R = I, (b), and uncorrelated components of equalspread, C = σ2 · I, (c).


x1

x2

β2 = const = a2

a · 2 · σ1a·2·σ

2

ρ = 0

Figure 36. Level set with β2 = a2 of a two-dimensional normal pdf for different values ofthe correlation coefficient, ρ.


5.3 Transformation of statistical distributions

y ∈ Rny , z ∈ Rnz , ny = nz

z = z (y), y = y (z) such that the mapping from y to z is smooth and bijective (preciselyz = φ (y), y = φ−1 (z))

cdfy (y) =

y∫· · ·∫

−∞

pdfy (y′) · dy′ =z(y)∫· · ·∫

−∞

pdfy (y′ (z′)) ·∣∣∣∣det

(∂y

∂zT

)∣∣∣∣ · dz′=

z∫· · ·∫

−∞

pdfz (z′) · dz′ = cdfz (z) (188)

y∫· · ·∫

−∞

pdfy (y′) · dy′ =z∫· · ·∫

−∞

pdfz (z′) · dz′ (189)

pdfz (z) = pdfy (y (z)) ·∣∣∣∣det

(∂y

∂zT

)∣∣∣∣ (190)

univariate case: pdfz(z) = pdfy (y(z)) ·∣∣∣∣∂y∂z

∣∣∣∣ (191)

In the simple univariate case the function pdfz has a domain that is a scaled version ofthe domain of pdfy.

∣∣(∂y∂z

)∣∣ determines the scaling factor. In high-order cases, the random

variable space is scaled and rotated with the Jacobian matrix ∂y∂zT

determining the scalingand rotation.

pdfz (z)pdfy (y)

y zy1 y2 y3 z1 z2 z3

Figure 37. Univariate pdf with random number y is transformed to a new pdf of newrandom number z = z (y). According to (188) the shaded areas as well as the hatchedareas under the curve are equal.


5.3.1 Example

Given: probability density function pdfU(z), here a uniform distribution:

pdfU(z) =

1 for 0 < z < 1

0 otherwise(192)

probability density function pdfy (y), y ∈ R

random number z

Find: random number y

z y

0 −∞1 +∞

from (188) ∫ z

0

pdfz (z′)︸︷︷︸1 for 0 ≤ z ≤ 1

from (192)

·dz′ =∫ y

−∞pdfy (y′) · dy′ (193)

hence

z =

∫ y

−∞pdfy (y′) · dy′ = cdfy(y) (194)

y = cdf−1y (z) (195)

This example details a method to generate sample values of a random variable y with anarbitrary pdf pdfy if sample values are available from a uniform distribution pdfz:

• insert pdfy(y) in (194)

• compute cdfy by integration

• compute inverse cdf−1y

• create uniform random number, z, insert into (195) to get sample value, y, accordingto pdfy(y)


6 Expectation values and their estimators

6.1 Expectation values

6.1.1 Definitions

h (z): function of a random number z with probability density function pdf (z)

Expectation value

Epdf(z)

h (z) = E h (z) =

+∞∫· · ·∫

−∞

h (z) · pdf (z) · dz (196)

Moment of order κm(κ) = E zκ (197)

Mean value (first-order moment)

m(1) = m = E z (198)

m = E z =

E z1

...

E znz

(199)

Central moment of order κ

c(κ) = E (z −m)κc(1) = 0

(200)

Variance (second-order central moment)

c(2) = E

(z −m)2

= σ2 = V z (201)

σ: standard deviation

Covariancecov zi, zj = E (zi −mi) (zj −mj) (202)

Variance/covariance matrix

C = V z = E

(z−m) · (z−m)T

=

V z1 cov z1, z2 · · · cov z1, znz

cov z2, z1 V z2 · · · cov z2, znz...

. . ....

cov znz , z1 cov znz , z2 · · · V znz

(203)

V h (z) = E

(h (z)− E h (z)) · (h (z)− E h (z))T

(204)


6.1.2 Linear transformation of expectation value

E A · h (z) + b = A · E h (z)+ b (205)

special cases:

E c = c, c is a constant

E c · h (z) = c · E h (z)E h1 (z) + h2 (z) = E h1 (z)+ E h2 (z)

6.1.3 Linear transformation of variance

V A · h (z) + b = A · V h (z) ·AT (206)

special cases:

VaT · h (z) + b

= aT · V h (z) · a

V a · h (z) + b = a2 · V h(z)

Gaussian error propagation:

VaT · z + b

= aT ·C · a =

∑i,j

aiajσiρi,jσj =∀i6=j

ρi,j=0

∑i

a2iσ

2i

6.1.4 Translation law of variances

V h (z) = E

(h (z)− a) · (h (z)− a)T− (E h (z) − a) · (E h (z) − a)T

(207)special cases:

V h (z) = E

(h (z)− a)2− (E h (z) − a)2

V h (z) = Eh (z) · hT (z)

− E h (z) · E

hT (z)

V h (z) = E

(h2 (z)

)− (E h (z))2

6.1.5 Normalizing a random variable

z′ =z − E z√

V z=z −mz

σz(208)

E z′ =E z −mz

σz= 0 (209)

V z′ = E

(z′ − 0)2

=E

(z −mz)2

σ2z

= 1 (210)


6.1.6 Linear transformation of a normal distribution

x ∼ N (x0,C)

f (x) = fa + gT · (x− xa) (211)

mean value µf of f :

µf = Ef

= Efa + gT · (x− xa)

= E fa+ gT · (E x − E xa)

µf = fa + gT · (x0 − xa) (212)

variance σ2f

of f :

σ2f

= E(f − µf

)2

= E(

gT (x− x0))2

= E

gT · (x− x0) · (x− x0)T · g

= gT · E

(x− x0) · (x− x0)T· g

σ2f

= gT ·C · g (213)


6.2 Estimation of expectation values

6.2.1 Expectation value estimator

E h (x) = mh =1

nMC

·nMC∑µ=1

h(x(µ)

)(214)

x(µ) ∼ D (pdf (x)) , µ = 1, . . . , nMC

sample of the population with nMC sample elements, i.e., sample size nMC

sample elements x(µ), µ = 1, · · · , nMC , that are independently and identically distributed,i.e,

Eh(x(µ)

)= E h (x) = mh (215)

Vh(x(µ)

)= V h (x) , cov

h(x(µ)

),h(x(ν))

= 0, µ 6= ν (216)

φ (x) = φ(x(1), . . . ,x(nMC)

): estimator function of φ (x) (217)

6.2.2 Variance estimator

V h (x) =1

nMC − 1

nMC∑µ=1

(h(x(µ)

)− mh

)·(h(x(µ)

)− mh

)T(218)

x(µ) ∼ D (pdf (x)) , µ = 1, . . . , nMC

ˆV h (x) =

1

nMC

nMC∑µ=1

(h(x(µ)

)−mh

)·(h(x(µ)

)−mh

)T(219)

estimator bias:bφ = E

φ (x)

− φ (x) (220)

unbiased estimator:Eφ (x)

= φ (x) ⇐⇒ bφ = 0 (221)

consistent estimator:lim

nMC→∞P∥∥∥φ− φ

∥∥∥ < |ε| = 1 (222)

strongly consistent:ε→ 0 (223)

variance of an estimator (quality):

Qφ = E

(φ− φ

)·(φ− φ

)T= V

φ

+ bφ · bTφ

(224)

bφ = 0 : Qφ = Vφ

(225)


6.2.3 Variance of the expectation value estimator

Qmh = V mh = VE h (x)

= V

1

nMC

·nMC∑µ=1

h(x(µ)

)

Qmh

(206)=

1

n2MC

· V

nMC∑µ=1

I〈nh,nh〉 · h(µ)

h(µ) = h(x(µ)

), I〈nh,nh〉 identity matrix of size nh of h(µ)

Qmh =1

n2MC

· V

[I〈nh,nh〉 I〈nh,nh〉 . . . I〈nh,nh〉

]·

h(1)

h(2)

...

h(nMC)

Qmh

(206)=

1

n2MC

·[I〈nh,nh〉 I〈nh,nh〉 . . . I〈nh,nh〉

]· V

h(1)

h(2)

...

h(nMC)

·

I〈nh,nh〉

I〈nh,nh〉...

I〈nh,nh〉

Qmh

(216)=

1

n2MC

·[I〈nh,nh〉 I〈nh,nh〉 . . . I〈nh,nh〉

]·

V h 0

. . .

0 V h

·

I〈nh,nh〉

I〈nh,nh〉...

I〈nh,nh〉

=

1

n2MC

· nMC · V h

Qmh = V mh =1

nMC

· V h (226)

replace Qmh by Qmh , V h by V h, (206) by (229) to obtain the variance estimator ofthe expected value estimator

Qmh = V mh =1

nMC

· V h (227)

standard deviation of the mean estimator decreases with 1/√nMC , e.g., 100 times more

sample elements for 10 times smaller standard deviation in the expectation value estimator


6.2.4 Linear transformation of estimated expectation value

E A · h (z) + b = A · E h (z)+ b (228)

6.2.5 Linear transformation of estimated variance

V A · h (z) + b = A · V h (z) ·AT (229)

6.2.6 Translation law of estimated variance

V h (z) =nMC

nMC − 1

[Eh (z) · hT (z)

− E h (z) · E

hT (z)

](230)


7 Worst-case analysis

7.1 Task

indexes d, s, r for parameter types xd,xs,xr left outindex i for performance feature fi left out

Given: tolerance region T of parameters

Find: worst-case performance value, fW , that the circuit takes over T and correspondingworst-case parameter vectors, xW

optimization specification “good” “bad” worst-case performance

max f lower bound: f ≥ fL f ↑ f ↓ fWL = f (xWL)

min f upper bound: f ≤ fU f ↓ f ↑ fWU = f (xWU)

7.2 Typical tolerance regions

box: TB = x |xL ≤ x ≤ xU

ellipsoid: TE =

x∣∣∣ β2 (x) = (x− x0)T ·C−1 · (x− x0) ≤ β2

W

,

C is symmetric, positive definite

x2

x1

x0

β = βW

TETB

x1xL,1 xU,1

xL,2

xU,2

x2

Figure 38. Tolerance box, tolerance ellipsoid.


7.3 Classical worst-case analysis


7.3.1 Task

Given: hyper-box tolerance region

TB = x |xL ≤ x ≤ xU (231)

linear performance modelf (x) = fa + gT · (x− xa) (232)

Find: worst-case parameter vectors xWL/U and corresponding worst-case performance

values fWL/U = fWL/U

(xWL/U

)

f = fag

xWU

f = fWU = f (xWU)

xL,2

xU,2

x2

xL,1 xU,1

f = fWL = f (xWL)

x1

xa

xWL

TB

Figure 39. Classical worst-case analysis with tolerance box and linear performance model.


7.3.2 Linear performance model

sensitivity analysis:

fa = f (xa) (233)

g = ∇f (xa) (234)

forward finite-difference approximation:

fa = f (xa) (235)

∇f (xa,i) ≈ gi =f (xa + ∆xi · ei)− f (xa)

∆xi(236)

ei = [0 . . . 0 1 0 . . . 0]T (237)

↑ i-th position

f, f

a

b

x

g = ba

f

(a)

(b)f, f

x

f

f

f

f(xa)

xa

f(xa + ∆x)

f(xa)

xa xa + ∆x

g = f(xa+∆x)−f(xa)∆x

Figure 40. Linear performance model based on gradient (a), based on forward finite-difference approximation of gradient (b).


7.3.3 Peformance type f ↑ ”good”, specification type f > fL

min f (x)︸︷︷︸mingT ·x

s.t.

x ≥ xL

x ≤ xU→ xWL, fWL = f (xWL) (238)

specific linear programming problem with analytical solution


L (x,λL,λU) = gT · x − λTL · (x− xL) − λTU · (xU − x) (239)

first-order optimality conditions:

∇L (x) = 0 : g − λ?L + λ?U = 0 (240)

λ?L/U ≥ 0

xU − xWL ≥ 0, xWL − xL ≥ 0

λ?L,j · (xWL,j − xL,j) = 0, j = 1, · · · , nxλ?U,j · (xU,j − xWL,j) = 0, j = 1, · · · , nx

λ?T

L · (xWL − xL) = 0

λ?T

U · (xU − xWL) = 0(241)

∀i ai · bi = 0 ⇒ aT · b =∑i

ai · bi = 0

aT · b = 0, ai ≥ 0, bi ≥ 0⇒ ∀iai · bi = 0

second-order optimality condition holds because ∇2L (x) = 0

either constraint xL,j or constraint xU,j active, never both, therefore from (240) and (241):

either: gj = λ?L,j > 0 (242)

or: gj = −λ?U,j < 0 (243)

component of the worst-case parameter vector xWL:

xWL,j =

xL,j, gj > 0

xU,j, gj < 0

undefined, gj = 0

(244)

worst-case performance value:

fWL = fa + gT · (xWL − xa) = fa +∑j

gj · (xWL,j − xa,j) (245)


7.3.4 Performance type f ↓ ”good”, specification type f < fU

min−f (x)︸︷︷︸min−gT ·x

s.t.

x ≥ xL

x ≤ xU→ xWU , fWU = f (xWU) (246)

component of the worst-case parameter vector xWU :

xWU,j =

xL,j, gj < 0

xU,j, gj > 0

undefined, gj = 0

(247)


fWU = fa + gT · (xWU − xa) = fa +∑j

gj · (xWU,j − xa,j) (248)


7.4 Realistic worst-case analysis


7.4.1 Task

Given: ellipsoid tolerance region

TE =

x∣∣∣(x− x0)T ·C−1 · (x− x0) ≤ β2

W

(249)

linear performance model

f (x) = fa + gT · (x− xa) (250)

= f0 + gT · (x− x0) with f0 = fa + gT · (x0 − xa) (251)

Find: worst-case parameter vectors xWL/U and corresponding worst-case performancevalues fWL/U = f

(xWL/U

)

x0,1

xWU

f = fWU = f (xWU)

f = fa + gT · (x0 − xa)

f = fWL = f (xWL)

x0,2

TE

x1

x2

g

x0

xWL

Figure 41. Realistic worst-case analysis with tolerance ellipsoid and linear performancemodel.


7.4.2 Peformance type f ↑ ”good”, specification type f ≥ fL:

min f (x)︸︷︷︸min gT ·x

s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2W → xWL, fWL = f (xWL) (252)

specific nonlinear programming problem (linear objective function, quadratic constraintfunction) with analytical solution


L (x, λ) = gT · x− λ(β2W − (x− x0)T ·C−1 · (x− x0)

)(253)

first-order optimality conditions:

∇L = 0 : g + 2λWL ·C−1 · (xWL − x0) = 0 (254)

due to linear function f , the solution is on the border of TE, i.e., the constraint is active:

(xWL − x0)T ·C−1 · (xWL − x0) = β2W (255)

λWL > 0 (256)

second-order optimality condition holds because: ∇2L (x) = 2λWL ·C−1, C−1 is positivedefinite, and because of (256)

(254) gives:

xWL − x0 = − 1

2λWL

·C · g (257)

substituting (257) into (255) gives:

1

4λ2WL

gT ·C · g = β2W (258)

inserting λWL from (258) into (257) to eliminate λWL gives a worst-case parameter vectorin terms of the performance gradient and tolerance region constants:

xWL − x0 = − βW√gT ·C·g

·C · g

= −βWσf·C · g

(259)

substituting (259) in (251) gives the corresponding worst-case performance value:

fWL = f (xWL) = f0 + gT · (xWL − x0)

= f0 − βW ·√

gT ·C · g= f0 − βW · σf

(260)

βw-sigma design, worst-case distance βw


7.4.3 Performance type f ↓ ”good”, specification type f ≤ fU

(252) becomes

min−f (x)︸︷︷︸min −gT ·x

s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2W → xWU , fWU = f (xWU) (261)

(259) becomes

xWU − x0 = + βW√gT ·C·g

·C · g

= +βWσf·C · g

(262)

(260) becomes

fWU = f (xWU) = f0 + gT · (xWU − x0)

= f0 + βW ·√

gT ·C · g= f0 + βW · σf

(263)


7.5 General worst-case analysis


7.5.1 Task

Given: ellipsoid tolerance region:

TE =

x∣∣∣(x− x0)T ·C−1 · (x− x0) ≤ β2

W

(264)

general smooth performance f (x)

Find: worst-case parameter vectors xWL/U and corresponding worst-case performancevalues fWL/U = f

(xWL/U

)

x1

x2

∇f (xWL)

xWL f = fWL = f (xWL)f

(WL)= fWL

f = f (x0) = f0

f = fWU = f (xWU)

f(WU)

= fWU

TE∇f (xWU)

f

x0

xWU

Figure 42. General worst-case analysis with tolerance ellipsoid and nonlinear performancefunction.


7.5.2 Peformance type f ↑ ”good”, specification type f ≥ fL:

minx

f (x) s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2W (265)

specific nonlinear programming problem (non-linear objective function, one quadraticinequality constraint) with numerical solution, e.g., by Sequential Quadratic Programming(SQP), which yields ∇f (xWL)

assumption: unique solution on border of TE

Linearization of objective function f at worst-case point xWL, i.e., after solu-tion of (265):

f(WL)

(x) = fWL +∇f (xWL)T · (x− xWL) (266)

substituting (266) in (265) gives:

min f(WL)

(x)︸︷︷︸min ∇f(xWL)T ·x

s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2W (267)

structure identical to realistic worst worst-case analysis (252), replace g by ∇f (xWL)

worst-case parameter vector:

xWL − x0 = − βW√∇f(xWL)T ·C·∇f(xWL)

·C · ∇f (xWL)

= − βWσf(WL)·C · ∇f (xWL)

(268)


f(WL)

(x0) = f(WL)

0 = fWL +∇f (xWL)T · (x0 − xWL)

fWL = f(WL)

0 +∇f (xWL)T · (xWL − x0) (269)

substituting (268) in (269) gives the corresponding worst-case performance value:

fWL = f(WL)

0 − βW ·√∇f (xWL)T ·C · ∇f (xWL)

= f(WL)

0 − βW · σf (WL)

(270)

βw-sigma design, worst-case distance βw


7.5.3 Performance type f ↓ ”good”, specification type f ≤ fU :

(265) becomesmin −f (x) s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2

W (271)

(266) becomes

f(WU)

(x) = fWU +∇f (xWU)T · (x− xWU) (272)

(267) becomes

min −∇f (xWU)T · x s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2W (273)

worst-case parameter vector:

xWU − x0 = + βW√∇f(xWU )T ·C·∇f(xWU )

·C · ∇f (xWU)

= + βWσf(WU)

·C · ∇f (xWU)(274)


fWU = f(WU)

0 + βW ·√∇f (xWU)T ·C · ∇f (xWU)

= f(WU)

0 + βW · σf (WU)

(275)

7.5.4 General worst-case analysis with tolerance box

lower worst-case:

min f (x) s.t.

x ≥ xL

x ≤ xU(276)

upper worst-case:

min−f (x) s.t.

x ≥ xL

x ≤ xU(277)


7.6 Input/output of worst-case analysis

worst-case analysis

classical

x, f

tolerance boxTB(xL,xU)

sensitivityanalysis

Jx, f (once)

for each fi

x(i)WL,C ,x

(i)WU,C

f(i)

WL,C , f(i)

WU,C

f(i)WL,C , f

(i)WU,C

worst-case parameter vector

worst-case performance

worst-case analysisx, f

sensitivityanalysis

Jx, f (once)

for each fi

realistictolerance ellipsoidTE(C,x0, βW )

”βW -sigma design”

x(i)WL,R,x

(i)WU,R

f(i)

WL,R, f(i)

WU,R

f(i)WL,R, f

(i)WU,R


worst-case performance

worst-case analysisx, f

sensitivityanalysis

for each fi

tolerance ellipsoid

f(i)WL,G, f

(i)WU,G

x(i)WL,G,x

(i)WU,G

(nit times)x(κ), f (κ) J(κ)

general


worst-case performanceand/or

tolerance box


sensitivityanalysis

simulation

x, f

Jacobian matrix

sensitivity matrix

x + ∆xiei (nx times) ∆fi = f(x + ∆xiei)− f

J = ∂f∂xT

J =

∂f1∂x1

∂f1∂x2

· · · ∂f1∂xnx

∂f2∂x1

∂f2∂x2

· · · ∂f2∂xnx

......

. . ....

∂fnf∂x1

∂fnf∂x2

· · · ∂fnf∂xnx

=

∂g(1)T = ∂f1

∂xT

∂g(2)T = ∂f2∂xT

...

∂g(nf )T =∂fnf∂xT

≈[· · · 1

∆xi∆fi · · ·

]

simulation fperformances

x =

xd

xs

xr

parameters

netlist, technologysimulation bench


7.7 Summary of discussed worst-case analysis problems

• For each performance feature fi there exists a worst-case parameter vector xWL,i

and/or xWU,i respectively and a corresponding worst-case performance value fWL,i

and/or fWU,i.

• xWL,i and xWU,i are unique in the classical and realistic worst-case analysis. Severalworst-case parameter vectors may exist in the general worst-case analysis.

worst case feasible region objective function good for

analysis type

classical hyper-box linearuniform distribution

unknown distribution

general hyper-box non-lineardiscrete circuits

range parameters

realistic ellipsoid linear normal distribution

general ellipsoid non-linear IC transistor parameters

• Worst-case analysis requires design technology (circuit, performance features) andprocess technology (statistical parameters, parameter distribution).


8 Yield analysis

8.1 Task

Given:

• statistical parameters with normal distribution, eventually obtained through trans-formation

• performance specification

Find: percentage/proportion of circuits that fulfill the specifications

statistical parameter distribution (manufacturing process)

pdf (xs)∆= pdfN (xs) =

1√2π

nxs ·√det (C)

· e−12β2(xs) (278)

β2 (xs) = (xs − xs,0)T ·C−1 · (xs − xs,0) (279)

performance acceptance region, performance specification (customer)

Af = f | fL ≤ f ≤ fU (280)

solution requires either: non-normal distribution pdff (f) or: non-linear parameter accep-tance region As = xs | f (xs) ∈ Af (dashed lines in Fig. 43)

f1

f2xs,2

xs,1

xs,0

As

β = const

Aff (xs,0)

fL,1 fU,1

fL,2

fU,2

Figure 43.


8.1.1 Acceptance function

δ (xs) =

1, f (xs) ∈ Af0, f (xs) /∈ Af

=

1, xs ∈ As0, xs /∈ As

”circuit functions”

”circuit malfunctions”(281)

8.1.2 Parametric yield

Y =

∫· · ·∫

As

pdf (xs) · dxs (282)

=

∞∫· · ·∫

−∞

δ (xs) · pdf (xs) · dxs (283)

= E δ (xs) (284)

yield: expected value of the acceptance function


8.2 Statistical yield analysis/Monte-Carlo analysis

• sample of statistical parameter vectors according to given distribution

x(µ)s ∼ N (xs,0,C) , µ = 1, . . . , nMC (285)

• (numerical) circuit simulation of each sample element (simulation of the stochasticmanufacturing process on circuit level)

x(µ)s 7→ f (µ) = f

(x(µ)s

), µ = 1, . . . , nMC (286)

• evaluation of the acceptance function (simulation of a production test)

δ(µ) = δ(x(µ)s

)=

1, f (µ) ∈ Af0, f (µ) /∈ Af

, µ = 1, . . . , nMC (287)

• statistical yield estimation

Y = E δ (xs) =1

nMC

nMC∑µ=1

δ(µ) (288)

=number of functioning circuits

sample size(289)

=n+

nMC

=#”+”

#”+”+ #”-”(Fig. 44) (290)

xs,2

xs,1

As

β = const

Figure 44.


8.2.1 Variance of yield estimator

VY

(288)= V

E δ (xs)

(226)=

1

nMC

V δ (xs) = σ2Y

(291)

V δ(xs) = Eδ2 (xs)︸︷︷︸δ(xs)

︸︷︷︸Y

− (E δ (xs)︸︷︷︸Y

)2

︸︷︷︸Y 2

= Y (1− Y ) (292)

8.2.2 Estimated variance of yield estimator

VY

(288)= V

E δ (xs)

(227)=

1

nMC

V δ (xs) (293)

(230)=

1

nMC

· nMC

nMC − 1[Eδ2 (xs)︸︷︷︸

δ(xs)

︸︷︷︸Y

− (E δ (xs)︸︷︷︸Y

)2

︸︷︷︸Y 2

] (294)

=Y(

1− Y)

nMC − 1= σ2

Y(295)

Y is binomially distributed: probability that n+ of nMC circuits are functioning

nMC →∞: Y is normally distributed (central limit theorem)

in practice: n+ > 4, nMC − n+ > 4 and nMC > 10

1.0 Y0.550%

00% 100%

0.25nMC nMC

σ2Y

Y = 85%:nMC 10 50 100 500 1000

σY 11.9% 5.1% 3.6% 1.6% 1.1%

Figure 45.


• confidence interval, confidence level

P (Y ∈ [Y − kζ · σY , Y + kζ · σY ]︸︷︷︸confidence interval

) =

∫ kζ

−kζ

1√2πe−

t2

2 dt︸︷︷︸confidence level

(296)

e.g., nMC = 1000, Y = 85%→ σY = 1.1%; kζ = 3: P (Y ∈ [81.7%, 88.3%]) = 99.7%

• given: yield estimator Y , confidence interval Y ∈ [Y − ∆Y, Y + ∆Y ], confidencelevel ζ%

find: nMC

ζ = cdfN

(Y + kζ · σY

)− cdfN

(Y − kζ · σY

)→ kζ (297)

∆Y = kζ · σY → σY (298)

σ2Y

=Y ·(

1− Y)

nMC − 1(299)

nMC = 1 +Y ·(

1− Y)· (kζ)2

∆Y 2(300)

number nMC for various confidence intervals and confidence levels:

Y ±∆Yζ

kζ

90%

1.645

95%

1.960

99%

2.576

99.9%

3.291

85% ± 10% 36 50 86 140

85% ± 5% 140 197 340 554

85% ± 1% 3,452 4,900 8,462 13,811

99.99% ± 0.01% 66,352

• given: Y!> Ymin, significance level α

find: nMC

null hypothesis H0, Y < Ymin, rejected if all circuits functioning, n+ = nMC

assuming H0 holds, the probability of falsely (i.e., Y < Ymin) rejecting H0 is

P (”rejection”)

(test

definition

)= P (”n+ = nMC”) (301)(

binominaldistribution

)= Y nMC

(falsely)< Y nMC

min

!< α (302)

nMC >logα

log Ymin(303)


number nMC for a minimum yield and significance level:

Ymin α = 5% α = 1%

95% 60 92

99% 300 460

99.9% 3,000 4,600

99.99% 30,000 46,000

8.2.3 Importance sampling

Y =

∞∫· · ·∫

−∞

δ (xs) · pdf (xs) · dxs (304)

=

∞∫· · ·∫

−∞

δ (xs) ·pdf (xs)

pdfIS (xs)· pdfIS (xs) · dxs (305)

= EpdfIS

δ (xs) ·

pdf (xs)

pdfIS (xs)

= E δ(xs) · w(xs) (306)

sample created according to a separate, specific distribution pdfIS

EpdfISw(xs) =

∞∫· · ·∫

−∞

pdf(xs)

pdfIS(xs)· pdfIS(xs) · dxs = 1 (307)

goal: reduction of estimator variance

VE δ(xs)

=V δ(xs)nMC

!<

VpdfISδ(xs)w(xs)

nIS= V

pdfIS

EpdfISδ(xs)

(308)

with

VpdfISδ(xs)w(xs) =

∞∫· · ·∫

−∞

δ (xs) ·pdf (xs)

pdfIS (xs)· pdf (xs) · dxs − Y 2 (309)

Eq. (308) is statisfied for nMC = nIS, if:

∀xs|δ(xs)=1

pdfIS(xs) > pdf(xs) (310)


8.3 Geometric yield analysis for linearized performance feature(”realistic geometric yield analysis”)

yield in case that As is a half of Rns defined by a hyperplane

linear model for one single performance feature, index i left out:

f = fL + gT · (xs − xs,WL) (311)

e.g. after solution according to (336) or (337), fL = f (xs,WL)

g = ∇f (xs,WL)(312)

specification type:f ≥ fL (313)

xs,1

xs,2

f = f 0

f = fL

β2 (xs) = (xs − xs,0)T ·C−1 · (xs − xs,0) = const

∇β2 (xs,a)

xs,a

xs,0

β2 (xs) = β2WL

g

A′s,L, f ≥ fL

Figure 46.


8.3.1 Yield partition

Y L =

∫· · ·∫

xs∈A′s,L

pdfN (xs) · dxs =

∫f≥fL

pdff(f)· df (314)

linearized performance feature normally distributed (212), (213):

f ∼ N

fL + gT · (xs,0 − xs,WL)︸︷︷︸f0=f(xs,0)

, gT ·C · g︸︷︷︸σ2f

(315)

yield written in terms of the pdf of the linearized performance feature:

Y L =

∫ ∞fL

1√2π · σf

· e− 1

2

(f−f0σf

)2

· df (Fig. 47) (316)

ff 0

pdff

fL

Y L

Figure 47.


8.3.2 Defining worst-case distance βWL as difference from nominal perfor-mance to specification bound as multiple of standard deviation σf oflinearized performance feature (βWL-sigma design)

f 0 − fL = gT · (xs,0 − xs,WL) (317)

f 0 − fL =

+βWL · σf , f 0 > fL

−βWL · σf , f 0 < fL



variable substitution:

t =f − f 0

σf,df

dt= σf (319)

tL =fL − f 0

σf=

−βWL, f 0 > fL

+βWL, f 0 < fL(320)

Y L =

∫ ∞∓βWL

1√2π· e−

t2

2 · dt = −∫ −∞±βWL

1√2π· e−

(−t′)22 · dt′

(t′ = −t, dt

dt′= −1

)

8.3.3 Yield partition as a function of worst-case distance βWL

Y L =

±βWL∫−∞

1√2π· e−

t′2

2 · dt′

f 0 > fL : +βWL (321)

f 0 < fL : −βWL

standard normal distribution, statistical tables, exact within given digits, no estimation


8.3.4 Worst-case distance βWL defines tolerance region

f (xs,a) = fL(311)⇐⇒ gT · (xs,a − xs,WL)︸︷︷︸

...

= 0 (322)

f (xs) = fL + gT · (xs − xs,WL)−

...︷︸︸︷gT · (xs,a − xs,WL)

= fL + gT · (xs − xs,a) (323)

i.e., xs,WL in (311) can be replaced with any point of level set f = fL because of (322)

from visual inspection of Fig. 46:

∇β2 (xs,a) = 2 ·C−1 · (xs,a − xs,0) =

−λa · g, f 0 > fL

+λa · g, f 0 < fL, λa > 0 (324)

⇐⇒ (xs,a − xs,0) = ∓λa2·C · g (325)

=⇒ gT · (xs,a − xs,0) = ∓λa2· σ2

f

(323)= fL − f 0

(318)= ∓βWL · σf (326)

λa2

=βWL

σf(327)


(xs,a − xs,0) = ∓βWL

σf·C · g (328)

(xs,a − xs,0)T ·C−1 · (xs,a − xs,0) = β2WL ·

1

σ2

f2

· gT ·C · g︸︷︷︸1

(329)

(xs,a − xs,0)T ·C−1 · (xs,a − xs,0) = β2WL (330)

β2WL is level parameter (”radius”) of ellipsoid that touches level hyperplane f = fL


8.3.5 specification type f ≤ fU

(318) becomes

fU − f 0 =

+βWU · σf , f 0 < fU

−βWU · σf , f 0 > fU



(321) becomes

Y U =

±βWU∫−∞

1√2π· e−

t′2

2 · dt′

f 0 < fU : +βWU (332)

f 0 > fU : −βWU

(323) becomes

f (xs) = fU + gT · (xs − xs,a) (333)


8.4 Geometric yield analysis for nonlinear performance feature(”general geometric yield analysis”)

8.4.1 Problem formulation

xs,2

xs,1

xs,WL

β2 (xs) = (xs − xs,0)T ·C−1 · (xs − xs,0) = const

xs,0f = fL

f(WL)

= fL

A′s,LAs,L

∇f(xs,WL

)

Figure 48.

lower bound, nominally fulfilled / upper bound, nominally violated

f (xs,0) > fL/U : maxxs

pdfN (xs) s.t. f (xs) ≤ fL/U (334)

lower bound, nominally violated / upper bound, nominally fulfilled

f (xs,0) < fL/U : maxxs

pdfN (xs) s.t. f (xs) ≥ fL/U (335)

statistical parameter vector with highest probability density ”on the other side” of theacceptance region border


f (xs,0) > fL/U : minxs

β2 (xs) s.t. f (xs) ≤ fL/U (336)

lower bound, nominally violated / upper bound, nominally fulfilled

f (xs,0) < fL/U : minxs

β2 (xs) s.t. f (xs) ≥ fL/U (337)

statistical parameter vector with smallest distance (weighted according to pdf) to theacceptance region border

specific form of a nonlinear programming problem, quadratic objective function, one non-linear inequality constraint, iterative solution with SQP:

worse-case parameter set xs,WL/U

worst-case distance βWL/U = β(xs,WL/U

)performance gradient ∇f

(xs,WL/U

)Optimization of Analog Circuits 96

8.4.2 Advantages of geometric yield analysis

• YL/U =∫· · ·∫As,L/U

pdf (xs) · dxs; the larger the pdf value, the larger the error in

Y if border of As is approximated inaccurately; A′s is exact at point xs,WL/U withhighest pdf value, A′s differs from As the more, the smaller the pdf value

• YL/U ↑ −→ accuracy of Y L/U ↑

• systematic error, depends on the curvature of performance f in xs,WL/U

• duality principle in minimum-norm problems:

minimum distance between point and convex set equal to maximum distance be-tween point to any separating hyperplanecase 1: Y (A′s) greatest lower bound of Y (As) concerning any tangent hyperplane

xs,0

A′s As

Figure 49.

case 2: Y (A′s) least upper bound of Y (As) concerning any tangent hyperplane

xs,0

A′s

As

Figure 50.

• in practice, error |Y L/U − YL/U | ≈ 1% . . . 2%


8.4.3 Lagrange function and first-order optimality conditions of problem (336)

L (xs, λ) = β2 (xs)− λ · (fL − f (xs)) (338)

∇L (xs) = 0 : 2 ·C−1 · (xs,WL − xs,0) + λWL · ∇f (xs,WL) = 0 (339)

λWL · (fL − f (xs,WL)) = 0 (340)

β2WL = (xs,WL − xs,0)T ·C−1 · (xs,WL − xs,0) (341)

assumption: λWL = 0; then fL ≥ f (xs,WL) (i.e., constraint inactive); from (339):xs,WL = xs,0 and f (xs,WL) = f (xs,0) > fL (from (336)), which contradicts assumption,therefore:

λWL > 0 (342)

f (xs,WL) = fL (343)

from (339):

xs,WL − xs,0 = −λWL

2·C · ∇f (xs,WL) (344)

substituting (344) in (341):(λWL

2

)2

· ∇f (xs,WL)T ·C · ∇f (xs,WL) = β2WL (345)

λWL

2=

βWL√∇f (xs,WL)T ·C · ∇f (xs,WL)

(346)

substituting (346) in (344):

xs,WL − xs,0 =−βWL√

∇f (xs,WL)T ·C · ∇f (xs,WL)·C · ∇f (xs,WL) (347)

(347) corresponds to (268), worst-case analysis

8.4.4 Lagrange function of problem (337)

L (xs, λ) = β2 (xs)− λ · (f (xs)− fL) (348)

(347) becomes

xs,WL − xs,0 =+βWL√

∇f (xs,WL)T ·C · ∇f (xs,WL)·C · ∇f (xs,WL) (349)


8.4.5 Second-order optimality condition of problem (336)

∇2L (xs,WL) = 2 ·C−1 + λWL · ∇2f (xs,WL) (350)

curvature of β2 stronger than curvature of f

8.4.6 Worst-case distance

linearization of f in xs,WL/U :

f(WL/U)

(xs) = fL/U +∇f(xs,WL/U

)T · (xs − xs,WL/U

)(351)

f(WL/U)

(xs,0)− fL/U = ∇f(xs,WL/U

)T · (xs,0 − xs,WL/U

)(352)

substituting (347) in (352):


βWL/U =

”performance safety margin”︷︸︸︷f

(WL/U)(xs,0)− fL/U√

∇f(xs,WL/U

)T ·C · ∇f (xs,WL/U

) =∇f

(xs,WL/U

)T · (xs,0 − xs,WL/U

)σf(WL/U)︸︷︷︸

”performance variability”

(353)lower bound, nominally violated / upper bound, nominally fulfilled

βWL/U =

”performance safety margin”︷︸︸︷fL/U − f

(WL/U)(xs,0)√

∇f(xs,WL/U

)T ·C · ∇f (xs,WL/U

) =∇f

(xs,WL/U

)T · (xs,WL/U − xs,0)

σf(WL/U)︸︷︷︸

”performance variability”

(354)


8.4.7 Remarks

• worst-case distance: yield approximation for each individual performance bound

• “realistic” geometric yield analysis by linearization in nominal parameter vector xs,0

• geometric yield analysis and general worst-case analysis are related by an exchangeof objective function and constraint

βWL/U,i / YL/U,i

fL/U,i

geometric yield analysis:

fi(xs) ”on the other side”

minxs

β2(xs) s.t.

fWL/U,i

general worst-case analysis:

s.t.β2(xs) ≤ β2W

,

, βW / YW

min/maxxs

fi(xs)


8.4.8 Overall yield

upper bound: Y ≤ miniY L/U,i

lower bound: Y ≥ 1−∑i

(1− Y L/U,i

)Monte-Carlo analysis with A′s = ∩

iA′s,L/U,i (Fig. 25), no circuit simulation required

xs,2

xs,1

A′s

As

Figure 51.


8.4.9 Consideration of range parameters

parameter acceptance region:

As,L,i =

xs| ∀

xr∈Trfi (xs,xr) ≥ fL,i

(355)

As,U,i =

xs| ∀

xr∈Trfi (xs,xr) ≤ fU,i

(356)

As,L,i =

xs| ∃

xr∈Trfi (xs,xr) < fL,i

(357)

As,U,i =

xs| ∃

xr∈Trfi (xs,xr) > fU,i

(358)

As,L,i ∩ As,L,i = φ and As,L,i ∪ As,L,i = Rns (359)

geometric yield analysis, problem formulation:

f (xs,0) ∈ As,L,i : maxxs

pdfN (xs) s.t. xs ∈ As,L,i (360)

f (xs,0) ∈ As,L,i : maxxs

pdfN (xs) s.t. xs ∈ As,L,i (361)

from (355) – (358):

xs ∈ As,L,i ⇐⇒ minxr∈Tr

fi (xs,xr) ≥ fL,i ”smallest value still inside” (362)

xs ∈ As,U,i ⇐⇒ maxxr∈Tr

fi (xs,xr) ≤ fU,i ”largest value still inside” (363)

xs ∈ As,L,i ⇐⇒ minxr∈Tr

fi (xs,xr) < fL,i ”smallest value already outside” (364)

xs ∈ As,U,i ⇐⇒ maxxr∈Tr

fi (xs,xr) > fU,i ”largest value already outside” (365)

(360), (361) with (355) – (358):

xs,0 ∈ As,L : minxs

β2 (xs) s.t. minxr∈Tr

f (xs,xr) ≤ fL ”nominal inside” (366)

xs,0 ∈ As,L : minxs

β2 (xs) s.t. minxr∈Tr

f (xs,xr) ≥ fL ”nominal outside” (367)

xs,0 ∈ As,U : minxs

β2 (xs) s.t. maxxr∈Tr

f (xs,xr) ≥ fU ”nominal inside” (368)

xs,0 ∈ As,U : minxs

β2 (xs) s.t. maxxr∈Tr

f (xs,xr) ≤ fU ”nominal outside” (369)


9 Yield optimization/design centering/nominal de-

sign

9.1 Optimization objectives

Yield Y (xd)

Worst-case distances

βWL/U,i =|f (WL/U,i)

(xs,0)− fL/U,i|√∇fi

(xs,WL/U,i

)T ·C · ∇fi (xs,WL/U,i

) (370)

=|∇fi

(xs,WL/U,i

)T · (xs,0 − xs,WL/U,i

)+∇fi (xd,0)T · (xd − xd,0) |

σf(WL/U,i)

(371)

i = 1, . . . , nPSF (372)

Performance features fi, i = 1, . . . , nf

Worst-case performance features fWL/U,i, i = 1, . . . , nf

Performance distances

αWL/U,i =|fi (xd,0)− fL/U,i|√

∇fi (xd,0)T ·A2xd· ∇fi (xd,0)

”performance safety margin”

”sensitivity”(373)

withf i (xd) = fi (xd,0) +∇fi (xd,0)T · (xd − xd,0) (374)

Axd =

axd,1 0

. . .

0 axd,nxd

(scaling of individualdesign parameters

)(375)


9.2 Derivatives of optimization objectives

Statistical yield derivatives

first derivative:∇Y (xs,0) = Y ·C−1 · (xs,0,δ − xs,0) (376)

xs,0,δ = Epdfδxs , pdfδ (xs) =

1

Y· δ (xs) · pdf (xs) (377)

”center of gravity”, truncated normal distribution

second derivative:

∇2Y (xs,0) = Y ·C−1 ·[Cδ + (xs,0,δ − xs,0) · (xs,0,δ − xs,0)T −C

]·C−1 (378)

Cδ = Epdfδ(xs − xs,0,δ) · (xs − xs,0,δ) (379)

difficult: ∇Y (xd)

Worst-case distance derivatives

first derivatives:

∇βWL/U,i (xs,0) =±1

σf(WL/U,i)

· ∇fi(xs,WL/U,i

)(380)

∇βWL/U,i

(x

(κ)d

)=

±1

σf(WL/U,i)

· ∇fi(x

(κ)d

)(381)

Performance distance derivative

first derivative:

∇αWL/U,i(xd,0) =±1√

∇f(xd,0)TA2xd∇f(xd,0)

∇f(xd,0) (382)


9.3 Problem formulations of analog optimization

Design centering/yield optimization

maxxd

Y (xd) s.t. c (xd) ≥ 0 (383)

geometric:

maxxd ±βWL/U,i (xd) , i = 1, . . . , nPSF s.t. c (xd) ≥ 0 (384)

“nominal inside” : +

“nominal outside” : −

(multicriteria optimization problem,MCO)

Nominal design

minxd±fi (xd) , i = 1, · · · , nf s.t. c (xd) ≥ 0 (385)

maxxd ±αWL/U,i (xd) , i = 1, · · · , nPSF s.t. c (xd) ≥ 0 (386)

“nominal inside” : +

“nominal outside” : −

(multicriteria optimization problem,MCO)

Scalarization of MCO

min oi(xd), i = 1, . . . , no → min t(xd) (387)

weighted sum: tl1(x) =no∑i=1

wi · oi(x) (388)

weighted least squares: tl2(x) =∑i

wi · (oi(x)− oi,target)2 (389)

weighted min/max: tl∞(x) = maxiwi · oi (390)

exponential function: texp(x) =∑i

e−wi·oi(x) (391)

with wi > 0 , i = 1, . . . , no ,∑no

i=1 wi = 1

Inscribing largest ellipsoid in linearized acceptance region by linear programming in iter-ation step κ:

maxtβ ,xd

tβ s.t. β(κ)w,i +∇βTw,i(x

(κ)d ) · (xd − x

(κ)d ) ≥ tβ , i = 1, . . . , nPSF (392)[

c(κ)(xd) ≥ 0]

9.4 Analysis, synthesis


design/sizing

analysis

simulation (deterministic/stochastic)numerical optimization

optimization objectivedeterministic/stochastic

numerical

synthesis

9.5 Sizing

analog optimization, circuit sizing

nominal design tolerance design

•) without statistical distribution •) with statistical parameter

•) without parameter ranges distribution

•) statistical and range

parameters at some fixed •) with tolerance regions of

point range parameters

9.6 Nominal design, tolerance design

nominal design

analysis synthesis

sensitivity analysis (objective: gradient) nominal optimization

simulation (objective: performance values)

tolerance design

analysis synthesis

worst-case analysis worst-case optimization

(objective: worst-case performance values and gradients)

yield analysis yield optimization

(objective: yield or worst-case distances and gradients)


9.7 Optimization without/with constraints

Optimization without constraints

search direction line search

– Newton direction 1. bracketing: start interval

– Quasi-Newton direction 2. sectioning: interval refinement

– coordinate direction – Newton step

– gradient – Golden section

– least squares: Levenberg–Marquardt – backtracking

Gauss–Newton

• Polytope (Nelder-Mead) method

• Trust region

Optimization with constraints

↓nonlinear

↓ sequence of

Quadratic programming (QP)

with inequality constraints

↓ active set

QP, equality constraints

– Newton direction

– step–length limitation: new constraint

– active set update


10 Sizing rules for analog circuit optimization

10.1 Single (NMOS) transistor

D

S

G

vGS

vDS

iDS

iDS

vGS

saturation regiontriode region

vDS

Figure 52.

drain-source current (simple model)

iDS =

0, vDS < 0

µ · Cox · WL ·(vGS − Vth − vDS

2

)· vDS · (1 + λ · vDS) , 0 ≤ vDS ≤ vGS − Vth

12· µ · Cox · WL · (vGS − Vth)

2 · (1 + λ · vDS) , vGS − Vth ≤ vDS(393)

W,L [m] : transistor width and length

µn

[m2

V ·s

]: electron mobility in silicon

Cox[Fcm2

]: capacitance per area due to oxide

Vth [V ] : threshold voltage

λ[ 1V

] : channel length modulation factor

(394)


saturation region, no channel length modulation:

iDS = K · WL· (vGS − Vth)2 with K =

1

2· µn · Cox (395)

derivatives:∇iDS (K) = iDS/K

∇iDS (W ) = iDS/W

∇iDS (L) = −iDS/L∇iDS (Vth) = − 2

vGS−Vth· iDS

(396)

variances: (σKK

)2

=AKW · L

(397)

(σVth)2 =AVthW · L

(398)

area law: Lakshmikumar et al. IEEE Journal of solid-state circuits SC-21, Dec.1986

iDS variance, assumptions: K,W,L, Vth variations statistically independent, saturationregion, no channel length modulation, linear transformation of variances

σ2iDS≈

∑x∈K,W,L,Vth

[∇iDS (x)]2 · σ2x (399)

σ2iDS

i2DS≈ AkW · L

+σ2W

W 2+σ2L

L2+

4

(vGS − Vth)2 ·AVthW · L

(400)

W ↑, L ↑, W · L ↑ → σiDS ↓

larger transistor geometries reduce iDS variance due to manufacturing variations inK,W,L, Vth

10.1.1 Sizing rules for single transistor that acts as a voltage-controlled cur-rent source (VCCS)

type origin

(1) vDS ≥ 0 electrical (DC) transistor

(2) vGS − Vth ≥ 0 function in saturation

(3) vDS − (vGS − Vth) ≥ V ?satmin

(4) W ≥ W ?min geometry manufacturing

(5) L ≥ L?min robustness variations affect iDS

(6) W · L ≥ A?min

? technology–specific value


10.2 Transistor pair: current mirror (NMOS)

vGS

i2

T2

i1

T1

Figure 53.

function: I2 = x · I1

assumptions: K1 = K2; λ1 = λ2 = 0; Vth1 = Vth2; L1 = L2; T1, T2 in saturation

x =I2

I1

=W2

W1

· (vGS − Vth2)2

(vGS − Vth1)2 ·1 + λ2 · vDS2

1 + λ1 · vDS1

(401)

10.2.1 Sizing rules for current mirror

in addition to sec. 10.1.1, (1) – (6):

type origin

(7) L1 = L2 geometric current

(8) x = W2

W1function ratio

(9) |vDS1 − vDS2| ≤ ∆V ?DSmax electrical influence of channel

function length modulation

(10) vGS − Vth1,2 ≥ V ?GSmin electrical influence of local variations

robustness in threshold voltages

? technology-specific value


Appendix A Matrix and vector notations

A.1 Vector

b =

b1

b2

...

bm

=

...

bi...

, bi ∈ R, b ∈ Rm, b<m> (402)

b: column vector

bT = [· · · bi · · · ] transposed vector (403)

bT : row vector

A.2 Matrix

A =

a11 a12 · · · a1n

a21 a22 · · · a2n

......

. . ....

am1 am2 · · · amn

=

· · ·

... aij...

· · ·

, aij ∈ R, A ∈ Rm×n, A<m×n> (404)

i: row index, j: column index

A = [a1 a2 · · · an] (405)

A =

aT1

aT2...

aTm

(406)

transposed matrix: AT<n×m>

AT =

a11 a21 · · · am1

a12 a22 · · · am2

......

. . ....

a1n a2n · · · amn

=

aT1

aT2...

aTn

= [a1a2 · · · am] (407)


A.3 Addition

A<m×n> + B<m×n> = C<m×n> (408)

cij = aij + bij (409)

A.4 Multiplication

A<m×n> ·B<n×p> = C<m×p> (410)

cij =n∑k=1

aik · bkj (411)

n

mi

n

B

C

1

m

1 p

1 p

A

11

1

j

cij = aTi︸︷︷︸i-th row

in A (410)

· bj︸︷︷︸j-th column

in B (410)

(412)


A.5 Special cases

a = A<m×1> (413)

aT = A<1×m> (414)

a = A<1×1> = a<1> = aT<1> (415)

aT<m> · b<m> = c scalar product (416)

a<m> · bT<n> = C<m×n> dyadic product (417)

A<m×n> · b<n> = c<m> (418)

bT<m> ·A<m×n> = cT<n> (419)

bT<m> ·A<m×n> · c<n> = d (420)

identity matrix I<m×m> =

1 0

. . .

0 1

(421)

diagonal matrix diag (a<m>) =

a1 0

a2

. . .

0 am

<m×m>

(422)


A.6 Determinant of a quadratic matrix

det (A<m×m>) = |A| =m∑j=1

aij · αij, i ∈ 1, . . . ,m

=m∑i=1

aij · αij, j ∈ 1, . . . ,m (423)

adjugate matrix adj (A)

adj (A) =

· · ·

... αij...

· · ·

T

(424)

minor determinant (row i, column j deleted):

aijαij = (−1)i+j ·

A.7 Inverse of a quadratic non-singular matrix

A−1<m×m> ·A<m×m> = A ·A−1 = I (425)

A−1 =1

det (A)· adj (A) (426)


A.8 Some properties

(A ·B) ·C = A · (B ·C) (427)

A · (B + C) = A ·B + A ·C (428)

(A ·B)T = BT ·AT (429)(AT)T

= A (430)

(A + B)T = AT + BT (431)

A symmetric ⇐⇒ A = AT (432)

A positive (semi)definite ⇐⇒ ∀x 6=0

xT ·A · x > (≥) 0 (433)

A negative (semi)definite ⇐⇒ ∀x6=0

xT ·A · x < (≤) 0 (434)

IT = I (435)

A · I = I ·A = A (436)

A · adj (A) = det (A) · I = adj (A) ·A (437)

det(AT)

= det (A) (438)

|a1 · · · b · aj · · · am| = b · det (A) (439)

det(b ·AT

)= bm · det (A) (440)

|a1 · · · a(1)j · · · am|+ |a1 · · · a

(2)j · · · am| = |a1 · · · a

(1)j + a

(2)j · · · am| (441)

|a2 a1 a3 · · · am| = −det (A) (442)

aj = ak (A rank deficient) : det (A) = 0 (443)

|a1 · · · aj + b · ak · · · am| = det (A) (444)

det (A ·B) = det (A) · det (B) (445)

(A ·B)−1 = B−1 ·A−1 (446)(AT)−1

=(A−1

)T(447)(

A−1)−1

= A (448)

det(A−1

)=

1

det (A)(449)

A−1 = AT ⇐⇒ A orthogonal (450)


Appendix B Abbreviated notations of derivatives us-

ing the nabla symbol

Gradient vector: vector of first partial derivatives

∇f (xs)<nxs> =

∂f∂xs,1∂f∂xs,2

...∂f

∂xs,nxs

∣∣∣∣∣∣∣∣∣∣∣x=

xd

xs

xr

=∂f

∂xs

∣∣∣∣x

(451)

∇f (x?s) =∂f

∂xs

∣∣∣∣xd

x?s

xr

(452)

Hessian Matrix: matrix of second partial derivatives

∇2f (xs)<nxs×nxs> =

∂2f∂x2s,1

∂2f∂xs,1·∂xs,2 · · · ∂2f

∂xs,1·∂xs,nxs∂2f

∂xs,2·∂xs,1∂2f∂x2s,2

· · · ∂2f∂xs,2·∂xs,nxs

......

. . ....

∂2f∂xs,nxs ·∂xs,1

∂2f∂xs,nxs ·∂xs,2

· · · ∂2f∂x2s,nxs

∣∣∣∣∣∣∣∣∣∣∣x=

xd

xs

xr

=∂2f

∂xs · ∂xTs

∣∣∣∣x

(453)

∇2f (x?s) =∂2f

∂xs · ∂xTs

∣∣∣∣xd

x?s

xr

(454)

∇2f (x?d,x?s)<(nxd+nxs)×(nxd+nxs)>

=

∂2f∂xd·∂xTd

∂2f∂xd·∂xTs

∂2f∂xs·∂xTd

∂2f∂xs·∂xTs

∣∣∣∣∣∣x=

x?d

x?s

xr

(455)

=

[∇2f (xd)

∂2f∂xd·∂xTs

∂2f∂xs·∂xTd

∇2f (xs)

]∣∣∣∣∣x

(456)


Jacobian matrix

∇f(x?

T

s

)<nf×nxs>

=∂f

∂xTs

∣∣∣∣xd

x?s

xr

(457)

=

∂f1∂xs,1

∂f1∂xs,2

· · · ∂f1∂xs,nxs

∂f2∂xs,1

∂f2∂xs,2

· · · ∂f2∂xs,nxs

......

. . ....

∂fnf∂xs,1

∂fnf∂xs,2

· · · ∂fnf∂xs,nxs

∣∣∣∣∣∣∣∣∣∣∣xd

x?s

xr

(458)


Appendix C Norms

Quadratic model

f (x) = f (x0) + gT · (x− x0) +1

2(x− x0)T ·H · (x− x0) (459)

= f0 +nx∑i=1

gi · (xi − x0,i) +1

2

nx∑i=1

nx∑j=1

hij · (xi − x0,i) · (xj − x0,j) (460)

H: symmetric

nx = 2 : f (x1, x2) = f0 + g1 · (x1 − x0,1) + g2 · (x2 − x0,2) (461)

+1

2h11 · (x1 − x0,1)2

+ h12 · (x1 − x0,1) · (x2 − x0,2) +1

2h22 · (x2 − x0,2)2

Vector norm

l1-norm ‖x‖1 =nx∑i=1

|xi| (462)

l2-norm ‖x‖2 =

√√√√ nx∑i=1

x2i =√

xT · x (463)

l∞-norm ‖x‖∞ = maxi|xi| (464)

lp-norm ‖x‖p =

(nx∑i=1

|xi|p) 1

p

(465)

(466)

Matrix norm

max norm ‖A‖M = nx ·maxi,j|aij| (467)

row norm ‖A‖Z = maxi

∑j

|aij| (468)

column norm ‖A‖S = maxj

∑i

|aij| (469)

Euclidean norm ‖A‖E =

√∑i

∑j

|aij|2 (470)

spectral norm ‖A‖λ =√λmax (AT ·A) (471)


Appendix D Pseudo-inverse, singular value decom-

position (SVD)

D.1 Moore-Penrose conditions

For every matrix A<m×n>, there always exists a unique matrix A+<n×m> that satisfies the

following conditions.

A ·A+ ·A = A, A ·A+ maps each column of A to itself (472)

A+ ·A ·A+ = A+, A+ ·A maps each column of A+ to itself (473)(A ·A+

)T= A ·A+, A ·A+ is symmetric (474)(

A+ ·A)T

= A+ ·A, A+ ·A is symmetric (475)


D.2 Singular value decomposition

Every matrix A<m×n> with rank(A) = r ≤ min(m,n) has a singular value decomposition

A<m×n> = V<m×m> · A<m×n> ·UT<n×n> (476)

V: matrix of the m left singular vectors (eigenvectors of A ·AT )

U: matrix of the n right singular vectors (eigenvectors of AT ·A)

U,V orthogonal: U−1 = UT , V−1 = VT

A =

[D<r×r> 0

0 0

]<m×n>

, r = rank (A) (477)

D = diag( d1, · · · , dr) (478)

↑ singular values

A+<n×m> = U<n×n> · A+

<n×m> ·VT<m×m> (479)

A+ =

[D−1<r×r> 0

0 0

]<n×m>

(480)

D−1 = diag

(1

d1

, . . . ,1

dr

)(481)

singular values: positive roots of eigenvalues of AT ·A

columns of V : basis for Rm

columns 1, . . . , r of V : basis for column space of A

columns (r + 1), . . . ,m of V : basis for kernel/null-space of AT

columns of U : basis for Rn

columns 1, . . . , r of U: basis for row space of A

columns (r + 1), . . . , n of U : basis for kernel/null-space of A

A · A+

=

[I<r×r> 0

0 0

]<m×m>

A+· A =

[I<r×r> 0

0 0

]<n×n>

A ·A+ =

[I<r×r> 0

0 0

]<m×m>

A+ ·A =

[I<r×r> 0

0 0

]<n×n>


Appendix E Linear equation system, rectangular sys-

tem matrix with full rank

A<m×n> · x<n> = c<m> (482)

rank(A) = min(m,n)

E.1 m < n, rank(A) = m, underdetermined system of equations

A ·AT is invertible

A · x = c → A ·AT ·w′︸︷︷︸x

= c → w′ = (A ·AT )−1 · c

x = AT · (A ·AT )−1︸︷︷︸A+

·c minimum length solution

Solving (482) by SVD

A = V · [D<m×m> 0]<m×n> ·UT

V · [D 0] ·UT · x︸︷︷︸w′′

= c

[D 0] ·w′′ = VT · c w′′ =

[w′′a<m>

w′′b

]<n>

D ·w′′a = VT · celement-wise

division→ w′′a

x = U ·

[w′′a

0

]

Solving (482) by QR-decomposition

AT<n×m> =

[Q<n×m> Q⊥

]<n×n−m>︸︷︷︸

orthogonal, i.e.,QT ·Q⊥ = 0

·

[R<m×m>

0

]<n−m×m>

(483)

A+ = AT · (A ·AT )−1 = Q ·R−T

RT ·QT · x︸︷︷︸w′′′

= c

forwardsubstitution→ w′′′ → x = Q ·w′′′


E.2 m > n, rank(A) = n, overdetermined system of equations

AT ·A is invertible

A · x = c → AT ·A · x = AT · cx = (AT ·A)−1 ·AT︸︷︷︸

A+

c least squares solution


A = V ·

[D<n×n>

0

]<m×n>

·UT

V ·

[D

0

]·UT · x︸︷︷︸

w′′= c[

D

0

]·w′′ = VT · c =

[va <n>

vb

]<m>

D ·w′′ = va

element-wisedivision→ w′′

x = U ·w′′


A<m×n> =[Q<m×n> Q⊥

]<m×m>︸︷︷︸

orthogonal

·

[R<n×n>

0

]<m×n>

= Q ·R

A+ = (AT ·A)−1 ·AT = R−1QT

Q ·R · x = c⇔ R · x = QT · cbackward

substitution→ x


E.3 m = n, rank(A) = m = n, determined system of equations

A is invertible

A+ = A−1


A = V ·D ·UT

V ·D ·UT · x︸︷︷︸w′′

= c

D ·w′′ = VT · celement-wise

division→ w′′

x = U ·w′′


A = Q ·RA+ = R−1 ·QT

Q ·R · x = c ⇔ R · x = QT · cbackward

substitution→ x

Solving (482) by LU-decomposition

A = L ·UL : lower left triangular matrix

U : upper right triangular matrix

L ·U · x︸︷︷︸w′′′′

= c

L ·w′′′′ = c

forwardsubstitution→ w′′′′

U · x = w′′′′backward

substitution→ x

A orthogonal: AT ·A = A ·AT = I

A+ = A−1 = AT

A diagonal: A+ii = 1

Aii

(rank deficient, i.e. ∃iAii = 0, then A+ii = 0 )


Appendix F Partial derivatives of linear, quadratic

terms in matrix/vector notation

∂∂x

∂∂xT

aT · x a aT

xT · a a aT

xT<n> ·A<n×n> · x<n> A · x + AT · x xT ·AT + xT ·AA = AT (symmetric): A = AT :

2 ·A · x 2 · xT ·AT

xT · x 2 · x 2 · xT

A<m×n> · x<n>= a1 · x1 + a2 · x2 + . . .+ an · xn

=

aT1 · xaT2 · x

...

aTm · x

a1

a2...

am

<mn>

A =

aT1

aT2...

aTm

xT<n> ·AT

<n×m>

= aT1 · x1 + aT2 · x2 + . . .+ aTn · xn AT = [a1 a2 · · · am][aT1 aT2 · · · aTm

]<mn>

=[aT1 · x aT2 · x · · · aTm · x

]


∂

...⊗...

<m>

∂

[ ]<n>

=

...∂⊗

∂

<n>

...

<mn>

(484)

∂

...⊗...

<m>

∂ [ ]<n>=

...

∂⊗

∂[ ]<n>...

<m×n>

(485)

∂ [· · ·⊗ · · · ]<m>∂

[ ]<n>

=

· · ·∂⊗

∂

[ ]<n>

· · ·

<n×m>

(486)

∂ [· · ·⊗ · · · ]<m>∂ [ ]<n>

=

[· · · ∂

⊗∂ [ ]<n>

· · ·]<mn>

(487)


Appendix G Probability space

event B: e.g., performance value f is less than or equal fU , i.e., B = f |f ≤ fU

probability P : e.g., P (B) = P (f | f ≤ fU) = cdff (fU) =∫ fU−∞ pdff (f) · df

event B ⊂ Ω: subset of sample set of possible outcomes of probabilistic experiment Ω

elementary event: event with one element (if event sets are countable)

event set B: system of subsets of sample set, B ∈ B

definitions:Ω ∈ B (488)

B ∈ B ⇒ B ∈ B (489)

Bi ∈ B, i ∈ N ⇒ ∪iBi ∈ B (490)

event set B: ”σ-algebra”

Kolmogorov axioms:P (B) ≥ 0 for all B ∈ B (491)

P (Ω) = 1 ”Ω: certain event” (492)

Bi ∩Bj = , Bi,j ∈ B ⇒ P (Bi ∪Bj) = P (Bi) + P (Bj) (493)

corollaries:P () = 0 ”: impossible event” (494)

Bi ⊆ Bj ⇒ P (Bi) ≤ P (Bj) (495)

P(B)

= 1− P (B) (496)

0 ≤ P (B) ≤ 1 (497)∫ +∞

−∞· · ·∫ +∞

−∞pdf (t) · dt = 1 (498)


Appendix H Convexity

H.1 Convex set K ∈ Rn

∀p0,p1∈K,a∈[0,1]

pa∆= (1− a) · p0 + a · p1 ∈ K (499)

p0pa

p1

K

(a) (b)

p0

p1

Figure 54. (a) convex set, (b) non-convex set

H.2 Convex function

∇2f is positive definite

∀p0,p1 ∈ K, a ∈ [0, 1]

pa = (1− a) · p0 + a · p1

f (pa) ≤ (1− a) · f (p0) + a · f (p1) (500)

pp1pap0

f

Figure 55. convex function

c (p) is a concave function ⇐⇒ p | c (p) ≥ c0 is a convex functionconvex optimization problem: local minima are global minimastrict convexity: unique global solution


Optimization Methods for Circuit Design - Compendiummediatum.ub.tum.de/doc/1086708/1086708.pdf ·...

Documents

Transcript of Optimization Methods for Circuit Design - Compendiummediatum.ub.tum.de/doc/1086708/1086708.pdf ·...