Matching circuit optimization for antenna applications - CST
Optimization Methods for Circuit Design - Compendiummediatum.ub.tum.de/doc/1086708/1086708.pdf ·...
Transcript of Optimization Methods for Circuit Design - Compendiummediatum.ub.tum.de/doc/1086708/1086708.pdf ·...
Technische Universitat MunchenDepartment of Electrical Engineering and Information TechnologyInstitute for Electronic Design Automation
Optimization Methods for Circuit Design
Compendium
H. Graeb
Version 2.8 (WS 12/13) Michael ZwergerVersion 2.0 - 2.7 (WS 08/09 - SS 12) Michael EickVersion 1.0 - 1.2 (SS 07 - SS 08) Husni Habal
Presentation follows:H. Graeb, Analog Design Centering and Sizing, Springer, 2007.R. Fletcher, Practical Methods of Optimization, John Wiley & Sons, 2nd Edition, 2000.
Status: October 12, 2012
Copyright 2008 - 2012Optimization Methods for Circuit DesignCompendiumH. GraebTechnische Universitat MunchenInstitute for Electronic Design AutomationArcisstr. 2180333 Munich, [email protected]: +49-89-289-23679
All rights reserved.
Contents
1 Introduction 1
1.1 Parameters, performance, simulation . . . . . . . . . . . . . . . . . . . . . 1
1.2 Performance specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Minimum, minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Unconstrained optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Constrained optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Classification of optimization problems . . . . . . . . . . . . . . . . . . . . 4
1.7 Classification of constrained optimization problems . . . . . . . . . . . . . 4
1.8 Structure of an iterative optimization process . . . . . . . . . . . . . . . . 5
1.8.1 . . . without constraints . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.8.2 . . . with constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.8.3 Trust-region approach . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Optimality conditions 9
2.1 Optimality conditions – unconstrained optimization . . . . . . . . . . . . . 9
2.1.1 Necessary first-order condition for a local minimum of an uncon-strained optimization problem . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Necessary second-order condition for a local minimum of an uncon-strained optimization problem . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Sufficient and necessary conditions for second-order derivative∇2f (x?)to be positive definite . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Optimality conditions – constrained optimization . . . . . . . . . . . . . . 13
2.2.1 Feasible descent direction r . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Necessary first-order conditions for a local minimum of a constrainedoptimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.3 Necessary second-order condition for a local minimum of a con-strained optimization problem . . . . . . . . . . . . . . . . . . . . . 15
2.2.4 Sensitivity of the optimum with regard to a change in an activeconstraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Optimization of Analog Circuits i
3 Unconstrained optimization 17
3.1 Univariate unconstrained optimization, line search . . . . . . . . . . . . . . 17
3.1.1 Wolfe-Powell conditions . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2 Backtracking line search . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.3 Bracketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.4 Sectioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.5 Golden Sectioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.6 Line search by quadratic model . . . . . . . . . . . . . . . . . . . . 26
3.1.7 Unimodal function . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Multivariate unconstrained optimization without derivatives . . . . . . . . 29
3.2.1 Coordinate search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2 Polytope method (Nelder-Mead simplex method) . . . . . . . . . . 31
3.3 Multivariate unconstrained optimization with derivatives . . . . . . . . . . 34
3.3.1 Steepest descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Newton approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Quasi-Newton approach . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.4 Levenberg-Marquardt approach (Newton direction plus trust region) 37
3.3.5 Least-squares (plus trust-region) approach . . . . . . . . . . . . . . 38
3.3.6 Conjugate-gradient (CG) approach . . . . . . . . . . . . . . . . . . 40
4 Constrained optimization – problem formulations 47
4.1 Quadratic Programming (QP) . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.1 QP – linear equality constraints . . . . . . . . . . . . . . . . . . . . 47
4.1.2 QP - inequality constraints . . . . . . . . . . . . . . . . . . . . . . . 51
4.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Sequential Quadratic programming (SQP), Lagrange–Newton . . . . . . . 54
4.2.1 SQP – equality constraints . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.2 Penalty function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Optimization of Analog Circuits ii
5 Statistical parameter tolerances 57
5.1 Univariate Gaussian distribution (normal distribution) . . . . . . . . . . . 58
5.2 Multivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3 Transformation of statistical distributions . . . . . . . . . . . . . . . . . . 62
5.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6 Expectation values and their estimators 65
6.1 Expectation values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1.2 Linear transformation of expectation value . . . . . . . . . . . . . . 66
6.1.3 Linear transformation of variance . . . . . . . . . . . . . . . . . . . 66
6.1.4 Translation law of variances . . . . . . . . . . . . . . . . . . . . . . 66
6.1.5 Normalizing a random variable . . . . . . . . . . . . . . . . . . . . 66
6.1.6 Linear transformation of a normal distribution . . . . . . . . . . . . 67
6.2 Estimation of expectation values . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.1 Expectation value estimator . . . . . . . . . . . . . . . . . . . . . . 68
6.2.2 Variance estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.2.3 Variance of the expectation value estimator . . . . . . . . . . . . . 69
6.2.4 Linear transformation of estimated expectation value . . . . . . . . 70
6.2.5 Linear transformation of estimated variance . . . . . . . . . . . . . 70
6.2.6 Translation law of estimated variance . . . . . . . . . . . . . . . . . 70
Optimization of Analog Circuits iii
7 Worst-case analysis 71
7.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 Typical tolerance regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.3 Classical worst-case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.3.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.3.2 Linear performance model . . . . . . . . . . . . . . . . . . . . . . . 73
7.3.3 Peformance type f ↑ ”good”, specification type f > fL . . . . . . . 74
7.3.4 Performance type f ↓ ”good”, specification type f < fU . . . . . . . 75
7.4 Realistic worst-case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.4.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.4.2 Peformance type f ↑ ”good”, specification type f ≥ fL: . . . . . . . 77
7.4.3 Performance type f ↓ ”good”, specification type f ≤ fU . . . . . . . 78
7.5 General worst-case analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.5.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.5.2 Peformance type f ↑ ”good”, specification type f ≥ fL: . . . . . . . 80
7.5.3 Performance type f ↓ ”good”, specification type f ≤ fU : . . . . . . 81
7.5.4 General worst-case analysis with tolerance box . . . . . . . . . . . . 81
7.6 Input/output of worst-case analysis . . . . . . . . . . . . . . . . . . . . . . 82
7.7 Summary of discussed worst-case analysis problems . . . . . . . . . . . . . 84
Optimization of Analog Circuits iv
8 Yield analysis 85
8.1 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.1.1 Acceptance function . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.1.2 Parametric yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.2 Statistical yield analysis/Monte-Carlo analysis . . . . . . . . . . . . . . . . 87
8.2.1 Variance of yield estimator . . . . . . . . . . . . . . . . . . . . . . . 88
8.2.2 Estimated variance of yield estimator . . . . . . . . . . . . . . . . . 88
8.2.3 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.3 Geometric yield analysis for linearized performance feature (”realistic geo-metric yield analysis”) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.3.1 Yield partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.3.2 Defining worst-case distance βWL as difference from nominal per-formance to specification bound as multiple of standard deviationσf of linearized performance feature (βWL-sigma design) . . . . . . 93
8.3.3 Yield partition as a function of worst-case distance βWL . . . . . . . 93
8.3.4 Worst-case distance βWL defines tolerance region . . . . . . . . . . 94
8.3.5 specification type f ≤ fU . . . . . . . . . . . . . . . . . . . . . . . . 95
8.4 Geometric yield analysis for nonlinear performance feature (”general geo-metric yield analysis”) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.4.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.4.2 Advantages of geometric yield analysis . . . . . . . . . . . . . . . . 97
8.4.3 Lagrange function and first-order optimality conditions of problem (336) 98
8.4.4 Lagrange function of problem (337) . . . . . . . . . . . . . . . . . . 98
8.4.5 Second-order optimality condition of problem (336) . . . . . . . . . 99
8.4.6 Worst-case distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.4.7 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.4.8 Overall yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.4.9 Consideration of range parameters . . . . . . . . . . . . . . . . . . 102
Optimization of Analog Circuits v
9 Yield optimization/design centering/nominal design 103
9.1 Optimization objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9.2 Derivatives of optimization objectives . . . . . . . . . . . . . . . . . . . . . 104
9.3 Problem formulations of analog optimization . . . . . . . . . . . . . . . . . 105
9.4 Analysis, synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.5 Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.6 Nominal design, tolerance design . . . . . . . . . . . . . . . . . . . . . . . 106
9.7 Optimization without/with constraints . . . . . . . . . . . . . . . . . . . . 107
10 Sizing rules for analog circuit optimization 109
10.1 Single (NMOS) transistor . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
10.1.1 Sizing rules for single transistor that acts as a voltage-controlledcurrent source (VCCS) . . . . . . . . . . . . . . . . . . . . . . . . . 110
10.2 Transistor pair: current mirror (NMOS) . . . . . . . . . . . . . . . . . . . 111
10.2.1 Sizing rules for current mirror . . . . . . . . . . . . . . . . . . . . . 111
A Matrix and vector notations 113
A.1 Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.2 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
A.3 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
A.5 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A.6 Determinant of a quadratic matrix . . . . . . . . . . . . . . . . . . . . . . 116
A.7 Inverse of a quadratic non-singular matrix . . . . . . . . . . . . . . . . . . 116
A.8 Some properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
B Abbreviated notations of derivatives using the nabla symbol 119
C Norms 121
Optimization of Analog Circuits vi
D Pseudo-inverse, singular value decomposition (SVD) 123
D.1 Moore-Penrose conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
D.2 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 124
E Linear equation system, rectangular system matrix with full rank 125
E.1 underdetermined system of equations . . . . . . . . . . . . . . . . . . . . . 125
E.2 overdetermined system of equations . . . . . . . . . . . . . . . . . . . . . . 126
E.3 determined system of equations . . . . . . . . . . . . . . . . . . . . . . . . 127
F Partial derivatives of linear, quadratic terms in matrix/vector notation129
G Probability space 131
H Convexity 133
H.1 Convex set K ∈ Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
H.2 Convex function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Optimization of Analog Circuits vii
Optimization of Analog Circuits viii
1 Introduction
1.1 Parameters, performance, simulation
design parameters xd ∈ Rnxd e.g. transistor widths, capacitances
statistical parameters xs ∈ Rnxs e.g. oxide thickness, threshold voltage
range parameters xr ∈ Rnxr e.g. operational parameters:
supply voltage, temperature
(circuit) parameters x =[xTd xTs xTr
]Tperformance feature fi e.g. gain, bandwidth, slew rate,
phase margin, delay, power
(circuit) performance f = [· · · fi · · · ]T ∈ Rnf
(circuit) simulation x 7−→ f(x) e.g. SPICE
abstraction from physical level !
A design parameter and a statistical parameter may refer to the same physical parameter.E.g., an actual CMOS transistor width is the sum of a design parameter Wk and astatistical parameter ∆W . Wk is the specific width of transistor Tk while ∆W is a widthreduction that varies globally and equally for all the transistors on a die.
A design parameter and a statistical parameter may be identical.
1.2 Performance specification
performance specification feature (upper or lower limit on a performance):
fi ≥ fL,i or fi ≤ fU,i (1)
number of performance specification features:
nf ≤ nPSF ≤ 2nf (2)
performance specification:fL,1 ≤ f1 (x) ≤ fU,1
...
fL,nf ≤ fnf (x) ≤ fU,nf
⇐⇒ fL ≤ f(x) ≤ fU (3)
Optimization of Analog Circuits 1
(a)
(b)
f
xglobal minimumweak local minimumstrong local minimum
x
f
Figure 1. Smooth function (a), i.e. continuous and differentiable at least several times ona closed region of the domain. Non-smooth continuous function (b).
1.3 Minimum, minimization
without loss of generality: optimum ≡ minimum
because: max f ≡ −min−f(4)
min ≡
minimum, i.e., a result
minimize, i.e., a process(5)
min f(x) ≡ f (x)!→ min (6)
min f(x) −→ x∗, f(x∗) = f ∗ (7)
1.4 Unconstrained optimization
f ∗ = min f(x) ≡ minxf ≡ min
xf(x) ≡ minf(x)
x∗ = argmin f(x) ≡ argminxf ≡ argmin
xf(x) ≡ argminf(x)
(8)
Optimization of Analog Circuits 2
1.5 Constrained optimization
min f(x) s.t. ci(x) = 0 , i ∈ Eci(x) ≥ 0 , i ∈ I
(9)
E: set of equality constraintsI: set of inequality constraints
Alternative formulationsminxf s.t. x ∈ Ω (10)
minx∈Ω
f (11)
min f(x) | x ∈ Ω (12)
where,
Ω =
x
∣∣∣∣∣ ci(x) = 0, i ∈ Eci(x) ≥ 0, i ∈ I
Lagrange function
combines objective function and constraints in a single expression
L(x,λ) = f(x)−∑i∈E∪I
λi · ci(x) (13)
λi: Lagrange multiplier associated with constraint i
Optimization of Analog Circuits 3
1.6 Classification of optimization problems
deterministic, stochastic The iterative search process is deterministic or ran-dom.
continuous, discrete Optimization variables can take an infinite number ofvalues, e.g., the set of real numbers, or take a finiteset of values or states.
local, global The objective value at a local optimal point is betterthan the objective values of other points in its vicinity.The objective value at a global optimal point is betterthan the objective values of any other point.
scalar, vector In a vector optimization problem, multiple objectivefunctions shall be optimized simultaneously (multiple-criteria optimization, MCO). Usually, objectives haveto be traded off with each other. A Pareto-optimalpoint is characterized in that one objective can only beimproved at the cost of another. Pareto optimizationdetermines the set of all Pareto-optimal points. Scalaroptimization refers to a single objective. A vectoroptimization problem is scalarized by combining themultiple objectives into a single overall objective, e.g.,by a weighted sum, least-squares, or min/max.
constrained, unconstrained Besides the objective function that has to be opti-mized, constraints on the optimization variables maybe given as inequalities or equalities.
with or without derivatives The optimization process may be based on gradients(first derivative) or on gradients and Hessians (secondderivative), or it may not require any derivatives ofthe objective/constraint functions.
1.7 Classification of constrained optimization problems
objective function constraint functions
linear ∧ linear linear programming
quadratic ∧ linear quadratic programming
nonlinear ∨ nonlinear nonlinear programming
convex linear equality convex programming
constraints (local ≡ global minimum)
concave inequality
constraints
Optimization of Analog Circuits 4
1.8 Structure of an iterative optimization process
1.8.1 . . . without constraints
Taylor series of a function f about iteration point x(κ) :
f(x) = f(x(κ)) + ∇f(x(κ)
)T · (x− x(κ))
+1
2
(x− x(κ)
)T · ∇2f(x(κ)
)·(x− x(κ)
)+ . . . (14)
= f (κ) + g(κ)T ·(x− x(κ)
)+
1
2
(x− x(κ)
)T ·H(κ) ·(x− x(κ)
)+ . . . (15)
f (κ) : value of function f at point x(κ)
g(κ) : gradient (first derivative, direction of steepest ascent) at point x(κ)
H(κ) : Hessian matrix (second derivative) at point x(κ)
Taylor series about search direction r starting from point x(κ):
x(r) = x(κ) + r (16)
f(x(κ) + r) ≡ f(r) = f (κ) + g(κ)T · r +1
2rT ·H(κ) · r + . . . (17)
Taylor series about step length along search direction r(κ) starting from point x(κ):
x(α) = x(κ) + α · r(κ) (18)
f(x(κ) + α · r(κ)) ≡ f(α) = f (κ) + g(κ)T · r(κ) · α +1
2r(κ)T ·H(κ) · r(κ) · α2 + . . . (19)
= f (κ) + ∇f (α = 0) · α +1
2∇2f (α = 0) · α2 + . . . (20)
∇f (α = 0) : slope of f along direction r(κ)
∇2f (α = 0) : curvature of f along r(κ)
repeat
determine the search direction r(κ)
determine the step length α(κ) (line search)
x(κ+1) = x(κ) + α(κ) · r(κ)
κ := κ+ 1
until termination criteria are fulfilled
Steepest-descent approach
search direction: direction of steepest descent, i.e., r(κ) = −g(κ)
Optimization of Analog Circuits 5
x0 x?
Figure 2. Visual illustration of the steepest-descent approach for Rosenbrock’s functionf (x1, x2) = 100(x2 − x2
1)2 + (1 − x1)2. A backtracking line search is applied (see Sec.3.1.2, page 19) with an initial x(0) = [−1.0, 0.8]T and α(0) = 1, α := c3 · α. The searchterminates when the Armijo condition is satisfied with c1 = 0.7, c3 = 0.6.
Optimization of Analog Circuits 6
1.8.2 . . . with constraints
• Constraint functions and objective functions are combined in an unconstrained op-timization problem in each iteration step
– Lagrange formulation
– penalty function
– Sequential Quadratic Programming (SQP)
• Projection on active constraints, i.e. into subspace of an unconstrained optimizationproblem in each iteration step
– active-set methods
1.8.3 Trust-region approach
model of the objective function:
f(x) ≈ m(x(κ) + r
)(21)
minr
m(x(κ) + r
)s.t. r ∈ trust region (22)
e.g., ‖r‖ < ∆
search direction and step length are computed simultaneouslytrust region to consider the model accuracy
Optimization of Analog Circuits 7
Optimization of Analog Circuits 8
2 Optimality conditions
2.1 Optimality conditions – unconstrained optimization
Taylor series of the objective function around the optimum point x?:
f(x) = f(x?)︸ ︷︷ ︸f?
+ ∇f (x?)T︸ ︷︷ ︸g?T
· (x− x?)
+1
2(x− x?)T · ∇2f (x?)︸ ︷︷ ︸
H?
· (x− x?) + . . . (23)
f ? : value of the function at the optimum x?
g? : gradient at the optimum x?
H? : Hessian matrix at the optimum x?
For x = x? + r close to the optimum:
f(r) = f ? + g?T · r +1
2rT ·H? · r + . . . (24)
x? is optimal ⇐⇒ there is no descent direction, r, such that f (r) < f ?.
x1
x2
f (x) < f(x(κ)
)f (x) = f(x(κ)
)f (x) > f(x(κ)
)∇f(x(κ)
)gradient
x(κ) direction
steepest descentdirection
Figure 3. Descent directions from x(κ): shaded area.
Optimization of Analog Circuits 9
2.1.1 Necessary first-order condition for a local minimum of an unconstrainedoptimization problem
∀r6=0
g?T · r ≥ 0 (25)
g? = ∇f (x?) = 0 (26)
x?: stationary point
descent direction r: ∇f(x(κ)
)T · r < 0
steepest descent direction: r = −∇f(x(κ)
)
Figure 4. Quadratic functions: (a) minimum at x?, (b) maximum at x?, (c) saddle pointat x?, (d) positive semidefinite with multiple minima along trench.
Optimization of Analog Circuits 10
2.1.2 Necessary second-order condition for a local minimum of an uncon-strained optimization problem
∀r6=0
rT · ∇2f (x?) · r ≥ 0 ⇐⇒ ∇2f (x?) is positive semidefinite
⇐⇒ f has non-negative curvature(27)
sufficient:
∀r6=0
rT · ∇2f (x?) · r > 0 ⇐⇒ ∇2f (x?) is positive definite
⇐⇒ has positive curvature(28)
Figure 5. Contour plots of quadratic functions that are (a),(b) positive or negative definite,(c) indefinite (saddle point), (d) positive or negative semidefinite.
Optimization of Analog Circuits 11
2.1.3 Sufficient and necessary conditions for second-order derivative ∇2f (x?)to be positive definite
• all eigenvalues > 0
• has a Cholesky decomposition:
∇f 2 (x?) = L · LT with lii > 0
∇f 2 (x?) = L ·D · LT with lii = 1 and dii > 0(29)
• all pivot elements during gaussian elimination without pivoting > 0
• all principal minors are > 0
x1
x2
∇f(x(κ)
)ci = const
x(κ)
descent
descentunconstrained
unconstrained
f = const
∇ci(x(κ)
)
Figure 6. Dark shaded area: unconstrained directions according to (35), light shaded area:descent directions according to (34), overlap: unconstrained descent directions. When nodirection satisfies both (34) and (35) then the cross section is empty and the current pointis a local minimum of the function.
Optimization of Analog Circuits 12
2.2 Optimality conditions – constrained optimization
2.2.1 Feasible descent direction r
descent direction: ∇f(x(κ)
)T · r < 0 (30)
feasible direction: ci(x(κ) + r
)' ci
(x(κ)
)+∇ci
(x(κ)
)T · r ≥ 0 (31)
Inactive constraint: i is inactive ⇐⇒ ci(x(κ)
)> 0
then each r with ‖r‖ < ε satisfies (31), e.g.,
r = −ci(x(κ)
)‖∇ci (x(κ)) ‖ · ‖∇f (x(κ)) ‖
· ∇f(x(κ)
)(32)
(32) in (31) gives:
ci(x(κ)
)·
[1−
∇ci(x(κ)
)T · ∇f (x(κ))
‖∇ci (x(κ)) ‖ · ‖∇f (x(κ)) ‖
]≥ 0 (33)
where,
−1 ≤∇ci
(x(κ)
)T · ∇f (x(κ))
‖∇ci (x(κ)) ‖ · ‖∇f (x(κ)) ‖≤ 1
Active constraint (Fig. 6):i is active ⇐⇒ ci
(x(κ)
)= 0
then (30) and (31) become:
∇f(x(κ)
)T · r < 0 (34)
∇ci(x(κ)
)T · r ≥ 0 (35)
no feasible descent direction exists: no vector r satisfies both (34) and (35) at x?:
∇f (x?) = λ?i · ∇ci (x?) with λ?i ≥ 0 (36)
no statement about sign of λ?i in case of an equality constraint (ci = 0 ⇐⇒ ci ≤ 0∧ci ≥ 0)
Optimization of Analog Circuits 13
2.2.2 Necessary first-order conditions for a local minimum of a constrainedoptimization problem
x?, f ? = f (x?) ,λ?,L? = L (x?,λ?)
Karush-Kuhn-Tucker (KKT) conditions
∇L (x?) = 0 (37)
ci (x?) = 0 i ∈ E (38)
ci (x?) ≥ 0 i ∈ I (39)
λ?i ≥ 0 i ∈ I (40)
λ?i · ci (x?) = 0 i ∈ E ∪ I (41)
(37) is analogous to (26)(13) and (37) give:
∇f (x?)−∑
i∈A(x?)
λ?i · ∇ci (x?) = 0 (42)
A (x?) is the set of active constraints at x?
A (x?) = E ∪ i ∈ I | ci (x?) = 0 (43)
(41) is called the complementarity condition, Lagrange multiplier is 0 (inactive constraint)or constraint ci (x
?) is 0 (active constraint).
from (41) and (13):L? = f ? (44)
Optimization of Analog Circuits 14
2.2.3 Necessary second-order condition for a local minimum of a constrainedoptimization problem
f (x? + r) = L (x? + r,λ?) (45)
= L (x?,λ?) + rT · ∇L (x?)︸ ︷︷ ︸0
+1
2rT · ∇2L (x?)︸ ︷︷ ︸
...
·r + · · · (46)
= f ? +1
2rT · [
...︷ ︸︸ ︷∇2f (x?)−
∑i∈A(x?)
λ?i · ∇2ci (x?)] · r + · · · (47)
for each feasible stationary direction r at x?, i.e.,
Fr =
r
∣∣∣∣∣∣∣∣r 6= 0
∇ci (x?)T · r ≥ 0 i ∈ A (x?) \A?+∇ci (x?)T · r = 0 i ∈ A?+ =
j ∈ A (x?)
∣∣j ∈ E ∨ λ?j > 0 (48)
necessary:
∀r∈Fr
rT · ∇2L (x?) · r ≥ 0 (49)
sufficient:∀
r∈Fr
rT · ∇2L (x?) · r > 0 (50)
Optimization of Analog Circuits 15
2.2.4 Sensitivity of the optimum with regard to a change in an active con-straint
perturbation of an active constraint at x? by ∆i ≥ 0
ci (x) ≥ 0→ ci (x) ≥ ∆i (51)
L (x,λ,∆) = f (x)−∑i
λi · (ci (x)−∆i) (52)
∇f ? (∆i) = ∇L? (∆i)
=
(∂L∂xT
· ∂x∂∆i
+∂L∂λT
· ∂λ∂∆i
+∂L∂∆T
i
)∣∣∣∣x?,λ?︸︷︷︸
0T︸︷︷︸0T
=∂L∂∆i
∣∣∣∣x?,λ?
= λ?i
∇f ? (∆i) = λ?i (53)
Lagrange multiplier: sensitivity to change in an active constraint close to x?
Optimization of Analog Circuits 16
3 Unconstrained optimization
3.1 Univariate unconstrained optimization, line search
single optimization parameter, e.g., step length α
f(α) ≡ f(x(κ) + α · r
)= f
(x(κ)
)︸ ︷︷ ︸f (κ)
+ ∇f(x(κ)
)T · r︸ ︷︷ ︸∇f(α=0)
·α
+1
2rT · ∇2f
(x(κ)
)· r︸ ︷︷ ︸
∇2f(α=0)
·α2 + · · · (54)
error vector:ε(κ) = x(κ) − x? (55)
global convergence:limκ→∞
ε(κ) = 0 (56)
convergence rate: ∥∥ε(κ+1)∥∥ = L ·
∥∥ε(κ)∥∥p (57)
p = 1: linear convergence, L < 1p = 2: quadratic convergence
exact line search is expensive, therefore only find α to:
• obtain sufficient reduction in f
• obtain a big enough step length
Optimization of Analog Circuits 17
3.1.1 Wolfe-Powell conditions
0
f (α)
ααopt
f
f(x(κ)
)
(58) satisfied
(59) satisfied
(60) satisfied
∇f (α = 0)
c2 · ∇f (α = 0)
c1 · ∇f (α = 0)
Figure 7.
• Armijo condition: sufficient objective reduction
f(α) ≤ f(0) + α · c1 · ∇f(x(κ)
)T · r︸ ︷︷ ︸∇f(α=0)
(58)
• curvature condition: sufficient gradient increase
∇f(x(κ) + α · r
)T · r︸ ︷︷ ︸∇f(α)
≥ c2 · ∇f(x(κ)
)T · r︸ ︷︷ ︸∇f(α=0)
(59)
• strong curvature condition: step close to the optimum
|∇f (α) | ≤ c2 · |∇f (α = 0) | (60)
0 < c1 < c2 < 1 (61)
Optimization of Analog Circuits 18
3.1.2 Backtracking line search
0 < c3 < 1
determine step length αt /* big enough, not too big */
WHILE αt violates (58)
[αt := c3 · αt
Optimization of Analog Circuits 19
3.1.3 Bracketing
finding an interval [αlo, αhi] that contains minimum
/* take αlo as previous αhi and find a new, larger αhi until the minimum has
passed by; if necessary exchange αlo, αhi such that f (αlo) < f (αhi) */
αhi := 0
REPEAT
αlo := αhi
determine αhi ∈ [αlo, αmax]
IF αhi violates (58) OR f (αhi) ≥ f (αlo)/* minimum has been significantly passed, because the Armijo condition
is violated or because the new objective value is larger than that at the
lower border of the bracketing interval (Fig. 8) */
GOTO sectioning
IF αhi satisfies (60)/* strong curvature condition satisfied and objective value smaller than
at αlo, i.e., step length found (Fig. 9) */
α? = αhi
STOP
IF ∇f (αhi) ≥ 0/* αhi has lower objective value than αlo, and has passed the minimum
as the objective gradient has switched sign (Fig. 10) */
exchange αhi and αlo
GOTO sectioning
Optimization of Analog Circuits 20
f
ααlo
αhi is somewhere here:
Figure 8.
αhi is somewhere here:
f
ααlo
Figure 9.
f
ααlo
αhi is somewhere here:
Figure 10.
Optimization of Analog Circuits 21
3.1.4 Sectioning
sectioning starts with interval [αlo, αhi] with the following properties
• minimum is included
• f (αlo) < f (αhi)
• αlo><αhi
• ∇f (αlo) · (αhi − αlo) < 0
REPEAT
determine αt ∈ [αlo, αhi]
IF αt violates (58) OR f (αt) ≥ f (αlo)[/* new αhi found, αlo remains (Fig. 11) */
αhi := αt
ELSE /* i.e., f (αt) ≤ f (αlo), new αlo found */
IF αt satisfies (60)/* step length found */
α? = αt
STOP
IF ∇f (αt) · (αhi − αlo) ≥ 0/* αt is on the same side from the minimum as αhi, then αlo must
become the new αhi (Fig. 12) */
αhi := αlo
/* Fig. 13 */
αlo := αt
Optimization of Analog Circuits 22
new interval
f
ααtαhi αlo
f
ααtαlo αhi
α′lo α′hinew interval
Figure 11.
new intervalα′lo α′hi
f
ααloαtαhi
f
ααt αhi
α′hi
αlo
α′lonew interval
Figure 12.
α′hi
new intervalα′lo
new intervalα′lo α′hi
f
ααhiαtαlo
f
αhi αloαtα
Figure 13.
Optimization of Analog Circuits 23
3.1.5 Golden Sectioning
golden section ratio:
1− ττ
=τ
1⇐⇒ τ 2 + τ − 1 = 0 : τ =
−1 +√
5
2≈ 0.618 (62)
/* left inner point */
α1 := L+ (1− τ) · |R− L|/* right inner point */
α2 := L+ τ · |R− L|REPEAT
IF f (α1) < f (α2)/* α? ∈ [L, α2], α1 is new right inner point */
R := α2
α2 := α1
α1 := L+ (1− τ) · |R− L|ELSE
/* α? ∈ [α1, R], α2 is new left inner point */
L := α1
α1 := α2
α2 := L+ τ · |R− L|UNTIL ∆ < ∆min
Optimization of Analog Circuits 24
α
f
L Rα1 α2∆
(1 − τ) · ∆ τ · ∆
τ · ∆ (1 − τ) · ∆
τ ·∆(2) (1− τ) ·∆(2)
(1− τ) ·∆(2) τ ·∆(2)
α(2)2L(2) α
(2)1 R(2)
∆(2)
Figure 14.
Accuracy after (κ)-th step
interval size: ∆(κ) = τκ ·∆(0) (63)
required number of iterations κ to reduce interval from ∆(0) to ∆(κ):
κ =
⌈− 1
log τ· log
∆(0)
∆(κ)
⌉≈⌈
4.78 · log∆(0)
∆(κ)
⌉(64)
Optimization of Analog Circuits 25
3.1.6 Line search by quadratic model
m, f
αf (αt)
f(E(κ)
)f(R(κ)
)f(L(κ)
) m(α)
αtE(κ)L(κ) R(κ)
Figure 15.
quadratic model by, e.g.,
• 3 sampling points
• first and second-order derivatives
univariate Newton approach
m (α) = f0 + g · α +1
2· h · α2 (65)
first-order condition:
minm (α)→ ∇m (α) = g + h · αt!
= 0→ αt = −g/h (66)
second-order condition:h > 0 (67)
Optimization of Analog Circuits 26
3.1.7 Unimodal function
f is unimodal over interval [L,R]: there exists exactly one value αopt ∈ [L,R] for whichholds:
∀L<α1<α2<R
α2 < αopt → f (α1) > f (α2) (Fig. 16 (a)) (68)
and ∀L<α1<α2<R
α1 > αopt → f (α1) < f (α2) (Fig. 16 (b)) (69)
α
f
α1α2
(a) (b)
L α1α2αopt R
Figure 16.
Optimization of Analog Circuits 27
α
f
αopt
f
xx0
(a) (b)
Figure 17. Univariate unimodal function (a). Univariate monotone function (b).
Remarks
minimization of univariate unimodal function:interval reduction with f (L) > f (αt) < f (R), αt ∈ [L,R]3 sampling points
root finding of univariate monotone function:interval reduction with sign (f (L)) = −sign (f (R)) , [L,R]two sampling points
∆κ =1
2
(κ)
·∆(0) (70)
κ =
⌈− 1
log 2· log
∆(0)
∆(κ)
⌉≈⌈
3.32 · log∆(0)
∆(κ)
⌉(71)
Optimization of Analog Circuits 28
3.2 Multivariate unconstrained optimization without derivatives
3.2.1 Coordinate search
optimize one parameter at a time, alternating coordinates
ej = [0 . . . 0 1 0 . . . 0]T
↑ j-th position
REPEATFOR j = 1, · · · , nx αt := arg min
αf (x + α · ej)
x := x + αt · ejUNTIL convergence or maximum allowed iterations reached
good for ”losely coupled”,”uncorrelated” parameters, e.g., ∇2f (x) diagonal matrix:
x1
f = const
x?
x2
x
1-st run of REPEAT loop
Figure 18. Coordinate search with exact line search for quadratic objective function withdiagonal Hessian matrix.
Optimization of Analog Circuits 29
(a)
(b)
x?
x?
2nd run of REPEAT loop
1st run of REPEAT loop
Figure 19. Coordinate search with exact line search for quadratic objective function (a).Steepest-descent method for quadratic objective function (b).
Optimization of Analog Circuits 30
3.2.2 Polytope method (Nelder-Mead simplex method)
x2x3
x1
f = const
f
x1
x2
Figure 20.
nx-simplex is the convex hull of a set of parameter vector vertices xi, i = 1, · · · , nx + 1the vertices are ordered such that fi = f (xi), f1 ≤ f2 ≤ · · · ≤ fnx+1
Cases of an iteration step
reflection
fr < f1 f1 ≤ fr < fnx fnx ≤ fr < fnx+1 fnx+1 ≤ fr
expansion
insert xr
delete xn+1
reorder
outer contraction inner contraction
fe < fr? fc ≤ fr? fcc < fnx+1?
yes no yes no no yes
xn+1 = xe xn+1 = xr xn+1 = xc reduction xn+1 = xcc
x1,v2, . . . ,vnx+1
Optimization of Analog Circuits 31
Reflection
x2
x1
x3x0
xr
Figure 21.
x0 =1
nx·nx∑i=1
xi (72)
xr = x0 + ρ · (x0 − xnx+1) (73)
ρ: reflection coefficient ρ > 0 (default: ρ = 1)
Expansion
x2
x1
x3x0
xrxe
Figure 22.
xe = x0 + χ · (xr − x0) (74)
χ: expansion coefficient, χ > 1, χ > ρ (default: χ = 2)
Optimization of Analog Circuits 32
Outer contraction
x2
x1
x3x0
xrxc
Figure 23.
xc = x0 + γ · (xr − x0) (75)
γ: contraction coefficient, 0 < γ < 1 (default: γ = 12)
Inner contraction
x2
x1
x3
xccx0
Figure 24.
xcc = x0 − γ · (x0 − xnx+1) (76)
Reduction (shrink)
x3
x2
v3
x1
v2
Figure 25.
vi = x1 + σ · (xi − x1) , i = 2, · · · , nx + 1 (77)
σ: reduction coefficient, 0 < σ < 1 (default: σ = 12)
Optimization of Analog Circuits 33
3.3 Multivariate unconstrained optimization with derivatives
3.3.1 Steepest descent
(sec. 1.8.1, page 5)search direction: negative of the gradient at the current point in the parameter space (i.e.,linear model of the objective function)
3.3.2 Newton approach
quadratic model of objective function
f(x(κ) + r
)≈ m
(x(κ) + r
)= f (κ) + g(κ)T · r +
1
2rT ·H(κ) · r (78)
g(κ) = ∇f(x(κ)
)H(κ) = ∇2f
(x(κ)
) (79)
minimize m (r)
First-order optimality condition
∇m (r) = 0 → H(κ) · r = −g(κ) → r(κ) (80)
r(κ): search direction for line seachr(κ) is obtained through the solution of a system of linear equations
Second-order optimality condition
∇2m (r) positive (semi)definiteif not:
• steepest descent: r(κ) = −g(κ)
• switch signs of negative eigenvalues
• Levenberg-Marquardt approach:(H(κ) + λ · I
)· r = −g(κ)
(λ→∞: steepest-descent approach)
Optimization of Analog Circuits 34
3.3.3 Quasi-Newton approach
second derivative not available → successive approximation B ≈ H from gradients in thecourse of the optimization process, e.g., B(0) = I
∇f(x(κ+1)
)= ∇f
(x(κ) + α(κ) · r(κ)
)= g(κ+1) (81)
approximate g(κ+1) from quadratic model (78)
g(κ+1) ≈ g(κ) + H(κ) · α(κ) · r(κ) (82)
α(κ) · r(κ) = x(κ+1) − x(κ) (83)
Quasi-Newton condition for approximation of B:
g(κ+1) − g(κ)︸ ︷︷ ︸y(κ)
= B(κ+1) · (x(κ+1) − x(κ)︸ ︷︷ ︸s(κ)
) (84)
Optimization of Analog Circuits 35
3.3.3.1 Symmetric rank-1 update (SR1)
approach:B(κ+1) = B(κ) + u · vT (85)
substituting (85) in (84)(B(κ) + u · vT
)· s(κ) = y(κ)
u · vT · s(κ) = y(κ) −B(κ) · s(κ)
u =y(κ) −B(κ) · s(κ)
vT · s(κ)(86)
substituting (86) in (85)
B(κ+1) = B(κ) +
(y(κ) −B(κ) · s(κ)
)· vT
vT · s(κ)(87)
because of the symmetry in B:
B(κ+1) = B(κ) +
(y(κ) −B(κ) · s(κ)
)·(y(κ) −B(κ) · s(κ)
)T(y(κ) −B(κ) · s(κ))
T · s(κ)(88)
if y(κ) −B(κ) · s(κ) = 0, then B(κ+1) = B(κ)
SR1 does not guarantee positive definiteness of Balternatives: Davidon-Fletcher-Powell (DFP), Broyden-Fletcher-Goldfarb-Shannon (BFGS)
approximation of B−1 instead of B or of a decomposition of the system matrix of linearequation system (80) to compute the search direction r(κ)
s(κ), κ = 1, · · · , nx linear independent and f quadratic with Hessian H: Quasi-Newtonwith SR1 terminates after not more than nx + 1 steps with B(κ+1) = H
Optimization of Analog Circuits 36
3.3.4 Levenberg-Marquardt approach (Newton direction plus trust region)
minm (r) s.t. ‖r‖2︸︷︷︸rT ·r
≤ ∆ (89)
constraint: trust region of model m (r), e.g., concerning positive definitiveness of B(κ),H(κ)
corresponding Lagrange function:
L (r, λ) = f (κ) + g(κ)T · r +1
2rT ·H(κ) · r− λ ·
(∆− rT · r
)(90)
stationary point ∇L (r) = 0 :
(H(κ) + 2 · λ · I
)· r = −g(κ) (91)
refer to the Newton approach, H(κ) indefinite
Optimization of Analog Circuits 37
3.3.5 Least-squares (plus trust-region) approach
minr‖f (r)− ftarget‖2 s.t. ‖r‖2 ≤ ∆ (92)
linear modelf (r) ≈ f
(κ)(r) = f
(x(κ)
)︸ ︷︷ ︸f(κ)0
+∇f(x(κ)T
)︸ ︷︷ ︸
S(κ)
·r (93)
∆: trust region of linear model
least-squares difference of linearized objective value to target value (index (κ) left out):
‖ε(r)‖2 = ‖f (r)− ftarget‖2 (94)
= εT (r) · ε(r) = (f0 − ftarget︸ ︷︷ ︸ε0
+S · r)T · (ε0 + S · r) (95)
= εT0 · ε0 + 2 · rT · ST · ε0 + rT · ST · S · r (96)
ST · S: part of second derivative of ‖ε(r)‖2
minr‖ε(r)‖2 s.t. ‖r‖2 ≤ ∆ (97)
corresponding Lagrange function:
L (r, λ) = εT0 · ε0 + 2 · rT · ST · ε0 + rT · ST · S · r− λ ·(∆− rT · r
)(98)
stationary point ∇L (r) = 0:
2 · ST · ε0 + 2 · ST · S · r + 2 · λ · r = 0 (99)
(ST · S + λ · I
)· r = −ST · ε0 (100)
λ = 0: Gauss-Newton method
(100) yields r(κ) for a given λ
determine the Pareto front ‖ε(κ)(r(κ))‖ vs. ‖r(κ)‖ for 0 ≤ λ <∞ (Figs. 26, 27)
select r(κ) with small step length and small error (iterative process): x(κ+1) = x(κ) + r(κ)
Optimization of Analog Circuits 38
λ→∞
λ
λ = 0
small step length
‖ε(κ)(r(κ))‖
‖r(κ)‖ = ∆
small
error
‖ε(r(κ))‖
Figure 26. Characteristic boundary curve, typical sharp bend for ill-conditioned problems.Small step length wanted, e.g., due to limited trust region of linear model.
x2 − x(κ)2
x1 − x(κ)1λ→∞ ‖r‖ = ∆
r(κ) (∆)
‖ε(κ) (r) ‖ = const
λ = 0
Figure 27. Characteristic boundary curve of a two-dimensional parameter space.
Optimization of Analog Circuits 39
3.3.6 Conjugate-gradient (CG) approach
3.3.6.1 Optimization problem with a quadratic objective function
fq (x) = bT · x +1
2· xT ·H · x (101)
min fq (x) (102)
nx large, H sparse
first-order optimality conditions leads to a linear equation system for the stationarypoint x?
g (x) = ∇fq (x) = 0(101)→ g (x) = b + H · x = 0 (103)
H · x = −b → x? (104)
x? is computed by solving linear equation system, not by matrix inversion
second-order condition:
∇2fq (x?) = H positive definite (105)
Optimization of Analog Circuits 40
3.3.6.2 Eigenvalue decomposition of a symmetric positive definite matrix H
H = U ·D12 ·D
12 ·UT (106)
D12 : diagonal matrix of square roots of eigenvalues
U: columns are orthogonal eigenvectors, i.e.
U−1 = UT , U ·UT = I (107)
substituting (106) in (101)
fq (x) = bT · x +1
2·(D
12 ·UT · x
)T·(D
12 ·UT · x
)(108)
coordinate transformation:
x′ = D12 ·UT · x, x = U ·D−
12 · x′ (109)
substituting (109) in (108)
f ′q (x) = b′T · x′ + 1
2· x′T · x′ (110)
b′ = D−12 ·UT · b (111)
Hessian of transformed quadratic function (110) is unity matrix H′ = I
gradient of f ′q
g′ = ∇f ′q (x′)(110)
= b′ + x′ (112)
transformation of gradient from (103), (106), (109), and (111)
∇f (x) = b + H · x = U ·D12 · b′ + U ·D
12 · x′ (113)
from (112)
g = ∇f (x) = U ·D12 · ∇f ′q (x′) = U ·D
12 · g′ (114)
min fq (x) ≡ min f ′q (x′) (115)
solution in transformed coordinates (g′!
= 0)
x′?
= −b′ (116)
Optimization of Analog Circuits 41
3.3.6.3 Conjugate directions
∀i 6=j
r(i)T ·H · r(j) = 0 (”H orthogonal”) (117)
from (106)
⇐⇒ ∀i 6=j
(D
12 ·UT · r(i)
)T·(D
12 ·UT · r(j)
)= 0
⇐⇒ ∀i 6=j
r(i)′T
· r(j)′ = 0 (118)
transformed conjugate directions are orthogonal
r(i)′T
= D12 ·UT · r(i) (119)
3.3.6.4 Step length
substitute x = x(κ) + α · r(κ) in (101)
fq (α) = bT ·(x(κ) + α · r(κ)
)+
1
2·(x(κ) + α · r(κ)
)T ·H · (x(κ) + α · r(κ))
(120)
∇fq (α) = bT · r(κ) +(x(κ) + α · r(κ)
)T ·H · r(κ) (121)
(103)= ∇fq
(x(κ)
)T︸ ︷︷ ︸g(κ)T
·r(κ) + α · r(κ)T ·H · r(κ) (122)
∇fq (α) = 0:
α(κ) =−g(κ)T · r(κ)
r(κ)T ·H · r(κ)(123)
3.3.6.5 New iterative solution
x(κ+1) = x(κ) + α(κ)︸︷︷︸(123)
· r(κ)︸︷︷︸(117)
= x(0) +κ∑i=1
α(i) · r(i) (124)
r(0) = − g(0)︸︷︷︸(103)
(125)
new search direction: combination of actual gradient and previous search direction
r(κ+1) = −g(κ+1) + β(κ+1) · r(κ) (126)
substituting (126) into (117)
− g(κ+1)T ·H · r(κ) + β(κ+1) · r(κ)T ·H · r(κ) = 0 (127)
β(κ+1) =g(κ+1)T ·H · r(κ)
r(κ)T ·H · r(κ)(128)
Optimization of Analog Circuits 42
x1
x2
x(0)
r(0) = −g(0)
x?
r(1)
with r(0)T ·H · r(1) = 0
−g(1)
x(1)
Figure 28.
x′2
x′1
r(0)′
r(1)′
with r(0)′T · r(1)′ = 0
x?
x(0)′
x(1)′
−g(0)′
Figure 29.
Optimization of Analog Circuits 43
3.3.6.6 Some properties
g(κ+1) = b + H · x(κ+1) = b + H · x(κ) + α(κ) ·H · r(κ)
= g(κ) + α(κ) ·H · r(κ) (129)
g(κ+1)T · r(κ) (129)= g(κ)T · r(κ) + α(κ) · r(κ)T ·H · r(κ)
(123)= g(κ)T · r(κ) − g(κ)T · r(κ) · r
(κ)T ·H · r(κ)
r(κ)T ·H · r(κ)
= 0 (130)
i.e., the actual gradient is orthogonal to previous search direction
− g(κ)T · r(κ) (126)= −g(κ)T ·
(−g(κ) + β(κ) · r(κ−1)
)= g(κ)T · g(κ) − β(κ) · g(κ)T · r(κ−1)︸ ︷︷ ︸
0 by (130)
= g(κ)T · g(κ) (131)
i.e., descent direction going along conjugate direction
g(κ+1)T · g(κ) (129)= g(κ)T · g(κ) + α(κ) · r(κ)T ·H · g(κ)
(131),(126)= −g(κ)T · r(κ) + α(κ) · r(κ)T ·H ·
(−r(κ) + β(κ) · r(κ−1)
)(117)= −g(κ)T · r(κ) − α(κ) · r(κ)T ·H · r(κ)
(123)= −g(κ)T · r(κ) − g(κ)T · r(κ)
= 0 (132)
i.e., actual gradient is orthogonal to previous gradient
substituting (126) into (130)
g(κ+1)T ·(−g(κ) + β(κ) · r(κ−1)
)= 0
(132)⇐⇒ g(κ+1)T · r(κ−1) = 0 (133)
i.e., actual gradient is orthogonal to all previous search directions
substituting (129) into (132)
g(κ+1)T ·(g(κ−1) + α(κ−1) ·H · r(κ−1)
)= 0
(126)⇐⇒ g(κ+1)T · g(κ−1) + α(κ−1) ·(−r(κ+1) + β(κ+1) · r(κ)
)·H · r(κ−1) = 0
(117)⇐⇒ g(κ+1)T · g(κ−1) = 0 (134)
i.e., actual gradient is orthogonal to all previous gradients
Optimization of Analog Circuits 44
3.3.6.7 Simplified computation of CG step length
substituting (131) into (123)
α(κ) =g(κ)T · g(κ)
r(κ)T ·H · r(κ)(135)
3.3.6.8 Simplified computation of CG search direction
substituting H · r(κ) from (129) into (128)
β(κ+1) =g(κ+1)T · 1
α(κ) ·(g(κ+1) − g(κ)
)r(κ)T ·H · r(κ)
(136)
from (135) and (132)
β(κ+1) =g(κ+1)T · g(κ+1)
g(κ)T · g(κ)(137)
3.3.6.9 CG algorithm
solves optimization problem (102) or linear equation system (103) in at most nx steps
g(0) = b + H · x(0)
r(0) = −g(0)
κ = 0
while r(k) 6= 0
step length: α(κ) =g(κ)T ·g(κ)
r(κ)T ·H·r(κ)⇒ line-search
in nonlinear programming
new point: x(κ+1) = x(κ) + α(κ) · r(κ)
new gradient: g(κ+1) = g(κ) + α(κ) ·H · r(κ) ⇒ gradient computation
in nonlinear programming
Fletcher-Reeves: β(κ+1) =g(κ+1)T ·g(κ+1)
g(κ)T ·g(κ)
new search-direction: r(κ+1) = −g(κ+1) + β(κ+1) · r(κ)
κ := κ+ 1
end while
Polak-Ribiere:
β(κ+1) =g(κ+1)T ·
(g(κ+1) − g(κ)
)g(κ)T · g(κ)
(138)
Optimization of Analog Circuits 45
Optimization of Analog Circuits 46
4 Constrained optimization – problem formulations
4.1 Quadratic Programming (QP)
4.1.1 QP – linear equality constraints
min fq (x) s.t. A · x = c
fq (x) = bT · x + 12xT ·H · x
H: symmetric, positive definite
A<nc×nx>, nc ≤ nx, rank (A) = nc
(139)
A · x = c (140)
underdetermined linear equation system with full rank (Appendix E).
Optimization of Analog Circuits 47
4.1.1.1 Transformation of (139) into an unconstrained optimization problemby coordinate transformation
approach:
x = [ Y<nx×nc> Z]<nx×nx>︸ ︷︷ ︸non-singular
·
c<nc>
y
<nx>
= Y · c︸ ︷︷ ︸feasible point
of (140)
+ Z · y︸︷︷︸degrees offreedom in
(140)
(141)
A · x = A ·Y︸ ︷︷ ︸(143)
·c + A · Z︸ ︷︷ ︸(144)
·y = c (142)
A ·Y = I<nc×nc> (143)
A · Z = 0<nc×nx−nc> (144)
φ (y) = fq (x (y)) (145)
= bT · (Y · c + Z · y) +1
2(Y · c + Z · y)T ·H · (Y · c + Z · y) (146)
=
(b +
1
2H ·Y · c
)T·Y · c (147)
+ (b + H ·Y · c)T · Z · y
+1
2yT · ZT ·H · Z · y
min fq (x) s.t. A · x = c ≡ minφ (y) (148)
y?: solution of the unconstrained optimization problem (148), Z · y? lies in the kernel ofA
Stationary point of optimization problem (148): ∇φ (y) = 0(ZT ·H · Z
)︸ ︷︷ ︸reduced Hessian
· y = −ZT · (b + H ·Y · c)︸ ︷︷ ︸reduced gradient
→ y? → x? = Y · c + Z · y? (149)
Computation of Y, Z by QR-decomposition
from (143) and (483): RT ·QT ·Y = I → Y = Q ·R−T (150)
from (143) and (483): RT ·QT · Z = 0 → Z = Q⊥ (151)
Optimization of Analog Circuits 48
x1
x3
x2
Z · yz1Y ·c
A · x = c ”feasible space”
z2
Figure 30. Y · c: ”minimum length solution”, i.e., orthogonal to plane A · x = c.zi, i = 1, · · · , nx − nc: orthonormal basis of kernel (null-space) of A, i.e. A · (Z · y) = 0.
4.1.1.3 Computation of Lagrange factor λ?
required for QP with inequality constraintsLagrange function of problem (139)
L (x,λ) = fq (x)− λT · (A · x− c) (152)
∇L (x) = 0 : AT · λ = ∇fq (x) (overdetermined)
→ λ?(143)= YT · ∇fq (x?)
(153)
λ? = YT · (b + H · x?) (154)
or, in case of QR-decomposition
AT · λ = b + H · x? (155)
from (483)Q ·R · λ = b + H · x? (156)
R · λ = QT · (b + H · x?) −→backward
substitution
λ? (157)
Optimization of Analog Circuits 49
4.1.1.4 Computation of Y · c in case of a QR-decomposition
A · x = c(483)⇐⇒ RT ·QT · x︸ ︷︷ ︸
u
= c
RT · u = c
forward
substitution
−→ u
Y · c (150)= Q ·R−T · c = Q · u (insert u)
(158)
4.1.1.5 Transformation of (139) into an unconstrained optimization problemby Lagrange multipliers
first-order optimality conditions of (139) according to Lagrange function (152)
∇L (x) = 0 :b + H · x−AT · λ = 0
A · x− c = 0
(159)
⇐⇒
[
H −AT
−A 0
]︸ ︷︷ ︸Lagrange matrix L
·
[x
λ
]=
[−b
−c
]→ x?,λ? (160)
L: symmetric, positive definite iff ZT ·H · Z positive definite
e.g.,
L−1 =
[K −T
−TT U
](161)
K = Z ·(ZT ·H · Z
)−1 · ZT (162)
T = Y −K ·H ·Y (163)
U = YT ·H ·K ·H ·Y −YT ·H ·Y (164)
factorization of L−1 by factorization of ZT ·H · Z and computation of Y · Z
Optimization of Analog Circuits 50
4.1.2 QP - inequality constraints
min fq (x) s.t. aTi · x = ci, i ∈ EaTj · x ≥ cj, j ∈ I
fq (x) = bT · x + 12xT ·H · x
H: symmetric, positive definite
A<nc×nx>, nc ≤ nx, rank (A) = nc
(165)
A =
...
aTi , i ∈ E...
· · · · · · · · ·...
aTj , j ∈ I...
Optimization of Analog Circuits 51
compute initial solution x(0) that satisfies all constraints, e.g., by linear programming
κ := 0
repeat
/* problem formulation at current iteration point x(κ) + δ
for active constraints A(κ) (active set method) */
min fq (δ) s.t. aTi · δ = 0, i ∈ A(κ) → δ(κ)
case I: δ(κ) = 0 /* i.e., x(κ) optimal for current A(κ) */
compute λ(κ) e.g., by (154) or (157)
case Ia: ∀i∈A(κ)∩I
λ(κ)i ≥ 0, /* i.e., first-order optimality conditions satisfied */[
x? = x(κ); stop
case Ib: ∃i∈A(κ)∩I
λ(κ)i < 0,
/* i.e., constraint becomes inactive, further improvement of f possible */q = arg min
i∈A(κ)∩Iλ
(κ)i ,
/* i.e., q is most sensitive constraint to become inactive */
A(κ+1) = A(κ)\ qx(κ+1) = x(κ)
case II: δ(κ) 6= 0, /* i.e., f can be reduced for current A(κ); step-length computation;
if no new constraint becomes active then α(κ) = 1 (case IIa); else find inequality
constraint that becomes inactive first and corresponding α(κ) < 1 (case IIb) */
α(κ) = min(1, mini /∈ A(κ), aTi · δ(κ) < 0︸ ︷︷ ︸
inactive constraints that
approach their lower bound
safety margin bound︷ ︸︸ ︷ci − aTi · x(κ)
aTi · δ(κ)︸ ︷︷ ︸consumed safety margin
at α(κ) = 1
)
q′ = argmin (–”–)
case IIa: α(κ) = 1[A(κ+1) = A(κ)
case IIb: α(κ) < 1[A(κ+1) = A(κ) ∪ (q′)
x(κ+1) = x(κ) + α(κ) · δ(κ)
κ := κ+ 1
Optimization of Analog Circuits 52
4.1.3 Example
x1
x2
(A)
(B)
(C)
δ(4)
δ(2)
x(6)
x(5)
δ(5)
fq = const
x(1) = x(2)
x(4) = x(3)
Figure 31.
(κ) A(κ) case
(1) A,B Ib
(2) B IIa
(3) B Ib
(4) IIb
(5) C IIa
(6) C Ia
Concerning q = arg mini∈A(1)∩I
λ(κ)i for κ = 1:
∇B
∇A
x(1)
∇fq
Figure 32.
∇fq = λA · ∇A+ λB · ∇B → λA < 0, λB = 0
Optimization of Analog Circuits 53
4.2 Sequential Quadratic programming (SQP), Lagrange–Newton
4.2.1 SQP – equality constraints
min f (x) s.t. c (x) = 0
c<nc>, nc ≤ nx(166)
4.2.1.1 Lagrange function of (166)
L (x,λ) = f (x)− λT · c (x) = f (x)−∑i
λi · ci (x) (167)
∇L (x,λ) =
∇L(x)
∇L (λ)
=
[∇f (x)−
∑i λi · ∇ci (x)
−c (x)
]
=
g(x)−AT (x)︷ ︸︸ ︷
[· · · ∇ci (x) · · · ] ·λ−c (x)
(168)
∇2L (x,λ) =
∇2L (x) ∂2L∂x·∂λT
∂2L∂λ·∂xT ∇2L (λ)
=
∇2f (x)−∑
i λi · ∇2ci (x) −AT (x)...
−∇ci (x)T
...
︸ ︷︷ ︸
−A(x)
0
(169)
Optimization of Analog Circuits 54
4.2.1.2 Newton approach to (167) in κ-th iteration step
∇L(x(κ) + δ(κ)︸ ︷︷ ︸x(κ+1)
,λ(κ) + ∆λ(κ)︸ ︷︷ ︸λ(κ+1)
) ≈ ∇L(x(κ),λ(κ)
)︸ ︷︷ ︸
∇L(κ)
+∇2L(x(κ),λ(κ)
)︸ ︷︷ ︸
∇2L(κ)
·
[δ(κ)
∆λ(κ)
]!
= 0
(170)with (168) and (169)
W(κ)︷ ︸︸ ︷∇2f(x(κ))−
∑i
λ(κ)i · ∇2ci
(x(κ)
)−A(κ)T
−A(κ) 0
·[
δ(κ)
∆λ(κ)
]=
[−g(κ) + A(κ)T · λ(κ)
c(κ)
]
(171)
[W(κ) −A(κ)T
−A(κ) 0
]·
[δ(κ)
λ(κ+1)
]=
[−g(κ)
c(κ)
](172)
⇑
L (δ,λ) = g(κ)T · δ +1
2δT ·W(κ) · δ − λT ·
(A(κ) · δ + c(κ)
)(173)
⇑
min g(κ)T · δ +1
2δT ·W(κ) · δ s.t. A(κ) · δ = −c(κ) (174)
QP problem, W(κ) according to (171) including quadratic parts of objective and con-straint, linearized equality constraints from (166)
• Quasi-Newton approach possible
• inequality constraints, A(κ) · δ ≥ −c(κ)I , treated as in QP
4.2.2 Penalty function
transformation into a penalty function of an unconstrained optimization problem withpenalty parameter µ > 0
quadratic:
Pquad (x, µ) = f (x) +1
2µcT (x) · c (x) (175)
logarithmic:
Plog (x, µ) = f (x)− µ ·∑i
log ci (x) (176)
l1 exact:Pl1 (x, µ) = µ · f (x)−
∑i∈E
|ci (x)|+∑i∈I
max (−ci(x), 0) (177)
Optimization of Analog Circuits 55
Optimization of Analog Circuits 56
5 Statistical parameter tolerances
modeling of manufacturing variations through a multivariate continuous distribution func-tion of statistical parameters xs
cumulative distribution function (cdf):
cdf (xs) =
∫ xs,1
−∞· · ·∫ xs,nxs
−∞pdf (t) · dt (178)
dt = dt1 · dt2 · . . . · dtnxs
(discrete: cumulative relative frequencies)
probability density function (pdf):
pdf (xs) =∂nxs
∂xs,1 · · · ∂xs,nxscdf (xs) (179)
(discrete: relative frequencies)
xs,i denotes the random number value of a random variable Xs,i
Optimization of Analog Circuits 57
5.1 Univariate Gaussian distribution (normal distribution)
xs ∼ N(xs,0, σ
2)
(180)
xs,0 : mean value
σ2 : variance
σ : standard deviation
probability density function of the univariate normal distribution:
pdfN(xs, xs,0, σ
2)
=1√
2π · σ· e−
12
(xs−xs,0
σ
)2(181)
cdfN
xs − xs,03σ2σσ0σ−2σ−3σ
1
0.5
pdfN
xs − xs,03σ2σσ0σ−2σ−3σ
Figure 33. Probability density function, pdf, and corresponding cdf of a univariate Gaus-sian distribution. The area of the shaded region under the pdf is the value of the cdf asshown.
xs − xs,0 −3σ −2σ −σ 0 σ 2σ 3σ 4σ
cdf (xs − xs,0) 0.1% 2.2% 15.8% 50% 84.1% 97.7% 99.8% 99.99%
Standard normal distribution table
Optimization of Analog Circuits 58
5.2 Multivariate normal distribution
xs ∼ N (xs,0,C) (182)
xs,0: vector of mean values of statistical parameters xs
C: covariance matrix of statistical parameters xs, symmetric, positive definite
probability density function of the multivariate normal distribution:
pdfN (xs,xs,0,C) =1√
2πnxs ·
√det (C)
· e−12β2(xs,xs,0,C) (183)
β2 (xs,xs,0,C) = (x− xs,0)T ·C−1 · (x− xs,0) (184)
C = Σ ·R ·Σ (185)
Σ =
σ1 0
. . .
0 σnxs
,R =
1 ρ1,2 · · · ρ1,nxs
ρ1,2 1 · · · ρ2,nxs
.... . .
...
ρ1,nxs ρ2,nxs · · · 1
(186)
C =
σ2
1 σ1ρ1,2σ2 · · · σ1ρ1,nxsσnxs
σ1ρ1,2σ2 σ22 · · · σ2ρ2,nxsσnxs
......
. . ....
σ1ρ1,nxsσnxs · · · · · · σ2nxs
(187)
R : correlation matrix of statistical parameters
σk : standard deviation of component xs,k, σk > 0
σ2k : variance of component xs,k
σkρk,lσl : covariance of components xs,k, xs,l
ρk,l : correlation coefficient of components xs,k and xs,l, −1 < ρk,l < 1
ρk,l = 0 : uncorrelated and also independent if jointly normal
|ρk,l| = 1 : strongly correlated components
Optimization of Analog Circuits 59
xs,0,1
xs,0,2xs,0
β2 (xs) = const
xs,1
xs,2
Figure 34. Level sets of a two-dimensional normal pdf
β2 (xs) = constβ2 (xs) = const
β2 (xs) = const
xs,0,2
xs,0,1
xs,0,2
xs,0,1
xs,0,2
x1
xs,0,1
(b)(a) (c)
xs,0xs,0xs,0
xs,2
xs,1
xs,2
xs,1
xs,2
xs,1
Figure 35. Level sets of a two-dimensional normal pdf with general covariance matrix C,(a), with uncorrelated components, R = I, (b), and uncorrelated components of equalspread, C = σ2 · I, (c).
Optimization of Analog Circuits 60
x1
x2
β2 = const = a2
a · 2 · σ1a·2·σ
2
ρ = 0
Figure 36. Level set with β2 = a2 of a two-dimensional normal pdf for different values ofthe correlation coefficient, ρ.
Optimization of Analog Circuits 61
5.3 Transformation of statistical distributions
y ∈ Rny , z ∈ Rnz , ny = nz
z = z (y), y = y (z) such that the mapping from y to z is smooth and bijective (preciselyz = φ (y), y = φ−1 (z))
cdfy (y) =
y∫· · ·∫
−∞
pdfy (y′) · dy′ =z(y)∫· · ·∫
−∞
pdfy (y′ (z′)) ·∣∣∣∣det
(∂y
∂zT
)∣∣∣∣ · dz′=
z∫· · ·∫
−∞
pdfz (z′) · dz′ = cdfz (z) (188)
y∫· · ·∫
−∞
pdfy (y′) · dy′ =z∫· · ·∫
−∞
pdfz (z′) · dz′ (189)
pdfz (z) = pdfy (y (z)) ·∣∣∣∣det
(∂y
∂zT
)∣∣∣∣ (190)
univariate case: pdfz(z) = pdfy (y(z)) ·∣∣∣∣∂y∂z
∣∣∣∣ (191)
In the simple univariate case the function pdfz has a domain that is a scaled version ofthe domain of pdfy.
∣∣(∂y∂z
)∣∣ determines the scaling factor. In high-order cases, the random
variable space is scaled and rotated with the Jacobian matrix ∂y∂zT
determining the scalingand rotation.
pdfz (z)pdfy (y)
y zy1 y2 y3 z1 z2 z3
Figure 37. Univariate pdf with random number y is transformed to a new pdf of newrandom number z = z (y). According to (188) the shaded areas as well as the hatchedareas under the curve are equal.
Optimization of Analog Circuits 62
5.3.1 Example
Given: probability density function pdfU(z), here a uniform distribution:
pdfU(z) =
1 for 0 < z < 1
0 otherwise(192)
probability density function pdfy (y), y ∈ R
random number z
Find: random number y
z y
0 −∞1 +∞
from (188) ∫ z
0
pdfz (z′)︸ ︷︷ ︸1 for 0 ≤ z ≤ 1
from (192)
·dz′ =∫ y
−∞pdfy (y′) · dy′ (193)
hence
z =
∫ y
−∞pdfy (y′) · dy′ = cdfy(y) (194)
y = cdf−1y (z) (195)
This example details a method to generate sample values of a random variable y with anarbitrary pdf pdfy if sample values are available from a uniform distribution pdfz:
• insert pdfy(y) in (194)
• compute cdfy by integration
• compute inverse cdf−1y
• create uniform random number, z, insert into (195) to get sample value, y, accordingto pdfy(y)
Optimization of Analog Circuits 63
Optimization of Analog Circuits 64
6 Expectation values and their estimators
6.1 Expectation values
6.1.1 Definitions
h (z): function of a random number z with probability density function pdf (z)
Expectation value
Epdf(z)
h (z) = E h (z) =
+∞∫· · ·∫
−∞
h (z) · pdf (z) · dz (196)
Moment of order κm(κ) = E zκ (197)
Mean value (first-order moment)
m(1) = m = E z (198)
m = E z =
E z1
...
E znz
(199)
Central moment of order κ
c(κ) = E (z −m)κc(1) = 0
(200)
Variance (second-order central moment)
c(2) = E
(z −m)2
= σ2 = V z (201)
σ: standard deviation
Covariancecov zi, zj = E (zi −mi) (zj −mj) (202)
Variance/covariance matrix
C = V z = E
(z−m) · (z−m)T
=
V z1 cov z1, z2 · · · cov z1, znz
cov z2, z1 V z2 · · · cov z2, znz...
. . ....
cov znz , z1 cov znz , z2 · · · V znz
(203)
V h (z) = E
(h (z)− E h (z)) · (h (z)− E h (z))T
(204)
Optimization of Analog Circuits 65
6.1.2 Linear transformation of expectation value
E A · h (z) + b = A · E h (z)+ b (205)
special cases:
E c = c, c is a constant
E c · h (z) = c · E h (z)E h1 (z) + h2 (z) = E h1 (z)+ E h2 (z)
6.1.3 Linear transformation of variance
V A · h (z) + b = A · V h (z) ·AT (206)
special cases:
VaT · h (z) + b
= aT · V h (z) · a
V a · h (z) + b = a2 · V h(z)
Gaussian error propagation:
VaT · z + b
= aT ·C · a =
∑i,j
aiajσiρi,jσj =∀i6=j
ρi,j=0
∑i
a2iσ
2i
6.1.4 Translation law of variances
V h (z) = E
(h (z)− a) · (h (z)− a)T− (E h (z) − a) · (E h (z) − a)T
(207)special cases:
V h (z) = E
(h (z)− a)2− (E h (z) − a)2
V h (z) = Eh (z) · hT (z)
− E h (z) · E
hT (z)
V h (z) = E
(h2 (z)
)− (E h (z))2
6.1.5 Normalizing a random variable
z′ =z − E z√
V z=z −mz
σz(208)
E z′ =E z −mz
σz= 0 (209)
V z′ = E
(z′ − 0)2
=E
(z −mz)2
σ2z
= 1 (210)
Optimization of Analog Circuits 66
6.1.6 Linear transformation of a normal distribution
x ∼ N (x0,C)
f (x) = fa + gT · (x− xa) (211)
mean value µf of f :
µf = Ef
= Efa + gT · (x− xa)
= E fa+ gT · (E x − E xa)
µf = fa + gT · (x0 − xa) (212)
variance σ2f
of f :
σ2f
= E(f − µf
)2
= E(
gT (x− x0))2
= E
gT · (x− x0) · (x− x0)T · g
= gT · E
(x− x0) · (x− x0)T· g
σ2f
= gT ·C · g (213)
Optimization of Analog Circuits 67
6.2 Estimation of expectation values
6.2.1 Expectation value estimator
E h (x) = mh =1
nMC
·nMC∑µ=1
h(x(µ)
)(214)
x(µ) ∼ D (pdf (x)) , µ = 1, . . . , nMC
sample of the population with nMC sample elements, i.e., sample size nMC
sample elements x(µ), µ = 1, · · · , nMC , that are independently and identically distributed,i.e,
Eh(x(µ)
)= E h (x) = mh (215)
Vh(x(µ)
)= V h (x) , cov
h(x(µ)
),h(x(ν))
= 0, µ 6= ν (216)
φ (x) = φ(x(1), . . . ,x(nMC)
): estimator function of φ (x) (217)
6.2.2 Variance estimator
V h (x) =1
nMC − 1
nMC∑µ=1
(h(x(µ)
)− mh
)·(h(x(µ)
)− mh
)T(218)
x(µ) ∼ D (pdf (x)) , µ = 1, . . . , nMC
ˆV h (x) =
1
nMC
nMC∑µ=1
(h(x(µ)
)−mh
)·(h(x(µ)
)−mh
)T(219)
estimator bias:bφ = E
φ (x)
− φ (x) (220)
unbiased estimator:Eφ (x)
= φ (x) ⇐⇒ bφ = 0 (221)
consistent estimator:lim
nMC→∞P∥∥∥φ− φ
∥∥∥ < |ε| = 1 (222)
strongly consistent:ε→ 0 (223)
variance of an estimator (quality):
Qφ = E
(φ− φ
)·(φ− φ
)T= V
φ
+ bφ · bTφ
(224)
bφ = 0 : Qφ = Vφ
(225)
Optimization of Analog Circuits 68
6.2.3 Variance of the expectation value estimator
Qmh = V mh = VE h (x)
= V
1
nMC
·nMC∑µ=1
h(x(µ)
)
Qmh
(206)=
1
n2MC
· V
nMC∑µ=1
I〈nh,nh〉 · h(µ)
h(µ) = h(x(µ)
), I〈nh,nh〉 identity matrix of size nh of h(µ)
Qmh =1
n2MC
· V
[I〈nh,nh〉 I〈nh,nh〉 . . . I〈nh,nh〉
]·
h(1)
h(2)
...
h(nMC)
Qmh
(206)=
1
n2MC
·[I〈nh,nh〉 I〈nh,nh〉 . . . I〈nh,nh〉
]· V
h(1)
h(2)
...
h(nMC)
·
I〈nh,nh〉
I〈nh,nh〉...
I〈nh,nh〉
Qmh
(216)=
1
n2MC
·[I〈nh,nh〉 I〈nh,nh〉 . . . I〈nh,nh〉
]·
V h 0
. . .
0 V h
·
I〈nh,nh〉
I〈nh,nh〉...
I〈nh,nh〉
=
1
n2MC
· nMC · V h
Qmh = V mh =1
nMC
· V h (226)
replace Qmh by Qmh , V h by V h, (206) by (229) to obtain the variance estimator ofthe expected value estimator
Qmh = V mh =1
nMC
· V h (227)
standard deviation of the mean estimator decreases with 1/√nMC , e.g., 100 times more
sample elements for 10 times smaller standard deviation in the expectation value estimator
Optimization of Analog Circuits 69
6.2.4 Linear transformation of estimated expectation value
E A · h (z) + b = A · E h (z)+ b (228)
6.2.5 Linear transformation of estimated variance
V A · h (z) + b = A · V h (z) ·AT (229)
6.2.6 Translation law of estimated variance
V h (z) =nMC
nMC − 1
[Eh (z) · hT (z)
− E h (z) · E
hT (z)
](230)
Optimization of Analog Circuits 70
7 Worst-case analysis
7.1 Task
indexes d, s, r for parameter types xd,xs,xr left outindex i for performance feature fi left out
Given: tolerance region T of parameters
Find: worst-case performance value, fW , that the circuit takes over T and correspondingworst-case parameter vectors, xW
optimization specification “good” “bad” worst-case performance
max f lower bound: f ≥ fL f ↑ f ↓ fWL = f (xWL)
min f upper bound: f ≤ fU f ↓ f ↑ fWU = f (xWU)
7.2 Typical tolerance regions
box: TB = x |xL ≤ x ≤ xU
ellipsoid: TE =
x∣∣∣ β2 (x) = (x− x0)T ·C−1 · (x− x0) ≤ β2
W
,
C is symmetric, positive definite
x2
x1
x0
β = βW
TETB
x1xL,1 xU,1
xL,2
xU,2
x2
Figure 38. Tolerance box, tolerance ellipsoid.
Optimization of Analog Circuits 71
7.3 Classical worst-case analysis
indexes d, s, r for parameter types xd,xs,xr left outindex i for performance feature fi left out
7.3.1 Task
Given: hyper-box tolerance region
TB = x |xL ≤ x ≤ xU (231)
linear performance modelf (x) = fa + gT · (x− xa) (232)
Find: worst-case parameter vectors xWL/U and corresponding worst-case performance
values fWL/U = fWL/U
(xWL/U
)
f = fag
xWU
f = fWU = f (xWU)
xL,2
xU,2
x2
xL,1 xU,1
f = fWL = f (xWL)
x1
xa
xWL
TB
Figure 39. Classical worst-case analysis with tolerance box and linear performance model.
Optimization of Analog Circuits 72
7.3.2 Linear performance model
sensitivity analysis:
fa = f (xa) (233)
g = ∇f (xa) (234)
forward finite-difference approximation:
fa = f (xa) (235)
∇f (xa,i) ≈ gi =f (xa + ∆xi · ei)− f (xa)
∆xi(236)
ei = [0 . . . 0 1 0 . . . 0]T (237)
↑ i-th position
f, f
a
b
x
g = ba
f
(a)
(b)f, f
x
f
f
f
f(xa)
xa
f(xa + ∆x)
f(xa)
xa xa + ∆x
g = f(xa+∆x)−f(xa)∆x
Figure 40. Linear performance model based on gradient (a), based on forward finite-difference approximation of gradient (b).
Optimization of Analog Circuits 73
7.3.3 Peformance type f ↑ ”good”, specification type f > fL
min f (x)︸ ︷︷ ︸mingT ·x
s.t.
x ≥ xL
x ≤ xU→ xWL, fWL = f (xWL) (238)
specific linear programming problem with analytical solution
corresponding Lagrange function:
L (x,λL,λU) = gT · x − λTL · (x− xL) − λTU · (xU − x) (239)
first-order optimality conditions:
∇L (x) = 0 : g − λ?L + λ?U = 0 (240)
λ?L/U ≥ 0
xU − xWL ≥ 0, xWL − xL ≥ 0
λ?L,j · (xWL,j − xL,j) = 0, j = 1, · · · , nxλ?U,j · (xU,j − xWL,j) = 0, j = 1, · · · , nx
λ?T
L · (xWL − xL) = 0
λ?T
U · (xU − xWL) = 0(241)
∀i ai · bi = 0 ⇒ aT · b =∑i
ai · bi = 0
aT · b = 0, ai ≥ 0, bi ≥ 0⇒ ∀iai · bi = 0
second-order optimality condition holds because ∇2L (x) = 0
either constraint xL,j or constraint xU,j active, never both, therefore from (240) and (241):
either: gj = λ?L,j > 0 (242)
or: gj = −λ?U,j < 0 (243)
component of the worst-case parameter vector xWL:
xWL,j =
xL,j, gj > 0
xU,j, gj < 0
undefined, gj = 0
(244)
worst-case performance value:
fWL = fa + gT · (xWL − xa) = fa +∑j
gj · (xWL,j − xa,j) (245)
Optimization of Analog Circuits 74
7.3.4 Performance type f ↓ ”good”, specification type f < fU
min−f (x)︸ ︷︷ ︸min−gT ·x
s.t.
x ≥ xL
x ≤ xU→ xWU , fWU = f (xWU) (246)
component of the worst-case parameter vector xWU :
xWU,j =
xL,j, gj < 0
xU,j, gj > 0
undefined, gj = 0
(247)
worst-case performance value:
fWU = fa + gT · (xWU − xa) = fa +∑j
gj · (xWU,j − xa,j) (248)
Optimization of Analog Circuits 75
7.4 Realistic worst-case analysis
indexes d, s, r for parameter types xd,xs,xr left outindex i for performance feature fi left out
7.4.1 Task
Given: ellipsoid tolerance region
TE =
x∣∣∣(x− x0)T ·C−1 · (x− x0) ≤ β2
W
(249)
linear performance model
f (x) = fa + gT · (x− xa) (250)
= f0 + gT · (x− x0) with f0 = fa + gT · (x0 − xa) (251)
Find: worst-case parameter vectors xWL/U and corresponding worst-case performancevalues fWL/U = f
(xWL/U
)
x0,1
xWU
f = fWU = f (xWU)
f = fa + gT · (x0 − xa)
f = fWL = f (xWL)
x0,2
TE
x1
x2
g
x0
xWL
Figure 41. Realistic worst-case analysis with tolerance ellipsoid and linear performancemodel.
Optimization of Analog Circuits 76
7.4.2 Peformance type f ↑ ”good”, specification type f ≥ fL:
min f (x)︸ ︷︷ ︸min gT ·x
s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2W → xWL, fWL = f (xWL) (252)
specific nonlinear programming problem (linear objective function, quadratic constraintfunction) with analytical solution
corresponding Lagrange function:
L (x, λ) = gT · x− λ(β2W − (x− x0)T ·C−1 · (x− x0)
)(253)
first-order optimality conditions:
∇L = 0 : g + 2λWL ·C−1 · (xWL − x0) = 0 (254)
due to linear function f , the solution is on the border of TE, i.e., the constraint is active:
(xWL − x0)T ·C−1 · (xWL − x0) = β2W (255)
λWL > 0 (256)
second-order optimality condition holds because: ∇2L (x) = 2λWL ·C−1, C−1 is positivedefinite, and because of (256)
(254) gives:
xWL − x0 = − 1
2λWL
·C · g (257)
substituting (257) into (255) gives:
1
4λ2WL
gT ·C · g = β2W (258)
inserting λWL from (258) into (257) to eliminate λWL gives a worst-case parameter vectorin terms of the performance gradient and tolerance region constants:
xWL − x0 = − βW√gT ·C·g
·C · g
= −βWσf·C · g
(259)
substituting (259) in (251) gives the corresponding worst-case performance value:
fWL = f (xWL) = f0 + gT · (xWL − x0)
= f0 − βW ·√
gT ·C · g= f0 − βW · σf
(260)
βw-sigma design, worst-case distance βw
Optimization of Analog Circuits 77
7.4.3 Performance type f ↓ ”good”, specification type f ≤ fU
(252) becomes
min−f (x)︸ ︷︷ ︸min −gT ·x
s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2W → xWU , fWU = f (xWU) (261)
(259) becomes
xWU − x0 = + βW√gT ·C·g
·C · g
= +βWσf·C · g
(262)
(260) becomes
fWU = f (xWU) = f0 + gT · (xWU − x0)
= f0 + βW ·√
gT ·C · g= f0 + βW · σf
(263)
Optimization of Analog Circuits 78
7.5 General worst-case analysis
indexes d, s, r for parameter types xd,xs,xr left outindex i for performance feature fi left out
7.5.1 Task
Given: ellipsoid tolerance region:
TE =
x∣∣∣(x− x0)T ·C−1 · (x− x0) ≤ β2
W
(264)
general smooth performance f (x)
Find: worst-case parameter vectors xWL/U and corresponding worst-case performancevalues fWL/U = f
(xWL/U
)
x1
x2
∇f (xWL)
xWL f = fWL = f (xWL)f
(WL)= fWL
f = f (x0) = f0
f = fWU = f (xWU)
f(WU)
= fWU
TE∇f (xWU)
f
x0
xWU
Figure 42. General worst-case analysis with tolerance ellipsoid and nonlinear performancefunction.
Optimization of Analog Circuits 79
7.5.2 Peformance type f ↑ ”good”, specification type f ≥ fL:
minx
f (x) s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2W (265)
specific nonlinear programming problem (non-linear objective function, one quadraticinequality constraint) with numerical solution, e.g., by Sequential Quadratic Programming(SQP), which yields ∇f (xWL)
assumption: unique solution on border of TE
Linearization of objective function f at worst-case point xWL, i.e., after solu-tion of (265):
f(WL)
(x) = fWL +∇f (xWL)T · (x− xWL) (266)
substituting (266) in (265) gives:
min f(WL)
(x)︸ ︷︷ ︸min ∇f(xWL)T ·x
s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2W (267)
structure identical to realistic worst worst-case analysis (252), replace g by ∇f (xWL)
worst-case parameter vector:
xWL − x0 = − βW√∇f(xWL)T ·C·∇f(xWL)
·C · ∇f (xWL)
= − βWσf(WL)·C · ∇f (xWL)
(268)
worst-case performance value:
f(WL)
(x0) = f(WL)
0 = fWL +∇f (xWL)T · (x0 − xWL)
fWL = f(WL)
0 +∇f (xWL)T · (xWL − x0) (269)
substituting (268) in (269) gives the corresponding worst-case performance value:
fWL = f(WL)
0 − βW ·√∇f (xWL)T ·C · ∇f (xWL)
= f(WL)
0 − βW · σf (WL)
(270)
βw-sigma design, worst-case distance βw
Optimization of Analog Circuits 80
7.5.3 Performance type f ↓ ”good”, specification type f ≤ fU :
(265) becomesmin −f (x) s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2
W (271)
(266) becomes
f(WU)
(x) = fWU +∇f (xWU)T · (x− xWU) (272)
(267) becomes
min −∇f (xWU)T · x s.t. (x− x0)T ·C−1 · (x− x0) ≤ β2W (273)
worst-case parameter vector:
xWU − x0 = + βW√∇f(xWU )T ·C·∇f(xWU )
·C · ∇f (xWU)
= + βWσf(WU)
·C · ∇f (xWU)(274)
worst-case performance value:
fWU = f(WU)
0 + βW ·√∇f (xWU)T ·C · ∇f (xWU)
= f(WU)
0 + βW · σf (WU)
(275)
7.5.4 General worst-case analysis with tolerance box
lower worst-case:
min f (x) s.t.
x ≥ xL
x ≤ xU(276)
upper worst-case:
min−f (x) s.t.
x ≥ xL
x ≤ xU(277)
Optimization of Analog Circuits 81
7.6 Input/output of worst-case analysis
worst-case analysis
classical
x, f
tolerance boxTB(xL,xU)
sensitivityanalysis
Jx, f (once)
for each fi
x(i)WL,C ,x
(i)WU,C
f(i)
WL,C , f(i)
WU,C
f(i)WL,C , f
(i)WU,C
worst-case parameter vector
worst-case performance
worst-case analysisx, f
sensitivityanalysis
Jx, f (once)
for each fi
realistictolerance ellipsoidTE(C,x0, βW )
”βW -sigma design”
x(i)WL,R,x
(i)WU,R
f(i)
WL,R, f(i)
WU,R
f(i)WL,R, f
(i)WU,R
worst-case parameter vector
worst-case performance
worst-case analysisx, f
sensitivityanalysis
for each fi
tolerance ellipsoid
f(i)WL,G, f
(i)WU,G
x(i)WL,G,x
(i)WU,G
(nit times)x(κ), f (κ) J(κ)
general
worst-case parameter vector
worst-case performanceand/or
tolerance box
Optimization of Analog Circuits 82
sensitivityanalysis
simulation
x, f
Jacobian matrix
sensitivity matrix
x + ∆xiei (nx times) ∆fi = f(x + ∆xiei)− f
J = ∂f∂xT
J =
∂f1∂x1
∂f1∂x2
· · · ∂f1∂xnx
∂f2∂x1
∂f2∂x2
· · · ∂f2∂xnx
......
. . ....
∂fnf∂x1
∂fnf∂x2
· · · ∂fnf∂xnx
=
∂g(1)T = ∂f1
∂xT
∂g(2)T = ∂f2∂xT
...
∂g(nf )T =∂fnf∂xT
≈[· · · 1
∆xi∆fi · · ·
]
simulation fperformances
x =
xd
xs
xr
parameters
netlist, technologysimulation bench
Optimization of Analog Circuits 83
7.7 Summary of discussed worst-case analysis problems
• For each performance feature fi there exists a worst-case parameter vector xWL,i
and/or xWU,i respectively and a corresponding worst-case performance value fWL,i
and/or fWU,i.
• xWL,i and xWU,i are unique in the classical and realistic worst-case analysis. Severalworst-case parameter vectors may exist in the general worst-case analysis.
worst case feasible region objective function good for
analysis type
classical hyper-box linearuniform distribution
unknown distribution
general hyper-box non-lineardiscrete circuits
range parameters
realistic ellipsoid linear normal distribution
general ellipsoid non-linear IC transistor parameters
• Worst-case analysis requires design technology (circuit, performance features) andprocess technology (statistical parameters, parameter distribution).
Optimization of Analog Circuits 84
8 Yield analysis
8.1 Task
Given:
• statistical parameters with normal distribution, eventually obtained through trans-formation
• performance specification
Find: percentage/proportion of circuits that fulfill the specifications
statistical parameter distribution (manufacturing process)
pdf (xs)∆= pdfN (xs) =
1√2π
nxs ·√det (C)
· e−12β2(xs) (278)
β2 (xs) = (xs − xs,0)T ·C−1 · (xs − xs,0) (279)
performance acceptance region, performance specification (customer)
Af = f | fL ≤ f ≤ fU (280)
solution requires either: non-normal distribution pdff (f) or: non-linear parameter accep-tance region As = xs | f (xs) ∈ Af (dashed lines in Fig. 43)
f1
f2xs,2
xs,1
xs,0
As
β = const
Aff (xs,0)
fL,1 fU,1
fL,2
fU,2
Figure 43.
Optimization of Analog Circuits 85
8.1.1 Acceptance function
δ (xs) =
1, f (xs) ∈ Af0, f (xs) /∈ Af
=
1, xs ∈ As0, xs /∈ As
”circuit functions”
”circuit malfunctions”(281)
8.1.2 Parametric yield
Y =
∫· · ·∫
As
pdf (xs) · dxs (282)
=
∞∫· · ·∫
−∞
δ (xs) · pdf (xs) · dxs (283)
= E δ (xs) (284)
yield: expected value of the acceptance function
Optimization of Analog Circuits 86
8.2 Statistical yield analysis/Monte-Carlo analysis
• sample of statistical parameter vectors according to given distribution
x(µ)s ∼ N (xs,0,C) , µ = 1, . . . , nMC (285)
• (numerical) circuit simulation of each sample element (simulation of the stochasticmanufacturing process on circuit level)
x(µ)s 7→ f (µ) = f
(x(µ)s
), µ = 1, . . . , nMC (286)
• evaluation of the acceptance function (simulation of a production test)
δ(µ) = δ(x(µ)s
)=
1, f (µ) ∈ Af0, f (µ) /∈ Af
, µ = 1, . . . , nMC (287)
• statistical yield estimation
Y = E δ (xs) =1
nMC
nMC∑µ=1
δ(µ) (288)
=number of functioning circuits
sample size(289)
=n+
nMC
=#”+”
#”+”+ #”-”(Fig. 44) (290)
xs,2
xs,1
As
β = const
Figure 44.
Optimization of Analog Circuits 87
8.2.1 Variance of yield estimator
VY
(288)= V
E δ (xs)
(226)=
1
nMC
V δ (xs) = σ2Y
(291)
V δ(xs) = Eδ2 (xs)︸ ︷︷ ︸δ(xs)
︸ ︷︷ ︸Y
− (E δ (xs)︸ ︷︷ ︸Y
)2
︸ ︷︷ ︸Y 2
= Y (1− Y ) (292)
8.2.2 Estimated variance of yield estimator
VY
(288)= V
E δ (xs)
(227)=
1
nMC
V δ (xs) (293)
(230)=
1
nMC
· nMC
nMC − 1[Eδ2 (xs)︸ ︷︷ ︸
δ(xs)
︸ ︷︷ ︸Y
− (E δ (xs)︸ ︷︷ ︸Y
)2
︸ ︷︷ ︸Y 2
] (294)
=Y(
1− Y)
nMC − 1= σ2
Y(295)
Y is binomially distributed: probability that n+ of nMC circuits are functioning
nMC →∞: Y is normally distributed (central limit theorem)
in practice: n+ > 4, nMC − n+ > 4 and nMC > 10
1.0 Y0.550%
00% 100%
0.25nMC nMC
σ2Y
Y = 85%:nMC 10 50 100 500 1000
σY 11.9% 5.1% 3.6% 1.6% 1.1%
Figure 45.
Optimization of Analog Circuits 88
• confidence interval, confidence level
P (Y ∈ [Y − kζ · σY , Y + kζ · σY ]︸ ︷︷ ︸confidence interval
) =
∫ kζ
−kζ
1√2πe−
t2
2 dt︸ ︷︷ ︸confidence level
(296)
e.g., nMC = 1000, Y = 85%→ σY = 1.1%; kζ = 3: P (Y ∈ [81.7%, 88.3%]) = 99.7%
• given: yield estimator Y , confidence interval Y ∈ [Y − ∆Y, Y + ∆Y ], confidencelevel ζ%
find: nMC
ζ = cdfN
(Y + kζ · σY
)− cdfN
(Y − kζ · σY
)→ kζ (297)
∆Y = kζ · σY → σY (298)
σ2Y
=Y ·(
1− Y)
nMC − 1(299)
nMC = 1 +Y ·(
1− Y)· (kζ)2
∆Y 2(300)
number nMC for various confidence intervals and confidence levels:
Y ±∆Yζ
kζ
90%
1.645
95%
1.960
99%
2.576
99.9%
3.291
85% ± 10% 36 50 86 140
85% ± 5% 140 197 340 554
85% ± 1% 3,452 4,900 8,462 13,811
99.99% ± 0.01% 66,352
• given: Y!> Ymin, significance level α
find: nMC
null hypothesis H0, Y < Ymin, rejected if all circuits functioning, n+ = nMC
assuming H0 holds, the probability of falsely (i.e., Y < Ymin) rejecting H0 is
P (”rejection”)
(test
definition
)= P (”n+ = nMC”) (301)(
binominaldistribution
)= Y nMC
(falsely)< Y nMC
min
!< α (302)
nMC >logα
log Ymin(303)
Optimization of Analog Circuits 89
number nMC for a minimum yield and significance level:
Ymin α = 5% α = 1%
95% 60 92
99% 300 460
99.9% 3,000 4,600
99.99% 30,000 46,000
8.2.3 Importance sampling
Y =
∞∫· · ·∫
−∞
δ (xs) · pdf (xs) · dxs (304)
=
∞∫· · ·∫
−∞
δ (xs) ·pdf (xs)
pdfIS (xs)· pdfIS (xs) · dxs (305)
= EpdfIS
δ (xs) ·
pdf (xs)
pdfIS (xs)
= E δ(xs) · w(xs) (306)
sample created according to a separate, specific distribution pdfIS
EpdfISw(xs) =
∞∫· · ·∫
−∞
pdf(xs)
pdfIS(xs)· pdfIS(xs) · dxs = 1 (307)
goal: reduction of estimator variance
VE δ(xs)
=V δ(xs)nMC
!<
VpdfISδ(xs)w(xs)
nIS= V
pdfIS
EpdfISδ(xs)
(308)
with
VpdfISδ(xs)w(xs) =
∞∫· · ·∫
−∞
δ (xs) ·pdf (xs)
pdfIS (xs)· pdf (xs) · dxs − Y 2 (309)
Eq. (308) is statisfied for nMC = nIS, if:
∀xs|δ(xs)=1
pdfIS(xs) > pdf(xs) (310)
Optimization of Analog Circuits 90
8.3 Geometric yield analysis for linearized performance feature(”realistic geometric yield analysis”)
yield in case that As is a half of Rns defined by a hyperplane
linear model for one single performance feature, index i left out:
f = fL + gT · (xs − xs,WL) (311)
e.g. after solution according to (336) or (337), fL = f (xs,WL)
g = ∇f (xs,WL)(312)
specification type:f ≥ fL (313)
xs,1
xs,2
f = f 0
f = fL
β2 (xs) = (xs − xs,0)T ·C−1 · (xs − xs,0) = const
∇β2 (xs,a)
xs,a
xs,0
β2 (xs) = β2WL
g
A′s,L, f ≥ fL
Figure 46.
Optimization of Analog Circuits 91
8.3.1 Yield partition
Y L =
∫· · ·∫
xs∈A′s,L
pdfN (xs) · dxs =
∫f≥fL
pdff(f)· df (314)
linearized performance feature normally distributed (212), (213):
f ∼ N
fL + gT · (xs,0 − xs,WL)︸ ︷︷ ︸f0=f(xs,0)
, gT ·C · g︸ ︷︷ ︸σ2f
(315)
yield written in terms of the pdf of the linearized performance feature:
Y L =
∫ ∞fL
1√2π · σf
· e− 1
2
(f−f0σf
)2
· df (Fig. 47) (316)
ff 0
pdff
fL
Y L
Figure 47.
Optimization of Analog Circuits 92
8.3.2 Defining worst-case distance βWL as difference from nominal perfor-mance to specification bound as multiple of standard deviation σf oflinearized performance feature (βWL-sigma design)
f 0 − fL = gT · (xs,0 − xs,WL) (317)
f 0 − fL =
+βWL · σf , f 0 > fL
−βWL · σf , f 0 < fL
”circuit functions”
”circuit malfunctions”(318)
variable substitution:
t =f − f 0
σf,df
dt= σf (319)
tL =fL − f 0
σf=
−βWL, f 0 > fL
+βWL, f 0 < fL(320)
Y L =
∫ ∞∓βWL
1√2π· e−
t2
2 · dt = −∫ −∞±βWL
1√2π· e−
(−t′)22 · dt′
(t′ = −t, dt
dt′= −1
)
8.3.3 Yield partition as a function of worst-case distance βWL
Y L =
±βWL∫−∞
1√2π· e−
t′2
2 · dt′
f 0 > fL : +βWL (321)
f 0 < fL : −βWL
standard normal distribution, statistical tables, exact within given digits, no estimation
Optimization of Analog Circuits 93
8.3.4 Worst-case distance βWL defines tolerance region
f (xs,a) = fL(311)⇐⇒ gT · (xs,a − xs,WL)︸ ︷︷ ︸
...
= 0 (322)
f (xs) = fL + gT · (xs − xs,WL)−
...︷ ︸︸ ︷gT · (xs,a − xs,WL)
= fL + gT · (xs − xs,a) (323)
i.e., xs,WL in (311) can be replaced with any point of level set f = fL because of (322)
from visual inspection of Fig. 46:
∇β2 (xs,a) = 2 ·C−1 · (xs,a − xs,0) =
−λa · g, f 0 > fL
+λa · g, f 0 < fL, λa > 0 (324)
⇐⇒ (xs,a − xs,0) = ∓λa2·C · g (325)
=⇒ gT · (xs,a − xs,0) = ∓λa2· σ2
f
(323)= fL − f 0
(318)= ∓βWL · σf (326)
λa2
=βWL
σf(327)
substituting (327) in (325)
(xs,a − xs,0) = ∓βWL
σf·C · g (328)
(xs,a − xs,0)T ·C−1 · (xs,a − xs,0) = β2WL ·
1
σ2
f2
· gT ·C · g︸ ︷︷ ︸1
(329)
(xs,a − xs,0)T ·C−1 · (xs,a − xs,0) = β2WL (330)
β2WL is level parameter (”radius”) of ellipsoid that touches level hyperplane f = fL
Optimization of Analog Circuits 94
8.3.5 specification type f ≤ fU
(318) becomes
fU − f 0 =
+βWU · σf , f 0 < fU
−βWU · σf , f 0 > fU
”circuit functions”
”circuit malfunctions”(331)
(321) becomes
Y U =
±βWU∫−∞
1√2π· e−
t′2
2 · dt′
f 0 < fU : +βWU (332)
f 0 > fU : −βWU
(323) becomes
f (xs) = fU + gT · (xs − xs,a) (333)
Optimization of Analog Circuits 95
8.4 Geometric yield analysis for nonlinear performance feature(”general geometric yield analysis”)
8.4.1 Problem formulation
xs,2
xs,1
xs,WL
β2 (xs) = (xs − xs,0)T ·C−1 · (xs − xs,0) = const
xs,0f = fL
f(WL)
= fL
A′s,LAs,L
∇f(xs,WL
)
Figure 48.
lower bound, nominally fulfilled / upper bound, nominally violated
f (xs,0) > fL/U : maxxs
pdfN (xs) s.t. f (xs) ≤ fL/U (334)
lower bound, nominally violated / upper bound, nominally fulfilled
f (xs,0) < fL/U : maxxs
pdfN (xs) s.t. f (xs) ≥ fL/U (335)
statistical parameter vector with highest probability density ”on the other side” of theacceptance region border
lower bound, nominally fulfilled / upper bound, nominally violated
f (xs,0) > fL/U : minxs
β2 (xs) s.t. f (xs) ≤ fL/U (336)
lower bound, nominally violated / upper bound, nominally fulfilled
f (xs,0) < fL/U : minxs
β2 (xs) s.t. f (xs) ≥ fL/U (337)
statistical parameter vector with smallest distance (weighted according to pdf) to theacceptance region border
specific form of a nonlinear programming problem, quadratic objective function, one non-linear inequality constraint, iterative solution with SQP:
worse-case parameter set xs,WL/U
worst-case distance βWL/U = β(xs,WL/U
)performance gradient ∇f
(xs,WL/U
)Optimization of Analog Circuits 96
8.4.2 Advantages of geometric yield analysis
• YL/U =∫· · ·∫As,L/U
pdf (xs) · dxs; the larger the pdf value, the larger the error in
Y if border of As is approximated inaccurately; A′s is exact at point xs,WL/U withhighest pdf value, A′s differs from As the more, the smaller the pdf value
• YL/U ↑ −→ accuracy of Y L/U ↑
• systematic error, depends on the curvature of performance f in xs,WL/U
• duality principle in minimum-norm problems:
minimum distance between point and convex set equal to maximum distance be-tween point to any separating hyperplanecase 1: Y (A′s) greatest lower bound of Y (As) concerning any tangent hyperplane
xs,0
A′s As
Figure 49.
case 2: Y (A′s) least upper bound of Y (As) concerning any tangent hyperplane
xs,0
A′s
As
Figure 50.
• in practice, error |Y L/U − YL/U | ≈ 1% . . . 2%
Optimization of Analog Circuits 97
8.4.3 Lagrange function and first-order optimality conditions of problem (336)
L (xs, λ) = β2 (xs)− λ · (fL − f (xs)) (338)
∇L (xs) = 0 : 2 ·C−1 · (xs,WL − xs,0) + λWL · ∇f (xs,WL) = 0 (339)
λWL · (fL − f (xs,WL)) = 0 (340)
β2WL = (xs,WL − xs,0)T ·C−1 · (xs,WL − xs,0) (341)
assumption: λWL = 0; then fL ≥ f (xs,WL) (i.e., constraint inactive); from (339):xs,WL = xs,0 and f (xs,WL) = f (xs,0) > fL (from (336)), which contradicts assumption,therefore:
λWL > 0 (342)
f (xs,WL) = fL (343)
from (339):
xs,WL − xs,0 = −λWL
2·C · ∇f (xs,WL) (344)
substituting (344) in (341):(λWL
2
)2
· ∇f (xs,WL)T ·C · ∇f (xs,WL) = β2WL (345)
λWL
2=
βWL√∇f (xs,WL)T ·C · ∇f (xs,WL)
(346)
substituting (346) in (344):
xs,WL − xs,0 =−βWL√
∇f (xs,WL)T ·C · ∇f (xs,WL)·C · ∇f (xs,WL) (347)
(347) corresponds to (268), worst-case analysis
8.4.4 Lagrange function of problem (337)
L (xs, λ) = β2 (xs)− λ · (f (xs)− fL) (348)
(347) becomes
xs,WL − xs,0 =+βWL√
∇f (xs,WL)T ·C · ∇f (xs,WL)·C · ∇f (xs,WL) (349)
Optimization of Analog Circuits 98
8.4.5 Second-order optimality condition of problem (336)
∇2L (xs,WL) = 2 ·C−1 + λWL · ∇2f (xs,WL) (350)
curvature of β2 stronger than curvature of f
8.4.6 Worst-case distance
linearization of f in xs,WL/U :
f(WL/U)
(xs) = fL/U +∇f(xs,WL/U
)T · (xs − xs,WL/U
)(351)
f(WL/U)
(xs,0)− fL/U = ∇f(xs,WL/U
)T · (xs,0 − xs,WL/U
)(352)
substituting (347) in (352):
lower bound, nominally fulfilled / upper bound, nominally violated
βWL/U =
”performance safety margin”︷ ︸︸ ︷f
(WL/U)(xs,0)− fL/U√
∇f(xs,WL/U
)T ·C · ∇f (xs,WL/U
) =∇f
(xs,WL/U
)T · (xs,0 − xs,WL/U
)σf(WL/U)︸ ︷︷ ︸
”performance variability”
(353)lower bound, nominally violated / upper bound, nominally fulfilled
βWL/U =
”performance safety margin”︷ ︸︸ ︷fL/U − f
(WL/U)(xs,0)√
∇f(xs,WL/U
)T ·C · ∇f (xs,WL/U
) =∇f
(xs,WL/U
)T · (xs,WL/U − xs,0)
σf(WL/U)︸ ︷︷ ︸
”performance variability”
(354)
Optimization of Analog Circuits 99
8.4.7 Remarks
• worst-case distance: yield approximation for each individual performance bound
• “realistic” geometric yield analysis by linearization in nominal parameter vector xs,0
• geometric yield analysis and general worst-case analysis are related by an exchangeof objective function and constraint
βWL/U,i / YL/U,i
fL/U,i
geometric yield analysis:
fi(xs) ”on the other side”
minxs
β2(xs) s.t.
fWL/U,i
general worst-case analysis:
s.t.β2(xs) ≤ β2W
,
, βW / YW
min/maxxs
fi(xs)
Optimization of Analog Circuits 100
8.4.8 Overall yield
upper bound: Y ≤ miniY L/U,i
lower bound: Y ≥ 1−∑i
(1− Y L/U,i
)Monte-Carlo analysis with A′s = ∩
iA′s,L/U,i (Fig. 25), no circuit simulation required
xs,2
xs,1
A′s
As
Figure 51.
Optimization of Analog Circuits 101
8.4.9 Consideration of range parameters
parameter acceptance region:
As,L,i =
xs| ∀
xr∈Trfi (xs,xr) ≥ fL,i
(355)
As,U,i =
xs| ∀
xr∈Trfi (xs,xr) ≤ fU,i
(356)
As,L,i =
xs| ∃
xr∈Trfi (xs,xr) < fL,i
(357)
As,U,i =
xs| ∃
xr∈Trfi (xs,xr) > fU,i
(358)
As,L,i ∩ As,L,i = φ and As,L,i ∪ As,L,i = Rns (359)
geometric yield analysis, problem formulation:
f (xs,0) ∈ As,L,i : maxxs
pdfN (xs) s.t. xs ∈ As,L,i (360)
f (xs,0) ∈ As,L,i : maxxs
pdfN (xs) s.t. xs ∈ As,L,i (361)
from (355) – (358):
xs ∈ As,L,i ⇐⇒ minxr∈Tr
fi (xs,xr) ≥ fL,i ”smallest value still inside” (362)
xs ∈ As,U,i ⇐⇒ maxxr∈Tr
fi (xs,xr) ≤ fU,i ”largest value still inside” (363)
xs ∈ As,L,i ⇐⇒ minxr∈Tr
fi (xs,xr) < fL,i ”smallest value already outside” (364)
xs ∈ As,U,i ⇐⇒ maxxr∈Tr
fi (xs,xr) > fU,i ”largest value already outside” (365)
(360), (361) with (355) – (358):
xs,0 ∈ As,L : minxs
β2 (xs) s.t. minxr∈Tr
f (xs,xr) ≤ fL ”nominal inside” (366)
xs,0 ∈ As,L : minxs
β2 (xs) s.t. minxr∈Tr
f (xs,xr) ≥ fL ”nominal outside” (367)
xs,0 ∈ As,U : minxs
β2 (xs) s.t. maxxr∈Tr
f (xs,xr) ≥ fU ”nominal inside” (368)
xs,0 ∈ As,U : minxs
β2 (xs) s.t. maxxr∈Tr
f (xs,xr) ≤ fU ”nominal outside” (369)
Optimization of Analog Circuits 102
9 Yield optimization/design centering/nominal de-
sign
9.1 Optimization objectives
Yield Y (xd)
Worst-case distances
βWL/U,i =|f (WL/U,i)
(xs,0)− fL/U,i|√∇fi
(xs,WL/U,i
)T ·C · ∇fi (xs,WL/U,i
) (370)
=|∇fi
(xs,WL/U,i
)T · (xs,0 − xs,WL/U,i
)+∇fi (xd,0)T · (xd − xd,0) |
σf(WL/U,i)
(371)
i = 1, . . . , nPSF (372)
Performance features fi, i = 1, . . . , nf
Worst-case performance features fWL/U,i, i = 1, . . . , nf
Performance distances
αWL/U,i =|fi (xd,0)− fL/U,i|√
∇fi (xd,0)T ·A2xd· ∇fi (xd,0)
”performance safety margin”
”sensitivity”(373)
withf i (xd) = fi (xd,0) +∇fi (xd,0)T · (xd − xd,0) (374)
Axd =
axd,1 0
. . .
0 axd,nxd
(scaling of individualdesign parameters
)(375)
Optimization of Analog Circuits 103
9.2 Derivatives of optimization objectives
Statistical yield derivatives
first derivative:∇Y (xs,0) = Y ·C−1 · (xs,0,δ − xs,0) (376)
xs,0,δ = Epdfδxs , pdfδ (xs) =
1
Y· δ (xs) · pdf (xs) (377)
”center of gravity”, truncated normal distribution
second derivative:
∇2Y (xs,0) = Y ·C−1 ·[Cδ + (xs,0,δ − xs,0) · (xs,0,δ − xs,0)T −C
]·C−1 (378)
Cδ = Epdfδ(xs − xs,0,δ) · (xs − xs,0,δ) (379)
difficult: ∇Y (xd)
Worst-case distance derivatives
first derivatives:
∇βWL/U,i (xs,0) =±1
σf(WL/U,i)
· ∇fi(xs,WL/U,i
)(380)
∇βWL/U,i
(x
(κ)d
)=
±1
σf(WL/U,i)
· ∇fi(x
(κ)d
)(381)
Performance distance derivative
first derivative:
∇αWL/U,i(xd,0) =±1√
∇f(xd,0)TA2xd∇f(xd,0)
∇f(xd,0) (382)
Optimization of Analog Circuits 104
9.3 Problem formulations of analog optimization
Design centering/yield optimization
maxxd
Y (xd) s.t. c (xd) ≥ 0 (383)
geometric:
maxxd ±βWL/U,i (xd) , i = 1, . . . , nPSF s.t. c (xd) ≥ 0 (384)
“nominal inside” : +
“nominal outside” : −
(multicriteria optimization problem,MCO)
Nominal design
minxd±fi (xd) , i = 1, · · · , nf s.t. c (xd) ≥ 0 (385)
maxxd ±αWL/U,i (xd) , i = 1, · · · , nPSF s.t. c (xd) ≥ 0 (386)
“nominal inside” : +
“nominal outside” : −
(multicriteria optimization problem,MCO)
Scalarization of MCO
min oi(xd), i = 1, . . . , no → min t(xd) (387)
weighted sum: tl1(x) =no∑i=1
wi · oi(x) (388)
weighted least squares: tl2(x) =∑i
wi · (oi(x)− oi,target)2 (389)
weighted min/max: tl∞(x) = maxiwi · oi (390)
exponential function: texp(x) =∑i
e−wi·oi(x) (391)
with wi > 0 , i = 1, . . . , no ,∑no
i=1 wi = 1
Inscribing largest ellipsoid in linearized acceptance region by linear programming in iter-ation step κ:
maxtβ ,xd
tβ s.t. β(κ)w,i +∇βTw,i(x
(κ)d ) · (xd − x
(κ)d ) ≥ tβ , i = 1, . . . , nPSF (392)[
c(κ)(xd) ≥ 0]
9.4 Analysis, synthesis
Optimization of Analog Circuits 105
design/sizing
analysis
simulation (deterministic/stochastic)numerical optimization
optimization objectivedeterministic/stochastic
numerical
synthesis
9.5 Sizing
analog optimization, circuit sizing
nominal design tolerance design
•) without statistical distribution •) with statistical parameter
•) without parameter ranges distribution
•) statistical and range
parameters at some fixed •) with tolerance regions of
point range parameters
9.6 Nominal design, tolerance design
nominal design
analysis synthesis
sensitivity analysis (objective: gradient) nominal optimization
simulation (objective: performance values)
tolerance design
analysis synthesis
worst-case analysis worst-case optimization
(objective: worst-case performance values and gradients)
yield analysis yield optimization
(objective: yield or worst-case distances and gradients)
Optimization of Analog Circuits 106
9.7 Optimization without/with constraints
Optimization without constraints
search direction line search
– Newton direction 1. bracketing: start interval
– Quasi-Newton direction 2. sectioning: interval refinement
– coordinate direction – Newton step
– gradient – Golden section
– least squares: Levenberg–Marquardt – backtracking
Gauss–Newton
• Polytope (Nelder-Mead) method
• Trust region
Optimization with constraints
↓nonlinear
↓ sequence of
Quadratic programming (QP)
with inequality constraints
↓ active set
QP, equality constraints
– Newton direction
– step–length limitation: new constraint
– active set update
Optimization of Analog Circuits 107
Optimization of Analog Circuits 108
10 Sizing rules for analog circuit optimization
10.1 Single (NMOS) transistor
D
S
G
vGS
vDS
iDS
iDS
vGS
saturation regiontriode region
vDS
Figure 52.
drain-source current (simple model)
iDS =
0, vDS < 0
µ · Cox · WL ·(vGS − Vth − vDS
2
)· vDS · (1 + λ · vDS) , 0 ≤ vDS ≤ vGS − Vth
12· µ · Cox · WL · (vGS − Vth)
2 · (1 + λ · vDS) , vGS − Vth ≤ vDS(393)
W,L [m] : transistor width and length
µn
[m2
V ·s
]: electron mobility in silicon
Cox[Fcm2
]: capacitance per area due to oxide
Vth [V ] : threshold voltage
λ[ 1V
] : channel length modulation factor
(394)
Optimization of Analog Circuits 109
saturation region, no channel length modulation:
iDS = K · WL· (vGS − Vth)2 with K =
1
2· µn · Cox (395)
derivatives:∇iDS (K) = iDS/K
∇iDS (W ) = iDS/W
∇iDS (L) = −iDS/L∇iDS (Vth) = − 2
vGS−Vth· iDS
(396)
variances: (σKK
)2
=AKW · L
(397)
(σVth)2 =AVthW · L
(398)
area law: Lakshmikumar et al. IEEE Journal of solid-state circuits SC-21, Dec.1986
iDS variance, assumptions: K,W,L, Vth variations statistically independent, saturationregion, no channel length modulation, linear transformation of variances
σ2iDS≈
∑x∈K,W,L,Vth
[∇iDS (x)]2 · σ2x (399)
σ2iDS
i2DS≈ AkW · L
+σ2W
W 2+σ2L
L2+
4
(vGS − Vth)2 ·AVthW · L
(400)
W ↑, L ↑, W · L ↑ → σiDS ↓
larger transistor geometries reduce iDS variance due to manufacturing variations inK,W,L, Vth
10.1.1 Sizing rules for single transistor that acts as a voltage-controlled cur-rent source (VCCS)
type origin
(1) vDS ≥ 0 electrical (DC) transistor
(2) vGS − Vth ≥ 0 function in saturation
(3) vDS − (vGS − Vth) ≥ V ?satmin
(4) W ≥ W ?min geometry manufacturing
(5) L ≥ L?min robustness variations affect iDS
(6) W · L ≥ A?min
? technology–specific value
Optimization of Analog Circuits 110
10.2 Transistor pair: current mirror (NMOS)
vGS
i2
T2
i1
T1
Figure 53.
function: I2 = x · I1
assumptions: K1 = K2; λ1 = λ2 = 0; Vth1 = Vth2; L1 = L2; T1, T2 in saturation
x =I2
I1
=W2
W1
· (vGS − Vth2)2
(vGS − Vth1)2 ·1 + λ2 · vDS2
1 + λ1 · vDS1
(401)
10.2.1 Sizing rules for current mirror
in addition to sec. 10.1.1, (1) – (6):
type origin
(7) L1 = L2 geometric current
(8) x = W2
W1function ratio
(9) |vDS1 − vDS2| ≤ ∆V ?DSmax electrical influence of channel
function length modulation
(10) vGS − Vth1,2 ≥ V ?GSmin electrical influence of local variations
robustness in threshold voltages
? technology-specific value
Optimization of Analog Circuits 111
Optimization of Analog Circuits 112
Appendix A Matrix and vector notations
A.1 Vector
b =
b1
b2
...
bm
=
...
bi...
, bi ∈ R, b ∈ Rm, b<m> (402)
b: column vector
bT = [· · · bi · · · ] transposed vector (403)
bT : row vector
A.2 Matrix
A =
a11 a12 · · · a1n
a21 a22 · · · a2n
......
. . ....
am1 am2 · · · amn
=
· · ·
... aij...
· · ·
, aij ∈ R, A ∈ Rm×n, A<m×n> (404)
i: row index, j: column index
A = [a1 a2 · · · an] (405)
A =
aT1
aT2...
aTm
(406)
transposed matrix: AT<n×m>
AT =
a11 a21 · · · am1
a12 a22 · · · am2
......
. . ....
a1n a2n · · · amn
=
aT1
aT2...
aTn
= [a1a2 · · · am] (407)
Optimization of Analog Circuits 113
A.3 Addition
A<m×n> + B<m×n> = C<m×n> (408)
cij = aij + bij (409)
A.4 Multiplication
A<m×n> ·B<n×p> = C<m×p> (410)
cij =n∑k=1
aik · bkj (411)
n
mi
n
B
C
1
m
1 p
1 p
A
11
1
j
cij = aTi︸︷︷︸i-th row
in A (410)
· bj︸︷︷︸j-th column
in B (410)
(412)
Optimization of Analog Circuits 114
A.5 Special cases
a = A<m×1> (413)
aT = A<1×m> (414)
a = A<1×1> = a<1> = aT<1> (415)
aT<m> · b<m> = c scalar product (416)
a<m> · bT<n> = C<m×n> dyadic product (417)
A<m×n> · b<n> = c<m> (418)
bT<m> ·A<m×n> = cT<n> (419)
bT<m> ·A<m×n> · c<n> = d (420)
identity matrix I<m×m> =
1 0
. . .
0 1
(421)
diagonal matrix diag (a<m>) =
a1 0
a2
. . .
0 am
<m×m>
(422)
Optimization of Analog Circuits 115
A.6 Determinant of a quadratic matrix
det (A<m×m>) = |A| =m∑j=1
aij · αij, i ∈ 1, . . . ,m
=m∑i=1
aij · αij, j ∈ 1, . . . ,m (423)
adjugate matrix adj (A)
adj (A) =
· · ·
... αij...
· · ·
T
(424)
minor determinant (row i, column j deleted):
aijαij = (−1)i+j ·
A.7 Inverse of a quadratic non-singular matrix
A−1<m×m> ·A<m×m> = A ·A−1 = I (425)
A−1 =1
det (A)· adj (A) (426)
Optimization of Analog Circuits 116
A.8 Some properties
(A ·B) ·C = A · (B ·C) (427)
A · (B + C) = A ·B + A ·C (428)
(A ·B)T = BT ·AT (429)(AT)T
= A (430)
(A + B)T = AT + BT (431)
A symmetric ⇐⇒ A = AT (432)
A positive (semi)definite ⇐⇒ ∀x 6=0
xT ·A · x > (≥) 0 (433)
A negative (semi)definite ⇐⇒ ∀x6=0
xT ·A · x < (≤) 0 (434)
IT = I (435)
A · I = I ·A = A (436)
A · adj (A) = det (A) · I = adj (A) ·A (437)
det(AT)
= det (A) (438)
|a1 · · · b · aj · · · am| = b · det (A) (439)
det(b ·AT
)= bm · det (A) (440)
|a1 · · · a(1)j · · · am|+ |a1 · · · a
(2)j · · · am| = |a1 · · · a
(1)j + a
(2)j · · · am| (441)
|a2 a1 a3 · · · am| = −det (A) (442)
aj = ak (A rank deficient) : det (A) = 0 (443)
|a1 · · · aj + b · ak · · · am| = det (A) (444)
det (A ·B) = det (A) · det (B) (445)
(A ·B)−1 = B−1 ·A−1 (446)(AT)−1
=(A−1
)T(447)(
A−1)−1
= A (448)
det(A−1
)=
1
det (A)(449)
A−1 = AT ⇐⇒ A orthogonal (450)
Optimization of Analog Circuits 117
Optimization of Analog Circuits 118
Appendix B Abbreviated notations of derivatives us-
ing the nabla symbol
Gradient vector: vector of first partial derivatives
∇f (xs)<nxs> =
∂f∂xs,1∂f∂xs,2
...∂f
∂xs,nxs
∣∣∣∣∣∣∣∣∣∣∣x=
xd
xs
xr
=∂f
∂xs
∣∣∣∣x
(451)
∇f (x?s) =∂f
∂xs
∣∣∣∣xd
x?s
xr
(452)
Hessian Matrix: matrix of second partial derivatives
∇2f (xs)<nxs×nxs> =
∂2f∂x2s,1
∂2f∂xs,1·∂xs,2 · · · ∂2f
∂xs,1·∂xs,nxs∂2f
∂xs,2·∂xs,1∂2f∂x2s,2
· · · ∂2f∂xs,2·∂xs,nxs
......
. . ....
∂2f∂xs,nxs ·∂xs,1
∂2f∂xs,nxs ·∂xs,2
· · · ∂2f∂x2s,nxs
∣∣∣∣∣∣∣∣∣∣∣x=
xd
xs
xr
=∂2f
∂xs · ∂xTs
∣∣∣∣x
(453)
∇2f (x?s) =∂2f
∂xs · ∂xTs
∣∣∣∣xd
x?s
xr
(454)
∇2f (x?d,x?s)<(nxd+nxs)×(nxd+nxs)>
=
∂2f∂xd·∂xTd
∂2f∂xd·∂xTs
∂2f∂xs·∂xTd
∂2f∂xs·∂xTs
∣∣∣∣∣∣x=
x?d
x?s
xr
(455)
=
[∇2f (xd)
∂2f∂xd·∂xTs
∂2f∂xs·∂xTd
∇2f (xs)
]∣∣∣∣∣x
(456)
Optimization of Analog Circuits 119
Jacobian matrix
∇f(x?
T
s
)<nf×nxs>
=∂f
∂xTs
∣∣∣∣xd
x?s
xr
(457)
=
∂f1∂xs,1
∂f1∂xs,2
· · · ∂f1∂xs,nxs
∂f2∂xs,1
∂f2∂xs,2
· · · ∂f2∂xs,nxs
......
. . ....
∂fnf∂xs,1
∂fnf∂xs,2
· · · ∂fnf∂xs,nxs
∣∣∣∣∣∣∣∣∣∣∣xd
x?s
xr
(458)
Optimization of Analog Circuits 120
Appendix C Norms
Quadratic model
f (x) = f (x0) + gT · (x− x0) +1
2(x− x0)T ·H · (x− x0) (459)
= f0 +nx∑i=1
gi · (xi − x0,i) +1
2
nx∑i=1
nx∑j=1
hij · (xi − x0,i) · (xj − x0,j) (460)
H: symmetric
nx = 2 : f (x1, x2) = f0 + g1 · (x1 − x0,1) + g2 · (x2 − x0,2) (461)
+1
2h11 · (x1 − x0,1)2
+ h12 · (x1 − x0,1) · (x2 − x0,2) +1
2h22 · (x2 − x0,2)2
Vector norm
l1-norm ‖x‖1 =nx∑i=1
|xi| (462)
l2-norm ‖x‖2 =
√√√√ nx∑i=1
x2i =√
xT · x (463)
l∞-norm ‖x‖∞ = maxi|xi| (464)
lp-norm ‖x‖p =
(nx∑i=1
|xi|p) 1
p
(465)
(466)
Matrix norm
max norm ‖A‖M = nx ·maxi,j|aij| (467)
row norm ‖A‖Z = maxi
∑j
|aij| (468)
column norm ‖A‖S = maxj
∑i
|aij| (469)
Euclidean norm ‖A‖E =
√∑i
∑j
|aij|2 (470)
spectral norm ‖A‖λ =√λmax (AT ·A) (471)
Optimization of Analog Circuits 121
Optimization of Analog Circuits 122
Appendix D Pseudo-inverse, singular value decom-
position (SVD)
D.1 Moore-Penrose conditions
For every matrix A<m×n>, there always exists a unique matrix A+<n×m> that satisfies the
following conditions.
A ·A+ ·A = A, A ·A+ maps each column of A to itself (472)
A+ ·A ·A+ = A+, A+ ·A maps each column of A+ to itself (473)(A ·A+
)T= A ·A+, A ·A+ is symmetric (474)(
A+ ·A)T
= A+ ·A, A+ ·A is symmetric (475)
Optimization of Analog Circuits 123
D.2 Singular value decomposition
Every matrix A<m×n> with rank(A) = r ≤ min(m,n) has a singular value decomposition
A<m×n> = V<m×m> · A<m×n> ·UT<n×n> (476)
V: matrix of the m left singular vectors (eigenvectors of A ·AT )
U: matrix of the n right singular vectors (eigenvectors of AT ·A)
U,V orthogonal: U−1 = UT , V−1 = VT
A =
[D<r×r> 0
0 0
]<m×n>
, r = rank (A) (477)
D = diag( d1, · · · , dr) (478)
↑ singular values
A+<n×m> = U<n×n> · A+
<n×m> ·VT<m×m> (479)
A+ =
[D−1<r×r> 0
0 0
]<n×m>
(480)
D−1 = diag
(1
d1
, . . . ,1
dr
)(481)
singular values: positive roots of eigenvalues of AT ·A
columns of V : basis for Rm
columns 1, . . . , r of V : basis for column space of A
columns (r + 1), . . . ,m of V : basis for kernel/null-space of AT
columns of U : basis for Rn
columns 1, . . . , r of U: basis for row space of A
columns (r + 1), . . . , n of U : basis for kernel/null-space of A
A · A+
=
[I<r×r> 0
0 0
]<m×m>
A+· A =
[I<r×r> 0
0 0
]<n×n>
A ·A+ =
[I<r×r> 0
0 0
]<m×m>
A+ ·A =
[I<r×r> 0
0 0
]<n×n>
Optimization of Analog Circuits 124
Appendix E Linear equation system, rectangular sys-
tem matrix with full rank
A<m×n> · x<n> = c<m> (482)
rank(A) = min(m,n)
E.1 m < n, rank(A) = m, underdetermined system of equations
A ·AT is invertible
A · x = c → A ·AT ·w′︸ ︷︷ ︸x
= c → w′ = (A ·AT )−1 · c
x = AT · (A ·AT )−1︸ ︷︷ ︸A+
·c minimum length solution
Solving (482) by SVD
A = V · [D<m×m> 0]<m×n> ·UT
V · [D 0] ·UT · x︸ ︷︷ ︸w′′
= c
[D 0] ·w′′ = VT · c w′′ =
[w′′a<m>
w′′b
]<n>
D ·w′′a = VT · celement-wise
division→ w′′a
x = U ·
[w′′a
0
]
Solving (482) by QR-decomposition
AT<n×m> =
[Q<n×m> Q⊥
]<n×n−m>︸ ︷︷ ︸
orthogonal, i.e.,QT ·Q⊥ = 0
·
[R<m×m>
0
]<n−m×m>
(483)
A+ = AT · (A ·AT )−1 = Q ·R−T
RT ·QT · x︸ ︷︷ ︸w′′′
= c
forwardsubstitution→ w′′′ → x = Q ·w′′′
Optimization of Analog Circuits 125
E.2 m > n, rank(A) = n, overdetermined system of equations
AT ·A is invertible
A · x = c → AT ·A · x = AT · cx = (AT ·A)−1 ·AT︸ ︷︷ ︸
A+
c least squares solution
Solving (482) by SVD
A = V ·
[D<n×n>
0
]<m×n>
·UT
V ·
[D
0
]·UT · x︸ ︷︷ ︸
w′′= c[
D
0
]·w′′ = VT · c =
[va <n>
vb
]<m>
D ·w′′ = va
element-wisedivision→ w′′
x = U ·w′′
Solving (482) by QR-decomposition
A<m×n> =[Q<m×n> Q⊥
]<m×m>︸ ︷︷ ︸
orthogonal
·
[R<n×n>
0
]<m×n>
= Q ·R
A+ = (AT ·A)−1 ·AT = R−1QT
Q ·R · x = c⇔ R · x = QT · cbackward
substitution→ x
Optimization of Analog Circuits 126
E.3 m = n, rank(A) = m = n, determined system of equations
A is invertible
A+ = A−1
Solving (482) by SVD
A = V ·D ·UT
V ·D ·UT · x︸ ︷︷ ︸w′′
= c
D ·w′′ = VT · celement-wise
division→ w′′
x = U ·w′′
Solving (482) by QR-decomposition
A = Q ·RA+ = R−1 ·QT
Q ·R · x = c ⇔ R · x = QT · cbackward
substitution→ x
Solving (482) by LU-decomposition
A = L ·UL : lower left triangular matrix
U : upper right triangular matrix
L ·U · x︸ ︷︷ ︸w′′′′
= c
L ·w′′′′ = c
forwardsubstitution→ w′′′′
U · x = w′′′′backward
substitution→ x
A orthogonal: AT ·A = A ·AT = I
A+ = A−1 = AT
A diagonal: A+ii = 1
Aii
(rank deficient, i.e. ∃iAii = 0, then A+ii = 0 )
Optimization of Analog Circuits 127
Optimization of Analog Circuits 128
Appendix F Partial derivatives of linear, quadratic
terms in matrix/vector notation
∂∂x
∂∂xT
aT · x a aT
xT · a a aT
xT<n> ·A<n×n> · x<n> A · x + AT · x xT ·AT + xT ·AA = AT (symmetric): A = AT :
2 ·A · x 2 · xT ·AT
xT · x 2 · x 2 · xT
A<m×n> · x<n>= a1 · x1 + a2 · x2 + . . .+ an · xn
=
aT1 · xaT2 · x
...
aTm · x
a1
a2...
am
<mn>
A =
aT1
aT2...
aTm
xT<n> ·AT
<n×m>
= aT1 · x1 + aT2 · x2 + . . .+ aTn · xn AT = [a1 a2 · · · am][aT1 aT2 · · · aTm
]<mn>
=[aT1 · x aT2 · x · · · aTm · x
]
Optimization of Analog Circuits 129
∂
...⊗...
<m>
∂
[ ]<n>
=
...∂⊗
∂
<n>
...
<mn>
(484)
∂
...⊗...
<m>
∂ [ ]<n>=
...
∂⊗
∂[ ]<n>...
<m×n>
(485)
∂ [· · ·⊗ · · · ]<m>∂
[ ]<n>
=
· · ·∂⊗
∂
[ ]<n>
· · ·
<n×m>
(486)
∂ [· · ·⊗ · · · ]<m>∂ [ ]<n>
=
[· · · ∂
⊗∂ [ ]<n>
· · ·]<mn>
(487)
Optimization of Analog Circuits 130
Appendix G Probability space
event B: e.g., performance value f is less than or equal fU , i.e., B = f |f ≤ fU
probability P : e.g., P (B) = P (f | f ≤ fU) = cdff (fU) =∫ fU−∞ pdff (f) · df
event B ⊂ Ω: subset of sample set of possible outcomes of probabilistic experiment Ω
elementary event: event with one element (if event sets are countable)
event set B: system of subsets of sample set, B ∈ B
definitions:Ω ∈ B (488)
B ∈ B ⇒ B ∈ B (489)
Bi ∈ B, i ∈ N ⇒ ∪iBi ∈ B (490)
event set B: ”σ-algebra”
Kolmogorov axioms:P (B) ≥ 0 for all B ∈ B (491)
P (Ω) = 1 ”Ω: certain event” (492)
Bi ∩Bj = , Bi,j ∈ B ⇒ P (Bi ∪Bj) = P (Bi) + P (Bj) (493)
corollaries:P () = 0 ”: impossible event” (494)
Bi ⊆ Bj ⇒ P (Bi) ≤ P (Bj) (495)
P(B)
= 1− P (B) (496)
0 ≤ P (B) ≤ 1 (497)∫ +∞
−∞· · ·∫ +∞
−∞pdf (t) · dt = 1 (498)
Optimization of Analog Circuits 131
Optimization of Analog Circuits 132
Appendix H Convexity
H.1 Convex set K ∈ Rn
∀p0,p1∈K,a∈[0,1]
pa∆= (1− a) · p0 + a · p1 ∈ K (499)
p0pa
p1
K
(a) (b)
p0
p1
Figure 54. (a) convex set, (b) non-convex set
H.2 Convex function
∇2f is positive definite
∀p0,p1 ∈ K, a ∈ [0, 1]
pa = (1− a) · p0 + a · p1
f (pa) ≤ (1− a) · f (p0) + a · f (p1) (500)
pp1pap0
f
Figure 55. convex function
c (p) is a concave function ⇐⇒ p | c (p) ≥ c0 is a convex functionconvex optimization problem: local minima are global minimastrict convexity: unique global solution
Optimization of Analog Circuits 133
Optimization of Analog Circuits 134