Basic concepts in Linear Algebra and Optimizationstanford.edu/~yinbin/TEACHING/Optimization.pdfBasic...
Transcript of Basic concepts in Linear Algebra and Optimizationstanford.edu/~yinbin/TEACHING/Optimization.pdfBasic...
Basic concepts in Linear Algebra and Optimization
Yinbin Ma
GEOPHYS 211
Outline
Basic Concepts on Linear AlgbraI vector spaceI normI linear mapping, range, null spaceI matrix multiplication
Iterative Methods for Linear OptimizationI normal equationI steepest descentI conjugate gradient
Unconstrainted Nonlinear OptimizationI Optimality conditionI Methods based on a local quadratic modelI Line search methods
Outline
Basic Concepts on Linear AlgbraI vector spaceI normI linear mapping, range, null spaceI matrix multiplication
Iterative Methods for Linear OptimizationI normal equationI steepest descentI conjugate gradient
Unconstrainted Nonlinear OptimizationI Optimality conditionI Methods based on a local quadratic modelI Line search methods
Outline
Basic Concepts on Linear AlgbraI vector spaceI normI linear mapping, range, null spaceI matrix multiplication
Iterative Methods for Linear OptimizationI normal equationI steepest descentI conjugate gradient
Unconstrainted Nonlinear OptimizationI Optimality conditionI Methods based on a local quadratic modelI Line search methods
Basic concepts - vector space
A vector space is any set V for which two operations are defined:1) Vector addition: any vector x1 and x2 in set V can be added to anothervector x = x1 + x2 and x is also in set V .2) Scalar Multiplication: Any vector x in V can be multiplied ("scaled")by a real number c 2 R to produce a second vector cx which is also in V.
In this class, we only discuss the case where V ⇢ Rn, meaning each vectorx is the space is a n-dimensional column vector.
Basic concepts - norm
The “model space” and “data space” we mentioned in class are normedvector spaces. A norm is a function k·k : Rn ! R that map a vector to areal number. A norm must satisfy the following:1) kxk � 0 and kxk= 0 i� x = 02) kx + yk kxk+kyk3) kaxk= |a|kxkwhere x and y are vectors in vector space V and a 2 R.
Basic concepts - normWe will see the following norm in this course:1) L2 norm: for a vector x , the L2 norm is defined as:
kxk2 ⌘
sn
Âi=1
x2i
2) L1norm: for a vector x ,the L2 norm is defined as:
kxk1 ⌘n
Âi=1
|xi
|
3) L• norm: for a vector x ,the L• norm is defined as:
kxk• ⌘ maxi=1,··· ,n
|xi
|
The norm for a matrix is induced as:
||A||a = supx 6=0
||Ax ||a||x ||a
Basic concepts - linear mapping, range and null spaceWe say a a map x ! Ax is linear if for any x ,y 2 Rn, and any a 2 R,
A(x + y) = Ax +AyA(ax) = aAx
It can be proved that each linear mapping from Rn to Rm can beexpressed by the multiplication of a m⇥n matrix.The range of linear operator A 2 Rm⇥n, is the space spanned by thecolumns of A,
range(A) = {y |such that y = Ax ,x 2 Rn}
The null space of linear operator A 2 Rm⇥n is the space,
null(A) = {x |such that Ax = 0}
It is “obvious” that range(A) is perpendicular to null(AT ). (exercise)
Basic concepts - four ways matrix multiplication
For the matrix-matrix product B = AC . If A is l ⇥m and C is m⇥n, thenB is l ⇥n.matrix multiplication method 1:
bij
=m
Âk=1
aik
ckj
Here bij
, aik
, and ckj
are entries of B, A, C.
Basic concepts - four ways matrix multiplication
For the matrix-matrix product B = AC . If A is l ⇥m and C is m⇥n, thenB is l ⇥n.matrix multiplication method 2:
B = [b1|b2| · · · |bn
]
Here bi
is the i � th column of matrix B.Then,
B = [Ac1|Ac2| · · · |Acn
]
bi
= Aci
Each column of B is in the range (we will talk about it later) of A. Thus,the range of B is the subset of the range of A.
Basic concepts - four ways matrix multiplicationFor the matrix-matrix product B = AC . If A is l ⇥m and C is m⇥n, thenB is l ⇥n.matrix multiplication method 3:
B =
2
664
bT
1bT
2· · ·bT
l
3
775
Here bi
is the i � th row of matrix B.Then,
B =
2
664
aT
1 CaT
2 C· · ·
aT
l
C
3
775
bT
i
= aT
i
CThis form is not commenly used.
Basic concepts - four ways matrix multiplication
For the matrix-matrix product B = AC . If A is l ⇥m and C is m⇥n, thenB is l ⇥n.matrix multiplication method 4:
B = Âi ,j=1,··· ,m
ai
cT
j
Where, ai
is the i � th column of matrix A, and cT
j
is the j � th row ofmatrix C .Each term a
i
cT
j
is a rank-one matrix.
Outline
Basic Concepts on Linear AlgbraI vector spaceI normI linear mapping, range, null spaceI matrix multiplication
Iterative Methods for Linear OptimizationI normal equationI steepest descentI conjugate gradient
Unconstrainted Nonlinear OptimizationI Optimality conditionI Search directionI Line search
Linear Optimization- normal equationWe solve a linear system having n unknowns and with m > n equations.We want to find a vector m 2 Rn that satisfies,
Fm = d
where d 2 Rm and F 2 Rm⇥n.Reformulate the problem:
define residual r = d�Fmfind m that minimizekrk2 = kFm�dk2
It can be proved that, we can minimize the residual norm when F⇤r = 0.This is equivalent to a n⇥n system,
F⇤Fm = F⇤d
which is the normal equation. We can solve norm equation using directionmethods such at LU, QR, SVD, Cholesky decomposition, etc.
Linear Optimization-steepest descent method
For the unconstraint linear optimization problem:
min J(m) = kFm�dk22
To find the minimum of objective function J(m) iteratively using steepestdescent method, at the current point mk, we update the model by movingalong the nagative direction of gradient,
mk+1 = m
k
�a—J(mk
)
—J(mk
) = F⇤(Fmk
�d)
The gradient can be evaluated exactly, and we have analytical formula forthe optimal a.
Linear Optimization-conjugate gradient methodFor the unconstraint linear optimization problem:
min J(m) = kFm�dk22
Starting from m0, we have a series of search direction �mi
, i = 1,2, · · · ,k,and updated model iteratively,m
i
= mi�1 �a
i�1�mi�1, i = 1, · · · ,k.
For the next search direction �mk
in the spacespan{�m0, · · · ,�m
k�1,—J(mk
)},
�mk
=k�1Âi=0
ci
�mi
+ ck
—J(mk
)
The “magic” is that for linear problem c0 = c1 = · · ·= ck�2 = 0. We ended
up with Conjugate gradient method,�m
k
= ck�1�m
k�1 + ck
—J(mk
)
ak
= min J(mk
+ak
�mk
)
mk+1 = m
k
+ak
�mk
We are searching within the space span{�m0, · · · ,�mk�1,—J(m
k
)} in CGmethod, though looks like we are doing a plane search.
Outline
Basic Concepts on Linear AlgbraI vector spaceI normI linear mapping, range, null spaceI matrix multiplication
Iterative Methods for Linear OptimizationI normal equationI steepest descentI conjugate gradient
Unconstrainted Nonlinear OptimizationI Optimality conditionI Search directionI Line search
Unconstrainted Nonlinear Optimization-OptimalityconditionFor the unconstraint nonlinear optimization problem:
minimize mJ(m)
where J(m) is a real-valued function.How should we determine if m⇤ is a local minimizer?Theorem(First order necessary conditions for a local minimum)
—J(m⇤) = 0
Theorem(Second order necessary conditions for a local minimum)
s⇤—2J(m⇤)s � 0, 8s 2 Rn
Unconstrainted Nonlinear Optimization-Search directionFor the unconstraint nonlinear optimization problem:
minimize mJ(m)
Given a model point mk
, we want to find a search direction �mk, and areal number, such that J(m
k
+ak
�mk
)< J(mk
).How do we choose the search direction �mk?1) Gradient based method,
J(mk
+ak
�mk
)�J(mk
)⇡ ak
—J(mk
)T�mk
+O(k�mk
k22)
Thus,�m
k
=�—J(mk
)
is a search direction. We can also use similar technique in CG method,
�mk
=�c1—J(mk
)+ c2�mk�1
where c1,c2 2 R.
Unconstrainted Nonlinear Optimization-Search directionFor the unconstraint nonlinear optimization problem:
minimize mJ(m)
Given a model point mk
, we want to find a search direction �mk, and areal number, such that J(m
k
+ak
�mk
)< J(mk
).How do we choose the search direction �mk?1) Methods based on a local quadratic model,
J(mk
+ak
�mk
)�J(mk
)⇡ ak
—J(mk
)T�mk
+a2k
12�mT
k
—2J(mk
)�mk
We solve the approximated problem,
minimize y(pk
)⌘ —J(mk
)T pk
+12p
k
—2J(mk
)pk
pk
= ak
�mk
The approximated problem is a linear system and can be solved exactly.Then, update the model,
mk+1 = m
k
+pk
Unconstrainted Nonlinear Optimization-Line searchFor the unconstraint nonlinear optimization problem:
minimize mJ(m)
Given a model point mk
, we want to find a search direction �mk, and areal number, such that J(m
k
+ak
�mk
)< J(mk
).How do we choose a
k
for a given search direction �mk? Can we choosearbitrary a
k
such that J(mk
+ak
�mk
)< J(mk
)?The answer is no. For example, J(m) = m2, m 2 R1. We can find asequence, such that
m0 = 2,�mk
=�mk
ak
=2+3⇥2�(k+1)
1+2�k
Then,m
k
= (�1)k(1+2�k)
J(mk
) =1
(1+2�k)2 ! 1
Unconstrainted Nonlinear Optimization-Line search
For the unconstraint nonlinear optimization problem:
minimize mJ(m)
Given a model point mk
, we want to find a search direction �mk, and areal number, such that J(m
k
+ak
�mk
)< J(mk
).How do we choose a
k
for a given search direction �mk? A popular set ofconditions that guarentee convergence named Wolfe condition:
J(mk
+ak
�mk
) J(mk
)+ c1ak
—J(mk
)T�mk
—J(mk
+ak
�mk
)T�mk
� c2ak
—J(mk
)T�mk
where 0 < c1 < c2 < 1.
Reference
Numerical Linear Algebra, by Lloyd N. Trefethen, David Bau III.Numerical Optimization, by Jorge Nocedal, Stephen Wright.Lecture notes from Prof. Walter Murray,http://web.stanford.edu/class/cme304/