Introduction to optimization · Introduction to optimization Introduction Mathematical optimization...

Introductionto

optimization

Introduction

Mathematicaloptimization

Appendix

Introduction to optimization

Olga [email protected], TG412

ELT-53656 Network Analysis and Dimensioning IITampere University of Technology, Tampere, Finland

February 10, 2016

Introductionto

optimization

Introduction

Appendix

Outline

1 Introduction

2 Mathematical optimization

3 Appendix

Introductionto

optimization

Introduction

Appendix

Content to be discussed during the next 5 weeks...

This is not Optimization Theory course...and we do not pretend it is.

Our goal: is to (re)visit basics of Methods of Optimizationto broaden horizons of good engineers:

Basic of mathematical optimization (today)

Linear programming

Convex programming

Mixed-integer and integer programming

(came from Graph Theory) Shortest path algorithms

Introductionto

optimization

Introduction

Appendix

History...

Leonhard Euler (1707-1783):nothing at all takes place in the Universe in which some

rule of maximum or minimum does not appear

Introductionto

optimization

Introduction

Appendix

Where Optimization Problem Arises?

Methods of Optimization deals with finding the optimumsolution for a mathematical model, in most cases: findingmaxima and minima of functions subject to some constraints.

Manufacturing and transportation systems

Scheduling of goods for manufacturing

Transportation of goods over transportation networks

Scheduling of fleets of airplanes

Communication Systems

Design of communication systems

Flow of information across networks

financial systems, and much more...

Introductionto

optimization

Introduction

Appendix

Examples

Wireless network optimization

variables: power for wireless, investments in infrastructure

constraints: budget, max/min Tx power, distance, rate...

objective: system/user throughput, energy efficiency, etc

Data fitting

variables: model parameters

constraints: prior information, parameter limits

objective: measure of prediction error

Introductionto

optimization

Introduction

Appendix

Outline

1 Introduction

3 Appendix

Introductionto

optimization

Introduction

Appendix

Mathematical Model

Consider an (mathematical) optimization problemminimize f (x), x ∈ Rn

subject to x ∈ Ω ⊂ Rn.

Definition

The function f (x) : Rn → R is a real-valued function, calledthe objective function, or cost function.

Definition

The variables x = [x1, ..., xn] are optimization variables.

Definition

Optimal point x0: f (x0) ≤ f (x),∀x ∈ Ω.

Introductionto

optimization

Introduction

Appendix

Mathematical Model

Definition

Introductionto

optimization

Introduction

Appendix

Mathematical Model

Definition

Introductionto

optimization

Introduction

Appendix

Mathematical Model

Definition

Introductionto

optimization

Introduction

Appendix

Mathematical Model

Definition

Introductionto

optimization

Introduction

Appendix

Mathematical Model

Definition

Introductionto

optimization

Introduction

Appendix

Constraint Set

Definition

The set Ω ⊂ Rn is the constraint set or feasible set/region.

Definition

The above problem is a general form of a constrainedoptimization problem. If Ω = Rn, the problem is we refer tothe problem unconstrained.

Ω takes form:x : hi (x) = 0, gj(x) < 0,

where hi (x), gi (x) are constraint functions

Definition

If Ω = , the problem is infeasible, otherwise feasible.

Introductionto

optimization

Introduction

Appendix

Constraint Set

Definition

Introductionto

optimization

Introduction

Appendix

Constraint Set

Definition

Introductionto

optimization

Introduction

Appendix

Constraint Set

Definition

Introductionto

optimization

Introduction

Appendix

Constraint Set

Definition

Introductionto

optimization

Introduction

Appendix

Constraint Set

Definition

Introductionto

optimization

Introduction

Appendix

Classification

Problems:

Linear/nonlinear (defined by shape of f (x), hi (x), gj(x))

Convex/non-convex (defined by properties of f (x) and Ω)

Discrete/continuous/integer/mixed-integer/..

One-dimensional/multi-dimensional

One extremum/multiple extrema

Methods:

Direct methods (zero order) use information only on f (x)

allows analyzing functions, defined algorithmically

First-order methods use information on f (x),∇f (x)

e.g., gradient methods

Second-order methods use f (x),∇f (x),∇2f (x)

e.g., Newton methods and modifications

Introductionto

optimization

Introduction

Appendix

Solving Optimization Problems

General optimization problem

very difficult to solve

methods involve some compromise, e.g., very longcomputation time, or not always finding the solution

Exceptions: certain problem classes can be solved efficientlyand reliably

least-squares problems

linear programming problems

convex optimization problems

Introductionto

optimization

Introduction

Appendix

Least-squares problem

Example: linear regression bi =∑

j xjaij + ε

minimize ||Ax − b||22 =∑

i (∑

j aijxj − bi )2

Solving least-squares problems

analytical solution: x0 = (ATA)−1ATb

reliable and efficient algorithms and software

computation time proportional to n2k (A ∈ Rk×n); less ifstructured

a mature technology

Using least-squares

least-squares problems are easy to recognize

a few standard techniques increase flexibility (e.g.,including weights, adding regularization terms)

Introductionto

optimization

Introduction

Appendix

j xjaij + ε

i (∑

j aijxj − bi )2

a mature technology

Using least-squares

Introductionto

optimization

Introduction

Appendix

j xjaij + ε

i (∑

j aijxj − bi )2

a mature technology

Using least-squares

Introductionto

optimization

Introduction

Appendix

j xjaij + ε

i (∑

j aijxj − bi )2

a mature technology

Using least-squares

Introductionto

optimization

Introduction

Appendix

Linear programming

minimize cT x =∑

j cjxjsubject to aTi x ≤ bi , i = 1, ...,m

Solving linear programs

no analytical formula for solution

computation time proportional to n2m if m ≥ n; less withstructure

a mature technology

Using linear programming

easy to recognize, but sometimes not as easy as LSproblems

a few standard tricks used to convert problems into linearprograms (e.g., problems involving l1- or l∞-norms,piecewise-linear functions)

Introductionto

optimization

Introduction

Appendix

Linear programming

minimize cT x =∑

a mature technology

Introductionto

optimization

Introduction

Appendix

Linear programming

minimize cT x =∑

a mature technology

Introductionto

optimization

Introduction

Appendix

Convex optimization problem

minimize f (x)subject to gi (x) ≤ bi , i = 1, ...,m

objective and constraint functions are convex:

gi (α1x + α2y) ≤ α1gi (x) + α2gi (y),

if α1 + α2 = 1, α1 ≥ 0, α1 ≥ 0.

includes least-squares problems and linear programs asspecial cases

Introductionto

optimization

Introduction

Appendix

Convex optimization problem

minimize f (x)subject to gi (x) ≤ bi , i = 1, ...,m

objective and constraint functions are convex:

gi (α1x + α2y) ≤ α1gi (x) + α2gi (y),

if α1 + α2 = 1, α1 ≥ 0, α1 ≥ 0.

includes least-squares problems and linear programs asspecial cases

Introductionto

optimization

Introduction

Appendix

Convex optimization problems

Solving convex optimization problems

no analytical solution

reliable and efficient algorithms

computation time (roughly) proportional tomaxn3, n2m,F, where F is cost of evaluating f (x) andits first and second derivatives

almost a technology

Using convex optimization

often difficult to recognize (requires proof of convexity)

many tricks for transforming problems into convex form

surprisingly many problems can be solved via convexoptimization

Introductionto

optimization

Introduction

Appendix

Convex optimization problems

Solving convex optimization problems

no analytical solution

reliable and efficient algorithms

computation time (roughly) proportional tomaxn3, n2m,F, where F is cost of evaluating f (x) andits first and second derivatives

almost a technology

Using convex optimization

often difficult to recognize (requires proof of convexity)

many tricks for transforming problems into convex form

surprisingly many problems can be solved via convexoptimization

Introductionto

optimization

Introduction

Appendix

Nonlinear optimization

Traditional techniques for general nonconvex problems involvecompromises

Local optimization methods (nonlinear programming)

find a point that minimizes f among feasible points near it

fast, can handle large problems

require initial guess

provide no information about distance to (global) optimum

Global optimization methods

find the (global) solution

worst-case complexity grows exponentially with problemsize

These algorithms are often based on solving convexsubproblems

Introductionto

optimization

Introduction

Appendix

Traditional techniques for general nonconvex problems involvecompromisesLocal optimization methods (nonlinear programming)

Introductionto

optimization

Introduction

Appendix

Traditional techniques for general nonconvex problems involvecompromisesLocal optimization methods (nonlinear programming)

Introductionto

optimization

Introduction

Appendix

Local and global extrema

The following theorem helps define whether local solution is aglobal one:

Theorem (Weierstrass)

Continuous real-valued function f (x) defined in non-emptycompact set S attains a maximum and a minimumxmin /max ∈ S .

Reminder: compact set S ∈ Rn = the closed and bounded set

Introductionto

optimization

Introduction

Appendix

Outline

1 Introduction

3 Appendix

Introductionto

optimization

Introduction

Appendix

Affine and Convex Sets

Definition

S ⊂ Rn is affine if [x , y ∈ S , α ∈ R]⇒ αx + (1− α)y ∈ S

Definition

S ⊂ Rn is convex if for all [x , y ∈ S , 0 < α < 1] ⇒z = αx + (1− α)y ∈ S (z a convex combination of x and y).

If x1, ..., xm ∈ Rn,∑

j αj = 1, αj > 0, thenx = α1x1 + ...+ α1x1 is a convex combination of x1, ..., xm.The intersection of (any number of) convex sets is convex

Introductionto

optimization

Introduction

Appendix

Definition

If x1, ..., xm ∈ Rn,∑

Introductionto

optimization

Introduction

Appendix

Definition

If x1, ..., xm ∈ Rn,∑

Introductionto

optimization

Introduction

Appendix

Definition

If x1, ..., xm ∈ Rn,∑

Introductionto

optimization

Introduction

Appendix

Definition

If x1, ..., xm ∈ Rn,∑

Introductionto

optimization

Introduction

Appendix

Compact Sets

Let Bδ(x0) denote the open ball of radius δ centered at thepoint x : Bδ(x0) = x : ||x − x0|| < δ.

Definition

Set S ⊂ Rn is said to be open if for each point x0 ∈ S there isδ such that Bδ(x0). A set S ⊂ Rn is said to be closed if itscomplement Rn \ S is open.

Definition

Set S is compact if each of its open covers has a finitesubcover: ∀Cii∈A, S ⊂ Cii∈A ∃ finite J : S ⊂ Cjj∈J .

Alternative: every sequence in S has a convergentsubsequence, whose limit lies in S .Note: If S ⊂ Rn, closed and bounded, then S - compact(Heine-Borel theorem).

Introductionto

optimization

Introduction

Appendix

Compact Sets

Definition

Introductionto

optimization

Introduction

Appendix

Compact Sets

Definition

Introductionto

optimization

Introduction

Appendix

Compact Sets

Definition

Introductionto

optimization

Introduction

Appendix

Compact Sets

Definition

Introductionto

optimization

Introduction

Appendix

Compact Sets

Definition

Introductionto

optimization

Introduction

Appendix

Compact Sets

Definition

Introductionto

optimization

Introduction

Appendix

Convex functions

Definition

Let C ⊂ Rn be a nonempty convex set. Then f : C → R isconvex (on C) if for all x , y ∈ C and all α ∈ (0, 1):f (αx + (1− α)y) ≤ αf (x) + (1− α)f (y)

If strict inequality holds whenever x 6= y , then f is said to bestrictly convex. The negative of a (strictly) convex function iscalled a (strictly) concave function.

Introductionto

optimization

Introduction

Appendix

Convex functions

Definition

Introductionto

optimization

Introduction

Appendix

Convex functions

Definition

Introductionto

optimization

Introduction

Appendix

Convex functions

Definition

Introductionto

optimization

Introduction

Appendix

Convex functions

Definition

Introductionto

optimization

Introduction

Appendix

Convex functions

nonnegative multiple: αf is convex if f is convex, α ≥ 0

sum: f1 + f2 convex if f1, f2 convex (extends to infinitesums, integrals)

composition with affine function: f (Ax + b) is convex if fis convex

Some univariate convex functions:1. exponential f (x) = eαx (for all real α)2. powers f (x) = xp if x ≥ 0 and 1 ≤ p <∞3. powers of abs value f (x) = |x |p, if x > 0 and −∞ < p ≤ 0Concave:1. powers f (x) = xp if x ≥ 0 and 0 ≤ p ≤ 12. logarithm: f (x) = log x if x > 0.

Introductionto

optimization

Introduction

Appendix

Convex functions

nonnegative multiple: αf is convex if f is convex, α ≥ 0

sum: f1 + f2 convex if f1, f2 convex (extends to infinitesums, integrals)

composition with affine function: f (Ax + b) is convex if fis convex

Some univariate convex functions:1. exponential f (x) = eαx (for all real α)2. powers f (x) = xp if x ≥ 0 and 1 ≤ p <∞3. powers of abs value f (x) = |x |p, if x > 0 and −∞ < p ≤ 0Concave:1. powers f (x) = xp if x ≥ 0 and 0 ≤ p ≤ 12. logarithm: f (x) = log x if x > 0.

Introductionto

optimization

Introduction

Appendix

Differentials

f is differentiable if domain of (f ) is open and the gradient off :

∇f (x) ≡(∂f

∂x1, ...,

,∃ ∀x ∈ dom (f )

f is twice differentiable if domain of f is open and theHessian of f :

H ≡ D2f (x) =

∂2f∂x2

1... ∂2f

∂x1∂xn

...∂2f

∂xn∂x1... ∂2f

∂x2n

,∃∀x ∈ dom (f )

Note:Not all convex functions are differentiable.

Introductionto

optimization

Introduction

Appendix

Differentials

∇f (x) ≡(∂f

∂x1, ...,

,∃ ∀x ∈ dom (f )

H ≡ D2f (x) =

∂2f∂x2

1... ∂2f

∂x1∂xn

...∂2f

∂xn∂x1... ∂2f

∂x2n

,∃∀x ∈ dom (f )

Not all convex functions are differentiable.

Introductionto

optimization

Introduction

Appendix

Differentials

∇f (x) ≡(∂f

∂x1, ...,

,∃ ∀x ∈ dom (f )

H ≡ D2f (x) =

∂2f∂x2

1... ∂2f

∂x1∂xn

...∂2f

∂xn∂x1... ∂2f

∂x2n

,∃∀x ∈ dom (f )

Note:Not all convex functions are differentiable.

Introductionto

optimization

Introduction

Appendix

First-order condition

Theorem (gradient inequality)

Differentiable f is convex on convex C ⊂ Rn i.f.f. ∀x , y ∈ Cf (y) ≥ f (x) + (∇f (x))T (y − x).

Theorem

Minimizing differentiable convex function f (x) s.t. x ∈ C⇔

Find x∗ ∈ C such that (∇f (x∗))T (y − x∗) ≥ 0 for all y ∈ C(variational inequality problem)

Introductionto

optimization

Introduction

Appendix

Theorem

Introductionto

optimization

Introduction

Appendix

Theorem

Introductionto

optimization

Introduction

Appendix

Theorem

Introductionto

optimization

Introduction

Appendix

Second-order condition

Theorem

Twice differentiable f is convex on C ⊂ Rn i.f.f Hessian matrix∇2f is positive semidefinite for all x ∈ C .

Note: If ∇2f (x) is positive definite for all x ∈ C , then f isstrictly convex on C . The converse is false.Example: consider the function f (x) = x4

Introductionto

optimization

Introduction

Appendix

Theorem

Introductionto

optimization

Introduction

Appendix

Theorem

Introductionto

optimization

Introduction

Appendix

Theorem

Introduction to optimization · Introduction to optimization Introduction Mathematical optimization...

Documents

Transcript of Introduction to optimization · Introduction to optimization Introduction Mathematical optimization...