Lecture 12 - University of Pittsburghluca/ECON2001/lecture_12.pdf · Lecture 12 Outline 1 Critical...
Transcript of Lecture 12 - University of Pittsburghluca/ECON2001/lecture_12.pdf · Lecture 12 Outline 1 Critical...
Lecture 12
Econ 2001
2015 August 25
Lecture 12 Outline
1 Critical Points and Quadratic Forms2 Unconstrained Optimization
1 First Order Conditions2 Second Order Conditions
3 Inverse Function Theorem4 Easy Implicit Function Theorem
First, we use Taylor’s theorem to connect quadratic forms to optimizationtheory.
Then, we develop tools to study how changes in parameters affect thevariables of interest while in equilibrium (same idea as Berge’s theorem, butdifferent point of view).
Local and Global Extrema
We start with definitions.
DefinitionsLet f : Rn −→ R. We say thatx∗ is a local maximizer if and only if ∃ ε > 0 such that ,
f (x∗) ≥ f (x) ∀ x ∈ Bε(x∗)x∗ is a local minimizer if and only if ∃ ε > 0 such that
f (x∗) ≤ f (x) ∀ x ∈ Bε(x∗)x∗ is a global maximizer if and only if
f (x∗) ≥ f (x) ∀ x ∈ Xx∗ is a global minimizer if and only if
f (x∗) ≤ f (x) ∀ x ∈ X
Critical PointsDefinitionLet f : Rn → R be a continuously differentiable function. A point x ∈ Rn is acritical point of f if all the partial derivatives of f equal zero at x (Df (x) = 0).
REMARKUsing Taylor’s theorem, at a critical point x we have:
f (x+ h) = f (x) + Df (x)h+12h>(D2f (x))h+ o
(|h|2)as h→ 0
= f (x) +12h>(D2f (x))h+ o
(|h|2)as h→ 0
or f (x+ h)− f (x) = 12h>(D2f (x))h+ o
(|h|2)as h→ 0
the sign of the expression on the right hand side could help us determine if xis a local maximum or minimum.
DefinitionWe say f has a saddle at x if x is a critical point that is neither a local maximumnor a local minimum.
Characterizing Critical Points
TheoremSuppose X ⊂ Rn is open and x ∈ X. If f : X → R is C 2, there is an orthonormalbasis {v1, . . . , vn} and corresponding eigenvalues λ1, . . . , λn ∈ R such that
f (x+h) = f (x+γ1v1+ · · ·+γnvn) = f (x)+n∑i=1
(Df (x)vi ) γ i+12
n∑i=1
λiγ2i +o
(|γ|2)
where γ i = h · vi . (If f ∈ C 3, we may strengthen o(|γ|2)to O
(|γ|3).)
1 If f has a local maximum or local minimum at x, then Df (x) = 02 If Df (x) = 0, then
λ1, . . . , λn > 0 ⇒ f has a local minimum at xλ1, . . . , λn < 0 ⇒ f has a local maximum at xλi < 0 for some i , λj > 0 for some j ⇒ f has a saddle at xλ1, . . . , λn ≥ 0, λi > 0 for some i ⇒ f has a local minimum or a saddle at xλ1, . . . , λn ≤ 0, λi < 0 for some i ⇒ f has a local maximum or a saddle at xλ1 = · · · = λn = 0 gives no information.
Characterizing Critical PointsIf f : X → R is C 2, there is an orthonormal basis {v1, . . . , vn} and correspondingeigenvalues λ1, . . . , λn ∈ R such thatf (x+h) = f (x+γ1v1+ · · ·+γnvn) = f (x)+
n∑i=1
(Df (x)vi ) γ i+12
n∑i=1
λiγ2i +o
(|γ|2)
where γ i = h · vi .If f has a local maximum or local minimum at x, then Df (x) = 0.If Df (x) = 0, things depend on the eigenvalues of D2f (x):
all strictly positive imply a minimum, all strictly negative imply a maximum;alternating signs imply a saddle;all zeroes give no information.
Where does this come from?12
∑ni=1 λiγ
2i comes from the diagonalization of D2f (x).
The rest comes from previous results on quadratic forms: definiteness isdetermined by the eigenvalues.
If λi = 0 for some i , then the quadratic form arising from the second partialderivatives is identically zero in the direction vi , and the higher derivatives willdetermine the behavior of the function f in that direction vi .
For example, if f (x) = x 3, then f ′(0) = 0, f ′′(0) = 0; we know that f has asaddle at x = 0;however, if f (x) = x 4, then again f ′(0) = 0 and f ′′(0) = 0 but f has a local(and global) minimum at x = 0.
Unconstrained Extrema: First-Order ConditionsGiven a function f : Rn → R, we want to find local (and possibly global)maximizers or minimizers.
Start by looking at the conditions an extremum imposes on the firstderivatives of the function.
Theorem (First Order Conditions)If f is differentiable at x∗, and x∗ is a local maximizer or minimizer then
Df (x∗) = 0.That is
∂f∂xi(x∗) = 0, ∀ i = 1, 2, . . . , n.
If x∗ is a local maximizer or minimizer then it must be a critical point.The proof is (essentially) a one-variable theorem proof.
If x∗ is a local maximum, then the one variable function you obtain byrestricting x to move along a fixed line through x∗ (in the direction v ) alsomust have a local maximum. Hence all directional derivatives are zero.
x∗ is a local maximizer then Df (x∗) = 0. (do the proof for minimizer as exercise)
Proof.If x∗ is a local maximizer, then ∃ε > 0 such that f (x∗) ≥ f (x) for all x ∈ Bε(x∗).
For any v ∈ Rn and t ∈ R, define h : R→ R byh(t) ≡ f (x∗ + tv)
Fix a direction v (with ‖v‖ 6= 0).Notice that for t small (so that t < ε ‖v‖) we have:
f (x∗ + tv) = h(t) ≤ f (x∗)Therefore, h : R→ R is maximized locally by t∗ = 0.The correspoding (one variable) first order condition is:
h′(0) = 0
Using the chain rule, the derivative of h(·) is∇f (x∗) · v = 0
This equation above must hold for any v ∈ Rn , threfore we must have:∇f (x∗) = 0
f is differentiable, so the directional derivative is the matrix of partialderivatives; hence Df (x∗) = 0.
More On First-Order Conditions
In the proof, we saw that at local maxima derivatives decrease in the neighborhoodof a critical point.
The first-derivative test, however, cannot distinguish between local minimaand and local maxima.
Moreover, critical points may fail to be minima or maxima.
In the one variable case a function decreases if you reduce x (suggesting a localmaximum) and increases if you increase x (suggesting a local minimum).In the general case one can have this behavior in any direction.
The function restricted to one direction has a local maximum, but it has alocal minimum with respect to another direction.
Hence: It is “hard” for critical point of a multivariable function to be a localextremum in the many variable case.
Unconstrained Extrema: Second Order Conditions
These tell us what kind of critical point we have found.
Using Taylor’s Theorem at a critical point x∗, one getsf (x∗ + h)− f (x∗)
‖h‖2=
12h
tD2f (z)h
‖h‖2
Write h = tv, where ‖v‖ = 1, and this is equivalent tof (x∗ + tv)− f (x∗)
t2=
12 t2vtD2f (z)vt2
=12vtD2f (z)v
for z ∈ B‖h‖(x∗).Observe that
a local maximizer must havef (x∗ + tv)− f (x∗) ≤ 0,
a local minimizer must havef (x∗ + tv)− f (x∗) ≥ 0.
Unconstrained Extrema:Suffi cient and Necessary Conditions
Suffi cient ConditionsIf f is twice continuously differentiable and x∗ is a critical point of f , then:
1 x∗ is a local minimizer whenever the quadratic form vtD2f (x∗)v is positivedefinite.
2 x∗ is a local maximizer whenever the quadratic form vtD2f (x∗)v is negativedefinite.
Necessary ConditionsIf D2f (x∗) exists, then:
1 if x∗ is a local maximizer, then vtD2f (x∗)v is a negative semi-definitequadratic form.
2 if x∗ is a local minimizer, then vtD2f (x∗)v is a positive semi-definitequadratic form.
What about Indefinite? A critical point is neither a maximum nor a minimum.
Endogenous Variables and ParametersComparative StaticsIn economics, the variables of interest often form a solution to a parameterizedfamily of equations (the equilibrium conditions); typically, we are interested in howthese endogenously determined variables are affected by changes in exogenouslygiven parameters.
Let X ⊂ Rn and A ⊂ Rp be open, and let f : X × A→ Rm .Given an a ∈ A, suppose x ∈ X solves the family of equations f (x, a) = 0m
OBJECTIVECharacterize the set of solutions to this system (all the xs such that the systemabove is satisfied), and study how this set depends on the parameters a.
Let x(a) be the function implicitly defined by f (x(a), a) = 0We want to know how x(a) changes with a.Things are easier when one can write f as f (x)− a = 0In that case what we need is the inverse of f .Even if an inverse does not exist, all we need is knowing what happens to x(a)for small changes in a.
Inverse Function Theorem: Introduction
We want conditions for functions to be invertible close to a point that “solves”ourmodel so that we can figure out how a change in parameters will affect thesolution.
Examples1 One variable, linear case: f (x) = ax . ⇒ f is invertible if a 6= 0.2 Many variables, linear case: f (x) = Ax. ⇒ f is invertible if detA 6= 0.
In general, a function f : X → Y may not be invertible as f −1 : f (X )→ X isnot necessarily a function.
IdeaIf f is strictly monotone in the neighborhood of some point x0, it could beone-to-one at least locally and therefore locally invertible.
Inverse Function: Reals
Local invertibility is somewhat straightforward for reals to reals functions.(Pictures)
TheoremSuppose f : R −→ R is C 1 and such that f ′(x0) 6= 0, then ∃ε > 0 such that f isstrictly monotone on the open interval (x0 − ε, x0 + ε).
In the general case, differentiable functions with derivative not equal to zeroat a point are invertible locally.
If the derivative is always non zero and continuous, then the inverse can bedefined over the entire range.
Local Invertibility
DefinitionThe function f : Rn −→ Rn is locally invertible at x0 if there is a ε > 0 and afunction g : Bε(f (x0)) −→ Rn such that
f ◦ g(y) ≡ y for y ∈ Bε(f (x0))and
g ◦ f (x) ≡ x for x ∈ Bε(x0)
If f is locally invertible at x0, then we can define g close to x0 such thatg(f (x)) = x at least locally.
When is f : Rn −→ Rn locally invertible?Linear functions are good local approximations for differentiable functions.
A linear functions can be represented by a square matrix.Invertibility is then equivalent to inverting the corresponding matrix.A linear function is invertible (globally) if its matrix representation is invertible.
Local invertibility hinges on differentiability at a point.
Rank of the DifferentialSuppose X ⊂ Rn is open and f : X → Rm is differentiable at x ∈ X .Let W = {e1, . . . , en} denote the standard basis of Rn .
Remember from last week: given a linear transformation T
ImT ={y ∈ Y : y = T (x)for some x ∈ X
}RankT = dim(ImT ) kerT =
{x ∈ X :T (x) = 0
}dimX = dim kerT + RankT
theorem: dimX = dim kerT + RankTtheorem: a linear transformation T is invertible if and only if kerT = {0}.
Let dfx ∈ L(Rn ,Rm); thenRank dfx = dim Im (dfx )
= dim span {dfx (e1), . . . , dfx (en)}= dim span {Df (x)e1, . . . ,Df (x)en}= dim span {column 1 of Df (x), . . . , column n of Df (x)}= RankDf (x)
Thus,Rank dfx ≤ min{m, n}
Definitiondfx has full rank if Rank dfx = min{m, n}; that is, dfx has maximum possible rank.
Regular and Critical Points and Values
DefinitionSuppose X ⊂ Rn is open, and f : X → Rm is differentiable on X .
x is a critical point of f if Rank dfx < min{m, n}.y is a critical value of f if there exists x ∈ f −1(y ) such that x is a critical pointof f .
x is a regular point of f if it is not a critical point (Rank dfx = min{m, n}).y is a regular value of f if y is not a critical value of f .
Notice that dfx is onto (and therefore x is a regular point) if and only if Df (x)has full rank. In the case m = n, this means detDf (x) 6= 0.
REMARKIs this different from the standard definition of critical point?
NO: because Rank dfx < min{m, n} can only be true when ∇f (x) = 0.
Inverse Function Theorem
Theorem (Inverse Function Theorem)
Suppose X ⊂ Rn is open, f : X → Rn is C 1 on X , and x0 ∈ X. If detDf (x0) 6= 0,then there are open neighborhoods U of x0 and V of f (x0) such that
f : U → V is one-to-one and onto
f −1 : V → U is C 1
Df −1(f (x0)) = [Df (x0)]−1
If, in addition, f ∈ C k , then f −1 ∈ C k .
In words: If the linear approximation to a function is invertible, then thefunction is invertible locally and the derivative of the inverse equals theinverse of the derivative.
The condition detDf (x0) 6= 0 means x0 is a regular point of f .
REMARKf is one-to-one and onto only on U; it need not be one-to-one and onto globally.Thus f −1 is only a local inverse.
Inverse Function Theorem: Idea of Proof
If f : Rn→ Rn is differentiable at x0 and Df (x0) is invertible, then f is locallyinvertible at x0. Moreover, the inverse function, g is differentiable at f (x0) andDg(f (x0)) = (Df (x0))
−1.
A rough sketch of the proofThe idea is that since detDf (x0) 6= 0, then dfx0 : Rn → Rn is one-to-one andonto.
You then find a neighborhood U of x0 suffi ciently small such that theContraction Mapping Theorem implies that f is one-to-one and onto locally.
This is hard.
To see the formula for Df −1, let idU denote the identity function from U to Uand I denote the n × n identity matrix. Then
Df −1(f (x0))Df (x0) = D(f −1 ◦ f )(x0)= D(idU (x0))= I
⇒ Df −1(f (x0)) = [Df (x0)]−1
Inverse Function Theorem: Comments
Inverse Function TheoremIf f : Rn −→ Rn is differentiable at x0 and Df (x0) is invertible, then f is locallyinvertible at x0. Moreover, the inverse function, g is differentiable at f (x0) andDg(f (x0)) = (Df (x0))
−1.
In words: if the linear approximation of a function is invertible, then the function isinvertible locally.
Locally means in an open neighborhood of x0.In one dimension, g ′(y0) = 1
f ′(x0), so the formula implies the one-variable
formula.
Unlike the one variable case, the assumption that Df is globally invertibledoes not imply the existence of a global inverse.
REMARKThis gets us only part of the way. Why? We want to express the endogenousvariables as function of the exogenous ones while satisfying some equation ofinterest. The theorem does not guarantee this last part.
Implicit Function Theorem: Prelude
Given f : Rm+n → Rn . Let x ∈ Rn and a ∈ Rm . Supposef (x, a) = 0
is a system of n equations in n +m variables that describes something of interest.
Can we say how some of the variables are related to each other close to apoint that solves the system?
For example, what happens to x as a is affected by something outside thesystem?
The ultimate goal is to find a solution to the equation that gives x as afunction of a, at least close to some point that solves the system.The problem of finding an inverse is a special case where n = m andf (x, a) = f (x)− a.
Implicit Function Theorem: Motivation
A bunch of equations characterize the solution to an economic model(equilibrium, market clearing, prices; first-order conditions, and so on)
We have a solution for some value of the parameters.
What happens to this solution when the parameters change?
Implicit function theoremUnder some assumptions, if you can solve the system at a given point, then youcan solve the system in a neighborhood of that point. Furthermore, we haveexpressions for the derivatives of the ‘solution function’.
One can describe how the solution changes when the parameters change.
We will then apply this result to the solution of a maximization problem thatis characterized by the system of equations given by the first order conditions(envelope theorem).
Why the name Implicit Function Theorem?
If you could write down the system of equations and solve them directly to getan explicit representation of the solution function, great.
. . . you can then get an explicit solution and a formula for its derivatives.
But life is usually not that easy.
IFT to the rescue: it describes “sensitivity”properties even when this explicitformula is not available.
Simple Motivating Example
Let f : R2 → RLet f (x , z) = 0 be an identity relating x and z , and let x0, z0 a point satisfying it.
How does z depend on x?
Since f (x , z) = 0 around x0, z0, define g : (x0 − ε, x0 + ε)→ (z0 − ε, z0 + ε) byf (x , g(x)) = 0, ∀ x ∈ (x0 − ε, x0 + ε)
and define h : (x0 − ε, x0 + ε)→ R byh(x) = f (x , g(x))
By construction, h(x) = 0 for all xs close to x0, thus h′(x) = 0 close to x0.
Using the Chain Rule:
h′(x) = D1f (x , g(x)) + D2f (x , g(x))Dg(x)
where Di denotes the derivative with respect to the ith argument.
The left hand side is 0 (why?); hence, provided D2f (x , g(x)) 6= 0, we have aformula for Dg(x)
Dg(x) = −D1f (x , g(x))D2f (x , g(x))
Calculation of Simple IFT Formula
Let f : R2 → RLet f (x , z) = 0 be an identity relating x and z , and x0, z0 a point satisfying it.How does z depends on x?
Suppose f and g are differentiable, we calculate
y =(xz
), G (x) =
(xg(x)
)and define h(x) = [f ◦ G ](x)
Thenh′(x) = Df (x0, z0)DG (x0)
=
(∂f∂x(x0, z0),
∂f∂z(x0, z0)
)·(
1g ′(x0)
)=∂f∂x(x0, z0) +
∂f∂z(x0, z0)g ′(x0)
= 0
which gives usg ′(x0) = −
∂f∂x (x0, z0)∂f∂z (x0, z0)
Implicit Function Theorem: Simple Case
One ‘parameter’and many ‘endogenous’variables together satisfy a system ofequations.
f : Rn × R −→ Rn
f (x, a) =
f 1(x, a)f 2(x, a)...
f n(x, a)
=
00...0
= 0
where a ∈ R, and x ∈ Rn .Let x0, a0 denote a solution to this system.
We defineg : R −→ Rn
asx = g(a)
and see what we can learn about g close to x0, a0.Formal statement next.
Simple Implicit Function Theorem
Theorem (Simple IFT)
Suppose f : Rn ×R→ Rn is C 1 and write f (x, a) where a ∈ R and x ∈ Rn . Assume
f (x0, a0) = 0 and det(Dxf (x0, a0)) =
∣∣∣∣∣∣∣∣∂f 1
∂x1. . . ∂f 1
∂xn...
...∂f n
∂x1. . . ∂f n
∂xn
∣∣∣∣∣∣∣∣ 6= 0.Then, there exists a neighborhood of (x0, a0) and a function g : R→ Rn definedon the neighborhood of a0, such that x = g(a) uniquely solves f (x, a) = 0 on thisneighborhood. Furthermore the derivatives of g are given by
Dg(a0)n×1
= −[Dxf (x0, a0)]−1n×n
Daf (x0, a0)n×1
Di deotes the derivative with respect to i .
Hard to prove (contraction mappings again). The hard part is the existence ofthe unique function g that gives x in terms of a.
Computing the derivatives of g is a simple application of the chain rule.
Simple IFT: Easy Part of the Proof
Proof.Let f (g(a)
∈Rn, a∈R) = 0
∈Rn, and define H : Rn × R→ Rn as H(a)
∈Rn≡ f (g(a), a).
Note that, by definition, H(a) = 0 for any a in a neighborhood of a0.Use the Chain rule to obtain
DaH(a0)n×1
= Daf (x0, a0)n×1
+ Dxf (x0, a0)n×n
Dag(a0)n×1
= 0n×1
ThereforeDxf (x0, a0)Dag(a0) = −Daf (x0, a0)
Premultiply both sides by the inverse (this exists since Dx f (x0, a0) isnon-singular):
[Dxf (x0, a0)]−1[Dx f (x0, a0)]︸ ︷︷ ︸=Im
Dag(a0) = −[Dxf (x0, a0)]−1Daf (x0, a0)
Hence
Dag(a0) = −[Dxf (x0, a0)]−1Daf (x0, a0)
Simple IFT: Comments
Theorem (Simple IFT)
Suppose f : Rn × R→ Rn is C 1 where a ∈ R and x ∈ Rm . Assumef (x0, a0) = 0 and det(Daf (x0, a0)) 6= 0.
Then, there exists a neighborhood of (x0, a0) and a function g : R −→ Rn definedon the neighborhood of a0, such that x = g(a) uniquely solves f (x, a) = 0 in thisneighborhood. Furthermore the frist derivative of g is given by
Dg(a0) = −[Dxf (x0, a0)]−1Daf (x0, a0)
The implicit function theorem guarantees that you can locally solve a systemof equations in terms of parameters...
...if a solution to the original system of equation exists.
This is a local version of a result about linear systems.
So far, only one parameter; in general, we have p parameters.
Implicit Function Theorem: An Example
A monopolist produces q units of output at a cost of C (q) = q + q2
2 dollars. She
can sell q units for the price of P(q) = 4− q5
6 dollars per unit. The monopolistmust pay a tax of one dollar per unit sold.
1 Show that q∗ = 1 maximizes profit (revenue minus tax payments minusproduction cost) when the tax rate is t = 1.
2 Show how q∗ changes when the tax rate changes by a small amount?
The monopolist picks q to maximize Profit(q); she solves:
maxqq(4− q
5
6)− tq − q − q
2
2This is a function of one variable, and the first-order condition is
q5 + q − 3+ t = 0since the second derivative of profit is −5q4 − 1 < 0 we know that there is atmost one solution to this equation and it is a (global) maximum.
One can easily verify that q = 1 satisfies the first-order condition when t = 1.
Implicit Function Theorem: An ExampleHow does the solution q(t) to:
q5 + q − 3+ t = 0 (A)vary as a function of t when t is close to one?
We know that q(1) = 1 satisfies the equation.
LHS of (A) is (strictly) increasing in q, so IFT holds and the change in q nearthe optimum is
q′(t) = − 15q4 + 1
.
In particular, q′(1) = − 16 . Where does this come from?Find the equation for q′(t) using the IFT formula; that is, differentiate theidentity:
[q(t)]5 + q(t)− 3+ t ≡ 0with respect to one exogenous variable (t)You get
5 [q(t)]4 q′(t) + q′(t) + 1 = 0,this is linear in q′, and solving for q′(t)
q′(t) = − 1
5 [q(t)]4 + 1gives you the answer.
Tomorrow
Generalize the implicit function theorem to more parameters, and look at theparticular version of it which applies to optimization problems: what happens tooptimizer and optimized value when some parameters of the optimization problemchange.
1 Implicit Function Theorem (General)2 Envelope Theorem