Economics of Optimal Redistributive Taxation - CREST
Transcript of Economics of Optimal Redistributive Taxation - CREST
Economics of Optimal Redistributive TaxationPreliminary Handout / Notes de cours preliminaires1
Master Economie Théorique et Empirique (ETE),
Ecole d’Economie de Paris et Paris 1
Etienne LEHMANNCREST, Laboratoire de Macroéconomie
[email protected]://www.crest.fr/pageperso/lehmann/lehmann.htm
10th February 2009
1This is preliminary text. Thank you to email me your comments and remarks.
Contents
1 The Mirrlees model 3I The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
I.1 Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3I.2 The government’s preferences . . . . . . . . . . . . . . . . . . . . . . . 6
II The First-Best Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7III The Taxation Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9IV The Stiglitz (1982) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12V Continuum of skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
V.1 The incentive constraints . . . . . . . . . . . . . . . . . . . . . . . . . 17V.2 The resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19V.3 The reinterpretation of the optimality conditions . . . . . . . . . . . . 22V.4 Properties of the second-best optimum . . . . . . . . . . . . . . . . . . 30
VI Empirical implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2
Chapter 1
The Mirrlees model
I The Problem
In this Section, we describe the redistribution problem and solve it in the perfect information
case
I.1 Individuals
Individuals have preferences over consumption C and labor supply L. Here, labor supply is a
shortcut for many aspects of individuals’ behavior in the labor market such as in-work effort,
investment in education our hours of work. Preferences are represented by a utility function
U (C,L). We assume U (., .) is twice-differentiable with U 0C > 0 > U 0L. We furthermore assumethat U (., .) is strictly concave.
People differ by their exogenous productivity endowment w. We also refer to w as the
skill level. With labor supply L a worker of productivity w gets gross earnings Y = w · L.The cumulative distribution of w is F (.) over a support denoted Ω within R+.1
One key assumption is that individuals are free to choose optimally their labor supply.
This assumption means first that we neglect the existence of many fixed costs that make the
labor supply choice discrete rather than continuous. It also ignores potential externalities.
For instance one may think that one individual will value more her leisure, the higher the
leisure of her friends, etc. Finally, there are in reality many choices that are constrained by
the demand side of the labor market. Involuntary unemployment is the most apparent one,
but it is not the only. However, the assumption of a perfectly competitive and frictionless
labor market is a good starting point to study the theory of the optimal direct taxation.
In a first-best setting, Taxation can be conditioned on earnings and skill Tw (w), whereas
in a second-best setting, taxation can be conditioned on earnings only. To keep generality,
we provisionally keep the notation Tw (w) for both settings, while in keeping in mind that
1 If the support is unbonded, then se have to assume thatRΩw dF (w) is finite.
3
in a second-best setting for any w 6= w0, Tw (Y ) ≡ Tw0 (Y ). An individual of skill w thus
faces faces the budget constraint C = Y −Tw (Y ). She therefore chooses her labor supply bysolving:
maxL
U (w · L− Tw (w · L) , L) ⇔ maxY
UµY − Tw (Y ) ,
Y
w
¶(1.1)
Let Uw be the value of this program. If Lw is a solution, so that Yw = w · L and Cw =Yw − Tw (Yw), one has Uw = U (Cw, Lw). It is perfectly equivalent to assume that individ-uals chooses their work effort L, their consumption level C or their gross earnings Y . We
henceforth privilege this latter interpretation. If Tw (.) is differentiable in earnings Y , the
first-order condition associated to Program (1.1) implies:
1− T 0w (Yw) = −1
w
U 0LU 0C
µCw,
Yww
¶(1.2)
The left-hand side of (1.2) equals the marginal rate of substitution between earnings and
consumption (MRS for short). For an individual of skill w, the MRS at a bundle (Y,C)
equals
M (C, Y,w)def≡ − U 0L (C, Y/w)
w · U 0C (C, Y/w)(1.3)
It corresponds to the marginal cost of earnings in terms of consumption. A worker of skill
w has to receiveM (C,Y,w) additional units of consumption to be compensated for working
such harder that her earnings increases by one unit. When the tax schedule is differentiable,
the MRS equals one minus the marginal rate of taxation that faces individual i at (gross)
earnings Yi. Hence, we will frequently interpret the MRS in terms of marginal tax rate.
We now adopt a geometric representation of individuals’ preferences. Gross earnings Y
will be displayed on the X-axis, while consumption will be displayed on the Y-axis. To depict
workers’ indifference curve, let us define function Γ (., .) by
U = U (C,L) def⇔ C = Γ (U,L)
From the implicit function theorem, one has
Γ0U (U,L) =1
U 0C (Γ (U,L) , L)and Γ0L (U,L) = −
U 0LU 0C
(Γ (U,L) , L) (1.4)
In particular, the slope of the indifference curve of a worker of skill w that passes through a
bundle (C,L) equalsM (C,wL,w). The indifference curve of a worker of skill w, associated
to the utility level U verifies C = Γ (U, Y/w). Since for a given skill level utility U (C,Y/w) isincreasing in consumption and decreasing in earnings, indifference curve are upward-slopping.
Higher utility levels corresponds to lower earnings or higher consumption. Hence indifference
curve corresponds to a higher utility level as we move to the North-West in the (Y,C) plane.
4
Finally, from the concavity of U (., .), along a given indifference curve, M (Γ (U, Y ) , Y, w)
increases with earnings level Y .2 Therefore, indifference curves are convex (See Figure 1.1).
Y
C
CH
YH
UH= U(C,Y/wH)
CL
YL
UL= U(C,Y/wL)
Figure 1.1: Indifference curves
An important question is how the MRS varies with the skill level. We expect that
getting one more euro of earnings requires a smaller additional effort for more skilled worker.
Therefore, more skilled worker should be less compensated and one should have:
For all C, Y , w M0w (C, Y,w) < 0 (1.5)
From a geometric viewpoint, this assumption implies that among two indifference curves
associated to two skill levels wL < wH that intersects at a given bundle (C, Y ), the indifference
curve of the least skilled worker is always stepper than the one of the higher skilled worker (see
Figure 1.1). This the reason why this assumption is often refereed to the strict single-crossing
condition. It is also referred as the Spence-Mirrlees condition.
What specification of individuals’ utility function U (., .) are consistent with the Spence-Mirrlees condition (1.5)? A first example is the case where U (., .) is additively separable, soU (C,L) = w (C)− v (L), with w0 (c) > 0 ≥ w00 (c) , v0 (L) > 0 and v00 (L) > 0. Equation (1.3)then implies:
M (C, Y,w) =v0 (Y/w)
w ·w0 (C)The convexity of v (.) then ensures that (1.5) is satisfied. One can also verify that whenever
consumption is a normal good, (1.5) is also verified. Hence, while (1.5) is clearly a restriction
on preferences, it does not seem to be a strong assumption to make. Therefore we henceforth
adopt this assumption.2Using (1.4), the derivative of M (Γ (U, Y/W ) , Y, w) with respect to Y equals¡
−1/w2¢(U 0C)−3
n(U 0L)2 U 00CC − 2U 0CU 0L + (U 0C)2 U 00LL
o. The term in the bracket is negative by concavity
of U (., .), so the derivative is positive.
5
I.2 The government’s preferences
The problem of redistribution arises because of two ingredients. First, individuals are dif-
ferent. Second, in one way or the other, the “government” considers the induced inequality
as “unfair”. There is nevertheless several problems that we have to deal to give a precise
meaning of this.
First, at best, we can rationalize in an axiomatic way some ordinal preferences. For
instance, under the axiomatic of Von-Neuman and Morgenstern, one can represent preferences
thanks to one utility function among a class that is invariant to a linear transformation. We
have however no non-arbitrary way to choose one utility function among all of these ones.
Given this problem, the issue of comparing the utility levels reached by different individuals
is even worst. However, the problem of redistribution is simply meaningless if we don’t
admit the existence of one cardinal representation of individuals’ utility that is furthermore
comparable across individuals of different types. Although this assumption is a very strong
one, we can do nothing else than working with it.
Second, even if individuals’ utility levels are comparable, there is different ways to quantify
these comparison. Let us assume that, due to the some desire of horizontal equity, different
individuals of the same skill level are treated identically. Then individuals’ utility can be
aggregated in different ways.
1. One may think a good representation is simply to sum individuals’ level of utility.ZΩU (Cw, Lw) dF (w) (1.6)
I will call this approach the Benthamite one. Its drawback is that it ignores the in-
equality in utility levels.
2. At the other extreme, one may want to consider only the wellbeing of the worst of in
the society. This is the maximin criterion
minwU (Cw, Lw) (1.7)
that many economists associates to John Rawls’ (1971) theory of justice.
3. In general, I will consider the sum of some increasing and concave transformation Φ (.)
of the individuals utility levelsZΩΦ (U (Cw, Lw)) dF (w) (1.8)
with Φ0 (.) > 0 and Φ00 (.) < 0. The concavity of Φ (.) then captures the government’s
aversion towards’ inequality.
6
What this discussion suggests is that choosing a specific social welfare function it is always
an arbitrary exercise. In a sense, this is reassuring since it keeps a room for value judgements
and political controversy in the social debate. Hence, we have to be clear that the notion of
any “social optimum” is always based on these value judgement that are behind the choice
of a social welfare function.
Harsany (1955) has nevertheless proposed a rather nice story, whose application to the
problem of direct taxation is the following. Let us assume that individuals are born identical.
Then, they draw at random a skill level w (according the distribution F (.)). Therefore, apart
from their heterogenous skill levels they are identical. One can consider that their utility
functions to be identical, thereby comparable. Moreover, one can retain as a social welfare
function the common expected utility of individuals before they draw their productivity level.
The Benthamite criterion therefore coincide with the maximization of agents’ expected of
utility “behind the veil of ignorance” concerning their skill level w.
Finally, what only matters at the end of the day is not functions Φ (.) and U (., .) by itself,but the composition Φ (.) U (., .). Therefore, many scholars assume a Benthamite criterionand consider that the government’s aversion towards inequality are already included in the
concavity of the U (., .). This argument is however only correct provided that the objectiveis not Maximin. Moreover, this way of doing hides the specific role of the government’s
tastes for redistribution. The latter is specifically linked to the concavity of Φ (.), and not
the one of U (., .). Finally, even if we ignore involuntary unemployment in this chapter, itsexistence breaks this equivalence. For instance, if U (., .) is linear in C and Φ (.) is concave,
this would means that individuals are risk-neutral about the unemployment risk, whereas the
government values negatively the inequality induced by unemployment.
The government faces a budget constraint. Let E ≥ 0 be an exogenous amount of publicexpenditures to finance. The government’s budget constraint writes:Z
Ωw Lw − Cw dF (w) ≥ E (1.9)
since an individual of skill w pay taxes Yw − Cw. We can reexpress this constraint as aresource constraint, in the vein of the Walras law.Z
ΩCw dF (w) +E ≤
ZΩYw dF (w) (1.10)
II The First-Best Problem
Let us first consider the case where the government can freely control individuals’ behavior.
The first-best problem of redistribution then consists in choosing bundles w 7→ (Yw, Cw) to
7
maximize a social welfare function such as (1.8), subject to the resource constraint (1.10):
maxYw,Cww∈Ω
ZΩΦ
µU
µCw,
Yww
¶¶dF (w) (1.11)
s.t :
ZΩYw − Cw dF (w) ≥ E
Let λ be the Lagrangian multiplier of the budget constraint. λ stands for the social
marginal value of public funds. Then, denoting©¡C1w, Y
1w
¢ªw∈Ω the solution to this program,
the first-order condition for the first-best problem (1.11) writes:3
λ
Φ0 (U1w)= U 0C
µC1w,
Y 1ww
¶= −
U 0L³C1w,
Y 1ww
´w
(1.12)
where obviously U1w = U³C1w,
Y 1ww
´and L1w = Y
1w/w.
We now investigate how the government can induce individuals to behave in such a way
that the equilibrium allocations (Cw, Yw) defined by (1.1) and the resource constraint (1.10)coincides with the first-best optimum
©¡C1w, Y
1w
¢ªw∈ defined by (1.12).
If the government is perfectly informed about individuals’ productivity, it can implement
some skill-specific tax schedules Tw (.) : Y 7→ Tw (.). Now, consider a specific skill level w and
assume for simplicity that the skill-specific tax schedule is linear. One can assume it takes
the form:
Tw (Y ) = Y1w − C1w + τw (Y − Yw)
In other words, individuals of skill w faces a constant marginal tax rate τw that ensures them
to get a consumption level C1w if they work L1w, so that their earnings is Y
1w . Because the
utility function U (., .)is concave and the tax function is linear, the program (1.1) of individualof skill w is well-behaved. In particular, the concavity of U (., .) ensures there exists a uniquesolution Yw. Hence the equilibrium coincides with the first-best allocation if and only if
the first-order condition (1.2) of the individual’s program (1.1) coincides with the first-order
conditions (1.12) of the first-best program (1.11). Since (1.12) implies:
1 = −U 0L¡Cw,
Yww
¢w · U 0L
¡Cw,
Yww
¢the first-best allocation is decentralized if and only if τw = 0. Hence, the first-best is de-
centralized by a set of transfers that vary with the exogenous skill level w, but not with the
endogenous earnings level Y . In other words, differentiated lump-sum transfers are required.
This result is an application to the present context of the Second theorem of welfare economics
. The intuition is straightforward. When τw = 0, the transfer does not depend on earnings
and therefore, does not affect the marginal return of effort. So doing does not introduce any3We here consider only interior solutions for which Yw > 0 and Cw > 0.
8
distortion in the labor supply choice. However, since the intercept of the tax is conditioned
on the skill level, the government can transfers as much income among agents without any
labor supply distortion.
III The Taxation Principle
This first-best optimum needs differentiated lump-sum transfers to be decentralized. In par-
ticular, this requires that the government can condition the tax schedule on the skill levels.
We have however many reasons to think, that in the real world, such conditioning is not
doable.
1. The first reason is that the government does not observe skill levels. It is stricto sensu
an informational constraint on the government’s ability to intervene.
2. However, one might argue that many determinants of the workers’ productivity are
observable. For instance, education or work experience is typically observed. So many
scholars have argue that taxation should not only condition tax on earnings but on all
the individuals’ observable characteristics by the government (Akerlof 1978). Following
this road, Alesina et al. (2008) defend the idea that taxation should differ for men
and women (the gender based taxation), whereas Mankiw and Weinzierl (2008) indicate
there could be potential gain by conditioning tax on individuals’ height. In the same vein
one may argue that taxation should also be condition on different exogenous aspects that
are correlated with a worker’s skill. However, so doing hurts the conception of horizontal
equity. Hence, even if some information that is correlated to workers’ productivity level
is available, using this information is typically not allowed by “constitutions”.
Therefore, we now simply assume that taxation can only be conditioned on earnings Y
and not on skill levels w. However so doing induces that redistributive taxation has to be
distortionary. To get the intuition, consider the case where more productive workers get
higher earnings and where the government wish to transfer income from high skill to low
skill workers. This means that the government wishes to make the function w 7→ T (Yw)
increasing. However, so doing, marginal tax rates T 0 (Yw) become positive, which reduces
the marginal return of earnings in terms of consumption. So the labor supply is distorted
downwards. Hence, it not possible to make transfers increasing in the skill level without
discouraging work effort. The informational constraints are thus essential to include in the
analysis if one wishes to consider the distortions induced by taxation in the design of the
optimal redistributive policy.
We now wonder how to incorporate this informational constraint in the redistributive
program. Let T (.) be the tax function that depends on earnings only. Whatever their skill
9
level, any individual faces the budget constraint C = Y −T (Y ). Hence, an individual of skilllevel w solves
maxY
UµY − T (Y ) , Y
w
¶(1.13)
The key difference with Program (1.1) is that now, all individuals face the same tax schedule,
whatever their skill level. Let Yw be a solution to this program and let Lw = Yw/w, Cw =
Yw − T (Yw) and Uw = U (Cw, Lw). Then let x 6= w be another skill level. Because Yw solves(1.13) and Cx = Yx − T (Yx), one has that
∀ (w, x) ∈ Ω2 UµCw,
Yww
¶≥ U
µCx,
Yxw
¶(1.14)
This constraint means that, when taking their labor supply decision, workers of skill w prefer
the bundle (Cw, Yw) designed for them rather then the bundle (Cx, Yx) designed for workers
of any other skill level x.
The assumption that the government can condition taxation on earnings only and not
on skill level implies that allocations w 7→ Cw, Yw that can be reached by the governmenthave to be consistent with constraints (1.14). Therefore, one should add these restrictions in
the government’s problem (1.11). One may then wonder whether adding these restrictions is
sufficient to fully characterizes the set of allocations that can be obtained by a government.
The Taxation principle (Hammond 1979, Rochet 1985 and Guesnerie 1995) answers positively
to this question.
More precisely, it ensures that an unsophisticated government which can only implement
income taxation Y 7→ T (Y ) has the same possibilities than a sophisticated government that
can implement a direct truthful mechanism w 7→ (Cw, Yw) subject to (1.14). We have already
seen that for any income tax schedule Y 7→ T (Y ), the induced allocation w 7→ (Cw, Yw)
verifies (1.14). The taxation principle ensures the reciprocal. Let w 7→ (Cw, Yw) be an
allocation that verifies (1.14). Then one can build an income tax schedule Y 7→ T (Y ), such
that any individual of skill w confronted with it, effectively chooses the earnings level Yw and
the consumption level Cw = Yw − T (Yw) designed for her skill level.To show this reciprocal, let w 7→ (Cw, Yw) be an allocation that verifies (1.14) and let
Y = Y such that there exists w ∈ Ω for which Y = Yw. We now build step by step a taxfunction T : Y 7→ T (Y ), such that for all w ∈ Ω, Cw = Yw − T (Yw) and Yw solves (1.13).
1. Let Y ∈ Y.
(a) If there exists a single skill level w ∈ Ω for which Y = Yw, then one must simplyhave T (Yw) = Yw − Cw.
10
(b) Consider now the case where there exists (w, x) ∈ Ω2 with w 6= x such that
Y = Yw = Yx. Then applying (1.14), one has
UµCw,
Y
w
¶≥ U
µCx,
Y
w
¶
so one must have Cw ≥ Cx. Symmetrically, inverting the roles of w and x in (1.14),one has that
UµCx,
Y
x
¶≥ U
µCw,
Y
x
¶which is only possible if Cx ≥ Cw. Hence one must have Cw = Cx if Yw = Yx. Wecan therefore define T (Y ) = Y − Cw without any ambiguity.
2. If Y /∈ Y, we define T (Y ) = +∞
Given such a tax function, we have now to verify that for any skill level w ∈ Ω, Ywsolves (1.13). Choosing an earnings level Y /∈ Y, implies T (Y ) = +∞. So such choice issuboptimal. Now choosing Y ∈ Y with Y 6= Yw amounts to chose a skill level x such that
Y = Yx. However, incentive constraints (1.14) and Ct = Yt − T (Yt) for t = w, x then impliesthat
UµYw − T (Yw) ,
Yww
¶≥ U
µY − T (Y ) , Y
w
¶= U
µCx,
Yxw
¶which ends the proof that Yw is a solution to (1.13). It is worth noting that this proof only
uses the assumptions that utility U¡., .w
¢increases in consumption and decreases in earnings.
In particular, the Spence-Mirrlees assumption is here useless.
Finally, it is important to note that the restriction of the function Y 7→ Y − T (Y ) overY has to be increasing. To show this statement, let x and w be two skill levels such that
Yw < Yx. If we assume by contradiction that Cx ≥ Cw, then one would have U¡Cx,
Yxw
¢≥
U¡Cw,
Yxw
¢> U
¡Cw,
Yww
¢. So, Yw < Yx implies Cx < Cw and the incentive constraint (1.14)
would be violated. This statement implies that whenever the tax function is differentiable,
the marginal tax rate T 0 (Y ) has to be lower than one for any reached level of gross income.
Moreover it can equal 1 only pointwise.4 It means in particular that if T 0 (Y ) ≥ 1 for someearnings level Y , than Y will be not chosen by any worker, whatever her skill level. Moreover,
this result has nothing to do with optimized tax schedules. It is a property of any incentive-
compatible allocations.
4Recall that an increasing and differentiable function may have a derivative that equals 0 pointwise, asillustrated by Function x 7→ x3
11
In sum, the redistributive problem consists in solving
maxYw,Cww∈Ω
ZΩΦ
µU
µCw,
Yww
¶¶dF (w) (1.15)
s.t :
ZΩYw − Cw dF (w) ≥ E
∀ (w, x) ∈ Ω2 UµCw,
Yww
¶≥ U
µCx,
Yxw
¶
A solution to this program corresponds to an optimal (second-best) allocation. From an
optimal allocation w 7→ (Cw, Yw), we now know how we can retrieve a tax function Y 7→ T (Y )
that decentralizes it.
IV The Stiglitz (1982) model
Program (1.15) is in general very complex. Stiglitz (1982) has proposed to focus on a version
of this problem with only two skill levels 0 < wL < wH . In other words the support Ω of the
distribution F (.) is reduced to two mass points at wL and wH . This restriction on the skill
distribution has proved to be very fruitful in understanding how informational constraints
(1.14) affects the optimal allocations.
Moreover, Stiglitz has proposed to describe the full set of second-best Pareto Optima,
instead of restricting to optimal allocation according to a specific social welfare function. So
doing enables to escape of deriving results that depends on some specific “value-judgements”
that determine a specific social welfare function. Let πL and πH be the mass of workers of
skill H and L with 0 < πL,πH < 1 and πL + πH = 1. The set of second-best Pareto optima
are described as the set of solutions of
maxYH ,CH ,YL,CL
Φ
µUµCL,
YLwL
¶¶πL · YL − CL+ πH YH − CH ≥ E Φ
µUµCH ,
YHwH
¶¶≥ UH
UµCH ,
YHwH
¶≥ U
µCL,
YLwH
¶UµCL,
YLwL
¶≥ U
µCH ,
YHwL
¶
when the parameter UH varies. Instead, we find more fruitful to think of the dual program
of maximizing tax revenues for a given levels of utility UH and UL.5 Program (1.15) then
5Program (1.16) is defined only for combinations of (UL, UH) such that the solution to (1.16) induces abudget surplus πL YL − CL+ πH YH − CH higher than E.
12
becomes
maxYH ,CH ,YL,CL
πL · YL − CL+ πH YH − CH ≥ E (1.16a)
Φ
µUµCL,
YLwL
¶¶≥ UL and Φ
µUµCH ,
YHwH
¶¶≥ UH (1.16b)
UµCH ,
YHwH
¶≥ U
µCL,
YLwH
¶(1.16c)
UµCL,
YLwL
¶≥ U
µCH ,
YHwL
¶(1.16d)
In Program (1.16), constraints (1.16c) and (1.16d) are the restrictions that may generate
distortions of the redistributive policies.
To better understand the working of the Stiglitz model, a geometric approach has proved
to be useful. Indifference curves are depicted in Figure 1.1. They are increasing and convex.
Following the Spence-Mirrlees assumption (1.5), the indifference curves of the low skilled
workers are everywhere stepper than the ones of the high skilled worker. Figure 1.1 also
display one bundle (CL, YL) designed for low skilled workers and one bundle (CH , YH) for high
skilled workers. To be consistent with (1.16c), the bundle (CL, YL) designed for low skilled
workers must be dominated by the other bundle from the high-skilled workers’ viewpoint.
Therefore, the bundle (CL, YL) has to be located below the high-skilled workers’ indifference
curve that passes through bundle (CH , YH). Symmetrically, to be consistent with (1.16d), the
bundle (CH , YH) designed for high skilled workers has to be below the low-skilled workers’
indifference curve that passes through the bundle (CL, YL). This is the case in Figure 1.1.
We can now turn to the resolution of the government’s problem. Substituting the con-
sumption levels Ci by Γ (Ui, Yi/wi), denoting μH the Lagrange multiplier associated to the
incentive constraint (1.16c) and μL the one associated to (1.16d), the Lagragian of Problem
(1.16) writes:
L (UH , UL, YH , YL,λ,μH ,μL) ≡ πL
∙YL − Γ
µUL,
YLwL
¶¸+ πH
∙YH − Γ
µUH ,
YHwH
¶¸−E + μH
½UH − U
µΓ
µUL,
YLwL
¶,YLwH
¶¾+ μL
½UL − U
µΓ
µUH ,
YHwH
¶,YHwL
¶¾Given (1.4), one has Γ0L (Ci, Yi/wi) =M (Ci, Yi, wi) · wi, so the necessary conditions are:
1−M (CH , YH , wH) =μL · U 0C
³CH ,
YHwL
´πH
M (CH , YH , wH)−M (CH , YH , wL) (1.17a)
1−M (CL, YL, wL) =μH · U 0C
³CL,
YLwH
´πL
M (CL, YL, wL)−M (CL, YL, wH) (1.17b)
where λ ≥ 0, μH ≥ 0 and μL ≥ 0.
13
Consider as a benchmark the irrealistic case where the government observes workers’
skill levels, so that constraints (1.16c) and (1.16d) should be ignored (so μH = μL = 0 in
1.18), the problem would then consist in choosing for skill level i = H,L a bundle (Ci, Yi)
that maximizes tax revenues Yi − Ci subject to Ui = U (Ci, Yi/wi). The necessary conditionimpliesM (Ci, Yi, wi) = 1. Graphically, this optimal bundle corresponds to the point of the
indifference Ui = U (Ci, Yi/wi) where the slope of the tangency is parallel to the 45 line. Thisearnings level is denoted Y ∗i in Figures 1.2, 1.3 and 1.4.
6 The corresponding consumption
levels is C∗i = Γ (Ui, Y∗i /wi).
Let us now turn back to the case where the government does not observe skill levels so
that constraints (1.16c) and (1.16d) now matter. Three cases should then be distinguished
depending on the earnings level where the two indifference curves intersect. In the first case,
this level is between Y ∗L and Y∗H (see Figure 1.2). In the second case the two indifference
curves intersect at an earnings level below Y ∗L and Y∗H (See Figure 1.3), whereas in the third
case, they intersect at an earnings level above Y ∗H and YL∗ (See Figure 1.3).
Case 1: First-best optimal taxation is fully revealing
If the indifference curves intersects at an earnings level that is between Y ∗H and Y∗L (see Figure
1.2), then the allocation (C∗L, Y ∗L ) , (C∗H , Y ∗H) verifies the incentive constraints (1.16c) and(1.16d), which are thus not binding. In such configuration, the Lagrange multipliers μH and
μL are nil in Equations (1.18) and the MRS equals 1 for both skill levels. Recalling from
(1.2) the interpretation ofM (Ci, Yi) as one minus the marginal tax rate at earnings level i,
marginal tax rates are nil. Therefore the first-best taxation is fully revealing. An example of
such a second-best Pareto optimum is the laissez faire allocation with no taxes.
Case 2: The “normal” case
In the second case, depicted in Figure 1.3, the two indifference curves intersect at an earnings
level that is lower than Y ∗L . In such situation, absence of constraint (1.16c), the bundle
(C∗L, Y∗L ) would maximize the tax revenues of the government for a given level of low skilled
workers’ utility level. However this bundle violates (1.16c). High skilled workers then prefer
to get a gross earnings Y ∗L and a consumption level C∗L, rather than the bundle (CH , YH) that
maximizes tax revenues paid by high skilled given the utility constraint U (C, Y/wH) = UH .To prevent this “mimicking”, the government has to reduce both earnings and consumption
designed for low skilled workers along the low skilled workers’ indifference curve. So doing,
6This an abuse of notations. In presence of income effects then Y ∗i depends on the level of the promisedutility Ui.
14
Y
C
CH
UH= U(C,Y/wH)
CL
UL= U(C,Y/wL)
YH = YH*YL = YL
*
Figure 1.2: First-Best Taxation is fully revealing
the utility of low skilled workers remains identical. However, because of the single-crossing
condition (1.5), the incentives for high skilled workers to choose the bundle designed for low
skilled workers decreases. However, the tax revenues YL − CL paid by low skilled workers
decreases too. So the reduction in earnings and consumption along low skilled workers’
indifference curve happens until high skilled workers become indifferent between the two
bundles. Therefore, the optimum corresponds to the intersection of the two indifference
curves, implying that the incentive constraint (1.16c) is binding. Conversely, the incentive
constraint (1.16d) remains slack. Hence, this configuration corresponds to the case where the
Lagrange multipliers verify μH > 0 = μL. Equations (1.17a) and (1.17b) thus becomes:
1−M (CH , YH , wH) = 0
1−M (CL, YL, wL) =μH · U 0C
³CL,
YLwH
´πL
M (CL, YL, wL)−M (CL, YL, wH) > 0
High skilled workers’ labor supply remains undistorted and YH = Y ∗H . Conversely low skilled
workers’ labor supply is distorted downwards sinceM (CL, YL, wL) < 1 . Recalling from (1.2)
the interpretation of M (Ci, Yi) as one minus the marginal tax rate at earnings level i, this
implies that the marginal tax rate is nil for high skilled workers and positive for low skilled
workers.
Stiglitz (1982) labelled this configuration as the normal case. The following experiment
gives a rational to this denomination. If the utility UH devoted to high skilled workers in-
creases, the steeper indifference curves shifts upwards. Therefore, YL can be raised without
hurting the incentive constraint (1.16c), YH increases and the distortions of low skilled work-
ers’ labor supply are reduced. However, so doing implies that the tax revenues paid by low
skilled workers increases (they are taxed more efficiently for an unchanged level of utility)
15
whereas the tax revenues paid by high skilled workers are reduced (they are still paid effi-
ciently, but they pay less taxes since they get a higher utility). Hence, the rise of low skilled
workers’ labor supply comes along a rise in inequality.
Y
C
CH
YH = YH*
UH= U(C,Y/wH)
CL
YL
UL= U(C,Y/wL)
YL*
Figure 1.3: The normal case
Case 3: The antiredistributive case
The polar case where the two curves intersects at an earnings level above Y ∗H is depicted in
Figure 1.4. Then the allocation (C∗L, Y ∗L ) , (C∗H , Y ∗H) violates (1.16d): low skilled workersprefers the bundle (C∗H , Y
∗H) designed for high skilled workers, rather than the bundle (C
∗L, Y
∗L )
designed for them. The government should therefore distorts upwards the bundle designed for
high skilled workers until (1.16d) is met with equality. Hence, this configuration corresponds
to the case where the Lagrange multipliers verify μL > 0 = μH . Equations (1.17a) and
(1.17b) thus becomes:
1−M (CH , YH , wH) =μL · U 0C
³CH ,
YHwL
´πH
M (CH , YH , wH)−M (CH , YH , wL)
1−M (CL, YL, wL) = 0
Low skilled workers’ labor supply is now undistorted and YL = Y ∗L . Conversely, high
skilled workers’ labor supply is distorted upwards sinceM (CH , YH , wH) > 1. Recalling from
(1.2) the interpretation of M (Ci, Yi) as one minus the marginal tax rate at earnings level
i, this implies that the marginal tax rate is nil for high skilled workers and positive for low
skilled workers. Therefore, one has a zero marginal tax rate at earnings YL and a negative
marginal tax rates at YH .
16
Y
C
CH
YH
UH= U(C,Y/wH)
CL
UL= U(C,Y/wL)
YL = YL* YH
*
Figure 1.4: The Anti-redistributive case
V Continuum of skills
The two skilled model of Stiglitz is a useful preliminary step to understand how the incen-
tives constraints (1.14) distorts the optimal allocations, thereby the structure of second-best
optima. However, it only gives information about the tax function at two earnings levels. We
now consider problem (1.15) with a continuum of skill levels. Hence the distribution F (.) of
skill is supposed to be continuous on a connected support [w,w], with 0 < w < w ≤ +∞. Toease the exposition, I restrict in all this Section the utility function to be additively separable
and of the form:
U (C,L) ≡ w (C)− v (L) where w0 (c) > 0 ≥ w00 (C) v0 (L) > 0 and v00 (L) > 0
V.1 The incentive constraints
The first issue is technical. How to deal with constraints (1.14), that is with a double con-
tinuum of inequalities? Mirrlees (1971) has shown that under the Spence-Mirrlees condition
(1.5), these constraints are equivalent to a differential equation and a monotonicity con-
straints. Let Uw be the value function associated to the optimization program of workers of
skill w, that is
Uw = maxY
w (Y − T (Y ))− vµY
w
¶= w (Cw)− v
µYww
¶(1.19)
Then the usual envelop argument applied to the latter program then suggests that
Uw =Yww2
· vµYww
¶> 0 (1.20)
It is then possible to show that whenever the Spence-Mirrlees condition (1.5) holds, the set
of allocations w 7→ (Cw, Yw, Uw) that verify Uw = U (Cw, Yw/w) and (1.14) and the set of
17
allocations w 7→ (Cw, Yw, Uw) that verify Uw = U (Cw, Yw/w), (1.20) almost everywhere andthe requirements that w 7→ Yw is non decreasing and w 7→ Uw is continuous is the same. This
is result is very useful because it allows to solve program (1.15) thanks to optimal control
technics, by taking Uw as the state variable and earnings Yw as the control.
Let w 7→ (Cw, Yw, Uw) be an allocation that verifies Uw = U (Cw, Yw/w) and (1.14). Wewill first show that w 7→ Yw is non decreasing. Then we will show that w 7→ Uw is continuous
and finally, that (1.20) holds almost everywhere.
1. Since for any x, one has Ux = w (Cx)−v (Yx/x), the incentive constraint (1.14) implies
Uw ≥ w (Cx)− vµYxw
¶= Ux + v
µYxx
¶− v
µYxw
¶Inverting the roles of x and w, one obtains that
v
µYxx
¶− v
µYxw
¶≤ Uw − Ux ≤ v
µYwx
¶− v
µYww
¶(1.21)
Now, assume without loss of generality that x < w. Then the inequality between the
two extremities implies:7Z w
x
½Ywt2v0µYwt
¶− Yxt2v0µYxt
¶¾≥ 0
By the convexity of v (.), Y 7→¡Y/t2
¢v0 (Y/t) is increasing. Therefore, the last inequal-
ity implies Yw ≥ Yx, which ends the proof that w 7→ Yw has to be nondecreasing.8
2. To show the continuity of w 7→ Uw. Take a skill level w. Then both extremes of (1.21)
tends to 0 as x tends to 0. Therefore Ux tends to 0 as x tends to x, which ensures the
continuity at w of w 7→ Uw.
3. We now turn to the differentiability and the derivative of w 7→ Uw. We here use the
mathematical results that any nondecreasing function is continuous everywhere except
on a set that is at worst countable. Accordingly, w 7→ Yw is henceforth said to be “almost
everywhere” continuous. Now, let w be a skill level at which w 7→ Yw is continuous.
Then for x < w, (1.21) implies
v¡Yxx
¢− v
¡Yxw
¢w − x ≤ Uw − Ux
w − x ≤v¡Ywx
¢− v
¡Yww
¢w − x
By continuity at w of w 7→ Yw, both extremes tends to¡Ywv
0 (Yw) /w2¢as x tends to
w. So w 7→ Uw admits a left-derivative at w that equals¡Ywv
0 (Yw) /w2¢. When x > w,
a symmetric reasoning holds to show that w 7→ Uw admits a right-derivative at w that
equals¡Ywv
0 (Yw) /w2¢too. Hence, (1.20) holds almost everywhere.
7We use here that v¡Yx
¢− v
¡Yw
¢=R wx
Yt2v0¡Yt
¢dt
8 It is worth noting here that we did not use any assumption about the support of the skill levels distribution.Hence the proof can be translated to the Stiglitz case to show that one must have YH ≥ YL.
18
We now verify the reciprocal. Let w 7→ (Cw, Yw, Uw) be an allocation such that Uw =
U (Cw, Yw/w), w 7→ Yw is nondecreasing, w 7→ Uw is continuous and (1.20) holds almost
everywhere. We have to verify whether such allocation verifies (1.14). Let two skill levels w
and x. By the continuity of w 7→ Uw and the fact that (1.20) holds almost everywhere, we
have that
Uw − Ux =Z w
x
µYtt2
¶v0µYtt
¶dt
Now since v (.) is increasing and convex, for any t one has that the function Y 7→¡Y/t2
¢v0 (Y/t)
is increasing. If x < w (resp. x > w), than for all t ∈ (x,w) (t ∈ (w, x)) one has¡Yt/t
2¢v0 (Yt/t) ≥
¡Yx/t
2¢v0 (Yx/t) (resp. ≤), since w 7→ Yw is nondecreasing. Hence one
has: Z w
x
µYxt2
¶v0µYxt
¶dt ≥
Z w
x
µYtt2
¶v0µYtt
¶dt
since w > x (x < w). Therefore, one has
Uw − Ux ≥Z w
x
µYxt2
¶v0µYxt
¶dt
Integrating the right-hand side and using Ux = w (Cx)− v¡Yxx
¢gives (1.14)
Hence, using that Cw = Γ (Uw, Yw/w), the second best problem (1.15) can therefore be
rewritten as
maxYw,Uww∈Ω
ZΩΦ (Uw) dF (w) s.t : w 7→ Yw is nondecreasing (1.22)ZΩ
½Yw − Γ
µUw,
Yww
¶¾dF (w) ≥ E (λ)
Uw =Yww2v0µYww
¶(qa)
V.2 The resolution
We now present the resolution of Program (1.15). The idea is to take Yw as the control
variable, Uw as the state variable, to define the Hamiltonian as
H (Y,U,w, q,λ) =½Φ (U) + λ
∙Y − Γ
µU,Y
w
¶¸¾f (w) + q
Y
w2v0µY
w
¶(1.23)
where we have assumed that the distribution of skill has no mass points and admits a con-
tinuous density f (.). The co-state variable is denoted q and λ is the Lagrange multiplier
associated to the government’s budget constraint. Then one can apply the optimal control
tools to get a set of necessary conditions that the second-best optimal allocation has to verify.
In doing this, several difficulties may appear, that we should be aware of.
The first difficulty is the treatment of the monotonicity constraint on w 7→ Yw. This con-
straint makes difficult the consideration of Yw as a “true” control variable. If the monotonicity
19
Y
C U2= U(C,Y/w2)U1= U(C,Y/w1)
C = Y – T(Y)
Y
U’= U(C,Y/w’)w1 < w’ < w2
Figure 1.5: An example of bunching
constraint is binding on an interval [w1, w2], then Yw is constant over this interval, and the
same occur for consumption. Hence, the same bundle (Cw, Yw) is offered to a bunch of skill.
It is then said that a bunching (of types) occur over [w1, w2]. An example of such bunching
is suggested in Figure 1.5. There, the tax function Y 7→ T (Y ) has a kink at earnings level Y
with a sudden increase in the marginal tax rate. Therefore, the function Y 7→ Y −T (Y ) hasa “downward” kink at earnings level Y . Workers of skill w1 (resp. w2) have an indifference
curve that is tangent to the Y 7→ Y − T (Y ) just before (after) the kink. Workers of skillw0 between w1 and w2 faces a too low marginal tax rate just before earnings Y , which gives
them an incentive to work more than Y , whereas they face a too high marginal tax rate just
after the kink, which gives them an incentive to work less than Y . Consequently, they are
“stuck” to work to get exactly the earnings level Y and bunching occurs between w1 and w2.
The easiest and classical way to deal with bunching is to solve a “relaxed” version of
Program (1.15) where the monotonicity constraint is ignored. Then, one has to verify (ana-
lytically, or numerically though simulations) that the solution to the relaxed program verifies
the monotonicity constraint, so this allocation also solves the full program. This is the so-
called first-order approach that we follow.9
The second problem is the case where w 7→ Yw is discontinuous at an earnings level Y .
It is somehow the opposite problem with w 7→ Yw increasing “infinitely rapidly” at skill level
w. Figure 1.6 illustrates this possibility. The function Y 7→ Y − T (Y ) becomes locally moreconvex than the indifference curve of workers of skill w2. Consequently there are two tangency
points at earnings level Y L2 < YH2 and function w 7→ Yw “jumps” from Y L2 to Y
H2 at skill level
9See Lollivier and Rochet (1983), Guesnerie and Laffont (1984), Ebert (1993) or Hellwig (2008) for alter-native methods that consider the possibility of bunching.
20
Y
C
U3= U(C,Y/w3)
C = Y – T(Y)
U2 = U(C,Y/w2)
U1= U(C,Y/w1)
w1 < w2 < w3
YH2YL
2
Figure 1.6: An example of discontinuous allocation
w2.Surprisingly, the eventuality of discontinuity of the optimal allocation has received less
attention in the literature than bunching. The usual attitude consists in simply ignoring that
eventuality. From a technical viewpoint, the necessary conditions derived from the control
technics are only available if there is a finite number of discontinuity points. However, since
w 7→ Yw is nondecreasing, the set of points of discontinuity is at worst countable, thereby of
zero measure (since we have assume no mass points in the skill distribution). We therefore
assume that this set is finite.10
We can now apply optimal control. For all skill levels where w 7→ Yw is continuous (that
is “almost everywhere”), w 7→ Uw is differentiable and (1.20) holds. Moreover, the necessary
conditions
0 =∂H∂Y
(Yw, Uw, w, qw,λ) and − qw =∂H∂U
(Yw, Uw, w, qw,λ)
hold, which imply, given (1.4) and (1.23):
1−v0¡Yww
¢w ·w0 (Cw)
a.e= − qw
λ · w2 · f (w)
∙v0µYww
¶+Y
wv00µYww
¶¸(1.24a)
−qw a.e=
½Φ0 (Uw)−
λ
w0 (Cw)
¾f (w) (1.24b)
Finally, we know that w 7→ (qw, Uw) is continuous and that the transversality conditions write
qw = qw = 0. Integrating (1.24b) between w and w, Equation (1.24a) becomes:
1−v0¡Yww
¢w ·w0 (Cw)
a.e=v0L¡Yww
¢+ Y
wv00 ¡Yw
w
¢w2 · f (w)
Z w
w
½1
w0 (Cn)− Φ
0 (Un)
λ
¾f (n) dn (1.25)
10This is an assumption. For instance within the set R of real numbers, the subset Q of rational number iscountable but dense within R.
21
where the Lagrange multiplier associated to the budget constraint is determined by the
transversality condition at qw = 0 through:
λ ·Z w
w
1
w0 (Cn)f (n) dn =
Z w
wΦ0 (Un) f (n) dn (1.26)
How these conditions are modified in presence of bunching? We here follows the approach
suggested by Guesnerie and Laffont (1984). Assume, there is bunching over a finite number n
of intervals denoted£bi, bi
¤, with w ≤ b1 < b1 < ... < bn ≤ w and that w 7→ Yw is continuous
everywhere, and is differentiable everywhere except on a finite number of points (including©bi, bi
ªi=1,...n
). Then one can define cw = Yw as the control variable, imposes zero as a lower
bound on cw and take Uw and Yw as state variables. With f the co-state variable associated
to Y , and e the Lagrange multiplier associated to the inequality constraint (Y =)c ≥ 0, theHamiltonian (1.23) becomes
H (Y,U,w, q,λ) =½Φ (U) + λ
∙Y − Γ
µU,Y
w
¶¸¾f (w) + q
Y
w2v0µY
w
¶+ f · c+ e · c
Equation (1.24b) still hold whereas (1.24a) becomes:
−fw = 1−v0¡Yww
¢w ·w0 (Cw)
+qw
λ · w2 · f (w)
∙v0µYww
¶+Y
wv00µYww
¶¸The optimal condition on cw(= Yw) writes simply 0 = dw + ew. Outside the intervals of
bunching, the monotonicity constraint is not binding, so ew is nil, and so is the co-state
variable fw. Therefore equations (1.24a) and (1.25) still hold outside bunching intervals.
Conversely, ew > 0 so fw < 0 inside bunching intervals, for w ∈£bi, bi
¤. However, the
monotonicity constraints are no longer binding at bi, bi. Hence fbi = fbi = 0 andR bibifwdw =
0. Therefore one obtainsZ bi
bi
(1−
v0¡Yww
¢w ·w0 (Cw)
)dw = −
Z bi
bi
qwλ · w2 · f (w)
∙v0µYww
¶+Y
wv00µYww
¶¸dw
so, one have, instead of (1.25):Z bi
bi
(1−
v0¡Yww
¢w ·w0 (Cw)
)dw =
Z bi
bi
(v0L¡Yww
¢+ Y
wv00 ¡Yw
w
¢w2 · f (w)
Z w
w
½1
w0 (Cn)− Φ
0 (Un)
λ
¾f (n) dn
)dw
In other words, one simply have to integrate (1.25) over the skills of a bunching interval.
V.3 The reinterpretation of the optimality conditions
Equation (1.25) is not very intuitive. Following Saez (2001), we now reinterpret this opti-
mality condition in terms of behavioral elasticities and derive it heuristically thanks to a tax
22
perturbation. Let a worker of skill w, choosing a earnings level Yw under the optimal tax
schedule Y 7→ T (Y ). Now, assume the tax function is submitted to two types of tax perturba-
tion so that, in the neighborhood of Yw, the tax function becomes Y 7→ T (Y )−τ (Y − Yw)−ρ.
• On the one hand, there is a uniform decrease in the marginal tax rate in the neighbor-
hood of Yw. The size of this change is denoted τ in Figure 1.7. This elementary reform
captures a compensated changes in the marginal tax rates since, if the workers keeps
its earnings choice at the initial value Yw, the reform does not change the level of tax.
It captures the substitution effect along the optimal tax schedule for workers of skill w.
Y
C
C = Y – T(Y)
Yw
Cw
τ
Yw - δY Yw + δY
Figure 1.7: An Elementary reform on the Marginal Tax Rate
• On the one hand, there is a uniform decrease in the level of tax in the neighborhood of
Yw. The size of this change is denoted ρ in Figure 1.8. This elementary reform captures
the income effect along the optimal tax schedule for workers of skill w.
Consider then the behavior of an individual of skill w. She has to solve:
maxY
w [Y − T (Y ) + τ (Y − Yw) + ρ]− vµY
w
¶The first-order conditions write Y (Y, ρ, τ , w) = 0, where we define:
Y (Y, ρ, τ , w) ≡¡1− T 0 (Y ) + τ
¢·w0 [Y − T (Y ) + τ (Y − Yw) + ρ]− 1
wv0µY
w
¶(1.27)
In the absence of a reform, on has Y (Yw, w, 0, 0), that is:
1− T 0 (Yw) =v0¡Yww
¢w ·w0 (Cw)
(1.28)
23
Y
C
C = Y – T(Y)
Yw
Cwρ
Yw - δY Yw + δY
Figure 1.8: An Elementary reform on the Level of Tax.
Moreover, the partial derivatives of Y at (Yw, w, 0, 0) are:
Y 0Y (Yw, 0, 0, w) =
Ãv0¡Yww
¢w ·w0 (Cw)
!2·w00 (Cw)−
v00¡Yw
¢w2
− T 00 (Yw) ·w0 (Cw) (1.29a)
Y 0τ (Yw, 0, 0, w) = w0 (Cw) > 0 (1.29b)
Y 0ρ (Yw, 0, 0, w) =v0¡Yww
¢w ·w0 (Cw)
u00 (Cw) ≤ 0 (1.29c)
Y 0w (Yw, 0, 0, w) =v0¡Yww
¢+ Yw
w v00 ¡Yw
w
¢w2
> 0 (1.29d)
The second-order condition writes Y 0Y (Yw, w, 0, 0) ≤ 0. It stipulates that the function Y 7→Y − T (Y ) is either concave or less convex than the indifference curve of workers of skill wat (Cw, Yw). Otherwise, there is a discontinuity of the tax function as illustrated in Figure
1.6. The second-order condition is in particular satisfied if the tax function is linear (by
concavity of the utility function that implies u00 (Cw) > v00 (Yw/w)). How the convexity of the
tax function matters for the second-order condition depends on the term T 00 (Yw) in (1.29a)
that captures the curvature of the tax function.
If the second-order condition holds with a strict inequality, then Y 0w (Yw, 0, 0, w) < 0 andone can apply the implicit function theorem to express the earnings choices Yw as a (locally)
differentiable function of the change in marginal tax rate τ , the change in the level of tax ρ
24
or the level of skill w. This enables us to define the following behavioral elasticities.11
εwdef≡ 1− T 0 (Yw)
Yw
∂Yw∂τ
=−v0
¡Yww
¢w · Yw · Y 0Y (Yw, 0, 0, w)
> 0 (1.30a)
ηwdef≡ ∂Yw
∂ρ= −
v0¡Yww
¢w ·w0 (Cw)
· w00 (Cw)
Y 0Y (Yw, 0, 0, w)≤ 0 (1.30b)
αwdef≡ w
Yw
∂Yw∂w
= −v0¡Yww
¢+ Yw
w v00 ¡Yw
w
¢w · Yw · Y 0Y (Yw, 0, 0, w)
> 0 (1.30c)
• εw stands for the compensated elasticity of the labor supply with respect to one minus
the marginal tax rates. It is positive. A compensated decline in the marginal tax
rates increases the marginal reward of effort in terms of additional consumption, which
induces workers to substitute consumption for leisure.
• ηw captures the income effect of the labor supply. It is negative as long as leisure is
a normal good, which is the case for the additively separable utility function we take,
except for the quasilinear utility function C − v (Y/w).
• Finally, αw captures the elasticity of earnings with respect to the skill level. It is
positive thanks to the convexity of v00 (.) (and more generally in the absence of additional
separability, due to the Spence-Mirrlees condition (1.5)). Therefore, we retrieve by
studying behavioral responses the fact that along an incentive-compatible allocation,
earnings have to be a nondecreasing function of skill. Moreover, bunching occurs only
when αw = 0, that is when the curvature term T 00 (Yw) in Y 0Y (see Equation (1.29a))tends to infinity. This corresponds to a kink of the tax function that is similar to the
one depicted in Figure 1.6.
These three behavioral elasticities are endogenous. The first reason is because they in
general depend on the bundle (Cw, Yw) where they are evaluated. The second reason is
because these behavioral elasticities depend on the curvature of the tax function, as captured
by the term term T 00 (Yw) in Y 0Y (see Equation (1.29a)). The intuition is the following. Anexogenous increase in either τ , ρ or w induces a direct change in earnings ∆1Y . However,
this change in turn modifies the marginal tax rate by ∆T 0 = T 00 (Y )∆1Y , inducing a second
change in earnings ∆2Y , which in turn... Therefore, a “circular process” takes place: the
earnings level determines the marginal tax rate through the tax function and the marginal tax
rate affects the earnings level through the substitution effect. The term T 00 (Yw) · u0 (Cw) in(1.29a) captures the indirect effects (in the words of Saez (2001)) due to this circular process.
The size of these indirect effects influence the various behavioral elasticities.
11Recall that the implict function theorem implies that for x = τ , ρ, w, ∂Yw/∂x = −Y0x/Y0Y .
25
We can now try to retrieve the optimal tax formula thanks to a tax perturbation method.
Consider a uniform decrease of marginal tax rates over an interval [Yw − δY, Yw] of the earn-
ings distribution. As a consequence, the tax function is unchanged for earnings below Yw−δY ,while the tax function is uniformly increased by an amount ρ = τ · δY for earnings above Yw(See Figure 1.9).
Y
C = Y – T(Y)
τ
ρ = τ ×δY
Before the tax perturbationAfter the tax perturbation
Substitution effectsMechanical effects
Income effects
Yw - δY Yw
Figure 1.9: The Tax Perturbation
Workers of skill n above w are confronted with a lump-sum decrease of their tax by an
amount of ρ Euros. The consequence of this on the government’s objective can be decomposed
into a mechanical effect absence of any behavioral response and an income effect. For each
tax payer of skill n above w, the government receives ρ Euros of tax less. However, the
welfare of these individuals increases by w0 (Cn) · ρ, which is valued as a gain equivalentto (Φ0 (Un) /λ) · w0 (Cn) · ρ Euros by the government. Therefore the total mechanical effectconcerning all tax payers of skill n above w equals
Mw = −ρ ·Z w
w
½1− Φ
0 (Un) ·w0 (Cn)λ
¾· f (n) dn (1.31)
Moreover, workers of skill n above w change their labor supply decisions because of the
income effect. From (1.30b), their earnings change by ∆Yn = ηn · ρ. This induce a changein tax revenues that equal T 0 (Yn) · ηn · ρ. This behavioral response has only a second-ordereffect on the social objective. Therefore, the total income effect concerning all tax payers of
skill n above w equals:
Iw = ρ ·Z w
wT 0 (Yn) · ηn · f (n) dn (1.32)
Workers whose earnings before the reform lie in the interval [Yw − δY, Yw] of the earnings
distribution have a productivity that belongs to an interval [w − δw,w] of the skill distrib-
ution. The elasticity αw of earnings with respect to skill level links the widths of these two
26
intervals through (see (1.30c)):
δw =w
αw · YwδY
Therefore the number of these individuals equals
f (w) δw =w · f (w)αw · Yw
δY
Each of them facing a decline τ of the marginal tax rate they face, they are induced to
substitute consumption for leisure. So, from (1.30a) their earnings increase by:
∆Yw =²w · Yw
1− T 0 (Yw)τ
Hence each of them generates T 0 (Yw)∆Yw additional tax to the government. Since their
change of labor supply induces only a second-order effect on the social welfare function and
since ρ = τ · δY , the substitution effect is valued
Sw =T 0 (Yw)
1− T 0 (Yw)· εwαw
· w · f (w) · ρ (1.33)
by the government.
Starting from the optimal tax schedule, a tax perturbation should have no first-order
effect. Therefore, adding (1.31) (1.32) and (1.33), the optimal tax schedule has to verify:
T 0 (Yw)
1− T 0 (Yw)=
αwεw|zA(w)
·
R ww
n1− Φ0(Un)·w0(Cn)
λ − ηn · T 0 (Yn)of (n) dn
1− F (w)| z B(w)
· 1− F (w)w · f (w)| z C(w)
(1.34)
The derivation of (1.34) was heuristic. Therefore, we have to verify that (1.34) is consistent
with (1.25). The latter equation hold on any point where w 7→ Yw is continuous. Using (1.30a)
and (1.30c), Equation (1.24a) can be rewritten as:
1−v0¡Yww
¢w ·w0 (Cw)
=αwεw·v0¡Yww
¢w ·w0 (Cw)
· Xww · f (w)
where Xw is defined as:
Xw = −qwλ·w0 (Cw)
Using the first-order condition (1.28) of the workers’ decision program, one gets:
T 0 (Yw)
1− T 0 (Yw)=
αwεw· Xww · f (w) (1.35)
Deriving in skill level w the definition Xw, we get:
−Xw =qwλ·w0 (Cw) +
qwλ· Cw ·w00 (Cw)
27
Differentiating in w the equality Cw = Yw − T (Yw) and using (1.24b) and (1.28), we get:
−Xw =½1− Φ
0 (Uw) ·w0 (Cw)λ
¾f (w) +
qwλ· Yw ·
v0¡Yww
¢w · u0 (Cw)
·w00 (Cw)
Using (1.30a) (1.30b), (1.30c) and Yw = (Yw/w)αw leads to:
−Xw =½1− Φ
0 (Uw) ·w0 (Cw)λ
¾f (w) +
qwλ· αwεw·v0¡Yww
¢w2
· ηw
Using (1.28) and the definition of Xt :
−Xw =½1− Φ
0 (Uw) ·w0 (Cw)λ
¾f (w)− Xw
w·¡1− T 0 (Yw)
¢· αwεw· ηw
Using (1.35)
−Xw =½1− Φ
0 (Uw) ·w0 (Cw)λ
− T 0 (Yw) · ηw¾f (w)
Using Xw = 0, integration this last equation between w and w and inserting in (1.35), one
finally obtains (1.34).
• The term αw/εw captures the magnitudes of the substitution effects (see 1.33). It is
inversely proportional to the compensated elasticity of the labor supply εw. However,
the elasticity αw of earnings with respect to skill w matters since it influences the
number of workers concerned by the substitution effect. Note that from (1.30a) and
(1.30c), the ratio αw/εw does not depend on the curvature of the tax function, captured
by the term T 00 (Yw).
Diamond (1998) has proposed to focus on the case where the utility function is quasi-
linear in consumption (i.e. U (C,L) = C − v (L)), so w00 (.) = 0 and there is no incomeeffect (ηw = 0 from (1.30b)). Therefore the term A (w) summarizes how the shape of
behavioral elasticities influence the shape of marginal tax rates. If behavioral elastic-
ities are more pronounced for high skilled workers, as suggested by Gruber and Saez
(2002) and Saez (2003) among others, then A (w) would be decreasing, which would
push marginal tax rates to be decreasing in the skill level (thereby in earnings level).
• The influence of the distribution of skill is captured by the termC (w) = (1− F (w)) / (w · f (w)).In the tax perturbation considered, the substitution effect is proportional to the density
of workers f (w) and to their skill level w (see 1.33). Therefore, the higher w · f (w),the larger the deadweight losses induces by a departure of marginal tax rates at Yw
from lump-sum taxation. However, distorting marginal tax rate around Yw induces
that the mass 1 − F (w) of tax payers of skill n above w pays a higher level of taxes.This is the reason why anything else being equal, marginal tax rates are decreasing in
(1− F (w)) / (w · f (w))
28
Saez (2001) proposed to express marginal tax rates as a function of the earnings distri-
bution, rather than the skill one. Let H (.) and h (.) be respectively the (endogenous)
cumulative distribution function and the density of earnings. One obviously have for
all w that H (Yw) ≡ F (w). So, one obtains that h (Yw) · Yw = f (w). Using (1.30c), onehas Yw = αw · (Yw/w) hence
αw ·1− F (w)w · f (w) =
1−H (Yw)Yw · h (Yw)
and equation therefore becomes
T 0 (Yw)
1− T 0 (Yw)=1
εw·
R ww
n1− Φ0(Un)·w0(Cn)
λ − ηn · T 0 (Yn)oh (n) dn
1−H (Yw)· 1−H (Yw)Yw · h (Yw)
(1.36)
• The last term B (w) equals the average ofn1− Φ0(Un)·w0(Cn)
λ − ηn · T 0 (Yn)o, for all skill
levels n above w, weighted by their density. The term 1 − Φ0(Un)·w0(Cn)λ − ηn · T 0 (Yn)
captures the total cost for the government to decrease by one unit the level of tax paid
by workers of skill n, including their change in labor supply due to the income effect.
B (w) summarizes two types of influences.
— The first is the government’s tastes for redistribution, as captured by Φ0(Un)·w0(Cn)λ .
The government values giving one more euro to each of the f (w) individuals of
skill w as a gain of Φ0(Un)·w0(Cn)λ of government spending.
∗ Since the government is averse to inequality, Φ (.) is increasing and concave,so Φ0 (Un) is positive and decreases in Un.
∗ From (1.20), Un is increasing in skill levels. More skilled workers are better
of, since they can reach a given amount of earnings with less effort. Therefore
Φ0 (Un) is positive and decreasing in skill level n.
∗ From the incentive constraints, and the monotonicity requirement it implies,
consumption Cw is nondecreasing in skill w. As w (.) is increasing and weakly
concave, w0 (Cn) is nonincreasing in skill w
As a consequence, the mechanical term 1− Φ0(Un)·w0(Cn)λ is increasing in skill level
n. Therefore, in the absence of income effects (i.e. if ηw = 0 following Diamond
(1998)’s quasilinear specification of the utility function) the term B (w) would then
be increasing. This would tend to make marginal tax rates increasing in the skill
level (thereby in earnings level).
— The second influence follows the income effects. If leisure is a normal good, a
higher level of tax (a lower nonlabor income ρ < 0) would increase labor supply, so
ηw < 0. Therefore income effects are an additional motivation for the government
29
to increases marginal tax rates, since rising marginal tax rate at one earnings
levels, induces through income effects more labor supply for all tax payers above,
therefore higher earnings and higher tax receipts. This interpretation is however
only valid if marginal tax rates are positive (so that choosing higher earnings
results in higher tax receipts for the government).
V.4 Properties of the second-best optimum
After interpreting the optimality conditions and understanding the influence of the various
determinants of them, we now derive some analytical results
Marginal tax rates at the top
If the skill distribution is bounded w, then the transversality condition qw = 0 and (1.24a)
implies that
1−v0¡Yww
¢w · u0 (Cw)
= 0
Therefore, from (1.28), marginal tax at the top of the skill distribution should be
nil. Moreover, let TM (Yw) = T (Yw) /Yw be the average tax rate at earnings Y . Then
T 0M (Yw) =T 0 (Yw)− T (Yw)
Yw
Yw
Therefore having marginal tax rate tending to zero at the top and positive average tax rates
implies that average tax rates have to be locally decreasing at the top of the skill
distribution.
Many scholars understood this zero-optimal-marginal-tax-rate-at-the-top as a drawback
of the Mirrlees model. Diamond (1998) have nevertheless argued that when the distribution
of skill is unbounded the abovementioned argument fails. More specifically, Diamond (1998)
argues that empirically, the distribution of skill could well be approximated for the highest
skill levels as a Pareto distribution for which (1− F (w)) /wf (w) is constant (See Figure 1.10.Saez made a similar point about earnings distribution and the pattern of 1−H (z) / (zh (z))in (1.36) (See Figure 1.11).
Therefore, since the terms A (w) in (1.34) is very likely to be constant or increasing in
the skill level, and in the absence of income effects, B (w) is increasing in the skill levels, this
would tend to make marginal tax rates increasing for the top part of the earnings distribution.
Marginal tax rates in the interior
What is the sign of marginal tax rates? From above, we know that Φ0 (Uw) is decreasing in
the skill level, whereas w0 (Cw) is decreasing. Therefore, −1/w0 (Cw) is decreasing too. Hence
30
Figure 1.10: Empirical skill distribution computed by Diamond (1998)
Figure 1.11: Distribution of earnings as computed by Saez (2001).
the term in the bracket in (1.24b) is decreasing in the skill level. Hence qw/f (w) is increasing
in w. Since the transversality conditions write qw = qw = 0, then qw must be first decreasing,
and then increasing. So for any interior skill level, one must have qw < 0. Together with
(1.24a) and (1.28), this implies that marginal tax rates have to be positive for any
interior skill levels w ∈ (w,w). Therefore the sign of marginal tax rates are driven by theshape of mechanical effects and the income effects essentially reinforces them.
Marginal tax rates at the bottom
It remains to analyze the sign of marginal tax rate at w. In the absence of bunching, or if the
government has not a maximin objective, than the transversality condition qw = 0 implies
with (1.24a) and (1.28) that marginal tax rate at the bottom of the skill distribution
31
is nil.
However, bunching at the bottom of the skill distribution may arise, for instance because
a nonnegative constraints Yw ≥ 0 may be binding at the bottom. Another case is where thereis positive mass of workers with the lowest level of skill. In such cases, there is a positive
mass of workers with the lowest earnings, and the highest skilled workers among them have
a skill w > w, and therefore face a positive marginal tax rate.
Another exception is the case of a Maximin government (see Boadway Jacquet 2008).
Then, marginal tax rates are positive at the bottom. The reason is that social weightsΦ0(Un)·w0(Cn)
λ are concentrated on the lowest skill level only. So, there is a positive mass
of social weights there, even without bunching at the bottom. Put differently, under the
Maximin objective, since Φ0(Un)·w0(Cn)λ is nil for all n above w, the term C (w) in formula
(1.34) is higher or equal 1 (depending on whether income effects are nil, or negative when
leisure is a normal good), including for w = w.
VI Empirical implications
We now explore the quantitative implications of the theory that we have just developed. We
here follow Saez (2001, Section 5)) very closely. As appear clear from (1.34), there are three
kinds of determinants that have to be specify to compute optimal tax schedule depends:
1. The behavioral elasticities εw, ηw and αw and behind the utility function U (., .).
2. The density of skills, and the term C (w)
3. The government tastes for redistribution, as captured by the shape of w 7→ Φ0 (Un) ·w0 (Cn) /λ.
There is a huge empirical literature that evaluates behavioral responses to tax changes.
Mots of them wishes to estimates how changes in the tax system induces changes in work-
effort L. There is however serious measurement problems. Hence, it seems more reasonable
to estimate instead the responses of gross income Y (Gruber and Saez (2002), Saez (2003)).
To two specifications of the utility functions are used, namely
Type 1 : log
ÃC − L
1+ 1k
1 + k
!Type 2 : logC − log
Ã1− L
1+ 1k
1 + k
!
Under both type-1 and type-2 utility functions the compensated elasticity of the labor supply
(along a linear tax schedule) simply equals 1/k and is exogenous. Type 1 utility function
corresponds to Diamond’s quasilinear specification with non income effects. Hence one has
for all w that αw/εw = 1 + k and ηw = 0. Type 2 utility function includes income effects.
32
Based on the empirical literature, Saez retains two values for the compensated elasticities
along a linear tax schedule (denoted ζc with his notations), namely 0.25 and 0.5.
The first scholars that have simulated optimal income tax schedules specifies a lognormal
distribution of skill (Mirrlees 1971) which fits the unimodality property of income distrib-
ution found in the data. However such specification is ad-hoc and fits poorly the top tail
distribution. Saez instead works with empirical distributions of earnings. For each of the
four utility functions he considered (Type 1 and Type 2, each of them for 1/k = 0.25 and
1/k = 0.5), he uses workers’ first-order condition (1.28) to recover skill levels as a function
of observed earnings levels.12 It is this skill distribution that he uses in his simulations. It is
worth noting that this procedure is conditional on a specific choice of the utility function.
The last component is the government’s taste for redistribution. This stage is difficult
since choosing a social welfare function is always a subjective exercise based on some value
judgments. Saez (2001) chooses to consider two social welfare functions: maximin (i.e. Uw)
and Benthamite. Given the concavity of type 1 and type 2 utility functions, the Benthamite
criterion is consistent with some government’s aversion towards inequality.13
Saez’s results are then given by Figure 1.12.
Figure 1.12: The numerical simulations of Saez (2001).
We learn the following.
• Optimal Marginal tax rates are positive everywhere. For the top income earners, this is12Saez (2001) used tax returns data.13Another attitude consists in using observed tax schedule to recover the social welfare function from the
optimal tax formulae (Bourguignon and Spadaro 2008).
33
due the unbounded distribution inferred. Since 1− F (w) /wf (w) is close to constant,(see Figures 1.10 and 1.11) above 200,000$ a year, marginal tax rates are roughly
constant above that threshold. For low income earners, marginal tax rates are positive
and very because workers with the lowest skill have a nil productivity and thereby do
not work.
• For every type of utility functions, and social welfare functions, marginal tax rates arehigher, the lower the compensated elasticity (along a linear tax schedule) ζc.
• For both types of utility functions and both ζc, marginal tax rates are higher under themaximin criteria, especially for lowest part of the distribution.
• Comparing Type 1 and Type 2, utility function for identical ζc and social welfarefunction, marginal tax rates are substantively higher in the presence of income effects.
34
Bibliography
[1] Akerlof, G., 1978, The Economics of “Tagging” as Applied to the Optimal Income Tax,
Welfare Programs, and Manpower Planning”, American Economic Review, 68(1), 8-19.
[2] Alesina, A., Ichino A. and Karabarbounis L., 2008, Gender Based Taxation and the
Division of Family Chores, mimeo Harvard.
[3] Boadway, R. and L. Jacquet, 2008, Optimal Marginal and Average Income Taxation
under Maximin, Journal of Economic Theory, 143, 425-441.
[4] Bourguignon, F. and Spadaro, A. 2008, Tax-benefit Revealed Social Preferences, PSE
Working Paper 2008-37.
[5] Diamond, P., 1998, Optimal Income Taxation: An Example with a U-shaped Pattern of
Optimal Marginal Tax Rates, American Economic Review, 88(1), 83-95.
[6] Ebert, U., 1993, A reexamination of the optimal nonlinear income tax, Journal of Public
Economics, 49, 47-73.
[7] Gruber, J., and E. Saez, 2002, The Elasticity of Taxable Income: Evidence and Impli-
cations, Journal of Public Economics, 84, 1-32.
[8] Guesnerie, R., 1995, A Contribution to the Pure Theory of Taxation, Cambridge Uni-
versity Press.
[9] Guesnerie, R. and Laffont, J-J, 1984, A complete solution to a class of principal-agent
problems with an application to the control of a self-managed firm, Journal of Public
Economics, 25, 329-369.
[10] Hammond, P., 1979, Straightforward Individual Incentive Compatibility in Large
Economies, Review of Economic Studies, 46, 263-282.
[11] Harsany, J., 1955, Cardinal Welfare, Individualistic Ethics, and Interpersonal Compar-
isons of Utility, Journal of Political Economy, 63(4), 309-21.
35
[12] Hellwig, M., 2008, A Maximum Principle for Control Problems with Monotonicity Con-
straints, Preprints of The Max Planck Institute for Research on Collective goods Bonn,
2008-04, http://www.coll.mpg.de/pdf_dat/2008_04online.pdf.
[13] Lollivier S. and J-C Rochet, 1983, Bunching and second-order conditions: a note on
optimal tax theory, Journal of Economic Theory, 31, 392-400.
[14] Mankiw, G. and Weinzierl, M., 2007, The Optimal Taxation of Height: A Case Study
of Utilitarian Income Redistribution, mimeo Harvard.
[15] Mirrlees, J., 1971, An Exploration in the Theory of Optimum Income Taxation, Review
of Economic Studies, 38(1), 175-208.
[16] Mirrlees, J., 1976, Optimal Tax Theory: A Synthesis, Journal of Public Economics, 6(3),
327-358.
[17] Piketty, T., La Redsitribution Fiscale face au Chômage, Revue Française d’Economie,
12, 157-201.
[18] Saez, E, 2001, Using Elasticities to Derive Optimal Income Tax Rates, Review of Eco-
nomics Studies, 68, 205-229.
[19] Saez, E., 2003, The Effect of Marginal Tax Rates on Income: A Panel Study of “Bracket
Creep”, Journal of Public Economics, 87, 1231-1258.
[20] Sadka, E., 1976, On Income Distribution, Incentive Effects and Optimal Income Taxa-
tion, Review of Economic Studies, 43, 261-267.
[21] Seade, J., 1977, On the Shape of Optimal Tax Schedules, Journal of Public Economics,
7, 203-235.
[22] Seade, J., 1982, On the Sign of the Optimum Marginal Income Tax, Review of Economic
Studies, 49, 637-643.
[23] Stiglitz, J., 1982, Self-Selection and Pareto Efficient Taxation, Journal of Public Eco-
nomics, 17, 213-40.
36