Economics of Optimal Redistributive Taxation - CREST

Economics of Optimal Redistributive TaxationPreliminary Handout / Notes de cours preliminaires1

Master Economie Théorique et Empirique (ETE),

Ecole d’Economie de Paris et Paris 1

Etienne LEHMANNCREST, Laboratoire de Macroéconomie

[email protected]://www.crest.fr/pageperso/lehmann/lehmann.htm

10th February 2009

1This is preliminary text. Thank you to email me your comments and remarks.

Contents

1 The Mirrlees model 3I The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

I.1 Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3I.2 The government’s preferences . . . . . . . . . . . . . . . . . . . . . . . 6

II The First-Best Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7III The Taxation Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9IV The Stiglitz (1982) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12V Continuum of skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

V.1 The incentive constraints . . . . . . . . . . . . . . . . . . . . . . . . . 17V.2 The resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19V.3 The reinterpretation of the optimality conditions . . . . . . . . . . . . 22V.4 Properties of the second-best optimum . . . . . . . . . . . . . . . . . . 30

VI Empirical implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2

Chapter 1

The Mirrlees model

I The Problem

In this Section, we describe the redistribution problem and solve it in the perfect information

case

I.1 Individuals

Individuals have preferences over consumption C and labor supply L. Here, labor supply is a

shortcut for many aspects of individuals’ behavior in the labor market such as in-work effort,

investment in education our hours of work. Preferences are represented by a utility function

U (C,L). We assume U (., .) is twice-differentiable with U 0C > 0 > U 0L. We furthermore assumethat U (., .) is strictly concave.

People differ by their exogenous productivity endowment w. We also refer to w as the

skill level. With labor supply L a worker of productivity w gets gross earnings Y = w · L.The cumulative distribution of w is F (.) over a support denoted Ω within R+.1

One key assumption is that individuals are free to choose optimally their labor supply.

This assumption means first that we neglect the existence of many fixed costs that make the

labor supply choice discrete rather than continuous. It also ignores potential externalities.

For instance one may think that one individual will value more her leisure, the higher the

leisure of her friends, etc. Finally, there are in reality many choices that are constrained by

the demand side of the labor market. Involuntary unemployment is the most apparent one,

but it is not the only. However, the assumption of a perfectly competitive and frictionless

labor market is a good starting point to study the theory of the optimal direct taxation.

In a first-best setting, Taxation can be conditioned on earnings and skill Tw (w), whereas

in a second-best setting, taxation can be conditioned on earnings only. To keep generality,

we provisionally keep the notation Tw (w) for both settings, while in keeping in mind that

1 If the support is unbonded, then se have to assume thatRΩw dF (w) is finite.

3

in a second-best setting for any w 6= w0, Tw (Y ) ≡ Tw0 (Y ). An individual of skill w thus

faces faces the budget constraint C = Y −Tw (Y ). She therefore chooses her labor supply bysolving:

maxL

U (w · L− Tw (w · L) , L) ⇔ maxY

UµY − Tw (Y ) ,

Y

w

¶(1.1)

Let Uw be the value of this program. If Lw is a solution, so that Yw = w · L and Cw =Yw − Tw (Yw), one has Uw = U (Cw, Lw). It is perfectly equivalent to assume that individ-uals chooses their work effort L, their consumption level C or their gross earnings Y . We

henceforth privilege this latter interpretation. If Tw (.) is differentiable in earnings Y , the

first-order condition associated to Program (1.1) implies:

1− T 0w (Yw) = −1

w

U 0LU 0C

µCw,

Yww

¶(1.2)

The left-hand side of (1.2) equals the marginal rate of substitution between earnings and

consumption (MRS for short). For an individual of skill w, the MRS at a bundle (Y,C)

equals

M (C, Y,w)def≡ − U 0L (C, Y/w)

w · U 0C (C, Y/w)(1.3)

It corresponds to the marginal cost of earnings in terms of consumption. A worker of skill

w has to receiveM (C,Y,w) additional units of consumption to be compensated for working

such harder that her earnings increases by one unit. When the tax schedule is differentiable,

the MRS equals one minus the marginal rate of taxation that faces individual i at (gross)

earnings Yi. Hence, we will frequently interpret the MRS in terms of marginal tax rate.

We now adopt a geometric representation of individuals’ preferences. Gross earnings Y

will be displayed on the X-axis, while consumption will be displayed on the Y-axis. To depict

workers’ indifference curve, let us define function Γ (., .) by

U = U (C,L) def⇔ C = Γ (U,L)

From the implicit function theorem, one has

Γ0U (U,L) =1

U 0C (Γ (U,L) , L)and Γ0L (U,L) = −

U 0LU 0C

(Γ (U,L) , L) (1.4)

In particular, the slope of the indifference curve of a worker of skill w that passes through a

bundle (C,L) equalsM (C,wL,w). The indifference curve of a worker of skill w, associated

to the utility level U verifies C = Γ (U, Y/w). Since for a given skill level utility U (C,Y/w) isincreasing in consumption and decreasing in earnings, indifference curve are upward-slopping.

Higher utility levels corresponds to lower earnings or higher consumption. Hence indifference

curve corresponds to a higher utility level as we move to the North-West in the (Y,C) plane.

4

Finally, from the concavity of U (., .), along a given indifference curve, M (Γ (U, Y ) , Y, w)

increases with earnings level Y .2 Therefore, indifference curves are convex (See Figure 1.1).

Y

C

CH

YH

UH= U(C,Y/wH)

CL

YL

UL= U(C,Y/wL)

Figure 1.1: Indifference curves

An important question is how the MRS varies with the skill level. We expect that

getting one more euro of earnings requires a smaller additional effort for more skilled worker.

Therefore, more skilled worker should be less compensated and one should have:

For all C, Y , w M0w (C, Y,w) < 0 (1.5)

From a geometric viewpoint, this assumption implies that among two indifference curves

associated to two skill levels wL < wH that intersects at a given bundle (C, Y ), the indifference

curve of the least skilled worker is always stepper than the one of the higher skilled worker (see

Figure 1.1). This the reason why this assumption is often refereed to the strict single-crossing

condition. It is also referred as the Spence-Mirrlees condition.

What specification of individuals’ utility function U (., .) are consistent with the Spence-Mirrlees condition (1.5)? A first example is the case where U (., .) is additively separable, soU (C,L) = w (C)− v (L), with w0 (c) > 0 ≥ w00 (c) , v0 (L) > 0 and v00 (L) > 0. Equation (1.3)then implies:

M (C, Y,w) =v0 (Y/w)

w ·w0 (C)The convexity of v (.) then ensures that (1.5) is satisfied. One can also verify that whenever

consumption is a normal good, (1.5) is also verified. Hence, while (1.5) is clearly a restriction

on preferences, it does not seem to be a strong assumption to make. Therefore we henceforth

adopt this assumption.2Using (1.4), the derivative of M (Γ (U, Y/W ) , Y, w) with respect to Y equals¡

−1/w2¢(U 0C)−3

n(U 0L)2 U 00CC − 2U 0CU 0L + (U 0C)2 U 00LL

o. The term in the bracket is negative by concavity

of U (., .), so the derivative is positive.

5

I.2 The government’s preferences

The problem of redistribution arises because of two ingredients. First, individuals are dif-

ferent. Second, in one way or the other, the “government” considers the induced inequality

as “unfair”. There is nevertheless several problems that we have to deal to give a precise

meaning of this.

First, at best, we can rationalize in an axiomatic way some ordinal preferences. For

instance, under the axiomatic of Von-Neuman and Morgenstern, one can represent preferences

thanks to one utility function among a class that is invariant to a linear transformation. We

have however no non-arbitrary way to choose one utility function among all of these ones.

Given this problem, the issue of comparing the utility levels reached by different individuals

is even worst. However, the problem of redistribution is simply meaningless if we don’t

admit the existence of one cardinal representation of individuals’ utility that is furthermore

comparable across individuals of different types. Although this assumption is a very strong

one, we can do nothing else than working with it.

Second, even if individuals’ utility levels are comparable, there is different ways to quantify

these comparison. Let us assume that, due to the some desire of horizontal equity, different

individuals of the same skill level are treated identically. Then individuals’ utility can be

aggregated in different ways.

1. One may think a good representation is simply to sum individuals’ level of utility.ZΩU (Cw, Lw) dF (w) (1.6)

I will call this approach the Benthamite one. Its drawback is that it ignores the in-

equality in utility levels.

2. At the other extreme, one may want to consider only the wellbeing of the worst of in

the society. This is the maximin criterion

minwU (Cw, Lw) (1.7)

that many economists associates to John Rawls’ (1971) theory of justice.

3. In general, I will consider the sum of some increasing and concave transformation Φ (.)

of the individuals utility levelsZΩΦ (U (Cw, Lw)) dF (w) (1.8)

with Φ0 (.) > 0 and Φ00 (.) < 0. The concavity of Φ (.) then captures the government’s

aversion towards’ inequality.

6

What this discussion suggests is that choosing a specific social welfare function it is always

an arbitrary exercise. In a sense, this is reassuring since it keeps a room for value judgements

and political controversy in the social debate. Hence, we have to be clear that the notion of

any “social optimum” is always based on these value judgement that are behind the choice

of a social welfare function.

Harsany (1955) has nevertheless proposed a rather nice story, whose application to the

problem of direct taxation is the following. Let us assume that individuals are born identical.

Then, they draw at random a skill level w (according the distribution F (.)). Therefore, apart

from their heterogenous skill levels they are identical. One can consider that their utility

functions to be identical, thereby comparable. Moreover, one can retain as a social welfare

function the common expected utility of individuals before they draw their productivity level.

The Benthamite criterion therefore coincide with the maximization of agents’ expected of

utility “behind the veil of ignorance” concerning their skill level w.

Finally, what only matters at the end of the day is not functions Φ (.) and U (., .) by itself,but the composition Φ (.) U (., .). Therefore, many scholars assume a Benthamite criterionand consider that the government’s aversion towards inequality are already included in the

concavity of the U (., .). This argument is however only correct provided that the objectiveis not Maximin. Moreover, this way of doing hides the specific role of the government’s

tastes for redistribution. The latter is specifically linked to the concavity of Φ (.), and not

the one of U (., .). Finally, even if we ignore involuntary unemployment in this chapter, itsexistence breaks this equivalence. For instance, if U (., .) is linear in C and Φ (.) is concave,

this would means that individuals are risk-neutral about the unemployment risk, whereas the

government values negatively the inequality induced by unemployment.

The government faces a budget constraint. Let E ≥ 0 be an exogenous amount of publicexpenditures to finance. The government’s budget constraint writes:Z

Ωw Lw − Cw dF (w) ≥ E (1.9)

since an individual of skill w pay taxes Yw − Cw. We can reexpress this constraint as aresource constraint, in the vein of the Walras law.Z

ΩCw dF (w) +E ≤

ZΩYw dF (w) (1.10)

II The First-Best Problem

Let us first consider the case where the government can freely control individuals’ behavior.

The first-best problem of redistribution then consists in choosing bundles w 7→ (Yw, Cw) to

7

maximize a social welfare function such as (1.8), subject to the resource constraint (1.10):

maxYw,Cww∈Ω

ZΩΦ

µU

µCw,

Yww

¶¶dF (w) (1.11)

s.t :

ZΩYw − Cw dF (w) ≥ E

Let λ be the Lagrangian multiplier of the budget constraint. λ stands for the social

marginal value of public funds. Then, denoting©¡C1w, Y

1w

¢ªw∈Ω the solution to this program,

the first-order condition for the first-best problem (1.11) writes:3

λ

Φ0 (U1w)= U 0C

µC1w,

Y 1ww

¶= −

U 0L³C1w,

Y 1ww

´w

(1.12)

where obviously U1w = U³C1w,

Y 1ww

´and L1w = Y

1w/w.

We now investigate how the government can induce individuals to behave in such a way

that the equilibrium allocations (Cw, Yw) defined by (1.1) and the resource constraint (1.10)coincides with the first-best optimum

©¡C1w, Y

1w

¢ªw∈ defined by (1.12).

If the government is perfectly informed about individuals’ productivity, it can implement

some skill-specific tax schedules Tw (.) : Y 7→ Tw (.). Now, consider a specific skill level w and

assume for simplicity that the skill-specific tax schedule is linear. One can assume it takes

the form:

Tw (Y ) = Y1w − C1w + τw (Y − Yw)

In other words, individuals of skill w faces a constant marginal tax rate τw that ensures them

to get a consumption level C1w if they work L1w, so that their earnings is Y

1w . Because the

utility function U (., .)is concave and the tax function is linear, the program (1.1) of individualof skill w is well-behaved. In particular, the concavity of U (., .) ensures there exists a uniquesolution Yw. Hence the equilibrium coincides with the first-best allocation if and only if

the first-order condition (1.2) of the individual’s program (1.1) coincides with the first-order

conditions (1.12) of the first-best program (1.11). Since (1.12) implies:

1 = −U 0L¡Cw,

Yww

¢w · U 0L

¡Cw,

Yww

¢the first-best allocation is decentralized if and only if τw = 0. Hence, the first-best is de-

centralized by a set of transfers that vary with the exogenous skill level w, but not with the

endogenous earnings level Y . In other words, differentiated lump-sum transfers are required.

This result is an application to the present context of the Second theorem of welfare economics

. The intuition is straightforward. When τw = 0, the transfer does not depend on earnings

and therefore, does not affect the marginal return of effort. So doing does not introduce any3We here consider only interior solutions for which Yw > 0 and Cw > 0.

8

distortion in the labor supply choice. However, since the intercept of the tax is conditioned

on the skill level, the government can transfers as much income among agents without any

labor supply distortion.

III The Taxation Principle

This first-best optimum needs differentiated lump-sum transfers to be decentralized. In par-

ticular, this requires that the government can condition the tax schedule on the skill levels.

We have however many reasons to think, that in the real world, such conditioning is not

doable.

1. The first reason is that the government does not observe skill levels. It is stricto sensu

an informational constraint on the government’s ability to intervene.

2. However, one might argue that many determinants of the workers’ productivity are

observable. For instance, education or work experience is typically observed. So many

scholars have argue that taxation should not only condition tax on earnings but on all

the individuals’ observable characteristics by the government (Akerlof 1978). Following

this road, Alesina et al. (2008) defend the idea that taxation should differ for men

and women (the gender based taxation), whereas Mankiw and Weinzierl (2008) indicate

there could be potential gain by conditioning tax on individuals’ height. In the same vein

one may argue that taxation should also be condition on different exogenous aspects that

are correlated with a worker’s skill. However, so doing hurts the conception of horizontal

equity. Hence, even if some information that is correlated to workers’ productivity level

is available, using this information is typically not allowed by “constitutions”.

Therefore, we now simply assume that taxation can only be conditioned on earnings Y

and not on skill levels w. However so doing induces that redistributive taxation has to be

distortionary. To get the intuition, consider the case where more productive workers get

higher earnings and where the government wish to transfer income from high skill to low

skill workers. This means that the government wishes to make the function w 7→ T (Yw)

increasing. However, so doing, marginal tax rates T 0 (Yw) become positive, which reduces

the marginal return of earnings in terms of consumption. So the labor supply is distorted

downwards. Hence, it not possible to make transfers increasing in the skill level without

discouraging work effort. The informational constraints are thus essential to include in the

analysis if one wishes to consider the distortions induced by taxation in the design of the

optimal redistributive policy.

We now wonder how to incorporate this informational constraint in the redistributive

program. Let T (.) be the tax function that depends on earnings only. Whatever their skill

9

level, any individual faces the budget constraint C = Y −T (Y ). Hence, an individual of skilllevel w solves

maxY

UµY − T (Y ) , Y

w

¶(1.13)

The key difference with Program (1.1) is that now, all individuals face the same tax schedule,

whatever their skill level. Let Yw be a solution to this program and let Lw = Yw/w, Cw =

Yw − T (Yw) and Uw = U (Cw, Lw). Then let x 6= w be another skill level. Because Yw solves(1.13) and Cx = Yx − T (Yx), one has that

∀ (w, x) ∈ Ω2 UµCw,

Yww

¶≥ U

µCx,

Yxw

¶(1.14)

This constraint means that, when taking their labor supply decision, workers of skill w prefer

the bundle (Cw, Yw) designed for them rather then the bundle (Cx, Yx) designed for workers

of any other skill level x.

The assumption that the government can condition taxation on earnings only and not

on skill level implies that allocations w 7→ Cw, Yw that can be reached by the governmenthave to be consistent with constraints (1.14). Therefore, one should add these restrictions in

the government’s problem (1.11). One may then wonder whether adding these restrictions is

sufficient to fully characterizes the set of allocations that can be obtained by a government.

The Taxation principle (Hammond 1979, Rochet 1985 and Guesnerie 1995) answers positively

to this question.

More precisely, it ensures that an unsophisticated government which can only implement

income taxation Y 7→ T (Y ) has the same possibilities than a sophisticated government that

can implement a direct truthful mechanism w 7→ (Cw, Yw) subject to (1.14). We have already

seen that for any income tax schedule Y 7→ T (Y ), the induced allocation w 7→ (Cw, Yw)

verifies (1.14). The taxation principle ensures the reciprocal. Let w 7→ (Cw, Yw) be an

allocation that verifies (1.14). Then one can build an income tax schedule Y 7→ T (Y ), such

that any individual of skill w confronted with it, effectively chooses the earnings level Yw and

the consumption level Cw = Yw − T (Yw) designed for her skill level.To show this reciprocal, let w 7→ (Cw, Yw) be an allocation that verifies (1.14) and let

Y = Y such that there exists w ∈ Ω for which Y = Yw. We now build step by step a taxfunction T : Y 7→ T (Y ), such that for all w ∈ Ω, Cw = Yw − T (Yw) and Yw solves (1.13).

1. Let Y ∈ Y.

(a) If there exists a single skill level w ∈ Ω for which Y = Yw, then one must simplyhave T (Yw) = Yw − Cw.

10

(b) Consider now the case where there exists (w, x) ∈ Ω2 with w 6= x such that

Y = Yw = Yx. Then applying (1.14), one has

UµCw,

Y

w

¶≥ U

µCx,

Y

w

¶

so one must have Cw ≥ Cx. Symmetrically, inverting the roles of w and x in (1.14),one has that

UµCx,

Y

x

¶≥ U

µCw,

Y

x

¶which is only possible if Cx ≥ Cw. Hence one must have Cw = Cx if Yw = Yx. Wecan therefore define T (Y ) = Y − Cw without any ambiguity.

2. If Y /∈ Y, we define T (Y ) = +∞

Given such a tax function, we have now to verify that for any skill level w ∈ Ω, Ywsolves (1.13). Choosing an earnings level Y /∈ Y, implies T (Y ) = +∞. So such choice issuboptimal. Now choosing Y ∈ Y with Y 6= Yw amounts to chose a skill level x such that

Y = Yx. However, incentive constraints (1.14) and Ct = Yt − T (Yt) for t = w, x then impliesthat

UµYw − T (Yw) ,

Yww

¶≥ U

µY − T (Y ) , Y

w

¶= U

µCx,

Yxw

¶which ends the proof that Yw is a solution to (1.13). It is worth noting that this proof only

uses the assumptions that utility U¡., .w

¢increases in consumption and decreases in earnings.

In particular, the Spence-Mirrlees assumption is here useless.

Finally, it is important to note that the restriction of the function Y 7→ Y − T (Y ) overY has to be increasing. To show this statement, let x and w be two skill levels such that

Yw < Yx. If we assume by contradiction that Cx ≥ Cw, then one would have U¡Cx,

Yxw

¢≥

U¡Cw,

Yxw

¢> U

¡Cw,

Yww

¢. So, Yw < Yx implies Cx < Cw and the incentive constraint (1.14)

would be violated. This statement implies that whenever the tax function is differentiable,

the marginal tax rate T 0 (Y ) has to be lower than one for any reached level of gross income.

Moreover it can equal 1 only pointwise.4 It means in particular that if T 0 (Y ) ≥ 1 for someearnings level Y , than Y will be not chosen by any worker, whatever her skill level. Moreover,

this result has nothing to do with optimized tax schedules. It is a property of any incentive-

compatible allocations.

4Recall that an increasing and differentiable function may have a derivative that equals 0 pointwise, asillustrated by Function x 7→ x3

11

In sum, the redistributive problem consists in solving

maxYw,Cww∈Ω

ZΩΦ

µU

µCw,

Yww

¶¶dF (w) (1.15)

s.t :

ZΩYw − Cw dF (w) ≥ E

∀ (w, x) ∈ Ω2 UµCw,

Yww

¶≥ U

µCx,

Yxw

¶

A solution to this program corresponds to an optimal (second-best) allocation. From an

optimal allocation w 7→ (Cw, Yw), we now know how we can retrieve a tax function Y 7→ T (Y )

that decentralizes it.

IV The Stiglitz (1982) model

Program (1.15) is in general very complex. Stiglitz (1982) has proposed to focus on a version

of this problem with only two skill levels 0 < wL < wH . In other words the support Ω of the

distribution F (.) is reduced to two mass points at wL and wH . This restriction on the skill

distribution has proved to be very fruitful in understanding how informational constraints

(1.14) affects the optimal allocations.

Moreover, Stiglitz has proposed to describe the full set of second-best Pareto Optima,

instead of restricting to optimal allocation according to a specific social welfare function. So

doing enables to escape of deriving results that depends on some specific “value-judgements”

that determine a specific social welfare function. Let πL and πH be the mass of workers of

skill H and L with 0 < πL,πH < 1 and πL + πH = 1. The set of second-best Pareto optima

are described as the set of solutions of

maxYH ,CH ,YL,CL

Φ

µUµCL,

YLwL

¶¶πL · YL − CL+ πH YH − CH ≥ E Φ

µUµCH ,

YHwH

¶¶≥ UH

UµCH ,

YHwH

¶≥ U

µCL,

YLwH

¶UµCL,

YLwL

¶≥ U

µCH ,

YHwL

¶

when the parameter UH varies. Instead, we find more fruitful to think of the dual program

of maximizing tax revenues for a given levels of utility UH and UL.5 Program (1.15) then

5Program (1.16) is defined only for combinations of (UL, UH) such that the solution to (1.16) induces abudget surplus πL YL − CL+ πH YH − CH higher than E.

12

becomes

maxYH ,CH ,YL,CL

πL · YL − CL+ πH YH − CH ≥ E (1.16a)

Φ

µUµCL,

YLwL

¶¶≥ UL and Φ

µUµCH ,

YHwH

¶¶≥ UH (1.16b)

UµCH ,

YHwH

¶≥ U

µCL,

YLwH

¶(1.16c)

UµCL,

YLwL

¶≥ U

µCH ,

YHwL

¶(1.16d)

In Program (1.16), constraints (1.16c) and (1.16d) are the restrictions that may generate

distortions of the redistributive policies.

To better understand the working of the Stiglitz model, a geometric approach has proved

to be useful. Indifference curves are depicted in Figure 1.1. They are increasing and convex.

Following the Spence-Mirrlees assumption (1.5), the indifference curves of the low skilled

workers are everywhere stepper than the ones of the high skilled worker. Figure 1.1 also

display one bundle (CL, YL) designed for low skilled workers and one bundle (CH , YH) for high

skilled workers. To be consistent with (1.16c), the bundle (CL, YL) designed for low skilled

workers must be dominated by the other bundle from the high-skilled workers’ viewpoint.

Therefore, the bundle (CL, YL) has to be located below the high-skilled workers’ indifference

curve that passes through bundle (CH , YH). Symmetrically, to be consistent with (1.16d), the

bundle (CH , YH) designed for high skilled workers has to be below the low-skilled workers’

indifference curve that passes through the bundle (CL, YL). This is the case in Figure 1.1.

We can now turn to the resolution of the government’s problem. Substituting the con-

sumption levels Ci by Γ (Ui, Yi/wi), denoting μH the Lagrange multiplier associated to the

incentive constraint (1.16c) and μL the one associated to (1.16d), the Lagragian of Problem

(1.16) writes:

L (UH , UL, YH , YL,λ,μH ,μL) ≡ πL

∙YL − Γ

µUL,

YLwL

¶¸+ πH

∙YH − Γ

µUH ,

YHwH

¶¸−E + μH

½UH − U

µΓ

µUL,

YLwL

¶,YLwH

¶¾+ μL

½UL − U

µΓ

µUH ,

YHwH

¶,YHwL

¶¾Given (1.4), one has Γ0L (Ci, Yi/wi) =M (Ci, Yi, wi) · wi, so the necessary conditions are:

1−M (CH , YH , wH) =μL · U 0C

³CH ,

YHwL

´πH

M (CH , YH , wH)−M (CH , YH , wL) (1.17a)

1−M (CL, YL, wL) =μH · U 0C

³CL,

YLwH

´πL

M (CL, YL, wL)−M (CL, YL, wH) (1.17b)

where λ ≥ 0, μH ≥ 0 and μL ≥ 0.

13

Consider as a benchmark the irrealistic case where the government observes workers’

skill levels, so that constraints (1.16c) and (1.16d) should be ignored (so μH = μL = 0 in

1.18), the problem would then consist in choosing for skill level i = H,L a bundle (Ci, Yi)

that maximizes tax revenues Yi − Ci subject to Ui = U (Ci, Yi/wi). The necessary conditionimpliesM (Ci, Yi, wi) = 1. Graphically, this optimal bundle corresponds to the point of the

indifference Ui = U (Ci, Yi/wi) where the slope of the tangency is parallel to the 45 line. Thisearnings level is denoted Y ∗i in Figures 1.2, 1.3 and 1.4.

6 The corresponding consumption

levels is C∗i = Γ (Ui, Y∗i /wi).

Let us now turn back to the case where the government does not observe skill levels so

that constraints (1.16c) and (1.16d) now matter. Three cases should then be distinguished

depending on the earnings level where the two indifference curves intersect. In the first case,

this level is between Y ∗L and Y∗H (see Figure 1.2). In the second case the two indifference

curves intersect at an earnings level below Y ∗L and Y∗H (See Figure 1.3), whereas in the third

case, they intersect at an earnings level above Y ∗H and YL∗ (See Figure 1.3).

Case 1: First-best optimal taxation is fully revealing

If the indifference curves intersects at an earnings level that is between Y ∗H and Y∗L (see Figure

1.2), then the allocation (C∗L, Y ∗L ) , (C∗H , Y ∗H) verifies the incentive constraints (1.16c) and(1.16d), which are thus not binding. In such configuration, the Lagrange multipliers μH and

μL are nil in Equations (1.18) and the MRS equals 1 for both skill levels. Recalling from

(1.2) the interpretation ofM (Ci, Yi) as one minus the marginal tax rate at earnings level i,

marginal tax rates are nil. Therefore the first-best taxation is fully revealing. An example of

such a second-best Pareto optimum is the laissez faire allocation with no taxes.

Case 2: The “normal” case

In the second case, depicted in Figure 1.3, the two indifference curves intersect at an earnings

level that is lower than Y ∗L . In such situation, absence of constraint (1.16c), the bundle

(C∗L, Y∗L ) would maximize the tax revenues of the government for a given level of low skilled

workers’ utility level. However this bundle violates (1.16c). High skilled workers then prefer

to get a gross earnings Y ∗L and a consumption level C∗L, rather than the bundle (CH , YH) that

maximizes tax revenues paid by high skilled given the utility constraint U (C, Y/wH) = UH .To prevent this “mimicking”, the government has to reduce both earnings and consumption

designed for low skilled workers along the low skilled workers’ indifference curve. So doing,

6This an abuse of notations. In presence of income effects then Y ∗i depends on the level of the promisedutility Ui.

14

Y

C

CH

UH= U(C,Y/wH)

CL

UL= U(C,Y/wL)

YH = YH*YL = YL

*

Figure 1.2: First-Best Taxation is fully revealing

the utility of low skilled workers remains identical. However, because of the single-crossing

condition (1.5), the incentives for high skilled workers to choose the bundle designed for low

skilled workers decreases. However, the tax revenues YL − CL paid by low skilled workers

decreases too. So the reduction in earnings and consumption along low skilled workers’

indifference curve happens until high skilled workers become indifferent between the two

bundles. Therefore, the optimum corresponds to the intersection of the two indifference

curves, implying that the incentive constraint (1.16c) is binding. Conversely, the incentive

constraint (1.16d) remains slack. Hence, this configuration corresponds to the case where the

Lagrange multipliers verify μH > 0 = μL. Equations (1.17a) and (1.17b) thus becomes:

1−M (CH , YH , wH) = 0

1−M (CL, YL, wL) =μH · U 0C

³CL,

YLwH

´πL

M (CL, YL, wL)−M (CL, YL, wH) > 0

High skilled workers’ labor supply remains undistorted and YH = Y ∗H . Conversely low skilled

workers’ labor supply is distorted downwards sinceM (CL, YL, wL) < 1 . Recalling from (1.2)

the interpretation of M (Ci, Yi) as one minus the marginal tax rate at earnings level i, this

implies that the marginal tax rate is nil for high skilled workers and positive for low skilled

workers.

Stiglitz (1982) labelled this configuration as the normal case. The following experiment

gives a rational to this denomination. If the utility UH devoted to high skilled workers in-

creases, the steeper indifference curves shifts upwards. Therefore, YL can be raised without

hurting the incentive constraint (1.16c), YH increases and the distortions of low skilled work-

ers’ labor supply are reduced. However, so doing implies that the tax revenues paid by low

skilled workers increases (they are taxed more efficiently for an unchanged level of utility)

15

whereas the tax revenues paid by high skilled workers are reduced (they are still paid effi-

ciently, but they pay less taxes since they get a higher utility). Hence, the rise of low skilled

workers’ labor supply comes along a rise in inequality.

Y

C

CH

YH = YH*

UH= U(C,Y/wH)

CL

YL

UL= U(C,Y/wL)

YL*

Figure 1.3: The normal case

Case 3: The antiredistributive case

The polar case where the two curves intersects at an earnings level above Y ∗H is depicted in

Figure 1.4. Then the allocation (C∗L, Y ∗L ) , (C∗H , Y ∗H) violates (1.16d): low skilled workersprefers the bundle (C∗H , Y

∗H) designed for high skilled workers, rather than the bundle (C

∗L, Y

∗L )

designed for them. The government should therefore distorts upwards the bundle designed for

high skilled workers until (1.16d) is met with equality. Hence, this configuration corresponds

to the case where the Lagrange multipliers verify μL > 0 = μH . Equations (1.17a) and

(1.17b) thus becomes:

1−M (CH , YH , wH) =μL · U 0C

³CH ,

YHwL

´πH

M (CH , YH , wH)−M (CH , YH , wL)

1−M (CL, YL, wL) = 0

Low skilled workers’ labor supply is now undistorted and YL = Y ∗L . Conversely, high

skilled workers’ labor supply is distorted upwards sinceM (CH , YH , wH) > 1. Recalling from

(1.2) the interpretation of M (Ci, Yi) as one minus the marginal tax rate at earnings level

i, this implies that the marginal tax rate is nil for high skilled workers and positive for low

skilled workers. Therefore, one has a zero marginal tax rate at earnings YL and a negative

marginal tax rates at YH .

16

Y

C

CH

YH

UH= U(C,Y/wH)

CL

UL= U(C,Y/wL)

YL = YL* YH

*

Figure 1.4: The Anti-redistributive case

V Continuum of skills

The two skilled model of Stiglitz is a useful preliminary step to understand how the incen-

tives constraints (1.14) distorts the optimal allocations, thereby the structure of second-best

optima. However, it only gives information about the tax function at two earnings levels. We

now consider problem (1.15) with a continuum of skill levels. Hence the distribution F (.) of

skill is supposed to be continuous on a connected support [w,w], with 0 < w < w ≤ +∞. Toease the exposition, I restrict in all this Section the utility function to be additively separable

and of the form:

U (C,L) ≡ w (C)− v (L) where w0 (c) > 0 ≥ w00 (C) v0 (L) > 0 and v00 (L) > 0

V.1 The incentive constraints

The first issue is technical. How to deal with constraints (1.14), that is with a double con-

tinuum of inequalities? Mirrlees (1971) has shown that under the Spence-Mirrlees condition

(1.5), these constraints are equivalent to a differential equation and a monotonicity con-

straints. Let Uw be the value function associated to the optimization program of workers of

skill w, that is

Uw = maxY

w (Y − T (Y ))− vµY

w

¶= w (Cw)− v

µYww

¶(1.19)

Then the usual envelop argument applied to the latter program then suggests that

Uw =Yww2

· vµYww

¶> 0 (1.20)

It is then possible to show that whenever the Spence-Mirrlees condition (1.5) holds, the set

of allocations w 7→ (Cw, Yw, Uw) that verify Uw = U (Cw, Yw/w) and (1.14) and the set of

17

allocations w 7→ (Cw, Yw, Uw) that verify Uw = U (Cw, Yw/w), (1.20) almost everywhere andthe requirements that w 7→ Yw is non decreasing and w 7→ Uw is continuous is the same. This

is result is very useful because it allows to solve program (1.15) thanks to optimal control

technics, by taking Uw as the state variable and earnings Yw as the control.

Let w 7→ (Cw, Yw, Uw) be an allocation that verifies Uw = U (Cw, Yw/w) and (1.14). Wewill first show that w 7→ Yw is non decreasing. Then we will show that w 7→ Uw is continuous

and finally, that (1.20) holds almost everywhere.

1. Since for any x, one has Ux = w (Cx)−v (Yx/x), the incentive constraint (1.14) implies

Uw ≥ w (Cx)− vµYxw

¶= Ux + v

µYxx

¶− v

µYxw

¶Inverting the roles of x and w, one obtains that

v

µYxx

¶− v

µYxw

¶≤ Uw − Ux ≤ v

µYwx

¶− v

µYww

¶(1.21)

Now, assume without loss of generality that x < w. Then the inequality between the

two extremities implies:7Z w

x

½Ywt2v0µYwt

¶− Yxt2v0µYxt

¶¾≥ 0

By the convexity of v (.), Y 7→¡Y/t2

¢v0 (Y/t) is increasing. Therefore, the last inequal-

ity implies Yw ≥ Yx, which ends the proof that w 7→ Yw has to be nondecreasing.8

2. To show the continuity of w 7→ Uw. Take a skill level w. Then both extremes of (1.21)

tends to 0 as x tends to 0. Therefore Ux tends to 0 as x tends to x, which ensures the

continuity at w of w 7→ Uw.

3. We now turn to the differentiability and the derivative of w 7→ Uw. We here use the

mathematical results that any nondecreasing function is continuous everywhere except

on a set that is at worst countable. Accordingly, w 7→ Yw is henceforth said to be “almost

everywhere” continuous. Now, let w be a skill level at which w 7→ Yw is continuous.

Then for x < w, (1.21) implies

v¡Yxx

¢− v

¡Yxw

¢w − x ≤ Uw − Ux

w − x ≤v¡Ywx

¢− v

¡Yww

¢w − x

By continuity at w of w 7→ Yw, both extremes tends to¡Ywv

0 (Yw) /w2¢as x tends to

w. So w 7→ Uw admits a left-derivative at w that equals¡Ywv

0 (Yw) /w2¢. When x > w,

a symmetric reasoning holds to show that w 7→ Uw admits a right-derivative at w that

equals¡Ywv

0 (Yw) /w2¢too. Hence, (1.20) holds almost everywhere.

7We use here that v¡Yx

¢− v

¡Yw

¢=R wx

Yt2v0¡Yt

¢dt

8 It is worth noting here that we did not use any assumption about the support of the skill levels distribution.Hence the proof can be translated to the Stiglitz case to show that one must have YH ≥ YL.

18

We now verify the reciprocal. Let w 7→ (Cw, Yw, Uw) be an allocation such that Uw =

U (Cw, Yw/w), w 7→ Yw is nondecreasing, w 7→ Uw is continuous and (1.20) holds almost

everywhere. We have to verify whether such allocation verifies (1.14). Let two skill levels w

and x. By the continuity of w 7→ Uw and the fact that (1.20) holds almost everywhere, we

have that

Uw − Ux =Z w

x

µYtt2

¶v0µYtt

¶dt

Now since v (.) is increasing and convex, for any t one has that the function Y 7→¡Y/t2

¢v0 (Y/t)

is increasing. If x < w (resp. x > w), than for all t ∈ (x,w) (t ∈ (w, x)) one has¡Yt/t

2¢v0 (Yt/t) ≥

¡Yx/t

2¢v0 (Yx/t) (resp. ≤), since w 7→ Yw is nondecreasing. Hence one

has: Z w

x

µYxt2

¶v0µYxt

¶dt ≥

Z w

x

µYtt2

¶v0µYtt

¶dt

since w > x (x < w). Therefore, one has

Uw − Ux ≥Z w

x

µYxt2

¶v0µYxt

¶dt

Integrating the right-hand side and using Ux = w (Cx)− v¡Yxx

¢gives (1.14)

Hence, using that Cw = Γ (Uw, Yw/w), the second best problem (1.15) can therefore be

rewritten as

maxYw,Uww∈Ω

ZΩΦ (Uw) dF (w) s.t : w 7→ Yw is nondecreasing (1.22)ZΩ

½Yw − Γ

µUw,

Yww

¶¾dF (w) ≥ E (λ)

Uw =Yww2v0µYww

¶(qa)

V.2 The resolution

We now present the resolution of Program (1.15). The idea is to take Yw as the control

variable, Uw as the state variable, to define the Hamiltonian as

H (Y,U,w, q,λ) =½Φ (U) + λ

∙Y − Γ

µU,Y

w

¶¸¾f (w) + q

Y

w2v0µY

w

¶(1.23)

where we have assumed that the distribution of skill has no mass points and admits a con-

tinuous density f (.). The co-state variable is denoted q and λ is the Lagrange multiplier

associated to the government’s budget constraint. Then one can apply the optimal control

tools to get a set of necessary conditions that the second-best optimal allocation has to verify.

In doing this, several difficulties may appear, that we should be aware of.

The first difficulty is the treatment of the monotonicity constraint on w 7→ Yw. This con-

straint makes difficult the consideration of Yw as a “true” control variable. If the monotonicity

19

Y

C U2= U(C,Y/w2)U1= U(C,Y/w1)

C = Y – T(Y)

Y

U’= U(C,Y/w’)w1 < w’ < w2

Figure 1.5: An example of bunching

constraint is binding on an interval [w1, w2], then Yw is constant over this interval, and the

same occur for consumption. Hence, the same bundle (Cw, Yw) is offered to a bunch of skill.

It is then said that a bunching (of types) occur over [w1, w2]. An example of such bunching

is suggested in Figure 1.5. There, the tax function Y 7→ T (Y ) has a kink at earnings level Y

with a sudden increase in the marginal tax rate. Therefore, the function Y 7→ Y −T (Y ) hasa “downward” kink at earnings level Y . Workers of skill w1 (resp. w2) have an indifference

curve that is tangent to the Y 7→ Y − T (Y ) just before (after) the kink. Workers of skillw0 between w1 and w2 faces a too low marginal tax rate just before earnings Y , which gives

them an incentive to work more than Y , whereas they face a too high marginal tax rate just

after the kink, which gives them an incentive to work less than Y . Consequently, they are

“stuck” to work to get exactly the earnings level Y and bunching occurs between w1 and w2.

The easiest and classical way to deal with bunching is to solve a “relaxed” version of

Program (1.15) where the monotonicity constraint is ignored. Then, one has to verify (ana-

lytically, or numerically though simulations) that the solution to the relaxed program verifies

the monotonicity constraint, so this allocation also solves the full program. This is the so-

called first-order approach that we follow.9

The second problem is the case where w 7→ Yw is discontinuous at an earnings level Y .

It is somehow the opposite problem with w 7→ Yw increasing “infinitely rapidly” at skill level

w. Figure 1.6 illustrates this possibility. The function Y 7→ Y − T (Y ) becomes locally moreconvex than the indifference curve of workers of skill w2. Consequently there are two tangency

points at earnings level Y L2 < YH2 and function w 7→ Yw “jumps” from Y L2 to Y

H2 at skill level

9See Lollivier and Rochet (1983), Guesnerie and Laffont (1984), Ebert (1993) or Hellwig (2008) for alter-native methods that consider the possibility of bunching.

20

Y

C

U3= U(C,Y/w3)

C = Y – T(Y)

U2 = U(C,Y/w2)

U1= U(C,Y/w1)

w1 < w2 < w3

YH2YL

2

Figure 1.6: An example of discontinuous allocation

w2.Surprisingly, the eventuality of discontinuity of the optimal allocation has received less

attention in the literature than bunching. The usual attitude consists in simply ignoring that

eventuality. From a technical viewpoint, the necessary conditions derived from the control

technics are only available if there is a finite number of discontinuity points. However, since

w 7→ Yw is nondecreasing, the set of points of discontinuity is at worst countable, thereby of

zero measure (since we have assume no mass points in the skill distribution). We therefore

assume that this set is finite.10

We can now apply optimal control. For all skill levels where w 7→ Yw is continuous (that

is “almost everywhere”), w 7→ Uw is differentiable and (1.20) holds. Moreover, the necessary

conditions

0 =∂H∂Y

(Yw, Uw, w, qw,λ) and − qw =∂H∂U

(Yw, Uw, w, qw,λ)

hold, which imply, given (1.4) and (1.23):

1−v0¡Yww

¢w ·w0 (Cw)

a.e= − qw

λ · w2 · f (w)

∙v0µYww

¶+Y

wv00µYww

¶¸(1.24a)

−qw a.e=

½Φ0 (Uw)−

λ

w0 (Cw)

¾f (w) (1.24b)

Finally, we know that w 7→ (qw, Uw) is continuous and that the transversality conditions write

qw = qw = 0. Integrating (1.24b) between w and w, Equation (1.24a) becomes:

1−v0¡Yww

¢w ·w0 (Cw)

a.e=v0L¡Yww

¢+ Y

wv00 ¡Yw

w

¢w2 · f (w)

Z w

w

½1

w0 (Cn)− Φ

0 (Un)

λ

¾f (n) dn (1.25)

10This is an assumption. For instance within the set R of real numbers, the subset Q of rational number iscountable but dense within R.

21

where the Lagrange multiplier associated to the budget constraint is determined by the

transversality condition at qw = 0 through:

λ ·Z w

w

1

w0 (Cn)f (n) dn =

Z w

wΦ0 (Un) f (n) dn (1.26)

How these conditions are modified in presence of bunching? We here follows the approach

suggested by Guesnerie and Laffont (1984). Assume, there is bunching over a finite number n

of intervals denoted£bi, bi

¤, with w ≤ b1 < b1 < ... < bn ≤ w and that w 7→ Yw is continuous

everywhere, and is differentiable everywhere except on a finite number of points (including©bi, bi

ªi=1,...n

). Then one can define cw = Yw as the control variable, imposes zero as a lower

bound on cw and take Uw and Yw as state variables. With f the co-state variable associated

to Y , and e the Lagrange multiplier associated to the inequality constraint (Y =)c ≥ 0, theHamiltonian (1.23) becomes

H (Y,U,w, q,λ) =½Φ (U) + λ

∙Y − Γ

µU,Y

w

¶¸¾f (w) + q

Y

w2v0µY

w

¶+ f · c+ e · c

Equation (1.24b) still hold whereas (1.24a) becomes:

−fw = 1−v0¡Yww

¢w ·w0 (Cw)

+qw

λ · w2 · f (w)

∙v0µYww

¶+Y

wv00µYww

¶¸The optimal condition on cw(= Yw) writes simply 0 = dw + ew. Outside the intervals of

bunching, the monotonicity constraint is not binding, so ew is nil, and so is the co-state

variable fw. Therefore equations (1.24a) and (1.25) still hold outside bunching intervals.

Conversely, ew > 0 so fw < 0 inside bunching intervals, for w ∈£bi, bi

¤. However, the

monotonicity constraints are no longer binding at bi, bi. Hence fbi = fbi = 0 andR bibifwdw =

0. Therefore one obtainsZ bi

bi

(1−

v0¡Yww

¢w ·w0 (Cw)

)dw = −

Z bi

bi

qwλ · w2 · f (w)

∙v0µYww

¶+Y

wv00µYww

¶¸dw

so, one have, instead of (1.25):Z bi

bi

(1−

v0¡Yww

¢w ·w0 (Cw)

)dw =

Z bi

bi

(v0L¡Yww

¢+ Y

wv00 ¡Yw

w

¢w2 · f (w)

Z w

w

½1

w0 (Cn)− Φ

0 (Un)

λ

¾f (n) dn

)dw

In other words, one simply have to integrate (1.25) over the skills of a bunching interval.

V.3 The reinterpretation of the optimality conditions

Equation (1.25) is not very intuitive. Following Saez (2001), we now reinterpret this opti-

mality condition in terms of behavioral elasticities and derive it heuristically thanks to a tax

22

perturbation. Let a worker of skill w, choosing a earnings level Yw under the optimal tax

schedule Y 7→ T (Y ). Now, assume the tax function is submitted to two types of tax perturba-

tion so that, in the neighborhood of Yw, the tax function becomes Y 7→ T (Y )−τ (Y − Yw)−ρ.

• On the one hand, there is a uniform decrease in the marginal tax rate in the neighbor-

hood of Yw. The size of this change is denoted τ in Figure 1.7. This elementary reform

captures a compensated changes in the marginal tax rates since, if the workers keeps

its earnings choice at the initial value Yw, the reform does not change the level of tax.

It captures the substitution effect along the optimal tax schedule for workers of skill w.

Y

C

C = Y – T(Y)

Yw

Cw

τ

Yw - δY Yw + δY

Figure 1.7: An Elementary reform on the Marginal Tax Rate

• On the one hand, there is a uniform decrease in the level of tax in the neighborhood of

Yw. The size of this change is denoted ρ in Figure 1.8. This elementary reform captures

the income effect along the optimal tax schedule for workers of skill w.

Consider then the behavior of an individual of skill w. She has to solve:

maxY

w [Y − T (Y ) + τ (Y − Yw) + ρ]− vµY

w

¶The first-order conditions write Y (Y, ρ, τ , w) = 0, where we define:

Y (Y, ρ, τ , w) ≡¡1− T 0 (Y ) + τ

¢·w0 [Y − T (Y ) + τ (Y − Yw) + ρ]− 1

wv0µY

w

¶(1.27)

In the absence of a reform, on has Y (Yw, w, 0, 0), that is:

1− T 0 (Yw) =v0¡Yww

¢w ·w0 (Cw)

(1.28)

23

Y

C

C = Y – T(Y)

Yw

Cwρ

Yw - δY Yw + δY

Figure 1.8: An Elementary reform on the Level of Tax.

Moreover, the partial derivatives of Y at (Yw, w, 0, 0) are:

Y 0Y (Yw, 0, 0, w) =

Ãv0¡Yww

¢w ·w0 (Cw)

!2·w00 (Cw)−

v00¡Yw

¢w2

− T 00 (Yw) ·w0 (Cw) (1.29a)

Y 0τ (Yw, 0, 0, w) = w0 (Cw) > 0 (1.29b)

Y 0ρ (Yw, 0, 0, w) =v0¡Yww

¢w ·w0 (Cw)

u00 (Cw) ≤ 0 (1.29c)

Y 0w (Yw, 0, 0, w) =v0¡Yww

¢+ Yw

w v00 ¡Yw

w

¢w2

> 0 (1.29d)

The second-order condition writes Y 0Y (Yw, w, 0, 0) ≤ 0. It stipulates that the function Y 7→Y − T (Y ) is either concave or less convex than the indifference curve of workers of skill wat (Cw, Yw). Otherwise, there is a discontinuity of the tax function as illustrated in Figure

1.6. The second-order condition is in particular satisfied if the tax function is linear (by

concavity of the utility function that implies u00 (Cw) > v00 (Yw/w)). How the convexity of the

tax function matters for the second-order condition depends on the term T 00 (Yw) in (1.29a)

that captures the curvature of the tax function.

If the second-order condition holds with a strict inequality, then Y 0w (Yw, 0, 0, w) < 0 andone can apply the implicit function theorem to express the earnings choices Yw as a (locally)

differentiable function of the change in marginal tax rate τ , the change in the level of tax ρ

24

or the level of skill w. This enables us to define the following behavioral elasticities.11

εwdef≡ 1− T 0 (Yw)

Yw

∂Yw∂τ

=−v0

¡Yww

¢w · Yw · Y 0Y (Yw, 0, 0, w)

> 0 (1.30a)

ηwdef≡ ∂Yw

∂ρ= −

v0¡Yww

¢w ·w0 (Cw)

· w00 (Cw)

Y 0Y (Yw, 0, 0, w)≤ 0 (1.30b)

αwdef≡ w

Yw

∂Yw∂w

= −v0¡Yww

¢+ Yw

w v00 ¡Yw

w

¢w · Yw · Y 0Y (Yw, 0, 0, w)

> 0 (1.30c)

• εw stands for the compensated elasticity of the labor supply with respect to one minus

the marginal tax rates. It is positive. A compensated decline in the marginal tax

rates increases the marginal reward of effort in terms of additional consumption, which

induces workers to substitute consumption for leisure.

• ηw captures the income effect of the labor supply. It is negative as long as leisure is

a normal good, which is the case for the additively separable utility function we take,

except for the quasilinear utility function C − v (Y/w).

• Finally, αw captures the elasticity of earnings with respect to the skill level. It is

positive thanks to the convexity of v00 (.) (and more generally in the absence of additional

separability, due to the Spence-Mirrlees condition (1.5)). Therefore, we retrieve by

studying behavioral responses the fact that along an incentive-compatible allocation,

earnings have to be a nondecreasing function of skill. Moreover, bunching occurs only

when αw = 0, that is when the curvature term T 00 (Yw) in Y 0Y (see Equation (1.29a))tends to infinity. This corresponds to a kink of the tax function that is similar to the

one depicted in Figure 1.6.

These three behavioral elasticities are endogenous. The first reason is because they in

general depend on the bundle (Cw, Yw) where they are evaluated. The second reason is

because these behavioral elasticities depend on the curvature of the tax function, as captured

by the term term T 00 (Yw) in Y 0Y (see Equation (1.29a)). The intuition is the following. Anexogenous increase in either τ , ρ or w induces a direct change in earnings ∆1Y . However,

this change in turn modifies the marginal tax rate by ∆T 0 = T 00 (Y )∆1Y , inducing a second

change in earnings ∆2Y , which in turn... Therefore, a “circular process” takes place: the

earnings level determines the marginal tax rate through the tax function and the marginal tax

rate affects the earnings level through the substitution effect. The term T 00 (Yw) · u0 (Cw) in(1.29a) captures the indirect effects (in the words of Saez (2001)) due to this circular process.

The size of these indirect effects influence the various behavioral elasticities.

11Recall that the implict function theorem implies that for x = τ , ρ, w, ∂Yw/∂x = −Y0x/Y0Y .

25

We can now try to retrieve the optimal tax formula thanks to a tax perturbation method.

Consider a uniform decrease of marginal tax rates over an interval [Yw − δY, Yw] of the earn-

ings distribution. As a consequence, the tax function is unchanged for earnings below Yw−δY ,while the tax function is uniformly increased by an amount ρ = τ · δY for earnings above Yw(See Figure 1.9).

Y

C = Y – T(Y)

τ

ρ = τ ×δY

Before the tax perturbationAfter the tax perturbation

Substitution effectsMechanical effects

Income effects

Yw - δY Yw

Figure 1.9: The Tax Perturbation

Workers of skill n above w are confronted with a lump-sum decrease of their tax by an

amount of ρ Euros. The consequence of this on the government’s objective can be decomposed

into a mechanical effect absence of any behavioral response and an income effect. For each

tax payer of skill n above w, the government receives ρ Euros of tax less. However, the

welfare of these individuals increases by w0 (Cn) · ρ, which is valued as a gain equivalentto (Φ0 (Un) /λ) · w0 (Cn) · ρ Euros by the government. Therefore the total mechanical effectconcerning all tax payers of skill n above w equals

Mw = −ρ ·Z w

w

½1− Φ

0 (Un) ·w0 (Cn)λ

¾· f (n) dn (1.31)

Moreover, workers of skill n above w change their labor supply decisions because of the

income effect. From (1.30b), their earnings change by ∆Yn = ηn · ρ. This induce a changein tax revenues that equal T 0 (Yn) · ηn · ρ. This behavioral response has only a second-ordereffect on the social objective. Therefore, the total income effect concerning all tax payers of

skill n above w equals:

Iw = ρ ·Z w

wT 0 (Yn) · ηn · f (n) dn (1.32)

Workers whose earnings before the reform lie in the interval [Yw − δY, Yw] of the earnings

distribution have a productivity that belongs to an interval [w − δw,w] of the skill distrib-

ution. The elasticity αw of earnings with respect to skill level links the widths of these two

26

intervals through (see (1.30c)):

δw =w

αw · YwδY

Therefore the number of these individuals equals

f (w) δw =w · f (w)αw · Yw

δY

Each of them facing a decline τ of the marginal tax rate they face, they are induced to

substitute consumption for leisure. So, from (1.30a) their earnings increase by:

∆Yw =²w · Yw

1− T 0 (Yw)τ

Hence each of them generates T 0 (Yw)∆Yw additional tax to the government. Since their

change of labor supply induces only a second-order effect on the social welfare function and

since ρ = τ · δY , the substitution effect is valued

Sw =T 0 (Yw)

1− T 0 (Yw)· εwαw

· w · f (w) · ρ (1.33)

by the government.

Starting from the optimal tax schedule, a tax perturbation should have no first-order

effect. Therefore, adding (1.31) (1.32) and (1.33), the optimal tax schedule has to verify:

T 0 (Yw)

1− T 0 (Yw)=

αwεw|zA(w)

·

R ww

n1− Φ0(Un)·w0(Cn)

λ − ηn · T 0 (Yn)of (n) dn

1− F (w)| z B(w)

· 1− F (w)w · f (w)| z C(w)

(1.34)

The derivation of (1.34) was heuristic. Therefore, we have to verify that (1.34) is consistent

with (1.25). The latter equation hold on any point where w 7→ Yw is continuous. Using (1.30a)

and (1.30c), Equation (1.24a) can be rewritten as:

1−v0¡Yww

¢w ·w0 (Cw)

=αwεw·v0¡Yww

¢w ·w0 (Cw)

· Xww · f (w)

where Xw is defined as:

Xw = −qwλ·w0 (Cw)

Using the first-order condition (1.28) of the workers’ decision program, one gets:

T 0 (Yw)

1− T 0 (Yw)=

αwεw· Xww · f (w) (1.35)

Deriving in skill level w the definition Xw, we get:

−Xw =qwλ·w0 (Cw) +

qwλ· Cw ·w00 (Cw)

27

Differentiating in w the equality Cw = Yw − T (Yw) and using (1.24b) and (1.28), we get:

−Xw =½1− Φ

0 (Uw) ·w0 (Cw)λ

¾f (w) +

qwλ· Yw ·

v0¡Yww

¢w · u0 (Cw)

·w00 (Cw)

Using (1.30a) (1.30b), (1.30c) and Yw = (Yw/w)αw leads to:

−Xw =½1− Φ

0 (Uw) ·w0 (Cw)λ

¾f (w) +

qwλ· αwεw·v0¡Yww

¢w2

· ηw

Using (1.28) and the definition of Xt :

−Xw =½1− Φ

0 (Uw) ·w0 (Cw)λ

¾f (w)− Xw

w·¡1− T 0 (Yw)

¢· αwεw· ηw

Using (1.35)

−Xw =½1− Φ

0 (Uw) ·w0 (Cw)λ

− T 0 (Yw) · ηw¾f (w)

Using Xw = 0, integration this last equation between w and w and inserting in (1.35), one

finally obtains (1.34).

• The term αw/εw captures the magnitudes of the substitution effects (see 1.33). It is

inversely proportional to the compensated elasticity of the labor supply εw. However,

the elasticity αw of earnings with respect to skill w matters since it influences the

number of workers concerned by the substitution effect. Note that from (1.30a) and

(1.30c), the ratio αw/εw does not depend on the curvature of the tax function, captured

by the term T 00 (Yw).

Diamond (1998) has proposed to focus on the case where the utility function is quasi-

linear in consumption (i.e. U (C,L) = C − v (L)), so w00 (.) = 0 and there is no incomeeffect (ηw = 0 from (1.30b)). Therefore the term A (w) summarizes how the shape of

behavioral elasticities influence the shape of marginal tax rates. If behavioral elastic-

ities are more pronounced for high skilled workers, as suggested by Gruber and Saez

(2002) and Saez (2003) among others, then A (w) would be decreasing, which would

push marginal tax rates to be decreasing in the skill level (thereby in earnings level).

• The influence of the distribution of skill is captured by the termC (w) = (1− F (w)) / (w · f (w)).In the tax perturbation considered, the substitution effect is proportional to the density

of workers f (w) and to their skill level w (see 1.33). Therefore, the higher w · f (w),the larger the deadweight losses induces by a departure of marginal tax rates at Yw

from lump-sum taxation. However, distorting marginal tax rate around Yw induces

that the mass 1 − F (w) of tax payers of skill n above w pays a higher level of taxes.This is the reason why anything else being equal, marginal tax rates are decreasing in

(1− F (w)) / (w · f (w))

28

Saez (2001) proposed to express marginal tax rates as a function of the earnings distri-

bution, rather than the skill one. Let H (.) and h (.) be respectively the (endogenous)

cumulative distribution function and the density of earnings. One obviously have for

all w that H (Yw) ≡ F (w). So, one obtains that h (Yw) · Yw = f (w). Using (1.30c), onehas Yw = αw · (Yw/w) hence

αw ·1− F (w)w · f (w) =

1−H (Yw)Yw · h (Yw)

and equation therefore becomes

T 0 (Yw)

1− T 0 (Yw)=1

εw·

R ww

n1− Φ0(Un)·w0(Cn)

λ − ηn · T 0 (Yn)oh (n) dn

1−H (Yw)· 1−H (Yw)Yw · h (Yw)

(1.36)

• The last term B (w) equals the average ofn1− Φ0(Un)·w0(Cn)

λ − ηn · T 0 (Yn)o, for all skill

levels n above w, weighted by their density. The term 1 − Φ0(Un)·w0(Cn)λ − ηn · T 0 (Yn)

captures the total cost for the government to decrease by one unit the level of tax paid

by workers of skill n, including their change in labor supply due to the income effect.

B (w) summarizes two types of influences.

— The first is the government’s tastes for redistribution, as captured by Φ0(Un)·w0(Cn)λ .

The government values giving one more euro to each of the f (w) individuals of

skill w as a gain of Φ0(Un)·w0(Cn)λ of government spending.

∗ Since the government is averse to inequality, Φ (.) is increasing and concave,so Φ0 (Un) is positive and decreases in Un.

∗ From (1.20), Un is increasing in skill levels. More skilled workers are better

of, since they can reach a given amount of earnings with less effort. Therefore

Φ0 (Un) is positive and decreasing in skill level n.

∗ From the incentive constraints, and the monotonicity requirement it implies,

consumption Cw is nondecreasing in skill w. As w (.) is increasing and weakly

concave, w0 (Cn) is nonincreasing in skill w

As a consequence, the mechanical term 1− Φ0(Un)·w0(Cn)λ is increasing in skill level

n. Therefore, in the absence of income effects (i.e. if ηw = 0 following Diamond

(1998)’s quasilinear specification of the utility function) the term B (w) would then

be increasing. This would tend to make marginal tax rates increasing in the skill

level (thereby in earnings level).

— The second influence follows the income effects. If leisure is a normal good, a

higher level of tax (a lower nonlabor income ρ < 0) would increase labor supply, so

ηw < 0. Therefore income effects are an additional motivation for the government

29

to increases marginal tax rates, since rising marginal tax rate at one earnings

levels, induces through income effects more labor supply for all tax payers above,

therefore higher earnings and higher tax receipts. This interpretation is however

only valid if marginal tax rates are positive (so that choosing higher earnings

results in higher tax receipts for the government).

V.4 Properties of the second-best optimum

After interpreting the optimality conditions and understanding the influence of the various

determinants of them, we now derive some analytical results

Marginal tax rates at the top

If the skill distribution is bounded w, then the transversality condition qw = 0 and (1.24a)

implies that

1−v0¡Yww

¢w · u0 (Cw)

= 0

Therefore, from (1.28), marginal tax at the top of the skill distribution should be

nil. Moreover, let TM (Yw) = T (Yw) /Yw be the average tax rate at earnings Y . Then

T 0M (Yw) =T 0 (Yw)− T (Yw)

Yw

Yw

Therefore having marginal tax rate tending to zero at the top and positive average tax rates

implies that average tax rates have to be locally decreasing at the top of the skill

distribution.

Many scholars understood this zero-optimal-marginal-tax-rate-at-the-top as a drawback

of the Mirrlees model. Diamond (1998) have nevertheless argued that when the distribution

of skill is unbounded the abovementioned argument fails. More specifically, Diamond (1998)

argues that empirically, the distribution of skill could well be approximated for the highest

skill levels as a Pareto distribution for which (1− F (w)) /wf (w) is constant (See Figure 1.10.Saez made a similar point about earnings distribution and the pattern of 1−H (z) / (zh (z))in (1.36) (See Figure 1.11).

Therefore, since the terms A (w) in (1.34) is very likely to be constant or increasing in

the skill level, and in the absence of income effects, B (w) is increasing in the skill levels, this

would tend to make marginal tax rates increasing for the top part of the earnings distribution.

Marginal tax rates in the interior

What is the sign of marginal tax rates? From above, we know that Φ0 (Uw) is decreasing in

the skill level, whereas w0 (Cw) is decreasing. Therefore, −1/w0 (Cw) is decreasing too. Hence

30

Figure 1.10: Empirical skill distribution computed by Diamond (1998)

Figure 1.11: Distribution of earnings as computed by Saez (2001).

the term in the bracket in (1.24b) is decreasing in the skill level. Hence qw/f (w) is increasing

in w. Since the transversality conditions write qw = qw = 0, then qw must be first decreasing,

and then increasing. So for any interior skill level, one must have qw < 0. Together with

(1.24a) and (1.28), this implies that marginal tax rates have to be positive for any

interior skill levels w ∈ (w,w). Therefore the sign of marginal tax rates are driven by theshape of mechanical effects and the income effects essentially reinforces them.

Marginal tax rates at the bottom

It remains to analyze the sign of marginal tax rate at w. In the absence of bunching, or if the

government has not a maximin objective, than the transversality condition qw = 0 implies

with (1.24a) and (1.28) that marginal tax rate at the bottom of the skill distribution

31

is nil.

However, bunching at the bottom of the skill distribution may arise, for instance because

a nonnegative constraints Yw ≥ 0 may be binding at the bottom. Another case is where thereis positive mass of workers with the lowest level of skill. In such cases, there is a positive

mass of workers with the lowest earnings, and the highest skilled workers among them have

a skill w > w, and therefore face a positive marginal tax rate.

Another exception is the case of a Maximin government (see Boadway Jacquet 2008).

Then, marginal tax rates are positive at the bottom. The reason is that social weightsΦ0(Un)·w0(Cn)

λ are concentrated on the lowest skill level only. So, there is a positive mass

of social weights there, even without bunching at the bottom. Put differently, under the

Maximin objective, since Φ0(Un)·w0(Cn)λ is nil for all n above w, the term C (w) in formula

(1.34) is higher or equal 1 (depending on whether income effects are nil, or negative when

leisure is a normal good), including for w = w.

VI Empirical implications

We now explore the quantitative implications of the theory that we have just developed. We

here follow Saez (2001, Section 5)) very closely. As appear clear from (1.34), there are three

kinds of determinants that have to be specify to compute optimal tax schedule depends:

1. The behavioral elasticities εw, ηw and αw and behind the utility function U (., .).

2. The density of skills, and the term C (w)

3. The government tastes for redistribution, as captured by the shape of w 7→ Φ0 (Un) ·w0 (Cn) /λ.

There is a huge empirical literature that evaluates behavioral responses to tax changes.

Mots of them wishes to estimates how changes in the tax system induces changes in work-

effort L. There is however serious measurement problems. Hence, it seems more reasonable

to estimate instead the responses of gross income Y (Gruber and Saez (2002), Saez (2003)).

To two specifications of the utility functions are used, namely

Type 1 : log

ÃC − L

1+ 1k

1 + k

!Type 2 : logC − log

Ã1− L

1+ 1k

1 + k

!

Under both type-1 and type-2 utility functions the compensated elasticity of the labor supply

(along a linear tax schedule) simply equals 1/k and is exogenous. Type 1 utility function

corresponds to Diamond’s quasilinear specification with non income effects. Hence one has

for all w that αw/εw = 1 + k and ηw = 0. Type 2 utility function includes income effects.

32

Based on the empirical literature, Saez retains two values for the compensated elasticities

along a linear tax schedule (denoted ζc with his notations), namely 0.25 and 0.5.

The first scholars that have simulated optimal income tax schedules specifies a lognormal

distribution of skill (Mirrlees 1971) which fits the unimodality property of income distrib-

ution found in the data. However such specification is ad-hoc and fits poorly the top tail

distribution. Saez instead works with empirical distributions of earnings. For each of the

four utility functions he considered (Type 1 and Type 2, each of them for 1/k = 0.25 and

1/k = 0.5), he uses workers’ first-order condition (1.28) to recover skill levels as a function

of observed earnings levels.12 It is this skill distribution that he uses in his simulations. It is

worth noting that this procedure is conditional on a specific choice of the utility function.

The last component is the government’s taste for redistribution. This stage is difficult

since choosing a social welfare function is always a subjective exercise based on some value

judgments. Saez (2001) chooses to consider two social welfare functions: maximin (i.e. Uw)

and Benthamite. Given the concavity of type 1 and type 2 utility functions, the Benthamite

criterion is consistent with some government’s aversion towards inequality.13

Saez’s results are then given by Figure 1.12.

Figure 1.12: The numerical simulations of Saez (2001).

We learn the following.

• Optimal Marginal tax rates are positive everywhere. For the top income earners, this is12Saez (2001) used tax returns data.13Another attitude consists in using observed tax schedule to recover the social welfare function from the

optimal tax formulae (Bourguignon and Spadaro 2008).

33

due the unbounded distribution inferred. Since 1− F (w) /wf (w) is close to constant,(see Figures 1.10 and 1.11) above 200,000$ a year, marginal tax rates are roughly

constant above that threshold. For low income earners, marginal tax rates are positive

and very because workers with the lowest skill have a nil productivity and thereby do

not work.

• For every type of utility functions, and social welfare functions, marginal tax rates arehigher, the lower the compensated elasticity (along a linear tax schedule) ζc.

• For both types of utility functions and both ζc, marginal tax rates are higher under themaximin criteria, especially for lowest part of the distribution.

• Comparing Type 1 and Type 2, utility function for identical ζc and social welfarefunction, marginal tax rates are substantively higher in the presence of income effects.

34

Bibliography

[1] Akerlof, G., 1978, The Economics of “Tagging” as Applied to the Optimal Income Tax,

Welfare Programs, and Manpower Planning”, American Economic Review, 68(1), 8-19.

[2] Alesina, A., Ichino A. and Karabarbounis L., 2008, Gender Based Taxation and the

Division of Family Chores, mimeo Harvard.

[3] Boadway, R. and L. Jacquet, 2008, Optimal Marginal and Average Income Taxation

under Maximin, Journal of Economic Theory, 143, 425-441.

[4] Bourguignon, F. and Spadaro, A. 2008, Tax-benefit Revealed Social Preferences, PSE

Working Paper 2008-37.

[5] Diamond, P., 1998, Optimal Income Taxation: An Example with a U-shaped Pattern of

Optimal Marginal Tax Rates, American Economic Review, 88(1), 83-95.

[6] Ebert, U., 1993, A reexamination of the optimal nonlinear income tax, Journal of Public

Economics, 49, 47-73.

[7] Gruber, J., and E. Saez, 2002, The Elasticity of Taxable Income: Evidence and Impli-

cations, Journal of Public Economics, 84, 1-32.

[8] Guesnerie, R., 1995, A Contribution to the Pure Theory of Taxation, Cambridge Uni-

versity Press.

[9] Guesnerie, R. and Laffont, J-J, 1984, A complete solution to a class of principal-agent

problems with an application to the control of a self-managed firm, Journal of Public

Economics, 25, 329-369.

[10] Hammond, P., 1979, Straightforward Individual Incentive Compatibility in Large

Economies, Review of Economic Studies, 46, 263-282.

[11] Harsany, J., 1955, Cardinal Welfare, Individualistic Ethics, and Interpersonal Compar-

isons of Utility, Journal of Political Economy, 63(4), 309-21.

35

[12] Hellwig, M., 2008, A Maximum Principle for Control Problems with Monotonicity Con-

straints, Preprints of The Max Planck Institute for Research on Collective goods Bonn,

2008-04, http://www.coll.mpg.de/pdf_dat/2008_04online.pdf.

[13] Lollivier S. and J-C Rochet, 1983, Bunching and second-order conditions: a note on

optimal tax theory, Journal of Economic Theory, 31, 392-400.

[14] Mankiw, G. and Weinzierl, M., 2007, The Optimal Taxation of Height: A Case Study

of Utilitarian Income Redistribution, mimeo Harvard.

[15] Mirrlees, J., 1971, An Exploration in the Theory of Optimum Income Taxation, Review

of Economic Studies, 38(1), 175-208.

[16] Mirrlees, J., 1976, Optimal Tax Theory: A Synthesis, Journal of Public Economics, 6(3),

327-358.

[17] Piketty, T., La Redsitribution Fiscale face au Chômage, Revue Française d’Economie,

12, 157-201.

[18] Saez, E, 2001, Using Elasticities to Derive Optimal Income Tax Rates, Review of Eco-

nomics Studies, 68, 205-229.

[19] Saez, E., 2003, The Effect of Marginal Tax Rates on Income: A Panel Study of “Bracket

Creep”, Journal of Public Economics, 87, 1231-1258.

[20] Sadka, E., 1976, On Income Distribution, Incentive Effects and Optimal Income Taxa-

tion, Review of Economic Studies, 43, 261-267.

[21] Seade, J., 1977, On the Shape of Optimal Tax Schedules, Journal of Public Economics,

7, 203-235.

[22] Seade, J., 1982, On the Sign of the Optimum Marginal Income Tax, Review of Economic

Studies, 49, 637-643.

[23] Stiglitz, J., 1982, Self-Selection and Pareto Efficient Taxation, Journal of Public Eco-

nomics, 17, 213-40.

36

Economics of Optimal Redistributive Taxation - CREST

Documents

Transcript of Economics of Optimal Redistributive Taxation - CREST