Life-Cycle Patterns of Earnings Shocks

Life-Cycle Patterns of Earnings Shocks

Martin Lopez-Daneri∗

USC-Dornsife INET

December 14, 2016To access the latest Version, Click Here.

Abstract

This paper addresses the estimation of individual income dynamics. It in-troduces a novel methodology that detects the presence of patterns in the lifecycle and the economic forces in action. I estimate a Bayesian LSTAR(1) modelwith a rich level of heterogeneity and find that there is a life-cycle pattern inearnings shocks: before age 29, workers experience shocks with higher varianceand a positive probability of having a lower persistence than older workers. Acomparison with conventional models shows the importance of modelling cor-rectly the level of heterogeneity in the innovations. The results can be used bymacroeconomists to calibrate income processes.

Key words: Income Processes, Nonlinear Time-Series Models, Bayesian Estima-tion.

JEL Classifications: C01, C11, C15, C33, D31, E27, and J31.

∗University of Southern California, Economics Department, 3620 South Vermont Ave., KaprielianHall, Room: 300. Los Angeles, California, USA 90089. Email: [email protected]. Specialthanks to Antonio Galvao, Gustavo Ventura and Charles H. Whiteman for comments and suggestionsthat substantially improved this paper. I have also benefited from remarks and comments on earlydrafts made by Kung-Sik Chan, German Cubas, Mariacristina De Nardi, Daisuke Fujii, Martin Ger-vais, Roger Moon, Hashem Pesaran, Luigi Pistaferri, Nariankadu D. Shyamalkumar, Ron Smith, Pe-tra Thiemann, Luke Tierney, and Guillaume Vandenbroucke. Seminar participants at Banco Centralde Uruguay-Montevideo Meeting, Universidad de Chile and USC-Dornsife INET Meetings providedhelpful comments. The usual disclaimer applies.

http://www.mlopezdaneri.com/MLDaneri/Research_files/IncomeShocks-paper.pdf

mailto:[email protected]

1

1. Introduction

There is a common perception that a middle-age worker faces risks, in terms of

income volatility and job tenure, which are inherently different from those experi-

enced by a twenty-year-old worker, and that these risks are distinct from those faced

by a worker close to retirement. This popular belief that there is a life-cycle pattern

in labor earning risk has not received much attention in the empirical literature on

income dynamics. If such a pattern does exist, modelling it in a world of incomplete

markets could significantly improve our understanding of consumption-saving de-

cisions in the life cycle, wealth inequality, insurance, the welfare costs of business

cycles and public policies, and other phenomena.

In this paper, I study labor income dynamics in the life cycle and ask the fol-

lowing questions: Is there a life-cycle pattern in the idiosyncratic earnings shocks?

What is the role of aging in these shocks? If there is a life-cycle pattern to earning

risk, can it be modeled parsimoniously?

The starting point for addressing these questions is a reduced-form model of

income dynamics, as it has been the standard in the literature since the early 1970s,

with the early works on the matter by McCall (1973), Shorrocks (1976), Lillard and

Willis (1978), Lillard and Weiss (1979), MaCurdy (1982), and Gottschalk (1982).

While the starting point is a reduced form, this paper introduces a novel method-

ology that is able to accommodate both the statistical fit and economic interpreta-

tion. Using a sample of male heads of households from the Panel Study of Income

Dynamics (PSID) from the years 1968 to 2011, I estimate a Logistic Smoothed Tran-

sition Autoregressive model of order 1 (LSTAR(1)), with a rich level of heterogeneity

in the innovations, that replaces the usual linear AR structure of the persistent com-

ponent of the earnings shocks with a nonlinear structure. At any point in time in the

life of a worker, the persistent shocks are a convex combination of two different AR

processes, where the weights are a function of a threshold variable that has its own

economic interpretation. This means that the model can detect the presence of

patterns in the life cycle as well as recognize the economic forces in action.

Although I limit my analysis to the effects of age on the idiosyncratic earnings

2

shocks in a Restricted Income Profile (RIP) setting, where there is no heterogene-

ity in the income growth rates of the individuals, one important contribution of this

paper is that I use a Bayesian analysis. This makes it possible to extend the results to

a Heterogeneous Income Profile (HIP), i.e. with heterogeneity in the income growth

rates, and to cover a much wider range of years in the PSID. This was not possible in

the previous models used in the literature, and it has nontrivial economic implica-

tions. My findings can be summarized as follows.

First, there is a life-cycle pattern in the idiosyncratic earnings shocks. Workers

younger than 29 experience shocks with higher variance and a positive probability

of having a lower persistence than older workers. The stationary variance of the

persistent component of the earnings shocks for a young worker is 1.3 times higher

than the stationary variance for an old worker.

Second, the usual GMM estimates for the correlation coefficient and variance

for the AR process as well as the variance of the temporary shocks, in a RIP setting,

are similar to those in a model with a rich structure of innovations and the same

linear AR structure. Including nonlinearities slightly reduces the persistence to a

range of 0.96 and 0.97.

Third, the posterior mode of the persistence of the shocks for young workers is

lower than the persistence for older workers, but the difference is not as significant

as one might expect. The posterior standard deviation of the persistence parameter

for the young workers is almost twice that of the standard deviation for the older

workers. There is an overlap in the distributions of both persistence parameters

that suggests they might be equal. Moreover, this higher dispersion in the posterior

distribution of the persistence suggests that using age as threshold variable might

not give a complete picture of what is seen in the data.

Fourth, the introduction of nonlinearities shows the importance of correctly

modelling the structure of the innovations. The results strongly suggest that there

is dispersion in the individual variances of the innovations. The distribution of the

individual-specific component of this variance points out a small proportion of in-

dividuals with a higher conditional variance than the rest of the sample. Models

that ignore this fact introduce an upward bias on the variance estimations. Also, my

3

model is in line with previous estimates of the transitory component of the earnings

shocks.

Finally, in this model, the income process is defined as a convex combination

of two AR processes and can be easily approximated with a discrete Markov pro-

cess (see Vandekerkhove (2005)). This means the model is tractable and the usual

calibration techniques can be applied.

1.1. Literature Review

The empirical literature on income dynamics has produced a variety of models with

different levels of complexity and heterogeneity. These models can be classified

in terms of the degree of heterogeneity present in the conditional mean and the

conditional variance of the income process.

Recently, it has become popular in the macro-literature to classify these mod-

els in terms of the heterogeneity in the growth rates of individual earning profiles.

An income process with a common growth rate for all individuals is a RIP, while if

these rates are different for every individual the process is a HIP. Consequently, a

RIP and a HIP are just different degrees of heterogeneity in the conditional mean

of the process. This classification is motivated by models of human capital where

agents with different levels of ability have different returns on their investments in

human capital accumulation.

Examples of HIP models with no heterogeneity in the persistence nor the con-

ditional variance are Baker (1997); Baker and Solon (2003); Gottschalk and Moffitt

(2002); Gottschalk and Moffitt (2002); Guvenen (2007); Guvenen (2009); Guvenen et

al. (2012); and Haider (2001).

Illustrations of RIP models are the papers of Abowd and Card (1989), Hryshko

(2012), Lillard and Willis (1978), Lillard and Weiss (1979), MaCurdy (1982), Storeslet-

ten et al. (2004), just to name a few.

A second group of papers aims to explain different patterns seen in the data by

modelling the conditional variance of the income process instead of the conditional

mean. In this group, the persistent component of the earnings shocks is discarded,

4

and the variance is allowed to vary across time and individuals. This is the case of

the papers of Barsky et al. (1997), Chamberlain and Hirano (1999), Jensen and Shore

(2012), Meghir and Pistaferri (2004),and Meghir and Windmeijer (1999).

This paper presents a RIP model and is in the intersection of both groups be-

cause of its focus on the conditional mean as wells as the conditional variance of

the income process, as in the papers of Browning et al. (2010) and Hospido (2012),

to name two. It also contributes to the Bayesian literature on income dynamics that

has started recently with Geweke and Keane (2000), Jensen and Shore (2012), Norets

and Schulhofer-Wohl (2010), and Shore (2011).

Even though this Bayesian literature on income dynamics is in its initial stages,

there are reasons to believe that it will experience a surge. The need for models that

capture the right level of heterogeneity and, as Nakata and Tonetti (2015) point out,

the good small-sample properties of Bayesian estimates predict a widespread use

of Bayesian methods.

There have only been few attempts to model the role of aging in the structure of

idiosyncratic earnings shocks. The most relevant papers on this matter are those of

Hause (1980), Meghir and Pistaferri (2004), and Karahan and Ozkan (2013).

Hause (1980) aimed to disentangle the effect of “on-the-job training” on the

individual earnings profiles in a sample taken from one cohort of Swedish white-

collar workers. The residual earnings is modeled with a time-varying AR structure,

and even though the variable age is not explicitly mentioned, the fact that there is

only one cohort in his sample makes this time-varying process an age-varying AR

process. Later, Meghir and Pistaferri (2004) focused on modelling the conditional

variance of labor earnings. They introduced an ARCH(1) specification for the per-

manent and transitory component of the shocks. However, the variable age seems

not to have been significant.

Karahan and Ozkan (2013) is to date the most complete paper to explicitly model

the age profile of the earnings shocks. After making some identifying assumptions,

they used GMM methods to model the residual earnings, including in their speci-

fication the individual fixed-effects, a transitory shock, and a persistent AR compo-

nent, the last two being age dependent. They found that the transitory shocks do

5

not exhibit an age profile, but the persistent component does. They estimated the

persistence parameters and variances for every age-bin considered, and in order to

simplify their model, they fit an age polynomial onto their estimates.

My work is related to Karahan and Ozkan in the sense that both of our papers are

interested in the exploration of life-cycle patterns of the earnings shocks, and both

papers are the first to focus on the effects of aging on the structure of these shocks.

However, there are methodological and conceptual differences that distinguish my

paper from theirs.

First, I follow a Bayesian approach. This enables me to deal with a level of het-

erogeneity that is not present in Karahan and Ozkan’s frequentist model. Moreover,

my model can easily be extended to a setting where the individual income growth

rates are different for each individual (HIP), while their model cannot. In addi-

tion, my model delivers a simple story of two distinct regimes that are connected

by a smooth transition function instead of the multiple estimates that they had for

which they needed to reduce their number by fitting an age polynomial. Also, the

results are different: while they show that there are significant differences in the

persistence and the variances, my results indicate that even though the persistence

is lower, the bulk of the action is in the innovations.

Finally, the most important distinction between this paper and Karahan and

Ozkan’s is that my model could take into account other variables besides age to

explain the patterns seen in the data. This allows me to take a stand on different

economic theories.

The paper continues with a description of the statistical model. In Section III the

estimation strategy is explained. Section IV describes the data set, and the selection

criteria for the sample used. Section V presents the results and Section VI concludes.

2. Statistical Model

Let yih(t),t be the logarithm of labor earnings for an individual i of age h (t) at time t.

Individual i first appears in the PSID at time ti, and leaves the sample after time Ti,

so t ∈ ti, ti + 1, . . . , Ti. Labor age h (t) is normalized to 0 for calendar age 18, and

6

h (t) ∈ 1, . . . ,H, where H is the maximum normalized labor age in the sample

(which corresponds to the calendar age 64).

The logarithm of labor earnings yih(t),t depends on the vector xih(t),t that controls

for the time or cohort effects, and the observable characteristics of an individual i

of age h (t) at time t, a temporary shock ωih(t),t, and a persistent shock εih(t),t:

yih(t),t = xi′h(t),tβ + εih(t),t + ωih(t),t

where β ∈ <k. The temporary shock reflects the measurement error present in the

sample, and the temporary changes in the worker labor productivity. In the case of

the persistent component of the innovations, the simplest assumption is to assume

that εih(t),t follows an ordinary AR process:

εih(t+1),t+1 = ρεih(t),t + ηit+1

I am interested in studying an alternative to this simple specification: the possibility

that the persistent changes over the life cycle follow a nonlinear structure given by

the presence of two regimes in εih(t+1),t+1:

εih(t+1),t+1 =

ρ1ε

ih(t),t + η1,it+1, when τ it ∈ (−∞, c]

ρ2εih(t),t + η2,it+1, when τ it ∈ (c,+∞)

or equivalently,

εih(t+1),t+1 = I(τ it ∈ (−∞, c]

) (ρ1ε

ih(t),t + η1,it+1

)+ I

(τ it ∈ (c,+∞)

) (ρ2ε

ih(t),t + η2,it+1

)(1)

The variable τ it is an individual specific variable and, depending on its value with

respect to the threshold, c, triggers the switch of regimes.1 The nuisance terms η1,it+1

and η2,it+1 are independent, with the same mean but different precision.

The model 1 is a Threshold Autoregressive Model (TAR), and was introduced by

1Some examples of τ it are age, level of income, or number of weeks unemployed, to a name a fewpossible possibilities.

7

Tong (1978), and further developed by Tong and Lim (1980), Tsay (1989), and Tong

(1993).

The abrupt change between regimes that happens in a TAR model adds an extra

degree of difficulty in maximum likelihood estimation, because the function is not

differentiable and the usual maximization techniques cannot be applied. What is

more, from an economic point of view, it is reasonable to think of a smooth function

that connects both regimes. Therefore, I use the following formulation:

εih(t+1),t+1 =(1−G

(γ, c, τ it

))ρ1ε

ih(t),t +G

(γ, c, τ it

)ρ2ε

ih(t),t + ηit+1 (2)

The specification 2 is called a Smoothed Transition Autoregressive model of

order 1 (STAR(1)). It was first proposed by Chan and Tong (1986) and extensively

studied by Luukkonen et al. (1988), and Terasvirta (1994).2

My choice for the transition function is a logistic cumulative distribution and, as

a result, my statistical model is also called a Logistic Smoothed Transition Autore-

gressive model of order 1 (LSTAR(1)).

Let G (γ, c, τ) be the logistic cumulative distribution function:

G (γ, c, τ) =1

1 + exp[−γ(τ it − c

)]where γ ∈ (0,∞) is a smoothing parameter that has the following property: the

higher is γ, the less smooth the transition function. For instance, if γ → 0, the weight

function G (γ, c, τ) → 1/2 and there is an indeterminacy problem, because ρ2 − ρ1can take any value. On the other hand, if γ → +∞, then G (γ, c, τ) → I(τ it − c > 0),

and G (γ, c, τ) becomes an indicator function, and the model is reduced to a TAR

model.

I can rewrite the autoregressive component as:

εih(t+1),t+1 = εi (γ, c)h(t),t ρ+ ηit+1.

2For a good review on nonlinear time series, see Bauwens et al. (2000), Dijk et al. (2002), and Ko-renok (2009).

8

where εi (γ, c)h(t),t =(εih(t),t, G

(γ, c, τ it

)εih(t),t

)and ρ = (ρ1, ρ2 − ρ1)′ . Then, the

LSTAR(1) model can be expressed as:


εih(t+1),t+1 = εi (γ, c)h(t),t ρ+ ηit+1

(3)

The nuisance term ηit+1 in 3 is a linear combination of η1,it+1 and η2,it+1 from 1. I

want the variance of ηit+1 to be a convex combination of the variances of the two

regimes. Therefore,

ηit = (κi)−.5 ξit

√1−G

(γ, c, τ it

)+ φG

(γ, c, τ it

)with φ being the ratio of the variance of the second regime with respect to the first

regime. The distribution of ηit needs to be flexible enough to accommodate the par-

ticular characteristics of the PSID (see Lillard and Willis (1978)).

As any continuous distribution can be well approximated with a finite Gaussian

mixture,3 I assume that

ξit ∼m∑j=1

pgjN(

0, g−1j

)Let sit ∈ 1, . . . ,m be an indicator function for the individual i at time t, which

shows the normal distribution from which the shock received has been drawn from.

Therefore, the variance of ηit can be expressed as:

V ar(ηit/s

it

)=

(κigsit

)−1 [(1−G

(γ, c, τ it

))+ φG

(γ, c, τ it

)]V ar

(ηit/s

it

)=

(κigsit

)−1kit (γ, c, φ) .

where kit (γ, c, φ) ≡(1−G

(γ, c, τ it

))+φG

(γ, c, τ it

). I assume that the first age-period

3For a detailed exposition of different applications of finite mixture models, see Everitt and Hand(1981), Titterington et al. (1985), and McLachlan and Peel (2004).

9

shocks are given by:

ηi1,t = λ−.5(κ−.5i ξit

√1−G

(γ, c, τ it

)+ φG

(γ, c, τ it

))where the factor λ allows me to capture the fact that a considerable size of the

lifetime earnings inequality is realized before the individual enters into the labor

market (see Keane and Wolpin (1997); Storesletten et al. (2004); and Huggett et al.

(2011)).

As can be seen from 3, at any point in time, the resulting process is a moving

weighted average of the different regimes, with the weights changing over the life

cycle.

2.1. RIP versus HIP

In the empirical income macro-literature, two types of models are widely used to

model the labor income dynamics: the Restricted Income Profile (RIP), and the

Heterogeneous Income Profile (HIP) model. The distinction between the two lies

in the growth rates of income: a RIP model assumes that the effect on income of an

extra year of labor market experience is common to all individuals, while a HIP as-

sumes that the effect is individual-specific.4 This unobservable individual-specific

effect gives a random-effect structure to the model instead of the RIP fixed-effect

structure.

This individual heterogeneity can be easily extended to the cubic polynomial of

labor market experience, but as pointed out by Baker (1997) and Guvenen (2009),

it does not substantially improve the model fit. The consensus is to keep the HIP

model with a simple structure and assume that the effects on income of the powers

of labor market experience are common to all agents.

The result is that a HIP model has lower persistence in the AR(1) process than

a RIP model, and the consequences are not trivial for the calibration of a macro

model with incomplete markets.

4Alternatively, age can be used instead of labor market experience.

10

A RIP model can be summarized as follows:


yih(t),t = α+ h (t)β1 + xi′h(t),tβ2 + εih(t),t + ωih(t),t

where xih(t),t =(

1, hi, xi′h(t),t

)′, and β = (α, β1, β2)

′ .

While a HIP is defined as:

yih(t),t = αi + h (t)βi1 + xi′h(t),tβ2 + εih(t),t + ωih(t),t

yih(t),t = xi′h(t),tβi + εih(t),t + ωih(t),t

where βi =(αi, βi1, β2

)′.Even though the economic rationale favors a HIP model,5 the verdict is not con-

clusive from a statistical point of view (see Abowd and Card (1989) and Hryshko

(2012)). In this paper, I focus my analysis on a RIP model with age-dependent

shocks.

3. Estimation

I study three models in this paper: a conventional RIP model, a RIP model with

constant persistence and heterogeneity in the innovations, and a RIP model with

heterogeneity in both the persistence and the innovations.

In each model, for the treatment effects of the logarithm of annual earnings I

include a cubic polynomial of experience to capture the hump shape of mean earn-

ings in the life cycle, time-effects, level of education, marital status, race and an

indicator on whether the individual is partially-retired (see Casanova (2013)).6

In the case of the conventional RIP, after controlling for the effects of the observ-

able characteristics, I identify the relevant moment conditions in the autocovari-

5In a model with human capital accumulation, the differences in ability translate into individual-specific growth rates of income.

6For individuals older than 55 years old, I signal them as partially-retired if their hourly wage isbelow the 70% of the average hourly wage in ages 50-55, and their wage remain below that thresholduntil the age 64.

11

ance matrix and proceed with the GMM estimation in the usual manner. For the

last two models, I follow a Bayesian approach and sample their posterior distribu-

tions with Markov Chain Monte Carlo techniques.

3.1. Priors

My choices for prior distributions are centered in conjugate and uninformative pri-

ors, depending on the information that I have available. The selection is standard.

Appendix B includes a complete description and references to the values of the hy-

perparameters. This is the list of the prior distributions:

1. ωih(t),t includes the measurement error and the transitory component of the

agent’s productivity. I assume that ωih(t),t/σ2t ∼ N

(0, σ2t

), and s2ω

(σ2t)−1 ∼

χ2 (νω) .

2. For the threshold variable c and the factor of proportionality φ, I choose an

uninformative prior distribution: p (c) ∝ 1 and p (φ) ∝ 1φ .

3. The smoothing parameter γ is distributed as a truncated Cauchy distribution:

p (γ) ∝

(1 + γ2

)−1 , γ > 0

0 , o.w.

.

4. I choose conjugate priors for sit and p: sit/p ∼ Multinomial(p), and p ∼

Dir (α) , where α=(α1, . . . , αm)′ . Along the same lines, ξit/sit = k ∼ N

(0, g−1k

),

with 1 ≤ k ≤ m. s2gj ∼ χ2(νg).

5. The persistence ρ has a normal distribution with mean µρ

and precision hρI,

i.e. ρ ∼ N(µρ, h−1ρ I

).

6. β/ (µ,H) ∼ N(µ,H−1

), where µ ∼ N

(µµ, H−1µ

)and H ∼Wishart (νH , SH)

7. s2κκi ∼ χ2 (νκ) and s2λλ ∼ χ2 (νλ) .

12

3.2. Posterior Distribution

The full posterior distribution of this model is:

p

(y∗,εih(t),t

I,Tii=1,t=ti

, β,σ2tTt=1

, ρ1, ρ2, γ, c, κiIi=1 , λ, µ,H,p, gjmj=1 ,

sitI,Tii=1,t=ti

)/y

(4)

where yi =(yih(ti)ti , . . . , y

ih(Ti)Ti

)′and y = (y′1, . . . , y

′I)′ , as well as y∗i =

(y∗,ih(ti)ti , . . . , y

∗,ih(Ti)Ti

)′and y∗ = (y′∗1 , . . . , y

′∗I )′, with y∗,ih(t),t being an imputed value, in case the value is miss-

ing or top-coded, i.e.

yih(t),t =

yih(t),t, when the observation is present in the sample;

y∗,ih(t),t, when the observation is not present in the sample;

The object 4 has all the necessary information to infer the income dynamics in

the sample.

3.3. Markov Chain Monte Carlo Methods

The Markov Chain Monte Carlo Methods (MCMC) are simulation-based techniques

that consist of generating random draws that are neither independent nor identi-

cally distributed, but rather form a Markov Chain that converges to an invariant

distribution that is the one under investigation (see Robert and Casella (2013)). Two

widely used MCMC methods to analyze a posterior distribution are the Gibbs Sam-

pler and the Metropolis-Hastings algorithm.

3.3.1. The Gibbs Sampler

The Gibbs Sampler (Geman and Geman (1984)) consists of partitioning 4, and sam-

pling each partition conditional on the other ones. A simple example will illustrate

the concept.

Suppose that the object of interest is p (θ | y) , the posterior distribution of θ, with

θ ∈ <k the parameter vector, and y ∈ <n the vector of observations. The vector θ

13

is split into the partitions θ0 and θ1. Let θ(0) =(θ(0)0 , θ

(0)1

)′be the starting point

of the algorithm, and for each iteration m ∈ 1, . . . , B of the Gibbs sampler, new

values are drawn and the partitions are updated iteratively in the following way:

θ(m)0 ∼ p

(θ0 | θ(m−1)1 , y

)and θ(m)

1 ∼ p(θ1 | θ(m)

0 , y)

. Then,

p(θ(m) | θ(m−1), y

)= p

(θ(m)0 | θ(m−1)1 , y

)p(θ(m)1 | θ(m)

0 , y)

For a sufficiently large B, this Markov Chain converges to an invariant distribution

that coincides with the distribution of interest p (θ | y).

Given the parameters of this model, I find it convenient to consider 15 blocks. Of

these, 12 are amenable to Gibbs sampling because the conditional distributions are

simple and follow known distributions. However, the other three blocks have distri-

butions from which random samples cannot be generated directly, and a Metropolis-

Hastings step within the Gibbs sampler is required. For complete algebraic deriva-

tions of the posterior distributions of each partition, see Appendix A.

The 12 Gibbs Sampling steps are:

1. Sample y∗ conditional on β,σ2tTt=1

,xih(t),t

i,t,sitI,Tii=1,t=ti

, gjmj=1 , ρ1, ρ2, γ,

c, φ, and y.

2. Sampleεih(t),t

I,Tii=1,t=ti

conditional on y, y∗, β,xih(t),t

i,t,σ2tTt=1

, ρ1, ρ2, γ, c, κiIi=1 , λ, φ,sitI,Tii=1,t=ti

, and gjmj=1 .

3. Sample β conditional on y, y∗,xih(t),t

i,t,εih(t),t

I,Tii=1,t=ti

, µ,H, andσ2tTt=1

.

4. Sampleσ2tTt=1

conditional on y, y∗,xih(t),t

i,t,εih(t),t

I,Tii=1,t=ti

, and β.

5. Sample ρ conditional onεih(t),t

I,Tii=1,t=ti

, γ, c, λ, κiIi=1 ,sitI,Tii=1,t=ti

, φ, and gjmj=1.

6. Sample κiIi=1 conditional onεih(t),t

I,Tii=1,t=ti

, (ρ1, ρ2) , λ, (γ, c) , φ,sitI,Tii=1,t=ti

,

and gjmj=1.

7. Sample λ conditional onεi1,ti

Ii=1

, κiIi=1 ,sitiIi=1

, and gjmj=1.

14

8. Sample µ conditional on β, and H .

9. Sample H conditional on β, and µ.

10. Sample p conditional onsitI,Tii=1,t=ti

.

11. SamplesitI,Tii=1,t=ti

conditional onp,εih(t),t

I,Tii=1,t=ti

, κiIi=1 , λ, ρ1, ρ2, γ, c, φ,and

gjmj=1.

12. Sample gjmj=1 conditional onsitI,Tii=1,t=ti

,εih(t),t

I,Tii=1,t=ti

, κiIi=1 , ρ1, ρ1, λ, γ, c,

and φ.

3.3.2. Metropolis-Hastings within the Gibbs sampler.

There are three posterior distributions (3 blocks) that still need to be sampled in

order to have the full posterior distribution of the model. One way to do it is with a

Metropolis-Hastings algorithm (Metropolis et al. (1953) and Hastings (1970)).

The Metropolis-Hastings is a MCMC method that consists of drawing values

from an arbitrary density q (·), and accepting only those draws that are more likely to

be generated from the density p (·) under study. Under some regularity conditions,

the collection of accepted draws approximates well p (·).

Formally, let D ∈ <k be the support of the target distribution p (x) with x ∈ D,

and let q (x∗/x) with x∗ ∈ D be the arbitrary proposal density. At the iteration m ∈

1, . . . , N of the algorithm and given x = x(m−1), the probability of accepting a

candidate draw x∗ is

α (x∗, x) = min

p (x∗)

p (x)

q (x/x∗)

q (x∗/x), 1

. (5)

If x∗ is accepted, x(m) = x∗, otherwise x(m) = x(m−1). A closer inspection of 5 shows

that a draw x∗ is more likely to be accepted if the chances to move from x to x∗ are

lower, and vice versa.

The Metropolis-Hastings acceptance criterion 5 ensures that a reversibility con-

dition is satisfied, and this is a sufficient condition to prove the convergence of the

15

Markov Chain, given by the collection of x(m) with m ∈ 1, . . . , N. For a text-

book treatment of this MCMC method, see Chib and Greenberg (1995), and Geweke

(2005).

In the solution of this model, I sample these three posterior distributions with

proposal densities that have the common property that q (x∗/x) is independent

from x.7 This means that q (x∗/x) = q (x∗), and the acceptance criterion 5 is re-

duced to

α (x∗, x) = min

p (x∗)

p (x)

q (x)

q (x∗), 1

.

The Independent Metropolis-Hastings algorithms for the objects

(γ/εih(t),t

I,Tii=1,t=ti

, c, (ρ1, ρ2) , λ, φ, κiIi=1 ,sitI,Tii=1,t=ti

, gjmj=1

)′and (

φ/εih(t),t

I,Tii=1,t=ti

, ρ1, ρ2, γ, c, λ, κiIi=1 ,sitI,Tii=1,t=ti

, gjmj=1

)′consists of draws taken from truncated normal distributions, with γ∗ ∼ N

(µγ , σ

2γ

)I (aγ , bγ)

and φ∗ ∼ N(µφ, σ

2φ

)I (aφ, bφ).8

In the case of(c/εih(t),t

I,Tii=1,t=ti

, ρ1, ρ2, γ, φ, λ, κiIi=1 ,sitI,Tii=1,t=ti

, gjmj=1

)′,

the variable c∗ is sampled from a discrete distribution in the support [ac, bc], with

the vector of probabilities pc. The parameters of the proposal densities are chosen

in such way that the acceptance probabilities are close to 30%. This ensures a good

mixing of the Markov Chain (see Muller (1991), and Canova (2007)).

7A Metropolis-Hastings algorithm with this feature is also referred to as “Independent Metropolis-Hastings”

8There are other ways to sample these objects. For instance, Lopes and Salazar (2006) use a

random-walk Metropolis chain to update (γ, φ, c), with γ∗ ∼ Γ

((γ(i))

2

∆γ, γ

(i)

∆γ

), φ∗ ∼ Γ

((φ(i))

2

∆φ, φ

(i)

∆φ

),

and c∗ ∼ N(c(i),4c

)I (ac, bc).

16

4. Panel Study of Income Dynamics (PSID)

The PSID is a rich panel dataset that has information for more than 9,000 fami-

lies and 70,000 individuals, who have been followed since 1968. It is the largest

and longest household-based survey in the world and an excellent starting point to

study income dynamics in the United States.

My sample consists of male heads of households in the time period between

1968 and 2011, including those individuals satisfying the following conditions:

1. Has at least two consecutive earning observations;

2. Is between 25 and 64 years old at the time of the survey;

3. Does not belong to the Survey of Economic Opportunities sample (SEO);

4. Has positive values for hours and labor income;9

5. Has hourly labor income between $2.63 and $600 in 2007 dollars;10

6. Has worked between 456 and 4,992 hours every year;11

7. Has no missing values for years of education, nor any inconsistencies;

One particular feature of the PSID that needs to be taken into account: after

1997, the PSID starts to have a biannual periodicity and in order to avoid this short-

coming, I impute the missing values for income (see Appendix).

After checking for any inconsistency, my final sample is reduced to 5,398 indi-

viduals with 61,163 individual-year observations (see Tables 1 and 2). On average,

an individual is followed for 11 years. As shown in Table 1, the mean and median

9The variables taken from the PSID for labor income are: V74, V514, V1196, V1897, V2498,V3051, V3463, V3863, V5031, V5627, V6174, V6767, V7413, V8066, V9376, V8690, V11023, V12372,V13624, V14671, V16145, V17534, V18878, V20178, V21484, V23323, (ER4140+ER4117+ER4119),(ER6980+ER6957+ER6959), (ER9231+ER9208+ER9210), and (ER12080+ER12065+ER12193).

10To deflate the labor income series, I use the CPI from the US Department Of Labor, Bureau ofLabor Statistics available at ftp://ftp.bls.gov/pub/special.requests/cpi/cpiai.txt

11The purpose of having an extreme value such as 4,992 hours is to discard those observations thatare certainly misreported. To work 4,992 in a year implies working 16 hours a day, 6 days a week. It isunlikely that anyone can work that much.

17

hourly wage in the sample are $26.51 and $31.66, with a standard deviation and

interquantile range of $22.15 and $16.37 in 2007 dollars.

The earnings distribution has the typical right asymmetry for an income vari-

able, and a closer inspection of the medians in the boxplots, depicted in red in Fig-

ure 1, shows the typical hump shape for earnings in the life cycle. The interquantile

range, given by the size of the boxplots, is a robust measure of the income variability

along the life cycle. It grows until individuals are in their mid-30s, remains approxi-

mately constant until they reach their 40s, and starts to shrink thereafter.

A plausible economic interpretation is that these differences in the interquantile

ranges are a consequence of the individuals’ decisions to accumulate human capi-

tal in the life cycle (see Ben-Porath (1967)). Another hypothesis is that this variabil-

ity in the interquantile ranges is due to occupational mobility in the life cycle, with

occupation-specific human capital (see Kambourov and Manovskii (2009)) and oc-

cupations having a direct relationship between risk and return (see Cubas and Silos

(2012)). Changes in the intra-household allocation (see Greenwood et al. (2003)

and Knowles (2013)) or the “boomerang” of moving back to one’s parent house as

means of insurance against bad shocks(Kaplan (2012)) could be other possible ex-

planations.

Whether acquiring human capital, modulating risk, or intra-family dynamics

explains Figure 1, it is clear that to model income dynamics, the forces operating in

the life cycle must be taken into account.

5. Results

The size of the MCMC sample for the LSTAR(1) model is 50000 and seems to con-

verge in 5000 iterations. After discarding the initial 5000, a visual inspection of the

different trace plots shows a good mixing (see Figures 2 and 3). In the case of the RIP

model with constant persistence and heterogeneity in the innovations, the MCMC

sample is 20000 and seems to converge after 5000 iterations. These initial 5000 it-

erations are the burn-in phase of the algorithm and are thrown away. I repeat the

sampling from the posterior distributions several times, at different dispersed start-

18

ing points (Gelman and Rubin (1992)) to test for convergence, and the results are

favorable. I apply the Geweke’s convergence test (see Geweke et al. (1991)), and

confirm the convergence.

5.1. Constant persistence and heterogeneity in the innovations

The benchmark and starting point is a conventional RIP model estimated with GMM

techniques. This is a standard model in the empirical macro-literature and is gen-

erally used for the calibration of dynamic general equilibrium models.

Table 3 shows that the idiosyncratic earnings shocks in the life cycle are persis-

tent with a persistence parameter ρ equal to 0.97, a variance of the persistent shock

σ2η equal to 0.029 and a variance for the temporary shock σ2ω equal to 0.095. This

stationary AR(1) process is close to a unit root process, and this high persistence

has the practical implication that an arbitrary shock is halved after 17 periods. This

means that, assuming a working life of 40 years, a worker spends more than 40% of

his working life with the effects of a shock that has not yet reduced in half.

Even though this model assumes that the income growth rate is common to all

agents, the starting point in the level of earnings is different for all workers. This is

represented by an interceptαwhich has a variance σ2α; according to my estimations,

it is equal to 0.061. The sum of σ2α and σ2η gives the variance of the idiosyncratic

earnings shocks for the first-age period without the transitory component of the

variance. In this case, this value is equal to 0.09 and means that a worker that suffers

a one-standard-deviation shock has a 29% increase in their earnings with respect to

the mean.12

The first contender for this standard RIP is a model similar to the LSTAR(1) pre-

viously presented but with just one regime. This means that this model is a RIP with

constant persistence in the life cycle and a rich structure of heterogeneity in the

innovations. As Table 3 shows there are practically no differences between the esti-

12This increase can be explained in the following way: labor earnings in efficiency units are modeledas exwθ, where w is the wage rate, θ is the hours worked, and x ∼ N

(µ, σ2

)is the earnings shocks.

Therefore, the efficiency units ex is a log-normal random variable with mean eµ+σ2/2. If a workersuffers a shock with a value of one-standard-deviation from the mean µ, the increase in labor earningsis eσ−σ

2/2.

19

mates from these two models, and the variances of these estimates are lower in the

Bayesian setting than in the GMM setting.13 In any case, what this shows is that the

assumptions made on the innovation structure does not not affect the conclusions

reached by standard models regarding the variance and persistence of the shocks.

Table 3 indicates that the posterior distribution of the persistence parameter ρ

has a mode at 0.97 and a standard deviation of 0.003. Moreover, as can be seen in

Figure 4, there is a 99% probability that the posterior value of ρ is between 0.96 and

0.98. This means that the GMM value for ρ of 0.976 is highly likely in this model.

The parameters g−1, κ−1i and λ−1 are the components of the variance of the in-

novations. Figures 5 and 6 show the posterior distribution of λ−1 and g−1, while

Figure 7 refers to the boxplot of κ−1i . It is interesting to notice that more than 50%

of the individuals in the sample have κ−1i with values less than 1 (see Table 5). How-

ever, there is a group of individuals with a higher conditional variance than the rest,

as it can be seen in the density plot in Figure 7. Because, it was not possible to iden-

tify this group in the conventional RIP model, there might be an upward bias in the

conventional variance estimation of a RIP model. For instance, the mode of the ini-

tial variance of the shocks in the first age-period, which is the product of the mode

of g−1, κ−1i and λ−1, is equal to 0.06 and is close to the estimate made by the con-

ventional model. However, if one focuses on the mean of the distributions instead

of the mode, the first age-period variance is 0.26. Not exactly the same number as

before.

The difference is more striking if we think of its economic interpretation. A one-

standard-deviation shock represents 46% higher earnings with respect to the mean

earnings if we take into account the mean of the components of the variance, and

only a 23% increase if we restrict to the mode. Putting this finding in perspective is

important: a vast majority of individuals has income shock processes less risky than

what is usually claimed in standard models.

Even though both models deliver similar stories in terms of the dispersion of the

initial shocks, the fact that the heterogeneity of κi is ignored by the conventional

RIP estimation, recommends against the use of the standard specification.

13This result was also pointed out in Nakata and Tonetti (2015).

20

5.2. Heterogeneity in the persistence and the innovations

In the previous subsection, the standard model used in the empirical macro-literature

was compared against a RIP model with constant persistence and heterogeneity in

the innovations. Now, the assumption of a constant persistence is left and the stan-

dard RIP is compared to a LSTAR(1) model which, by definition, exhibits hetero-

geneity in the persistence and the innovations.

In a LSTAR(1) model two regimes are connected with a smooth transition func-

tion. Depending on how close the variable age is from a threshold, the weights as-

signed to each regime change and the nature of the idiosyncratic shocks changes in

every point of the life cycle.

Figure 8 depicts the posterior distribution of the threshold variable c, and Table 6

shows the summary statistics. The histogram for c reveals that there is a probability

higher than 50% that the threshold is below 31 years old, with the age 29 being the

mode of the distribution and the most likely value.

This result means that a worker in his twenties experiences different shocks than

an older worker. If we want to know the economic reasons for this, we need to in-

quire on the changes that happen during this time in the life of a worker. It might

be the case that questions related with job mobility, labor experience or changes in

the composition of the household are relevant.

The weight function that connects both regimes plays an important role in the

estimation results. The parameter γ controls the smoothness of this function and,

in this case, has a posterior mode of 0.52 and a standard deviation of 0.487 (see Table

6). An inspection of Figure 9 shows that there is a 50% posterior probability that γ

is between 0.44 and 0.89. The parameters γ and c complete the description of the

weight function, and they will matter for the analysis of the persistence parameters

ρ′s and the variance of the innovations.

It is quite interesting to notice what happens with the convex combination of

ρ1 and ρ2 at different ages (see Table 4). At age 25, the mode socks’ persistence is

0.959, and starts to increase to 0.97 at age 45, being constant afterwards. On av-

erage, shocks are slightly less persistent when workers are young but the variance

21

of ρ is higher at the beginning of the life cycle. There is a positive probability that

the persistence of a shock at age 25 is between 0.92 and 1. As it can be seen in

Figure 11, the wide posterior dispersion of ρ1 suggests that there might be forces

in action that a threshold like age is unable to detect. The joint posterior distribu-

tion of ρ1 and ρ2 in Figure 10 has an important mass of probability in the region

[0.955, 0.965] × [0.965, 0.975], i.e. there is a positive probability for the persistence

parameter ρ1 to be lower than the persistence ρ2. However, the actual ρ in the life

cycle is a convex combination of ρ1 and ρ2 and it is this weighted ρ that needs to be

taken into account.

The effect of the nonlinearities on the persistence parameters translates into a

reduction of the weighted ρ’s value and seems not to affect the difference between

ρ1 and ρ2. The situation is different for the variance of the innovations.

There are two parameters that show the effect of these nonlinearities on the vari-

ance of the idiosyncratic shocks. These parameters are φ and kit (γ, c, φ), where φ is

the ratio of the variance of the second regime with respect to the first regime, and

kit (γ, c, φ) is the component of the variance that captures the nonlinearity effects.

Figure 12 shows the behavior of kit (γ, c, φ) for the average φ, c and γ, and it is

clear how the variance of the innovations is reduced as agents get older. Table 7 and

Figure 13 indicate that there is a 50% posterior probability that φ is between 0.71

and 0.80, and the average of the ratio of these two variances is 75%. The stationary

variance of the regime for the young workers is 1.3 times the stationary variance of

regime for the old workers.14 While the actual variance in the life cycle of a worker

of 64 years is 75% of the corresponding variance for a 25-year-old worker.

The individual-specific component of the variance κ−1i shows a similar behavior

as in the previous subsection. It has a posterior mean of 2.37 and 75% of the individ-

uals in the sample have values of κ−1i below 1.75. However, there is nonnegligible

proportion of individuals with values higher than 1 (see Figure 14). The posterior

distribution of the rest of the components of the variance, λ and g, can be seen in

Figures in 15 and 16.

14The stationary variance of an AR(1) process with correlation coefficient ρ and variance for theerror term σ2 is equal to σ2

1−ρ2 .

22

The mode of the initial variance in the first-age period in this model is 0.06,

which means that a one-standard-deviation shock represents a 23% increase in la-

bor earnings with respect to the mean. Instead, if we take into account the mean of

the components of the variance, the initial variance is 0.28 and a one-standard de-

viation shock means a 47% rise in earnings. As it was pointed out before, ignoring

individual-specific heterogeneity could be misleading.

6. Conclusion

In this paper, I have shown an age profile in the idiosyncratic earnings shocks and

introduced for the first time a Smoothed Transition Autoregressive model for the

persistent component of the residual earnings. This nonlinear model has the main

advantage of disentangling the economic forces acting in the life cycle and a simple

structure capable of capturing the intrinsic complexity of an income process.

I have departed from the standard GMM framework used in the literature by fol-

lowing a Bayesian approach. From a methodological point of view, Bayesian meth-

ods are capable of dealing with nonlinearities and levels of heterogeneity in the in-

come process which are not easy to estimate with GMM techniques. They are bet-

ter suited to deal with a data set like the PSID, which has top-coded observations

and a biannual periodicity since 1997, where missing values need to be imputed.

Moreover, they simplify the estimation of income processes where the persistent

component is not directly observable and filtering techniques need to be applied.

My results show that workers younger than 29 experience income shocks with

higher variance and lower persistence than older workers. These variances have

significant differences in magnitude. On the one hand, shocks seem to be riskier;

however, they last less time when the worker is younger. The opposite is true when

the worker is older. The high dispersion in the posterior distribution of the persis-

tence when a worker is young suggests that using age as a threshold variable might

not give a complete picture of what is seen in the data.

A comparison of a RIP model with constant persistent and heterogeneity with

respect to a standard RIP shows the a considerable portion of individuals have in-

23

come shock processes less risky than what is actually thought. The standard model

is silent on this heterogeneity and introduces an upward bias on the variance esti-

mates. My Bayesian specification handles well this feature of the data and should

be considered when a macroeconomic model needs to be calibrated.

Possible venues for further research are the extension of this model to a HIP

setting and the introduction of new threshold variables, such as level of income,

whether the economy is in recession or not, occupational mobility, and the sec-

tors where the head of household is employed. After these extensions are studied,

one can apply a model selection criterion and with the best model simulate earn-

ing paths in the life cycle. These simulations should be contrasted with the ones

obtained with a standard model at different age bins and compared with the actual

earning paths seen in the data. This comparison will shed light on the goodness of

fit of different models and their economic implications.

Perhaps because of the difficulties of modelling income processes with nonlin-

earities, few papers consider an age profile in the earnings shocks. This paper fills

this gap and shows that these difficulties can be easily overcome and can substan-

tially improve our understanding of different economic phenomena.

References

Abowd, J. and D. Card, “On the covariance structure of earnings and hours changes,” Econo-

metrica, 1989, 57, 411–445.

Baker, Michael, “Growth-Rate Heterogeneity and the Covariance Structure of Life-Cycle

Earnings,” Journal of Labor Economics, April 1997, 15 (2), 338–375.

and Gary Solon, “Earnings dynamics and inequality among Canadian men, 1976-1992:

Evidence from longitudinal tax records,” Journal of Labor Economics, 2003, 21 (2), 289–

321.

Barsky, Robert B., Miles S. Kimball, F. Thomas Juster, and Matthew D. Shapiro, “Preference

Parameters and Behavioral Heterogeneity: An Experimental Approach in the Health and

Retirement Study,” Quarterly Journal of Economics, 1997, 112 (2), 537–579.

24

Bauwens, Luc, Michel Lubrano, and Jean-Francois Richard, Bayesian Inference in Dynamic

Econometric Models, Oxford University Press, 2000.

Ben-Porath, Yoram, “The Production of Human Capital and the Life Cycle of Earnings,”

Journal of Political Economy, 1967, 75 (4), 352–265.

Browning, Martin, Mette Ejrnaes, and Javier Alvarez, “Modelling income processes with lots

of heterogeneity,” The Review of Economic Studies, 2010, 77 (4), 1353–1381.

Canova, Fabio, Methods for Applied Macroeconomic Research, Vol. 13, Princeton University

Press, 2007.

Casanova, Maria, “Revisiting the hump-shaped wage profile,” Unpublished mimeo, UCLA,

2013.

Chamberlain, Gary and Keisuke Hirano, “Predictive Distributions Based on Longitudinal

Earnings Data,” Annals of Economics and Statistics / Annales d’Economie et de Statistique,

1999, 55-56, 211–242.

Chan, Kung Sik and Howell Tong, “On Estimating Thresholds in Autoregressive Models,”

Journal of Time Series Analysis, 1986, 7 (3), 179–190.

Chib, Siddhartha and Edward Greenberg, “Understanding the metropolis-hastings algo-

rithm,” The American Statistician, 1995, 49 (4), 327–335.

Cubas, German and Pedro Silos, “The Risk Premium in Labor Markets,” Technical Report,

mimeo 2012.

Durbin, James and Siem Jan Koopman, “A simple and efficient simulation smoother for state

space time series analysis,” Biometrika, 2002, 89 (3), 603–616.

Everitt, Brian S. and D.J Hand, Finite Mixture Distributions, Chapman and Hall, 1981.

Gelman, Andrew and Donald B. Rubin, “Inference from iterative simulation using multiple

sequences,” Statistical Science, 1992, 7 (4), 457–472.

Geman, Stuart and Donald Geman, “Stochastic relaxation, Gibbs distributions, and the

Bayesian restoration of images,” IEEE Trans. Pattern Analysis and Machine Intelligence,

1984, 6, 721–741.

Geweke, John, Contemporary Bayesian econometrics and statistics, Vol. 537, John Wiley &

Sons, 2005.

25

and Michael Keane, “An empirical analysis of earnings dynamics among men in the PSID:

1968-1989,” Journal of Econometrics, 2000, 96 (2), 293–356.

et al., Evaluating the accuracy of sampling-based approaches to the calculation of pos-

terior moments, Vol. 196, Federal Reserve Bank of Minneapolis, Research Department

Minneapolis, MN, USA, 1991.

Gottschalk, Peter, “Earnings Mobility: Permanent Change or Transitory Fluctuations?,” The

Review of Economics and Statistics, 1982, 64 (3), 450–456.

and Robert Moffitt, “Trends in the transitory variance of earnings in the United States,”

Economic Journal, 2002, 112 (C), 68–73.

Greenwood, Jeremy, Nezih Guner, and John A. Knowles, “More On Marriage, Fertility, And

The Distribution Of Income,” International Economic Review, 2003, 44 (3), 827–862.

Guvenen, Fatih, “Learning your Earning: Are Labor Income Shocks Really Very Persistent?,”

The American Economic Review, June 2007, 97 (3), 687–712.

, “An empirical investigation of labor income processes,” Review of Economic Dynamics,

January 2009, 12 (1), 58–79.

, Serdar Ozkan, and Jae Song, “The nature of countercyclical income risk,” Technical Re-

port, National Bureau of Economic Research 2012.

Haider, Steven J., “Earnings Instability and Earnings Inequality of Males in the United States:

1967-1991,” Journal of Labor Economics, October 2001, 19 (4), pp. 799–836.

Hastings, W Keith, “Monte Carlo sampling methods using Markov chains and their applica-

tions,” Biometrika, 1970, 57 (1), 97–109.

Hause, John C., “The Fine Structure of Earnings and the On-the-Job Training Hypothesis,”

Econometrica, 1980, 48 (4), 1013–1029.

Heathcote, Jonathan, Kjetil Storesletten, and Gianluca Violante, “The Macroeconomic Im-

plications of Rising Wage Inequality in the United States,” Journal of Political Economy,

2010, 118 (4), 681–722.

Hospido, Laura, “Modelling Heterogeneity and Dynamics in the Volatility of Individual

Wages,” Journal of Applied Econometrics, 2012, 27 (3), 386–414.

26

Hryshko, Dmytro, “Labor income profiles are not heterogeneous: Evidence from income

growth rates,” Quantitative Economics, 2012, 3 (2), 177–209.

Huggett, Mark, Gustavo Ventura, and Amir Yaron, “Sources of lifetime inequality,” The

American Economic Review, 2011, 101 (7), 2923–2954.

Jensen, Shane T. and Stephen Shore, “Semi-parametric Bayesian Modeling of Income

Volatility Heterogeneity,” Journal of the American Statistical Association, 2012, 106 (496),

1280–1290.

Kalman, Rudolph Emil, “A new approach to linear filtering and prediction problems,” Jour-

nal of Basic Engineering, 1960, 82 (1), 35–45.

Kambourov, Gueorgui and Iourii Manovskii, “Occupational Mobility and Wage Inequality,”

The Review of Economic Studies, April 2009, 76 (2), 731–759.

Kaplan, Greg, “Moving back home: Insurance against labor market risk,” Journal of Political

Economy, 2012, 120 (3), 446–512.

Karahan, Fatih and Serdar Ozkan, “On the persistence of income shocks over the life cycle:

Evidence, theory, and implications,” Review of Economic Dynamics, 2013, 16 (3), 452–476.

Keane, Michael P and Kenneth I Wolpin, “The career decisions of young men,” Journal of

Political Economy, 1997, 105 (3), 473–522.

Knowles, John A, “Why are married men working so much? An aggregate analysis of intra-

household bargaining and labour supply,” The Review of Economic Studies, 2013, 80 (3),

1055–1085.

Korenok, Oleg, “Bayesian Methods in Nonlinear Time Series,” Encyclopedia of Complexity

and System Science. Springer, 2009, p. Entry 217.

Lillard, Lee A. and Robert J. Willis, “Dynamic Aspects of Earning Mobility,” Econometrica,

September 1978, 46 (5), 985–1012.

and Yoram Weiss, “Components of Variation in Panel Earnings Data: American Scientists

1960-70,” Econometrica, 1979, 47 (2), pp. 437–454.

Lopes, Hedibert F and Esther Salazar, “Bayesian model uncertainty in smooth transition

autoregressions,” Journal of Time Series Analysis, 2006, 27 (1), 99–117.

27

Luukkonen, Ritva, Pentti Saikkonen, and Timo Terasvirta, “Testing linearity against smooth

transition autoregressive models,” Biometrika, 1988, 75 (3), 491–499.

MaCurdy, Thomas E, “The use of time series processes to model the error structure of earn-

ings in a longitudinal data analysis,” Journal of econometrics, 1982, 18 (1), 83–114.

McCall, John J., Income Mobility, Racial Discrimination, and Economic Growth, Lexington

Books, Lexington, MA, 1973.

McLachlan, Geoffrey and David Peel, Finite mixture models, John Wiley & Sons, 2004.

Meghir, Costas and Frank Windmeijer, “Moment Conditions for Dynamic Panel Data Mod-

els with Multiplicative Individual Effects in the Conditional Variance,” Annals of Eco-

nomics and Statistics / Annales d’Economie et de Statistique, 1999, 55-56, 317–330.

and Luigi Pistaferri, “Income Variance Dynamics and Heterogeneity,” Econometrica,

2004, 72 (1), 1–32.

Metropolis, Nicholas, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and

Edward Teller, “Equation of state calculations by fast computing machines,” The Journal

of Chemical Physics, 1953, 21 (6), 1087–1092.

Muller, Peter, A generic approach to posterior integration and Gibbs sampling, Purdue Uni-

versity, Department of Statistics, 1991.

Nakata, Taisuke and Christopher Tonetti, “Small sample properties of Bayesian estimators

of labor income processes,” Journal of Applied Economics, 2015, 18 (1), 121–148.

Norets, Andriy and Sam Schulhofer-Wohl, “Heterogeneity in income processes,” Technical

Report, mimeo 2010.

Robert, Christian and George Casella, Monte Carlo statistical methods, Springer Science &

Business Media, 2013.

Shore, Stephen H, “The intergenerational transmission of income volatility: is riskiness in-

herited?,” Journal of Business & Economic Statistics, 2011, 29 (3), 372–381.

Shorrocks, Anthony F., “Income Mobility and the Markov Assumption,” Economic Journal,

1976, 86 (343), 566–578.

Storesletten, Kjetil, Christopher I Telmer, and Amir Yaron, “Consumption and risk sharing

over the life cycle,” Journal of Monetary Economics, 2004, 51 (3), 609–633.

28

Terasvirta, Timo, “Specification, estimation, and evaluation of smooth transition autore-

gressive models,” Journal of the American Statistical Association, 1994, 89 (425), 208–218.

Titterington, D Michael, Adrian FM Smith, and Udi E Makov, Statistical analysis of finite

mixture distributions, Wiley, 1985.

Tong, Howell, “On a threshold model,” In Pattern recognition and signal processing. NATO

ASI Series E: Applied Sc., 1978, 29, 575–586.

and K.S. Lim, “Threshold Autoregression, Limit Cycles and Cyclical Data,” Journal of the

Royal Statistical Society. Series B (Methodological), 1980, 42 (3), 245–292.

Tong, Towell, Non-Linear Time Series: A Dynamical System Approach, Vol. 6 of Oxford Sta-

tistical Science Series, Oxford University Press, 1993.

Tsay, Ruey S, “Testing and modeling threshold autoregressive processes,” Journal of the

American Statistical Association, 1989, 84 (405), 231–240.

van Dijk, Dick, Timo Terasvirta, and Philip Hans Franses, “Smooth Transition Autoregressive

Models. A Survey of Recent Developments,” Econometric Reviews, 2002, 21 (1), 1–47.

Vandekerkhove, Pierre, “Consistent and asymptotically normal parameter estimates for

hidden Markov mixtures of Markov models,” Bernoulli, 2005, 11 (11), 103–129.

29

Appendix A: Posterior Distributions.In this appendix, I derive the posterior distributions of the parameters of interest

in each of the Gibbs sampler steps:

•(y∗/

xih(t),t

i,t, β,σ2tTt=1

,sitI,Tii=1,t=ti

, gjmj=1 , ρ1, ρ2, γ, c, φ, y

)′Our starting point is:


εih(t),t =[(

1−G(γ, c, τ it

))ρ1 +G

(γ, c, τ it

)ρ2]︸︷︷︸

mit(γ,c,ρ1,ρ2)

εih(t)−1,t−1 + ηit

Then,

εih(t),t = yih(t),t−xi′h(t),tβ−ω

ih(t),t and yih(t),t = xi′h(t),tβ+mi

t (γ, c, ρ1, ρ2) εih(t)−1,t−1+

ηit + ωih(t),t

Combining both equations, we have:

yih(t),t = xi′h(t),tβ + mit (γ, c, ρ1, ρ2)

(yih(t)−1,t−1 − x

i′h(t)−1,t−1β − ω

ih−1,t−1

)+ ηit +

ωih(t),t

yih(t),t =(xih(t),t −m

it (γ, c, ρ1, ρ2)x

ih(t)−1,t−1

)′β + mi

t (γ, c, ρ1, ρ2) yih(t)−1,t−1 +

ωi′h(t),t−

−mit (γ, c, ρ1, ρ2)ω

ih−1,t−1 + ηit

Using the above, we can find the posterior distribution:

p

(y∗,ih(t),t/

xih(t),t

i,t, β,σ2tTt=1

,sitI,Tii=1,t=ti

, gjmj=1 , ρ1, ρ2, γ, c, φ, y

)∝

∝ p(yih(t)+1,t+1/y

∗,ih(t),t, · · ·

)p(y∗,ih(t),t/y

ih(t)−1,t−1, · · ·

)

30

∝ exp

−12

σ2t+1 +mit+1 (γ, c, ρ1, ρ2)

2 σ2t + V ar(ηit+1

)︸︷︷︸Rt+1

−1

×

×

yih(t)+1,t+1 −(xih+1,t+1 −mi

t+1 (γ, c, ρ1, ρ2)xih(t),t

)′β−

−mit+1 (γ, c, ρ1, ρ2) y

∗,ih(t),t

2

+

+R−1t

y∗,ih(t),t −(xih(t),t −m

it (γ, c, ρ1, ρ2)x

ih(t)−1,t−1

)′β−

−mit (γ, c, ρ1, ρ2) y

ih(t)−1,t−1

2

∝ exp

−12

(y∗,ih(t),t

)2 [(mit+1 (γ, c, ρ1, ρ2)

)2R−1t+1 +R−1t

]−

−2y∗,ih(t),t

R−1t+1mit+1 (γ, c, ρ1, ρ2)×

×(yih(t)+1,t+1 −

(xih+1,t+1 −mi

t+1 (γ, c, ρ1, ρ2)xih(t),t

)′β

)+

+R−1t

(xih(t),t −m

it (γ, c, ρ1, ρ2)x

ih(t)−1,t−1

)′β+

+mit (γ, c, ρ1, ρ2) y

ih(t)−1,t−1

∝ exp

−1

2hy

(y∗,ih(t),t − µy

)2where

hy =(mit+1 (γ, c, ρ1, ρ2)

)2R−1t+1 +R−1t

and

µy = h−1y

R−1t+1mit+1 (γ, c, ρ1, ρ2)×

×(yih(t)+1,t+1 −

(xih+1,t+1 −mi

t+1 (γ, c, ρ1, ρ2)xih(t),t

)′β

)+

+R−1t

[(xih(t),t −m

it (γ, c, ρ1, ρ2)x

ih(t)−1,t−1

)′β −mi

t (γ, c, ρ1, ρ2) yih(t)−1,t−1

]

Therefore,

y∗,ih(t),t/ · · · ∼ N(µy, h

−1y

).

For the time-period when the PSID becomes bi-annual, I impute the values for

the missing years with y∗,ih(t),t/ · · · ∼ N(µy, h

−1y

)I(y∗,ih(t),t

)[y,y])

, where y and y

31

are the lower- and upper-bounds defined in the text.

•(

εih(t),t

I,Tii=1,t=ti

/xih(t),t

i,t, β,σ2tTt=1

, ρ1, ρ2, γ, c, λ, φ, κiIi=1 ,sitI,Tii=1,t=ti

, gjmj=1 , y, y∗)′

Following Durbin and Koopman (2002), we have for each individual i, condi-

tional on sitTiti

, a conditionally linear Gaussian state-space model given by:yih(t),t = yih(t),t − x

i′h(t),tβ = εih(t),t + ωih(t),t

εih(t),t+1 = mit+1 (γ, c, ρ1, ρ2) ε

ih(t),t + ηit+1

(A.1)

where ωih(t),t ∼ N(0, σ2t

), ηit/s

it ∼ N

(0, g−1

sit

)and εti−1 ≡ 0, for ti ≤ t ≤ Ti.

Let wi =(ωih(ti),ti , η

iti , . . . , ω

ih(Ti),Ti

, ηiTi

)′, yi =

(yih(ti),ti , . . . , y

ih(Ti),Ti

)′and εi =(

εih(ti),ti , . . . , εih(Ti),Ti

)′wherewi ∼ N (0,Ω) ,with Ω = diag

(σ2ti , g

−1siti, . . . , σ2Ti , g

−1siTi

).

Step 1: Take a random draw fromw+ ∼ N (0,Ω) and use it to generate yi+ and

εi+ from the recursion (A.1).

Step 2: Apply the Kalman filter (Kalman, 1960) and the disturbance smoother

to yi+ and yi :

Let

εih(t),t/t−1 = E(εih(t),t/y

ih(t)−1,t−1, s

it

)eit = yih(t),t − ε

ih(t),t/t

Rt/t−1 = V ar(εih(t),t/y

ih(t)−1,t−1, s

it

)=

(mit (γ, c, ρ1, ρ2)

)2Rt−1/t−1 + g−1

sit

Pt/t−1 = V ar(yih(t),t/y

ih(t)−1,t−1, s

it

)= Rt/t−1 + σ2t

As,

yih(t),t

εih(t),t

/yih(t)−1,t−1, sit

∼ N εih(t),t/t−1

εih(t),t/t−1

,

Pt/t−1 Rt/t−1

R′t/t−1 Rt/t−1

32

Then,

εih(t),t/t = εih(t),t/t−1 +Rt/t−1P−1t/t−1︸︷︷︸

Kt

eit

Rt/t = Rt/t−1 −Rt/t−1P−1t/t−1Rt/t−1

= (I −Kt)Rt/t−1

The disturbance smoother is given by:

wt = E(wt/y

i)

=

σ2tP−1t/t−1 −σ2tK ′t

0 g−1sit

eit

rt

where

rt−1 = P−1t/t−1eit +(mit (γ, c, ρ1, ρ2)−Kt

)rt

with rTi ≡ 0.

Step 3: Compute εi+ = E(εi+/yi+,

sitTit=ti

)and εi = E

(εi/yi,

sitTit=ti

)with

the forwards recursion:

εih(t)+1,t+1 = mit (γ, c, ρ1, ρ2) ε

ih(t),t + g−1

sitrt

Step 4: Keep εi = εi+ − εi+ + εi.

Finally, εiIi=1∼ p

(εih(t),t

I,Tii=1,t=ti

/ . . .

)

•(β/xih(t),t

i,t,σ2tTt=1

,εih(t),t

I,Tii=1,t=ti

, µ,H, y, y∗)′

p

(β/xih(t),t

i,t,σ2tTt=1

,εih(t),t

I, Tii=1,t=ti

, µ,H, y, y∗)∝

∝[I∏i=1

Ti∏t=ti

p(yih(t),t/x

ih(t),t, β, σ

2t , ε

ih(t),t

)]p (β/µ,H)

33

∝

I∏i=1

Ti∏t=ti

exp

[− 1

2σ2t

(yih(t),t − x

i′h(t),tβ − ε

ih(t),t

)2]exp

−1

2 (β − µ)′H (β − µ)

∝ exp

[−1

2

I∑i=1

Ti∑t=ti

(yih(t),t

−xi′h(t),t

β−εih(t),t

σt

)2]

exp−1

2 (β − µ)′H (β − µ)

∝ exp

−1

2

[β′

I∑i=1

Ti∑t=ti

xih(t),t

xi′h,t

σ2t

β − 2β′I∑i=1

Ti∑t=ti

xih(t),t

(yih(t),t

−εih(t),t

)σ2t

+ β′Hβ − 2β′Hµ

]

∝ exp

−1

2

[β′(

I∑i=1

Ti∑t=ti

xih(t),t

xi′h(t),t

σ2t

+H

)β − 2β′

(I∑i=1

Ti∑t=ti

xih(t),t

(yih(t),t

−εih(t),t

)σ2t

+Hµ

)]∝ exp

−1

2

[(β − µβ

)′Hβ

(β − µβ

)]whereHβ =

I∑i=1

Ti∑t=ti

xih(t),t

xi′h(t),t

σ2t

+H andµβ = H−1β

(I∑i=1

Ti∑t=ti

xih(t),t

(yih(t),t

−εih(t),t

)σ2t

+Hµ

).

Thus,

β/

(xih(t),t

i,t,σ2tTt=1

,εih(t),t

I, Tii=1, t=ti

, µ,H, y, y∗)∼ N

(µβ, H

−1β

).

•(

σ2tTt=1

/xih(t),t

i,t, β,εih(t),t

I,Tii=1,t=ti

, y, y∗)′

p(σ2t /

xih(t),t

i, β,εih(t),t

i, y, y∗

)∝

∝

[ ∏i∈Bt

p(yih(t),t/x

ih(t),t, β, σ

2t , κi, ε

ih(t),t

)]p(σ2t)

where Bt =i ∈ I s.t. yih(t),t is in the sample

, and ntI = #Bt

∝(σ2t)−ntI

2 exp

− 1

2σ2t

∑i∈Bt

(yih(t),t − x

i′h(t),tβ − ε

ih(t),t

)2(σ2t)−( νω2 −1) exp

− 1

2σ2ts2ω

∝(σ2t)−(ntI

2+ νω

2−1)

exp

− 1σ2t

∑i∈Bt

(yih(t),t − x

i′h(t),tβ − ε

ih(t),t

)2+ s2ω

2

∝(σ2t)−(ntI

2+ νω

2−1)

exp− 1σ2t

[ntIs

2ω,t+s

2ω

2

]

∴(σ2t)−1

/(xih(t),t

i, β, εih(t),t, y, y

∗)∼ Γ

ntI2 + νω

2 ,

∑i∈Bt

(yih(t),t − x

i′h(t),tβ − ε

ih(t),t

)2+ s2ω

2

34

•(ρ/εih(t),t

I,Tii=1,t=ti

, γ, c, λ, φ, κiIi=1 ,sitI,Tii=1,t=ti

, gjmj=1

)′p

(ρ/εih(t),t

I,Tii=1,t=ti

, γ, c, λ, φ, κiIi=1 ,sitI,Tii=1,t=ti

, gjmj=1

)∝

∝

I∏i=1

Ti∏t=ti+1

p

(εih(t),t/

εih(t),t

t−1ti

, ρ, γ, c, λ, φ, κi,sittt=ti

, gjmj=1

)× p (ρ)

∝

I∏i=1

Ti∏t=ti+1

p(εih(t),t/ε

ih(t)−1,t−1, · · ·

)× p (ρ)

∝

I∏i=1

Ti∏t=ti+1

exp

−1

2

[κigsit

kit(γ,c,φ)

(εih(t),t − ε

i (γ, c)h(t)−1,t−1 ρ)2]

×

× exp−1

2hρ(ρ− µρ

)′ (ρ− µρ

)

∝ exp

−12

I∑i=1

Ti∑t=ti+1

κigsitkit(γ,c,φ)

(εih(t),t − ε

i (γ, c)h(t)−1,t−1 ρ)2

+

+ hρ(ρ− µρ

)′ (ρ− µρ

)

∝ exp

−1

2

I∑i=1

Ti∑t=ti+1

κigsitkit(γ,c,φ)

ρ′εi (γ, c)′h(t)−1,t−1 εi (γ, c)h(t)−1,t−1 ρ−

−2ρ′εi (γ, c)′h(t)−1,t−1 εih(t),t

+ hρ

(ρ′ρ− 2ρ′µρ

)

∝ exp

−12

ρ′

(I∑i=1

Ti∑t=ti+1

κigsitkit(γ,c,φ)

εi (γ, c)′h(t)−1,t−1 εi (γ, c)h(t)−1,t−1

)ρ−

−2ρ′

(I∑i=1

Ti∑t=ti+1

κigsitkit(γ,c,φ)

εi (γ, c)′h(t)−1,t−1 εih(t),t

)+ hρ

(ρ′ρ− 2ρ′µρ

)

∝ exp

−12

ρ′

(I∑i=1

Ti∑t=ti+1

κigsitkit(γ,c,φ)

εi (γ, c)′h(t)−1,t−1 εi (γ, c)h(t)−1,t−1 + hρI

)ρ−

−2ρ′

(I∑i=1

Ti∑t=ti+1

κigsitkit(γ,c,φ)

εi (γ, c)′h(t)−1,t−1 εih(t),t + hρµρ

)

∝ exp

−1

2

[(ρ− µρ

)′H ρ

(ρ− µρ

)]where H ρ =

I∑i=1

Ti∑t=ti+1

κigsitkit(γ,c,φ)

εi (γ, c)′h(t)−1,t−1 εi (γ, c)h(t)−1,t−1 + hρI, and

µρ = H−1ρ

[I∑i=1

Ti∑t=ti+1

κigsitkit(γ,c,φ)

εi (γ, c)′h(t)−1,t−1 εih(t),t + hρµρ

]

35

∴ ρ/ . . . ∼ N(µρ, H

−1ρ

)•(

(γ, c, φ) /εih(t),t

I,Tii=1,t=ti

, (ρ1, ρ2) , λ, κiIi=1 ,sitI,Tii=1,t=ti

, gjmj=1

)′Let A = i : individual i has a first-age period observation , and q = #A.

p

((γ, c, φ) /

εih(t),t

I,Tii=1,t=ti

, (ρ1, ρ2) , λ, κiIi=1 ,sitI,Tii=1,t=ti

, gjmj=1

)∝

∝

I∏i=1

Ti∏t=ti

p

(εih(t),t/

εih(t),t

t−1ti

, (ρ1, ρ2) , γ, c, λ, φ, κi, µρ1 , hρ1 ,sittit=ti

, gjmj=1

)×

×p (γ) p (c) p (φ)

∝

I∏i=1

Ti∏t=ti

p(εih(t),t/ε

ih(t)−1,t−1, · · ·

)p (γ) p (c) p (φ)

∝

I∏i=1

Ti∏t=ti

( κigsitkit(γ,c,φ)

) 12

exp

−1

2

[λI(t=tiî∈A)κigsit

kit(γ,c,φ)

(εih(t),t − ε

i (γ, c)h(t)−1,t−1 ρ)2]

×

×p (γ) p (c) p (φ)

∝

I∏i=1

Ti∏t=ti

( κigsitkit(γ,c,φ)

) 12

exp

−1

2

[λI(t=tiî∈A)κigsit

kit(γ,c,φ)

(εih(t),t − ε

i (γ, c)h(t)−1,t−1 ρ)2]

×

×(1 + γ2

)−1I (γ > 0) 1

φ

∝

I∏i=1

Ti∏t=ti

(kit (γ, c, φ)

)− 12

×

×exp

−1

2

[I∑i=1

Ti∑t=ti

λI(t=tiî∈A)κigsitkit(γ,c,φ)

(εih(t),t − ε

i (γ, c)h(t)−1,t−1 ρ)2](

1 + γ2)−1

I (γ > 0) 1φ

•(κiIi=1 /

εih(t),t

I,Tii=1,t=ti

, (ρ1, ρ2) , (γ, c) , λ, φ,sitI,Tii=1,t=ti

, gjmj=1

)′p

(κi/εih(t),t

Tit=ti

, (ρ1, ρ2) , (γ, c) , λ, φ,sitTit=ti

, gjmj=1

)∝

∝

Ti∏t=ti

p

(εih(t),t/

εih(t),t

t−1ti

, ((ρ1, ρ2) , γ, c, λ, φ, κi, µρ1 , hρ1 , sit, gj

mj=1

)p (κi)

∝

Ti∏t=ti

p(εih(t),t/ε

ih(t)−1,t−1, · · ·

)p (κi)

∝ (κi)Ti2 exp

−1

2

Ti∑t=ti

λI(t=tiî∈A)κigsitkit(γ,c,φ)

(εih(t),t − ε

i (γ, c)h(t)−1,t−1 ρ)2

×

× (κi)νκ2−1 exp

−1

2s2κκi

36

∝ (κi)Ti+νκ

2−1 exp

−κi 12

[Ti∑t=ti

λI(t=tiˆi∈A)gsit

kit(γ,c,φ)

(εih(t),t − ε

i (γ, c)h(t)−1,t−1 ρ)2

+ s2κ

]

∴ κi/ . . . ∼ Γ

(Ti+νκ

2 , 12

[Ti∑t=ti

λI(t=tiˆi∈A)gsit

kit(γ,c,φ)

(εih(t),t − ε

i (γ, c)h(t)−1,t−1 ρ)2

+ s2κ

])

•(λ/εi1,ti

i∈A , κii∈A ,

sitii∈A , gj

mj=1 , γ, c, φ

)′p(λ/εi1,ti

i∈A , κii∈A ,

sitii∈A , gj

mj=1 , γ, c, φ

)∝

∝∏i∈A

p(εi1,ti/λ, κi, s

iti , gj

mj=1 , γ, c, φ

)p (λ)

∝∏i∈A

(λκigsiti

ki−1t (γ, c, φ)) 1

2exp

−1

2λκigsitiki−1t (γ, c, φ)

(εi1,ti

)2×× (λ)

νλ2−1 exp

−1

2s2λλ

∝ (λ)νλ+q

2−1 exp

−1

2λ

[∑i∈A

κigsitiki−1t (γ, c, φ)

(εi1,ti

)2+ s2λ

]∴ λ/ . . . ∼ Γ

(νλ+q2 , 12

[∑i∈A

κigsitiki−1t (γ, c, φ)

(εi1,ti

)2+ s2λ

])• (µ/β,H)′

p (µ/β,H) ∝ p (β/µ,H) p (µ)

∝ exp−1

2 (β − µ)′H (β − µ)

exp

−1

2

(µ− µ

µ

)′Hµ

(µ− µ

µ

)∝ exp

−1

2

[µ′Hµ− 2µ′Hβ + µ′Hµµ− 2µ′Hµµµ

]∝ exp

−1

2

[µ′(H +Hµ

)µ− 2µ′

(Hβ +Hµµµ

)]∝ exp

−1

2

[(µ− µ)′H (µ− µ)

]where H =

(H +Hµ

)and µ = H

−1(Hβ +Hµµµ

)∴ µ/ (β,H) ∼ N

(µ,H

−1)

• (H/β, µ)′

p (H/β, µ) ∝ p (β/µ,H) p (H)

∝ |H|12 exp

−1

2 (β − µ)′H (β − µ)|H|

νH−(k+1)−1

2 exp−1

2 tr(S−1H H

)∝ |H|

1+νH−(k+1)−1

2 exp−1

2 tr[(

(β − µ) (β − µ)′ + S−1H)H]

∴ H/ (β, µ) ∼Wishart(

1 + νH ,[(β − µ) (β − µ)′ + S−1H

]−1)

37

•(p/sitI,Tii=1,t=ti

)′p(p/sitI,Tii=1,t=ti

)∝

I∏i=1

Ti∏t=ti

p(sit/p

)p (p)

∝

I∏i=1

Ti∏t=ti

p(sit/p

)p (p)

∝

I∏i=1

Ti∏t=ti

m∏j=1

pI(sit=j)j

m∏j=1

pαj−1j

∝m∏j=1

p

I∑i=1

Ti∑t=ti

I(sit = j

)+ αj − 1

j

∴ p/(sitI,Tii=1,t=ti

)∼ Dir

[(I∑i=1

Ti∑t=ti

I(sit = j

)+ αj

)mj=1

]

•(

sitI,Tii=1,t=ti

/εih(t),t

I,Tii=1,t=ti

, ρ1, ρ2, γ, c, λ, φ, κiIi=1 , gjmj=1 ,p

)′p

(sit = j/

εih(t),t

Tit=ti

, ρ1, ρ2, γ, c, λ, φ, κi, gjmj=1 ,p

)∝

∝ p(ηit/s

it = j, · · ·

)p(sit = j/p

)∝λ

12I(t=tiî∈A)

(κigj

kit (γ, c, φ)

) 12

exp

−1

2gjκi

(λ(ηiti)2)

I (t = tiî ∈ A) +

+ (1− I (t = tiî ∈ A))(ηit)

2

kit(γ,c,φ).

︸︷︷︸p(gj)

wi,tj

∴ p

(sit = j/

εih(t),t

Tit=ti

, ρ1, ρ2, γ, c, λ, φ, κiIi=1 , gjmj=1 ,p

)=

wi,tjm∑j=1

wi,tj

•(gjmj=1 /

εih(t),t

I,Tii=1,t=ti

, ρ1, ρ2, λ, γ, c, φ, κiIi=1 ,sitI,Tii=1,t=ti

)′Let Cj =

(i, t) : sit = j

and nj = #Cj .

p

(gj/εih(t),t

(i,t)∈Cj

, ρ1, ρ2, λ, γ, c, φ, κi(i,t)∈Cj ,sit(i,t)∈Cj

)∝

∝

∏(i,t)∈Cj

p(ηit/s

it = j, · · ·

)p (gj)

38

∝ gnj2j exp

−12gj

∑(i,t)∈Cj∩Ac

∪(i,t)∈Cj∩A,with t≥ti+1

κi(ηit)

2

kit(γ,c,φ).+ λκi

kiti(γ,c,φ)

(ηiti)2I (i ∈ A ∩ Cj)

× (gj)νg2−1 exp

−1

2gjs2

∝ exp

−12gj

s2 +∑

(i,t)∈Cj∩Ac∪(i,t)∈Cj∩A,with t≥ti+1

κi(ηit)

2

kit(γ,c,φ).+ λκi

kiti(γ,c,φ)

(ηiti)2I (i ∈ A ∩ Cj)

×

× (gj)nj+νg

2−1

∴ gj/

(εih(t),t

(i,t)∈Cj

, . . .

)∼ Γ

nj+νg

2 , s2

2 +

∑(i,t)∈Cj∩Ac

∪(i,t)∈Cj∩A,with t≥ti+1

κi(ηit)

2

kit(γ,c,φ).

2 +

+ λκikiti

(γ,c,φ)

(ηiti

)2I(i∈A∩Cj)2

39

Appendix B: Hyperparameters.

The values of the hyperparameters are set in such way that the prior distribu-

tions are flexible enough to cover the range of all possible and reasonable values

of the parameters under investigation. My choices are in line with previous esti-

mations made in the literature. A good reference in this matter is Heathcote et al.

(2010).

The nuisance term ωih(t),t includes the transitory component in the agent’s pro-

ductivity together with the measurement error present in the survey. As was de-

scribed in the main body of the paper, the variance ωih(t),t is distributed as a chi-

squared. My choice of hyperparameters is νω = 200 and s2ω = 15, implying that

P(0.064 ≤ σ2t ≤ 0.089

)= 0.9.Moreover, there is a 99% probability that σ2t is between

0.02 and 0.11.

For g−1j , I choose s2 = 0.2 and νg = 2 which means that the variance of ηit in the

autoregressive component of the income process is between 0.033 and 1.95 with a

90% probability. Formally, P(

0.033 ≤ g−1j ≤ 1.95)

= 0.9 ∀j.

The vector p is distributed as a Dirichlet distribution withα = 10×1M , making it

uniform over the M − 1 simplex, with p′js similar to each other and away from zero.

With respect to the persistence parameters, the mean of the random vector ρ is

µρ = (0.5, 0)′ and the precision is hρI = 0.5I. The parameter λ that multiplies the

variance of the first age-period shocks is distributed as s2λλ ∼ χ2 (νλ) with s2λ = 2

and νλ = 2, which gives a P(0.33 ≤ λ−1 ≤ 19.5

)= 0.9. As can be seen, this is not an

informative prior. However, it seems reasonable to allow for a λ that could reduce

the variance in the first age-period by a third or increase it by a factor of twenty.

I choose µ to have mean 0 and considerable dispersion, making the prior un-

informative. The values chosen are µµ

= 0 and Hµ = 0.01I. In the same line,

H ∼Wishart (νH , SH) with νH = 10 and SH = 1, 000I.

Finally, the agent-specific component κi of the variance of ηit is distributed as a

chi-squared random variable, where the degrees of freedom νκ and the multiplying

constant s2κ are chosen in such way that the individual variance can be multiplied

by values that could potentially augment it by a factor of 10 or diminish it by 10%

40

with a high probability, i.e., s2κκi ∼ χ2 (νκ) with s2κ = 1 and νκ = 2, which implies

P(0.17 ≤ κ−1i ≤ 9.75

)= 0.9.

41

Tables and Figures.

Table 1: Wages per Hour Worked

Minimum Wage: $2.64

25th Percentile: $15.29

Median: $22.47

Mean: $26.51

75th Percentile: $31.66

Maximum Wage: $597.33

Std. deviation: $22.15

Interquantile Range: $16.37

Number of individuals: 5,398

Number of observations: 61,163

Note: Summary Statistics for thePSID Sample 1968-2011. All valuesare expressed in 2007 dollars.

42

Table 2: Summary Statistics by years (2007 dollars)

Year Avge. Avge. Avge. Median Std. Dev. IQR. N

Age Education Wage Wage Wage Wage

1968 42.48 2.17 23.43 21.42 13.42 13.50 1335

1969 42.76 2.17 24.17 21.68 14.83 13.99 1423

1970 42.56 2.20 25.24 23.04 14.63 14.43 1406

1971 42.32 2.24 25.80 23.09 15.82 14.68 1399

1972 41.93 2.27 26.08 23.21 17.65 14.88 1415

1973 41.35 2.31 26.97 24.18 17.24 15.60 1445

1974 41.03 2.32 27.09 24.42 16.35 15.84 1472

1975 40.74 2.45 26.84 24.28 16.11 15.52 1484

1976 40.78 2.47 25.97 23.61 14.61 15.71 1479

1977 40.64 2.50 26.75 24.08 15.99 15.39 1523

1978 40.23 2.52 26.88 24.47 14.89 15.50 1551

1979 40.01 2.53 27.33 24.53 18.06 15.70 1586

1980 40.00 2.54 26.45 24.32 14.66 16.01 1612

1981 40.20 2.56 25.28 22.71 14.29 15.61 1625

1982 40.14 2.56 24.77 22.77 14.45 15.77 1643

1983 39.98 2.60 25.11 22.32 19.43 15.71 1644

1984 39.94 2.61 25.39 22.22 20.04 16.31 1653

1985 39.77 2.67 25.89 22.41 21.66 16.76 1684

1986 39.94 2.66 25.95 22.21 21.90 17.19 1702

1987 39.94 2.67 26.13 22.12 24.68 17.52 1723

1988 40.01 2.66 26.01 21.67 22.96 17.61 1730

1989 40.16 2.68 26.00 22.21 19.46 17.36 1721

1990 40.41 2.70 25.66 21.80 21.23 16.85 1709

1991 40.53 2.76 25.30 21.54 20.96 16.63 1698

1992 40.67 2.76 25.15 20.94 20.47 16.81 1698

1993 40.66 2.80 27.20 22.64 23.10 17.03 1628

1994 40.45 2.77 27.52 21.87 28.90 16.78 1704

1995 40.71 2.76 27.16 21.72 26.25 17.07 1726

1996 40.89 2.78 26.47 21.15 23.91 16.56 1723

1997 41.07 2.79 26.46 21.16 22.14 16.33 1731

1999 41.48 2.79 27.13 21.30 24.77 16.52 1784

2001 41.60 2.77 28.88 22.53 31.60 16.84 1858

2003 41.60 2.77 27.82 21.64 28.24 17.03 1924

2005 41.73 2.80 27.30 21.22 27.86 16.53 1943

2007 41.90 2.80 27.95 21.39 29.59 16.44 2007

2009 41.80 2.97 29.11 22.09 33.91 17.19 2047

2011 43.00 2.99 29.58 22.92 27.80 18.80 1728

Note: Average age, education and wage, median wage, wage’s standard devi-ation and interquartile range (IQR), and number of individuals for each yearin the sample. All values are expressed in 2007 dollars.

43

Table 3: Restricted Income Profile Process (RIP)

Parameters GMM Bayesian

ρ 0.976 (0.0120) 0.967 (0.0029)

σ2ω 0.095 (0.0153) 0.094 (0.0157)

σ2η 0.029 (0.0103) 0.025 (0.0009)

Note: Estimates for the autocorrelation coeffi-cient (ρ) and the variances of the transitory andpersistent shocks (σ2

ω and σ2η, respectively) un-

der different estimation strategies. Standard Er-rors are written in parenthesis. In the case ofGMM, the variances where computed by meansof a Block Bootstrap with 100 repetitions.

Table 4: Persistence Parameters

Parameters Age 25 Age 35 Age 45 Age 55 Age 64

1-Regime ρ 0.967 0.967 0.967 0.967 0.967

2-Regimes ρ 0.959 0.968 0.970 0.970 0.970

Note: Mode Estimates for a sample of ages in a model whichallows for different ρ’s in the life-cycle (2-Regimes) and a modelwhich doesn’t (1-Regime).

Table 5: Components of the Persistent Shocks (1-Regime Model)

Parameters Mode Mean Std. Dev. 25% 50% 75%

λ−1 4.213 4.219 0.2544 4.046 4.212 4.384

κ−1i 0.520 2.269 4.9093 0.481 0.693 1.728

g−1 0.025 0.027 0.0009 0.026 0.027 0.029

Note: Summary statistics for the components of the persistentshocks’ variance, namely the loading factor for the Age-1 variance(λ−1), the individual loading factor (κ−1

i ) and the common variancefor all individuals (g−1).

44

Table 6: Weight Function Parameters.


c 29.30 30.72 1.5575 29.39 30.68 33.48

γ 0.52 0.76 0.4873 0.44 0.59 0.89

Note: Summary statistics for the threshold variable c and thesmoothing parameter γ.

Table 7: Components of the Persistent Shocks (2-Regimes Model)


λ−1 3.519 3.531 0.2709 3.343 3.524 3.712

κ−1i 0.485 2.367 5.2797 0.479 0.692 1.750

g−1 0.032 0.033 0.0023 0.031 0.033 0.035

φ 0.752 0.750 0.0537 0.710 0.751 0.794

Note: Summary statistics for the components of the persistentshocks’ variance, namely the loading factor for the Age-1 variance(λ−1), the individual loading factor (κ−1

i ) and the common variancefor all individuals (g−1).

45

25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63

020

40

60

80

Figure 1: Wage’s Boxplots by Age.

46

0 10000 20000 30000 40000

28

30

32

34

36

Iterations

Figure 2: Trace plot for the threshold variable c.

47

0 10000 20000 30000 40000

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Iterations

0 10000 20000 30000 40000

0.6

00.6

50.7

00.7

50.8

00.8

50.9

0

Iterations

Figure 3: Trace-plots for γ (left) and φ (right).

48

0.95 0.96 0.97 0.98 0.99 1.00

020

40

60

80

100

120

140

Figure 4: Posterior Density Estimation of the Correlation Coefficient ρ in the 1-Regime Model.

49

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Figure 5: Posterior Density Estimation of the Age-1-Variance Loading Factor (λ−1)in the 1-Regime Model.

50

0.010 0.015 0.020 0.025 0.030 0.035 0.040

050

100

150

200

250

300

Figure 6: Posterior Density Estimation of the Variance of the Persistent Shocks (g−1)in the 1-Regime Model.

51

0.5

1.0

1.5

2.0

2.5

3.0

3.5

0 20 40 60

0.0

00.0

50.1

00.1

50.2

00.2

50.3

0

Figure 7: Boxplot and Posterior Density Estimation of the individual-specific load-ing factor κ−1i (1-Regime Model).

52

25 30 35 40 45 50 55

0.0

00.0

50.1

00.1

5

Figure 8: Posterior Density Estimation of the Age Threshold.

53

0 1 2 3 4

0.0

0.5

1.0

1.5

Figure 9: Posterior density estimation for the Smoothing Parameter γ.

54

0

200

400

600

800

0.92 0.94 0.96 0.98 1.00

0.94

0.95

0.96

0.97

0.98

0.99

1.00

ρ1

ρ2

Persistence Early in the Life−Cycle

0.90

0.92

0.94

0.96

0.98

1.00Per

sist

ence

Lat

er in

the

Life

−Cyc

le

0.90

0.92

0.94

0.96

0.98

1.00

Z

0

200

400

600

Figure 10: Joint posterior distribution of the Correlation Coefficients (ρ1 and ρ2) forboth Regimes (Above: Contour Plot; Below: Density Plot)

55

0.90 0.92 0.94 0.96 0.98 1.00

010

20

30

Age 25

Age 35

Age 45

Figure 11: Posterior Density Estimation of the Correlation Coefficient ρ for differentages.

56

0.7

50.8

00

.85

0.9

00.9

5

Age

Weig

ht F

unction

25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63

0.6 0.8 1.0 1.2 1.4

05

10

15

20

Age 25

Age 35

Age 45

Figure 12: Variance’s Weight Function kit (γ, c, φ). Above: Posterior Mean by Age.Below: Density Estimation by Ages 25, 35 and 45.

57

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Figure 13: Posterior Density Estimation of the Ratio (φ) of the Variance of the SecondRegime with respect to the First Regime.

58

0.5

1.0

1.5

2.0

2.5

3.0

3.5

0 20 40 60 80

0.0

00.0

50.1

00.1

50.2

00.2

50.3

0

Figure 14: Boxplot and Posterior Density Estimation of the individual-specific load-ing factor κ−1i (2-Regimes Model).

59

0 1 2 3 4 5

0.0

0.5

1.0

1.5

Figure 15: Posterior Distribution of the Age-1-Variance Loading Factor (λ−1) in theLSTAR(1).

60

0.00 0.01 0.02 0.03 0.04 0.05

020

40

60

Figure 16: Posterior Distribution of the First-Regime-Variance (g−1) in the LSTAR(1).

Life-Cycle Patterns of Earnings Shocks

Documents

Transcript of Life-Cycle Patterns of Earnings Shocks