Statistical models for predicting recidivism risk
Rolando De la Cruz
Department of Public Health and Department of Statistics
Ponti�cia Universidad Católica de Chile
and CEAMOS
First International Summit on Scienti�c Criminal AnalysisSantiago, April 21 to 27, 2014
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Presentation structure
Introduction
Survival analysis
Statistical models to survival analysis
Application
Split Population Models
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Introduction
Recidivism, as observed by Maltz (1984), can be understood as
a sequence of failures: failure of the corrections system in
�correcting� the ex inmate, failure of the ex inmate in being
able to live in a society and failure of the society in completely
reintegrating the ex inmate into a law abiding environment.
Besides the important psychological, sociological and
criminological impacts related to recidivism, there are also
economic e�ects, for instance, the forgone labor market
earnings, the costs of keeping inmates in prison and jails, the
obsolescence of the inmate's human capital because of
incarceration.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Introduction
Measures of Recidivism
The word recidivism, in a criminological context, can be broadly
de�ned as the return of a criminal behavior after an individual
has been convicted of a prior o�ence, sentenced and corrected.
The modern tendency in criminology has shown that there are
three possible de�nitions for recidivism: rearrest, reconviction
and reincarceration (Maltz, 1984).
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Introduction
The process of recidivism is best approached by the use of
survival models.
The use of survival models to study criminal recidivism
dates back to the end of the nineteen seventies. The
pioneering works of Partanen (1969) Carr-Hill and
Carr-Hill (1972), and Stollmack and Harris (1974) are
representative of the early literature.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis
Survival analysis is a branch of statistics which deals with
analysis of time to events, such as death in biological organisms,
failure in mechanical systems and recidivism in crime analysis.
This topic is called reliability theory or reliability analysis in
engineering, and duration analysis or duration modeling in
economics or event history analysis in sociology
wikipedia
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis
The object of primary interest is the survival function,
conventionally denoted S, which is de�ned as
S(t) = P(T > t)
where t is some time, T is a random variable denoting the time
to event.
Properties:
S(0) = 1
S(t)→ 0 when t→∞.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis
How do we record and represent survival data with censoring?
Ti denotes the response for the ith subject.
Let Ci denote the censoring time for the ith subject.
Let δi denote the event indicator
δi =
{1 if the event was observed (Ti ≤ Ci)0 if the response was censored (Ti > Ci).
The observed response is Yi = min(Ti, Ci).
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis
Censoring is when an observation is incomplete due to some
random cause. The cause of the censoring must be independent
of the event of interest if we are to use standard methods of
analysis.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis
When only the random variable Yi = min(Ti, Ci) is observed
due to
loss to follow-up
drop-out
study termination
we call this right-censoring because the true unobserved event
is to the right of our censoring time; i.e., all we know is that the
event has not happened at the end of follow-up.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis
More De�nitions and Notation:
There are several equivalent ways to characterize the
probability distribution of a survival random variable. Some of
these are familiar; others are special to survival analysis. We
will focus on the following terms:
The density function f(t)
The survivor function S(t)
The hazard function λ(t)
The cumulative hazard function Λ(t)
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis
The hazard is interpretable as the expected number of
events per individual per unit of time.
The cumulative hazard function Λ(t) represents the
expected number of events that have occurred by time t.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis
Estimating the survival or hazard function
We can estimate the survival (or hazard) function in two ways:
by specifying a parametric model for λ(t) based on a
particular density function f(t)
by specifying a parametric model for λ(t) based on a
particular density function f(t)
If no censoring:
The empirical estimate of the survival function, �S(t), is the
proportion of individuals with event times greater than t.
With censoring:
If there are censored observations, then �S(t) is not a good
estimate of the true S(t), so other non-parametric methods
must be used to account for censoring (life-table methods,
Kaplan-Meier estimator)
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis
Some hazard shapes seen in applications:
increasing
decreasing
bathtub
constant
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Regression models to survival data
Most interesting survival-analysis research examines the
relationship between survival � typically in the form of the
hazard function � and one or more explanatory variables
(or covariates or regressors).
Most common are linear-like models for the log hazard.
For example, a parametric regression model based on the
exponential distribution:
loge λi(t) = α+ β1xi1 + · · ·+ βkxik
or, equivalently
λi(t) = exp(α+ β1xi1 + · · ·+ βkxik)
where: i indexes subjects; xi1, . . . , xik n are the values of
the covariates for the ith subject.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Regression models to survival data
This is therefore a linear model for the log-hazard or a
multiplicative model for the hazard itself, because
hi(t) = eα × eβ1xi1 × · · · × eβkxik
The model is parametric because, once the regression
parameters α,β1, . . . , βk are speci�ed, the hazard function
λ(t) is fully characterized by the model.
The regression constant α represents a kind of baseline
hazard, since loge λi(t) = α, or equivalently, hi(t) = eα,
when all of the x's are 0.
Other parametric hazard regression models are based on
other distributions commonly used in modeling survival
data, such as the Gompertz and Weibull distributions.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Regression models to survival data
Fully parametric hazard regression models have largely
been superseded by the Cox model (introduced by David
Cox in 1972), which leaves the baseline hazard function
α(t) = loge λ0(t) unspeci�ed:
loge λi(t) = α(t) + β1xi1 + · · ·+ βkxikor equivalently,
λi(t) = λ0(t) exp(β1xi1 + · · ·+ βkxik)
The Cox model is termed semi-parametric because while
the baseline hazard can take any form, the covariates enter
the model through the linear predictor
ηi = β1xi1 + · · ·+ βkxikNotice that there is no constant term (intercept) in the
linear predictor: The constant is absorbed in the baseline
hazard.R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Regression models to survival data: Estimation
To estimate a hazard regression model we can use
maximum likelihood estimation.
Software available (among others)
? R
? SAS
? STATA
? SPSS
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Regressors to predict recidivism
Accordingly to Zamble and Quinsey (2001), there are two
groups of regressors: static and dynamic. The criterion of
classi�cation is basically to what extent the correctional
authorities or policy makers can change the regressor. The cited
authors observe that variables such as sex, age, past o�ences,
past substance abuse are inherently non-modi�able and are
examples of static variables. Other regressors, predominantly of
psychological nature, such as emotions, thoughts, and
perceptions, are classi�ed as dynamics. This is so because,
those regressors could be changed through public initiatives.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
This dataset is from a randomized �eld experiment originally
reported by Rossi, Berk, and Lenihan (1980). In this study, 432
inmates released from Maryland state prisions were randomly
assigned to either an intervention or control condition. The
intervention consisted of �nancial assistance provided to the
released inmates for the duration of the study period. Those in
the control condition received no aid. The inmates were
followed for one year after their release. The event of interest
was re-arrest.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
The variables in the data set are as follows:
week: The week of �rst arrest of each former prisoner; if
the prisoner was not rearrested, this variable is censored
and takes on the value 52.
arrest: The censoring indicator, coded 1 if the former
prisoner was arrested during the period of the study and 0
otherwise.
fin: A dummy variable coded 1 if the former prisoner
received �nancial aid after release from prison and 0
otherwise. The study was an experiment in which �nancial
aid was randomly provided to half the prisoners.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
con't:
age: The former prisoner's age in years at the time of
release.
race: A dummy variable coded 1 for blacks and 0 for
others.
wexp: Work experience, a dummy variable coded 1 if the
former prisoner had full-time work experience prior to
going to prison and 0 otherwise.
mar: Marital status, a dummy variable coded 1 if the former
prisoner was married at the time of release and 0 otherwise.
paro: A dummy variable coded 1 if the former prisoner was
released on parole and 0 otherwise.
prio: The number of prior incarcerations.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
con't:
educ: Level of education, coded as follows:
2: 6th grade or less
3: 7th to 9th grade
4: 10th to 11th grade
5: high-school graduate
6: some postsecondary or more
emp1�emp52: 52 dummy variables, each coded 1 if the
former prisoner was employed during the corresponding
week after release and 0 otherwise
Next Figure shows the Kaplan-Meier estimate of time to �rst
arrest for the full recidivism data set.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
0 10 20 30 40 50
0.70
0.75
0.80
0.85
0.90
0.95
1.00
t
estim
ated
S(t
)
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis: Comparing Survival Functions
There are several tests to compare survival functions
between two or among several groups.
Most tests can be computed from contingency tables for
those at risk at each event time.
A variety of test statistics can be computed using the
expected and observed counts; probably the simplest and
most common is the Mantel-Haenszel or log-rank test
The null hypothesis that the survival functions for the two
groups are the same.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis: Comparing Survival Functions
Consider, next Figure which shows Kaplan-Meier estimates
separately for released prisoners who received and did not
receive �nancial aid.
? At all times in the study, the estimated probability of not
(yet) reo�ending is greater in the �nancial aid group than
in the no-aid group.
? The log-rank test statistic is 3.84, which is associated
with a p-value of almost exactly .05, providing marginally
signi�cant evidence for a di�erence between the two groups.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Survival analysis: Comparing Survival Functions
� �no aid�; � �aid�
0 10 20 30 40 50
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
t
estim
ated
S(t
)
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
Cox regression model to Rossi et al's (1980) data set
The Cox regression reported below uses the following
time-constant covariates:
fin: A dummy variable coded 1 if the former prisoner
received �nancial aid after release from prison and 0
otherwise.
age: The former prisoner's age in years at the time of
release.
race: A dummy variable coded 1 for blacks and 0 for
others.
wexp: Work experience, a dummy variable coded 1 if the
former prisoner had full-time work experience prior to
going to prison and 0 otherwise.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
Cox regression model to Rossi et al's (1980) data set.
Con't
mar: Marital status, a dummy variable coded 1 if the former
prisoner was married at the time of release and 0 otherwise.
paro: A dummy variable coded 1 if the former prisoner was
released on parole and 0 otherwise.
prio: The number of prior incarcerations.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
The results of the Cox regression for time to �rst arrest are as
follows:
Covariate bi ebi SE(bi) zi pifin -0.379 0.684 0.191 -1.983 0.047
age -0.057 0.944 0.022 -2.611 0.009
race 0.314 1.369 0.308 1.019 0.310
wexp -0.150 0.861 0.212 -0.706 0.480
mar -0.434 0.648 0.382 -1.136 0.260
paro -0.085 0.919 0.196 -0.434 0.660
prio 0.091 1.096 0.029 3.195 0.001
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
where:
bi is the maximum partial�likelihood estimate of βi in the
Cox model.
ebi , the exponentiated coe�cient, gives the e�ect of xi in
the multiplicative form of the model.
SE(bi) is the standard error of bi, that is the square-root of
the corresponding diagonal entry of the estimated
asymptotic coe�cient covariance matrix.
zi = bi/SE(bi) is the Wald statistic for testing the null
hypothesis H0 : βi = 0; under this null hypothesis, zifollows an asymptotic standard-normal distribution.
pi is the two-sided p-value for the null hypothesis
H0 : βi = 0. � Thus, the coe�cients for age and prio are
highly statistically signi�cant, while that for fin is
marginally so.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
The estimated coe�cients bi of the Cox model give the linear,
additive e�ects of the covariates on the log-hazard scale.
Although the signs of the coe�cients are interpretable
(e.g., other covariates held constant, getting �nancial aid
decreases the hazard of rearrest, while an additional prior
incarceration increases the hazard), the magnitudes of the
coe�cients are not so easily interpreted.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Application: Rossi et al's (1980) data set
It is more straightforward to interpret the exponentiated
coe�cients, which appear in the multiplicative form of the
model,
λ̂i(t) = λ̂0(t)× eb1xi1 × · · · × ebkxik
Thus, increasing xi by 1, holding the other x's constant,
multiplies the estimated hazard by ebi .
For example, for the dummy�regressor �n,
eb1 = e−0.379 = 0.684, and so we estimate that providing
�nancial aid reduces the hazard of rearrest � other
covariates held constant � by a factor of 0.684 � that is, by
100(1 - 0.684) = 31.6 percent.
Similarly, an additional prior conviction increases the
estimated hazard of rearrest by a factor of
eb7 = e0.91 = 1.096 or 100(1.096 - 1) = 9.6 percent.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Research CEAMOS
Recall: distributions in parametric modeling of time to
event data: exponential, weibull, log�normal, half normal,
etc.
We introduce a new class of distributions with positive
support called epsilon�positive which are generated on the
basis of the distributions with positive support.
This new class has as special cases the exponential,
Weibull, log�normal, etc. distributions, and is an
alternative to analyze time to event data.
We study in detail a particular member of this new class of
distributions: the epsilon�exponential distribution,
denoted as, EE(σ, ε)
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Research CEAMOS
0 2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
x
f(x)
ε = 0.8
ε = 0.5
ε = 0.3
ε = 0
0 2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
x
F(x
)
ε = 0.8
ε = 0.5
ε = 0.3
ε = 0
The pdf f(t) and cdf F(t) of EE(σ = 1, ε). The exponential
distribution corresponds when ε = 0.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Research CEAMOS
0 2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
x
S(x
)
ε = 0.8
ε = 0.5
ε = 0.3
ε = 0
0 2 4 6 8 10
0.0
0.5
1.0
1.5
x
λ(x)
ε = 0.8
ε = 0.5
ε = 0.3
ε = 0
The survival S(t) and hazard λ(t) functions of EE(σ = 1, ε).
The exponential distribution corresponds when ε = 0.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Research CEAMOS
Application with data from Gendarmerie of Chile.
Dataset of prisoners sentenced originally for robbery
2007/1/1 2007/12/31 2012/04/30
released follow-up period
n = 6 577 released inmates
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Research CEAMOS
Recidivism was de�ned as return to prison.
The recidivism rate was approximately 50%.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Research CEAMOS
0 500 1000 1500 2000
0.0
0.2
0.4
0.6
0.8
1.0
Time (in days)
S(t
)
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Research CEAMOS
Selection model:
Log�likelihood value:
Exponential model: -28786.9
Epsilon�exponential model: -28590.79
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Split population models
Recall:
S(0) = 1
S(∞) = 0. All inmates recidivism....
Split population models account for a speci�c type of
heterogeneity, i.e., the possibility that some cases will never
experience the event of interest while some will. One of the
assumptions of standard duration models is that every
observation in the data will eventually experience the event of
interest, which is sometimes an unreasonable assumption in
violation of a particular theory or understanding of the process
under examination.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Split population models
We can express a split population model as follows:
Let F be a variable indicating whether a subject would or
would not eventually fail. De�ne Y as:
F =
{1 for subjects who would eventually fail
0 for subjects who would never fail.
We assume
P(F = 1) = p, P(F = 0) = 1− p.
The parameter p is the eventual recidivism rate.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Split population models
The survival function for the split population model is
given by
S(t) = (1− p) + pSu(t)
where Su(·) is the survivor function for the subjects who
would eventually fail.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Split population models
The likelihood function for the split population model is
given by
L =
n∏i=1
[pfu(ti)]δi [(1− p) + pSu(ti)]
1−δi
When we have a set of covariate vector X, then the
dependence of the recidivism rate p on X can be modeled
using a logistic function p = 11+exp(X′β) .
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Split population models
Software:
SAS
STATA
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Thank you
rolando@{med;mat}.puc.cl
Part of this short course is based on the lecture notes of John Fox.
R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk
Top Related