Download - Rolando de la Cruz

Statistical models for predicting recidivism risk

Rolando De la Cruz

Department of Public Health and Department of Statistics

Ponti�cia Universidad Católica de Chile

and CEAMOS

First International Summit on Scienti�c Criminal AnalysisSantiago, April 21 to 27, 2014

R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk

Presentation structure

Introduction

Survival analysis

Statistical models to survival analysis

Application

Split Population Models


Introduction

Recidivism, as observed by Maltz (1984), can be understood as

a sequence of failures: failure of the corrections system in

�correcting� the ex inmate, failure of the ex inmate in being

able to live in a society and failure of the society in completely

reintegrating the ex inmate into a law abiding environment.

Besides the important psychological, sociological and

criminological impacts related to recidivism, there are also

economic e�ects, for instance, the forgone labor market

earnings, the costs of keeping inmates in prison and jails, the

obsolescence of the inmate's human capital because of

incarceration.


Introduction

Measures of Recidivism

The word recidivism, in a criminological context, can be broadly

de�ned as the return of a criminal behavior after an individual

has been convicted of a prior o�ence, sentenced and corrected.

The modern tendency in criminology has shown that there are

three possible de�nitions for recidivism: rearrest, reconviction

and reincarceration (Maltz, 1984).


Introduction

The process of recidivism is best approached by the use of

survival models.

The use of survival models to study criminal recidivism

dates back to the end of the nineteen seventies. The

pioneering works of Partanen (1969) Carr-Hill and

Carr-Hill (1972), and Stollmack and Harris (1974) are

representative of the early literature.


Survival analysis

Survival analysis is a branch of statistics which deals with

analysis of time to events, such as death in biological organisms,

failure in mechanical systems and recidivism in crime analysis.

This topic is called reliability theory or reliability analysis in

engineering, and duration analysis or duration modeling in

economics or event history analysis in sociology

wikipedia


Survival analysis

The object of primary interest is the survival function,

conventionally denoted S, which is de�ned as

S(t) = P(T > t)

where t is some time, T is a random variable denoting the time

to event.

Properties:

S(0) = 1

S(t)→ 0 when t→∞.


Survival analysis

How do we record and represent survival data with censoring?

Ti denotes the response for the ith subject.

Let Ci denote the censoring time for the ith subject.

Let δi denote the event indicator

δi =

{1 if the event was observed (Ti ≤ Ci)0 if the response was censored (Ti > Ci).

The observed response is Yi = min(Ti, Ci).


Survival analysis

Censoring is when an observation is incomplete due to some

random cause. The cause of the censoring must be independent

of the event of interest if we are to use standard methods of

analysis.


Survival analysis

When only the random variable Yi = min(Ti, Ci) is observed

due to

loss to follow-up

drop-out

study termination

we call this right-censoring because the true unobserved event

is to the right of our censoring time; i.e., all we know is that the

event has not happened at the end of follow-up.


Survival analysis

More De�nitions and Notation:

There are several equivalent ways to characterize the

probability distribution of a survival random variable. Some of

these are familiar; others are special to survival analysis. We

will focus on the following terms:

The density function f(t)

The survivor function S(t)

The hazard function λ(t)

The cumulative hazard function Λ(t)


Survival analysis

The hazard is interpretable as the expected number of

events per individual per unit of time.

The cumulative hazard function Λ(t) represents the

expected number of events that have occurred by time t.


Survival analysis

Estimating the survival or hazard function

We can estimate the survival (or hazard) function in two ways:

by specifying a parametric model for λ(t) based on a

particular density function f(t)

by specifying a parametric model for λ(t) based on a

particular density function f(t)

If no censoring:

The empirical estimate of the survival function, �S(t), is the

proportion of individuals with event times greater than t.

With censoring:

If there are censored observations, then �S(t) is not a good

estimate of the true S(t), so other non-parametric methods

must be used to account for censoring (life-table methods,

Kaplan-Meier estimator)


Survival analysis

Some hazard shapes seen in applications:

increasing

decreasing

bathtub

constant


Regression models to survival data

Most interesting survival-analysis research examines the

relationship between survival � typically in the form of the

hazard function � and one or more explanatory variables

(or covariates or regressors).

Most common are linear-like models for the log hazard.

For example, a parametric regression model based on the

exponential distribution:

loge λi(t) = α+ β1xi1 + · · ·+ βkxik

or, equivalently

λi(t) = exp(α+ β1xi1 + · · ·+ βkxik)

where: i indexes subjects; xi1, . . . , xik n are the values of

the covariates for the ith subject.



This is therefore a linear model for the log-hazard or a

multiplicative model for the hazard itself, because

hi(t) = eα × eβ1xi1 × · · · × eβkxik

The model is parametric because, once the regression

parameters α,β1, . . . , βk are speci�ed, the hazard function

λ(t) is fully characterized by the model.

The regression constant α represents a kind of baseline

hazard, since loge λi(t) = α, or equivalently, hi(t) = eα,

when all of the x's are 0.

Other parametric hazard regression models are based on

other distributions commonly used in modeling survival

data, such as the Gompertz and Weibull distributions.



Fully parametric hazard regression models have largely

been superseded by the Cox model (introduced by David

Cox in 1972), which leaves the baseline hazard function

α(t) = loge λ0(t) unspeci�ed:

loge λi(t) = α(t) + β1xi1 + · · ·+ βkxikor equivalently,

λi(t) = λ0(t) exp(β1xi1 + · · ·+ βkxik)

The Cox model is termed semi-parametric because while

the baseline hazard can take any form, the covariates enter

the model through the linear predictor

ηi = β1xi1 + · · ·+ βkxikNotice that there is no constant term (intercept) in the

linear predictor: The constant is absorbed in the baseline

hazard.R. De la Cruz rolando@{med;mat}.puc.cl Predicting recidivism risk

Regression models to survival data: Estimation

To estimate a hazard regression model we can use

maximum likelihood estimation.

Software available (among others)

? R

? SAS

? STATA

? SPSS


Regressors to predict recidivism

Accordingly to Zamble and Quinsey (2001), there are two

groups of regressors: static and dynamic. The criterion of

classi�cation is basically to what extent the correctional

authorities or policy makers can change the regressor. The cited

authors observe that variables such as sex, age, past o�ences,

past substance abuse are inherently non-modi�able and are

examples of static variables. Other regressors, predominantly of

psychological nature, such as emotions, thoughts, and

perceptions, are classi�ed as dynamics. This is so because,

those regressors could be changed through public initiatives.


Application: Rossi et al's (1980) data set

This dataset is from a randomized �eld experiment originally

reported by Rossi, Berk, and Lenihan (1980). In this study, 432

inmates released from Maryland state prisions were randomly

assigned to either an intervention or control condition. The

intervention consisted of �nancial assistance provided to the

released inmates for the duration of the study period. Those in

the control condition received no aid. The inmates were

followed for one year after their release. The event of interest

was re-arrest.



The variables in the data set are as follows:

week: The week of �rst arrest of each former prisoner; if

the prisoner was not rearrested, this variable is censored

and takes on the value 52.

arrest: The censoring indicator, coded 1 if the former

prisoner was arrested during the period of the study and 0

otherwise.

fin: A dummy variable coded 1 if the former prisoner

received �nancial aid after release from prison and 0

otherwise. The study was an experiment in which �nancial

aid was randomly provided to half the prisoners.



con't:

age: The former prisoner's age in years at the time of

release.

race: A dummy variable coded 1 for blacks and 0 for

others.

wexp: Work experience, a dummy variable coded 1 if the

former prisoner had full-time work experience prior to

going to prison and 0 otherwise.

mar: Marital status, a dummy variable coded 1 if the former

prisoner was married at the time of release and 0 otherwise.

paro: A dummy variable coded 1 if the former prisoner was

released on parole and 0 otherwise.

prio: The number of prior incarcerations.



con't:

educ: Level of education, coded as follows:

2: 6th grade or less

3: 7th to 9th grade

4: 10th to 11th grade

5: high-school graduate

6: some postsecondary or more

emp1�emp52: 52 dummy variables, each coded 1 if the

former prisoner was employed during the corresponding

week after release and 0 otherwise

Next Figure shows the Kaplan-Meier estimate of time to �rst

arrest for the full recidivism data set.



0 10 20 30 40 50

0.70

0.75

0.80

0.85

0.90

0.95

1.00

t

estim

ated

S(t

)


Survival analysis: Comparing Survival Functions

There are several tests to compare survival functions

between two or among several groups.

Most tests can be computed from contingency tables for

those at risk at each event time.

A variety of test statistics can be computed using the

expected and observed counts; probably the simplest and

most common is the Mantel-Haenszel or log-rank test

The null hypothesis that the survival functions for the two

groups are the same.



Consider, next Figure which shows Kaplan-Meier estimates

separately for released prisoners who received and did not

receive �nancial aid.

? At all times in the study, the estimated probability of not

(yet) reo�ending is greater in the �nancial aid group than

in the no-aid group.

? The log-rank test statistic is 3.84, which is associated

with a p-value of almost exactly .05, providing marginally

signi�cant evidence for a di�erence between the two groups.



� �no aid�; � �aid�

0 10 20 30 40 50

0.65

0.70

0.75

0.80

0.85

0.90

0.95

1.00

t

estim

ated

S(t

)



Cox regression model to Rossi et al's (1980) data set

The Cox regression reported below uses the following

time-constant covariates:

fin: A dummy variable coded 1 if the former prisoner

received �nancial aid after release from prison and 0

otherwise.

age: The former prisoner's age in years at the time of

release.

race: A dummy variable coded 1 for blacks and 0 for

others.

wexp: Work experience, a dummy variable coded 1 if the

former prisoner had full-time work experience prior to

going to prison and 0 otherwise.



Cox regression model to Rossi et al's (1980) data set.

Con't

mar: Marital status, a dummy variable coded 1 if the former

prisoner was married at the time of release and 0 otherwise.

paro: A dummy variable coded 1 if the former prisoner was

released on parole and 0 otherwise.

prio: The number of prior incarcerations.



The results of the Cox regression for time to �rst arrest are as

follows:

Covariate bi ebi SE(bi) zi pifin -0.379 0.684 0.191 -1.983 0.047

age -0.057 0.944 0.022 -2.611 0.009

race 0.314 1.369 0.308 1.019 0.310

wexp -0.150 0.861 0.212 -0.706 0.480

mar -0.434 0.648 0.382 -1.136 0.260

paro -0.085 0.919 0.196 -0.434 0.660

prio 0.091 1.096 0.029 3.195 0.001



where:

bi is the maximum partial�likelihood estimate of βi in the

Cox model.

ebi , the exponentiated coe�cient, gives the e�ect of xi in

the multiplicative form of the model.

SE(bi) is the standard error of bi, that is the square-root of

the corresponding diagonal entry of the estimated

asymptotic coe�cient covariance matrix.

zi = bi/SE(bi) is the Wald statistic for testing the null

hypothesis H0 : βi = 0; under this null hypothesis, zifollows an asymptotic standard-normal distribution.

pi is the two-sided p-value for the null hypothesis

H0 : βi = 0. � Thus, the coe�cients for age and prio are

highly statistically signi�cant, while that for fin is

marginally so.



The estimated coe�cients bi of the Cox model give the linear,

additive e�ects of the covariates on the log-hazard scale.

Although the signs of the coe�cients are interpretable

(e.g., other covariates held constant, getting �nancial aid

decreases the hazard of rearrest, while an additional prior

incarceration increases the hazard), the magnitudes of the

coe�cients are not so easily interpreted.



It is more straightforward to interpret the exponentiated

coe�cients, which appear in the multiplicative form of the

model,

λ̂i(t) = λ̂0(t)× eb1xi1 × · · · × ebkxik

Thus, increasing xi by 1, holding the other x's constant,

multiplies the estimated hazard by ebi .

For example, for the dummy�regressor �n,

eb1 = e−0.379 = 0.684, and so we estimate that providing

�nancial aid reduces the hazard of rearrest � other

covariates held constant � by a factor of 0.684 � that is, by

100(1 - 0.684) = 31.6 percent.

Similarly, an additional prior conviction increases the

estimated hazard of rearrest by a factor of

eb7 = e0.91 = 1.096 or 100(1.096 - 1) = 9.6 percent.


Research CEAMOS

Recall: distributions in parametric modeling of time to

event data: exponential, weibull, log�normal, half normal,

etc.

We introduce a new class of distributions with positive

support called epsilon�positive which are generated on the

basis of the distributions with positive support.

This new class has as special cases the exponential,

Weibull, log�normal, etc. distributions, and is an

alternative to analyze time to event data.

We study in detail a particular member of this new class of

distributions: the epsilon�exponential distribution,

denoted as, EE(σ, ε)


Research CEAMOS

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

x

f(x)

ε = 0.8

ε = 0.5

ε = 0.3

ε = 0

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

x

F(x

)

ε = 0.8

ε = 0.5

ε = 0.3

ε = 0

The pdf f(t) and cdf F(t) of EE(σ = 1, ε). The exponential

distribution corresponds when ε = 0.


Research CEAMOS

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

x

S(x

)

ε = 0.8

ε = 0.5

ε = 0.3

ε = 0

0 2 4 6 8 10

0.0

0.5

1.0

1.5

x

λ(x)

ε = 0.8

ε = 0.5

ε = 0.3

ε = 0

The survival S(t) and hazard λ(t) functions of EE(σ = 1, ε).

The exponential distribution corresponds when ε = 0.


Research CEAMOS

Application with data from Gendarmerie of Chile.

Dataset of prisoners sentenced originally for robbery

2007/1/1 2007/12/31 2012/04/30

released follow-up period

n = 6 577 released inmates


Research CEAMOS

Recidivism was de�ned as return to prison.

The recidivism rate was approximately 50%.


Research CEAMOS

0 500 1000 1500 2000

0.0

0.2

0.4

0.6

0.8

1.0

Time (in days)

S(t

)


Research CEAMOS

Selection model:

Log�likelihood value:

Exponential model: -28786.9

Epsilon�exponential model: -28590.79


Split population models

Recall:

S(0) = 1

S(∞) = 0. All inmates recidivism....

Split population models account for a speci�c type of

heterogeneity, i.e., the possibility that some cases will never

experience the event of interest while some will. One of the

assumptions of standard duration models is that every

observation in the data will eventually experience the event of

interest, which is sometimes an unreasonable assumption in

violation of a particular theory or understanding of the process

under examination.



We can express a split population model as follows:

Let F be a variable indicating whether a subject would or

would not eventually fail. De�ne Y as:

F =

{1 for subjects who would eventually fail

0 for subjects who would never fail.

We assume

P(F = 1) = p, P(F = 0) = 1− p.

The parameter p is the eventual recidivism rate.



The survival function for the split population model is

given by

S(t) = (1− p) + pSu(t)

where Su(·) is the survivor function for the subjects who

would eventually fail.



The likelihood function for the split population model is

given by

L =

n∏i=1

[pfu(ti)]δi [(1− p) + pSu(ti)]

1−δi

When we have a set of covariate vector X, then the

dependence of the recidivism rate p on X can be modeled

using a logistic function p = 11+exp(X′β) .



Software:

SAS

STATA


Thank you

rolando@{med;mat}.puc.cl

Part of this short course is based on the lecture notes of John Fox.