1 Research Method Lecture 15-2 Censored regression ©

18
1 Research Method Research Method Lecture 15-2 Lecture 15-2 Censored regression Censored regression ©

Transcript of 1 Research Method Lecture 15-2 Censored regression ©

Page 1: 1 Research Method Lecture 15-2 Censored regression ©

1

Research MethodResearch Method

Lecture 15-2 Lecture 15-2

Censored regressionCensored regression

©

Page 2: 1 Research Method Lecture 15-2 Censored regression ©

Censored regressionCensored regression Sometimes, the dependent variable (y-

variable) is right-censored (censored from above) or left-censored (censored from below).

Example 1: In some household survey, when the wealth of a family exceeds $500,000, the data is recorded as $500,000 even if the actual wealth may be much higher than that amount. This is called the top coding. Top coding is done to protect the identity of the survey participants. In this case, wealth is right-censored.

2

Page 3: 1 Research Method Lecture 15-2 Censored regression ©

Top-coding exampleTop-coding example

3

Wealth

Educ of the head of the family

$500,000

When wealth exceeds the threshold, say $500,000, the data record it as $500,000

Page 4: 1 Research Method Lecture 15-2 Censored regression ©

Example 2: Duration data. Suppose that a survey is conducted to measure the duration of unemployed workers to find a job. If the survey is conducted for 12 months, some survey participants may not have found a job. For those workers, the researcher only knows that the duration is greater than 12 months. Thus, the duration is right-censored.

4

Page 5: 1 Research Method Lecture 15-2 Censored regression ©

When a variable is censored from above, you only know that the variable is at least equal to the threshold value.

5

Page 6: 1 Research Method Lecture 15-2 Censored regression ©

The censored regression The censored regression modelmodel

Here, I will explain the censored regression model for the case where the dependent variable is right censored.

Let yi be the actual value of the dependent variable for the ith person.

However, when yi exceeds certain threshold, ci, the data is recorded as ci. In such a case, the observation is said to be right-censored.

6

Page 7: 1 Research Method Lecture 15-2 Censored regression ©

Let wi be the observed value of the dependent variable, which may be censored. Then the model is written as:

yi=β0+β1xi+ui u~N(0,σ2)

wi=yi if yi<ci

=ci if yi≥ci

In the top-coding example, the threshold value is the same for all the people at $500,000. But, the threshold value can be different for different individual. Thus, we have i-subscript for the threshold values.

7

This can be also written as wi=min(yi,ci)

Page 8: 1 Research Method Lecture 15-2 Censored regression ©

It should be emphasized that, in the censored regression model, ci are known values. For example, in the top coding example, it is $500,000 for all the observations.

8

Page 9: 1 Research Method Lecture 15-2 Censored regression ©

When the person is not right censored, we have wi=yi. Thus, we have ui=wi-(β0+β1xi). Thus, the likelihood contribution is the height of the density function, which is given by:

9

)(1

2

11

2

1 10

)(

)(2

1

2

)(

2

10

2102

210

ii

xw

xwxw

i

xweeL

ii

iiii

Page 10: 1 Research Method Lecture 15-2 Censored regression ©

If the person is right-censored, we only know that yi≥ci. Thus, the likelihood contribution of this person is given by:

10

)(1

)(

)(

)()(

10

10

10

10

ii

iii

iii

iiiiii

xc

xcuP

xcuP

cuxPcyPL

Page 11: 1 Research Method Lecture 15-2 Censored regression ©

To summarize:

11

censoredright isn observatio theif )(1

and

cencorednot isn observatio theif )(1

10

10

iii

iii

xcL

xwL

Page 12: 1 Research Method Lecture 15-2 Censored regression ©

Now, let Di be the dummy variable that takes the value 1 if the ith person is right-censored. Then the likelihood contribution for the ith person is conveniently written as:

12

ii D

ii

D

iii

xcxwL

)(1)(1 10

1

10

Page 13: 1 Research Method Lecture 15-2 Censored regression ©

Note, that computation-wise, Tobit model is the same as the censored regression model where the actual dependent variable is left-censored, and ci=0 for all observations.

13

Page 14: 1 Research Method Lecture 15-2 Censored regression ©

The partial effectThe partial effect In censored regression model, our interest

is to estimate the effect of x-variable on y, not w. Since β1 is the partial effect of x on y, you can use β1 as the partial effect. No difficult computation is necessary. You can interpret the coefficients as if it were OLS.

This is very different from the Tobit regression model. In Tobit model, our interest is to estimate the effect of x on w, not y. Thus, we had a very complicated partial effect formula in the case of Tobit.

14

Page 15: 1 Research Method Lecture 15-2 Censored regression ©

Exercise:Exercise:Duration analysis of Duration analysis of

recidivismrecidivism Recidivism is an act of a person

repeating an undesirable behavior. The data RECID.dta contains the

duration (in month) until an inmate who are released from the prison is imprisoned again.

1445 released inmates were followed for a certain period of time.

15

Page 16: 1 Research Method Lecture 15-2 Censored regression ©

Among them, 893 of them had not been arrested again. Thus, the duration for those inmates are right-censored: We only know that the duration until they would come back to prison is at least as long as the recorded duration.

Now, we want to estimate the determinants of the duration of prisoner recidivism.

Although modern duration analysis is mostly conducted using “hazard function analysis” or “survival function analysis”, censored regression is also a valid model for a duration analysis.

16

Page 17: 1 Research Method Lecture 15-2 Censored regression ©

Using RECID.dta, estimate the censored regression model of the duration of recidivism. Explanatory variables are workprog, priors, tserved, felon, alcohol drugs, black, married, educ age. Use the log of duration as the dependent variable.

17

Page 18: 1 Research Method Lecture 15-2 Censored regression ©

18 893 right-censored observations 552 uncensored observations Observation summary: 0 left-censored observations /sigma 1.81047 .0623022 1.688257 1.932683 _cons 4.099386 .347535 11.80 0.000 3.417655 4.781117 age .0039103 .0006062 6.45 0.000 .0027211 .0050994 educ .0229196 .0253974 0.90 0.367 -.0269004 .0727395 married .3406837 .1398431 2.44 0.015 .066365 .6150024 black -.5427179 .1174428 -4.62 0.000 -.7730958 -.31234 drugs -.2981602 .1327355 -2.25 0.025 -.5585367 -.0377837 alcohol -.6349092 .1442166 -4.40 0.000 -.9178072 -.3520113 felon .4439947 .1450865 3.06 0.002 .1593903 .7285991 tserved -.0193305 .0029779 -6.49 0.000 -.0251721 -.013489 priors -.1372529 .0214587 -6.40 0.000 -.1793466 -.0951592 workprg -.0625715 .1200369 -0.52 0.602 -.2980382 .1728951 ldurat Coef. Std. Err. t P>|t| [95% Conf. Interval]

Log likelihood = -1597.059 Pseudo R2 = 0.0496 Prob > chi2 = 0.0000 LR chi2(10) = 166.74Censored-normal regression Number of obs = 1445

. cnreg ldurat workprg priors tserved felon alcohol drugs black married educ age, censored(cens)

Put the censoring indicator. This indicator must be 1 if right censored, -1 if left censored, and 0 if uncensored.