Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis
Transcript of Simulation Study for Extended AUC In Disease Risk Prediction in survival analysis
–11
Gang CuiSr. Biostatistician, CSCC, Department of Biostatistics, UNC at Chapel Hill
–11
–22
Background
Risk Prediction Application in Public HealthFramingham Heart Study: Risk Assessment Tool for Estimating 10-year
Risk of Developing CHD ARIC (Atherosclerosis Risk in Community Study): CHD, Stroke, and D
iabetes Risk Calculator at CSCC
Useful in intervention trials, estimating the population burden of disease, and designing prevention strategies.
Need to be evaluated to incorporate new data (clinical, environmental, and genetic).
–22
–33
IntroductionCox Proportional Hazard Model to account for both censored
and uncensored dataAssumptions: The survival time of each member of a population is assumed to follow its
own hazard function is a baseline hazard function, Xi is the vector of explanatory variables for the ith individual,
is the vector of unknown regression coefficient. The vector is assumed to be the same for all individuals.
The survivor function can be expressed as where, is the baseline survival function.
SAS procedure: PROC PHREG
–33
)'exp()()|()( 0 βXX ittt iii
)(0 t
β β
)exp(0 )]([)|( βX'iX tStS ii
))(exp()(0 00 t
duutS
–44
Introduction(Cont)However, statistically significant association is NOT enough.Need measure to assess and quantify the improvement in risk
prediction by new models?Interest of measures in evaluating prediction model:
AUC (Area Under ROC Curve), ROC stands for Receiver Operating Characteristic
Extended AUCSensitivity/Specificity given Corr( , T)Others
–44
βX'i
βX'
–55
AUC(T) (Area Under ROC at time T) – measure of accuracy: probability that a person with disease onset has higher score than a person without such onset, P( Zi>Zj|Di(T)=1 & Dj(T)=0), where, D(T) is the indicator of disease or not by the time of T and z= (Note: usually defined without regard to T. The present estimator is calculated for any follow-up time T, and account for censoring in the estimation)
–55
βX'
Introduction(Cont)
AUC(T) 1 0.9-1 0.8-0.9 0.7-0.8 0.6-0.7 0.5-0.6
Prediction accuracy
perfect excellent good fair poor fail
–66
Our Parameters of Interest:Extended AUC: probability of i-th person score z being greater
than j-th person given the i-th person has the event onset before certain time T0 (time of interest, say 10 year) and earlier than the jth person,
Correlation Coefficient between score and event time given the event time less than the time of interest (T0)
)TTTTZP(Z 0ijiji |
–66
)|()|()|,()|,(
00
00 TTTVARTTZVAR
TTTZCOVTTTZCORR i
Introduction(Cont)
–77
Question
I. How to estimate Extended AUC and CORR(Z,T|T< T0)?Two estimators for Extended AUC – counting method and
survival analysis method.One estimator for CORR(Z,T|T< T0)
II. How good are the estimators?Answered by comparing the estimates with true value from
simulated data.
–77
–88
MethodsSimulate data based on three model assumptions of
independent variable(s) and event time distributions
For each simulated data, we calculate the estimates of Extended AUC and Corr(Z,T|T0) and the true values.
Independent Variable(s) Event time distribution
Assumption 1 X ~N(0,1) Exponential
Assumption 2 X ~N(0,1) Weibull
Assumption 3 X1 normal conditional on X2, with mean varying by X2X2 binomial
Exponential
–88
–99
Methods- Extended AUC Estimating Extended AUC (θ): counting and survival analysis
methodsI. Counting Method
Denominator: the number of pairs for which one had event at time (Ti) before T0, and the other with event time Tj greater than Ti.
Numerator: the number of pairs among the denominator for which the order of Zi and Zj is the reverse of the order of Ti and Tj
)TTT|Z(ZP̂ˆ0ijiji T
–99
–1010
II. Survival Analysis Method
Denominator: it can be derived that , where
Numerator: it can be derived that
S(T|Z), cummulative survival density function, conditional on Z g(T|Z), conditional density function of event time. h(Z), density function of
)TP(T)TT|TP(T)TTTTZP(Z
)TTTP(T)TTTTZP(Z
)TTTT|ZP(Z0i0iji
0ijiji
0iji
0ijiji0ijiji
)](1[21)()|( 0
200 TSTTPTTTTP iiji
iii dZZhZTSTS
)()|()( 0
jiZj i
T
jiiijiijiji dZdZdTZhZhZTgZTSTTTTZZP
0
00 )()()|()|()(
–1010
Methods- Extended AUC
βX'
–1111
II. Survival analysis method(cont)From Riemann-Stieltjes Integral, we can rewrite as
EXTAUC can be estimated as
The h(Z)dZ integral is an expected value of z and estimated by averaging the sample of Z
is the fitted survival function, is the average of I (Zi>Zj) is the indicator function with value 1 if Zi>Zj, 0 otherwise. n is the sample size and k is the number of total event before T0
–1111
Methods- Extended AUC
jiZj
T
jiiijiijiji dZdZZhZhZTdGZTSTTTTZZP
0
00 )()()}|()|({)(
)()]}|(ˆ)|(ˆ[)|(ˆ{1)(ˆ1
2ˆ1
1 122
0
jiididdi
n
i
n
ij
k
djdj ZZIZTSZTSZTS
nTS
)(ˆ0TS
)|(ˆ ZTS
)|(ˆ0ZTS
–1212
Calculating true EXTAUC by numerical integrationAssuming cummulative survival density function S(T|Z),
event time density function g(T|Z), and xbeta density function h(Z) are known as specified in simulation
)](1[21
)()()|()|(
02
0
0
TS
dZdZdTZhZhZTgZTS iji
T
ijiijiZj
1212
Methods- Extended AUC
Numerical Numerical IntegrationIntegration
–1313
Method-Corr(Z,T|T0)Estimating – Survival Analysis Method
By definition
Conditional joint density function can be derived as
–1313
)|()|()|,(
00
0T|TZ, 0 TTTVARTTZVAR
TTTZCOV
)}|()|()}{|()|({
)|()|()|(
02
02
02
02
000
TTTETTTETTZETTZE
TTTETTZETTZTE
)()()|()|,(
00 TG
ZhZTgTTTZf
0T|TZ,
–1414
Estimating – Survival Analysis Method Now can estimate all the pieces needed to calculate
where, , therefore can be estimated as
–1414
dZZZhZTSTG
dZZZhdTZTgTG
dZdTTG
ZhZTgZdZdTTTTZfZTTZE
T
TT
)()]|(1[)(
1)(])|([)(
1)(
)()|(*)|,(*)|(
00
00
00
0 00
0
00
n
jjj ZZTS
TSnTTZE
10
00 )]|(ˆ1[
)](ˆ1[1)|(ˆ
dZdTZhZTgTTPTGT
0
000 )()|()()( )(ˆ1)(ˆ00 TSTG
0T|TZ,0T|TZ,
Method-Corr(Z,T|T0)
)|( 0TTZE
–1515
Estimating – Survival Analysis Method Similarly
For and
similarly
–1515
dZZhdTZTgTTG
dZdTTTTZfTTTTETT
)(])|([)(
1)|,(*)|( 00
00
0 00
n
jjj ZZTS
TSnTTZE
1
20
00
2 )]|(ˆ1[)](ˆ1[
1)|(ˆ
n
j
k
ijijii ZTSZTST
TSnTTTE
1 11
00 )]|(ˆ)|(ˆ[
)](ˆ1[1)|(ˆ
n
j
k
ijijii ZTSZTST
TSnTTTE
1 11
2
00
2 )]|(ˆ)|(ˆ[)](ˆ1[
1)|(ˆ
0T|TZ,
Method-Corr(Z,T|T0)
)|( 0TTTE )|( 02 TTTE
–1616
Estimating – Survival Analysis Methodsimilarly
therefore,
–1616
n
j
k
ijijiji ZTSZTSZT
TSnTTZTE
1 11
00 )]|(ˆ)|(ˆ[
)](ˆ1[1)|(ˆ
)}|(ˆ)|(ˆ)}{|(ˆ)|(ˆ{
)|(ˆ)|(ˆ)|(ˆ
02
02
02
02
000|, 0
TTTETTTETTZETTZE
TTTETTZETTZTEr TTTZ i
0T|TZ,
Method-Corr(Z,T|T0)
–1717
Calculating true by numerical integrationAssuming cummulative survival density function S(T|Z),
event time density function g(T|Z), and xbeta density function h(Z) are known as specified in model assumptions
–1717
0T|TZ,
)}|()|()}{|()|({
)|()|()|(
02
02
02
02
000T|TZ, 0 TTTETTTETTZETTZE
TTTETTZETTZTE
Method-Corr(Z,T|T0)
–1818
Calculating true by numerical integration
–1818
0T|TZ,
dZZhdTZTgTTG
dZdTTTTZfTTTTETT
)(])|([)(
1)|,(*)|( 00
00
0 00
dZZhdTZTgTTG
dZdTTTTZfTTTTETT
)(])|([)(
1)|,(*)|( 00
0
2
00 0
20
2
dZZhZdTZTgTTG
dZdTTTTZfZTTTZTETT
)(])|([)(
1)|,(*)|( 00
00
0 00
Method-Corr(Z,T|T0)
dZZZhZTSTG
TTZE )()]|(1[)(
1)|( 00
0
dZZhZZTSTG
dZdTTTTZfZTTZET
)()]|(1[)(
1)|,(*)|( 20
00 0
20
2 0
–1919
Result
–1919
*Model Assumption
Spl_size n_sim Mean of Estimated EXTAUC
Estimated EXTAUC
STD
True EXTAUC Bias (=Estimate - True)
1 100 800 0.604900 0.057083 0.607174 -0.002274
1 500 800 0.606580 0.025622 0.607174 -0.000594
2 100 1000 0.603647 0.047803 0.606972 -0.003325
2 500 1000 0.608658 0.021580 0.606972 0.001685
3 100 850 0.641936 0.048242 0.621508 0.020428
3 500 850 0.641398 0.022373 0.621508 0.019890
Extended AUC – Survival Analysis Method
*Model assumption: 1- X~N(0,1), T being exponential2 - X~N(0,1), T being Weibull3 - X1 being Normal with mean varying with X2, X2 being Binomial, T being exponential
–2020
ResultExtended AUC – Counting Method
*Model assumption: 1- X~N(0,1), T being exponential2 - X~N(0,1), T being Weibull3 - X1 being Normal with mean varying with X2, X2 being Binomial, T being exponential
–2020
*Model Assumption
Spl_size n_sim Mean of Estimated EXTAUC True EXTAUC Diff (=Estimate - True)
1 100 800 0.613986 0.607174 0.006812
1 500 800 0.608614 0.607174 0.001439
2 100 1000 0.612707 0.606972 0.005735
2 500 1000 0.610470 0.606972 0.003498
3 100 850 0.649875 0.621508 0.028367
3 500 850 0.642890 0.621508 0.021382
–2121
Result
–2121
*Model Assumption
Spl_size n_sim Mean of Estimated EXTAUC (Survival)
Estimated EXTAUC Mean (Counting)
Diff
1 100 800 0.604900 0.613986 -0.009086
1 500 800 0.606580 0.608614 -0.002033
2 100 1000 0.603647 0.612707 -0.009060
2 500 1000 0.608658 0.610470 -0.001812
3 100 850 0.641936 0.649875 -0.007939
3 500 850 0.641398 0.642890 -0.001492
Extended AUC – Comparison of two methods
*Model assumption: 1- X~N(0,1), T being exponential2 - X~N(0,1), T being Weibull3 - X1 being Normal with mean varying with X2, X2 being Binomial, T being exponential
–2222
Result
–2222
*Model Assumption
Spl_ size n_ sim
STD Bias MSE STD Bias MSE
1 100 800 0.057083 -0.002274 0.057088 0.065914 0.006812 0.065960
1 500 800 0.025622 -0.000594 0.025622 0.028968 0.001439 0.028970
2 100 1000 0.047803 -0.003325 0.047814 0.060650 0.005735 0.060683
2 500 1000 0.021580 0.001685 0.021583 0.027041 0.003498 0.027054
3 100 850 0.048242 0.020428 0.048659 0.056477 0.028367 0.057282
3 500 850 0.022373 0.019890 0.022768 0.026539 0.021382 0.026996
Estimated EXTAUCMean (Survival)
Estimated EXTAUC Mean (Counting)
Extended AUC – Comparison of two methods
*Model assumption: 1- X~N(0,1), T being exponential2 - X~N(0,1), T being Weibull3 - X1 being Normal with mean varying with X2, X2 being Binomial, T being exponential
–2323
ResultCorr(Z,T|T0)
*Model assumption: 1- X~N(0,1), T being exponential2 - X~N(0,1), T being Weibull3 - X1 being Normal with mean varying with X2, X2 being Binomial, T being exponential
–2323
*Model Assumption
spl_size n_sim Est. Corr. Mean True Corr. Bias (=Estimate - True)
1 100 800 -0.032053 -0.028371 -0.003682
1 500 800 -0.028618 -0.028371 -0.000246
2 100 1000 -0.038438 -0.035328 -0.003110
2 500 1000 -0.035660 -0.035328 -0.000333
3 100 850 -0.059517 -0.052803 -0.006715
3 500 850 -0.053330 -0.052803 -0.000527
–2424
ConclusionsI. Bias of these estimators are small relative to estimates.II. Large sample size provide better estimate than small
sample size.III. For extended AUC, difference between estimates of
two methods are small relative to the estimate, while counting method is slightly more biased than survival analysis method.
–2424