1Statistics 741, Chappell - Spring 2012 Rick Chappell, Ph.D. Professor, Department of Biostatistics...

1 Statistics 741, Chappell - Spring 2012

Rick Chappell, Ph.D.Professor,Department of Biostatistics and Medical InformaticsDepartment of StatisticsUniversity of Wisconsin Madison

University of Wisconsin – MadisonDepartment of Biostatistics and Medical Informatics

Defining Biochemical Recurrence inProstate Cancer


I. Obviously Dependent Censoring“Retiring to Arizona” or “Going back to the farm”

II. Dependent Administrative CensoringInduced by time trends

III. The ASTRO Definition of Biochemical Recurrence in Prostate Cancer

IV. Why Backdating is a Problem

And many comments on it

IV. Conclusions

Outline


I. Obviously Dependent Censoring

Consider a simple situation without censoring:

x

x

x

xTime


The K-M curve is the empirical CDF:

x

x

x

x

1

0

Time


If the healthiest in terms of remaining life are selectively censored (negatively dependent censoring) then the K-M curve is biased downward:

x

x

O

O

1

0

x

xTime


If the sickest are selectively censored (positively dependent censoring) then the K-M curve is biased upward:

x

x

x

x

1

0

o

oTime


II. Dependent Administrative Censoring

Even when the sole source of censoring is administrative (event hasn’t yet occurred at the time of analysis), it can be dependent with failure time.

Pointed out by Kaplan & Meier (1958), credited to Sartwell and Merrell (1952), Am. J. Pub. Health 42, “Influence of the dynamic character of chronic disease on the interpretation of morbidity rates”.

“For example, in a study of survival after an operation, a change in surgical technique five years before the data are analyzed will affect the survival times only of those with observation limit less than five years [p. 470].”

Consider an extreme example:

1980 200 accrued 50% failure in 1981

1983 2000 accrued 50% failure in 1986

The rest are cured. An analysis is performed in 1995.

0 years 5

0 years 5

0 years 5

K-M estimate for 1980 cohortof 200 analyzed in 1985

K-M estimate for 1983 cohortof 2000 analyzed in 1985

K-M estimate for combined sampleof 2200 analyzed in 1985

1

.5

0

1

.5

0

1

.5

0

+

=


Thus, even though the long-term failure rate in both cohorts is 50%, the K-M curve remains near 100%.

This is not a sample-size issue: the confidence intervals for the previous example are narrow (and can be made arbitrarily narrower by choosing higher sample sizes).

Note that censoring is solely administrative.


III. The ASTRO Definition of Biochemical Failure (BF) in Prostate Cancer

The American Society for Therapeutic Radiology and Oncology consensus statement on guidelines for PSA following radiation therapy (1997):

“Three consecutive rises in prostate-specific antigen (PSA) after reaching the PSA nadir constitute BF. The date of failure is the midpoint between the nadir and the first of the three consecutive rises in PSA.”


A hypothetical PSA curveafter radiation treatment

PS

A

leve

l

PSA assay times

treatment observednadirnadir

backdated BF “at call” BF

Time


IV. Why Backdating is a Problem

Problems with definition quickly noticed by Vicini et al., attributed to inadequate followup.

They examined a series of prostate cancer patients treated with radiation and followed for up to 12 years.

They artificially censored patients at a range of followups, recalculated backdated BF times, and plotted K-M curves.

Vicini, F.A., Kestin, L.L., and Martinez, A.A. The importance of adequatefollow-up in defining treatment success after external beam irradiation forprostate cancer. IJROBP 1999; 45:553-561.


Their conclusion: need more followup – at least 5, preferably 10 years.

Vicini and others recommended that most or all patients be followed “at least beyond the time point at which actuarial results are examined”.

This is problematic considering the lengthy progress of the disease, frail patient population.

Also, even in Vicini’s results, the actuarial curves start to be biased even before attempted followup (see plot).


Subsequent comments:

One proposed solution was to also back-date the censoring times. But if there were no rises in PSA, to when

would the censoring be backdated? How would this curve be comparable to

others, which are usually “at call”?


Subsequent comments (cont.):

The backdated definition is said to have high sensitivity. But backdating can move the BF date from

after the CF to before if CF occurs between the nadir and the third rise.

Thus the “sensitivity” is to CF events which occur before the BF is determined (see slide 12).

Even so, Thames (2003) found on-call definitions with superior sensitivity.



The fundamental statistical problem: When, for a failure to be observed at time t,

followup to about t + 2 years is required, we know that followup and failure are dependent; usually, given failure at t, we know nothing of further followup.

As with the previous two examples, dependent censoring biases the K-M curve.


Other remarks:

Two purposes for predicting CF:1) To determine in a clinical trial whether a patient

relapsed; and

2) To plan therapy for a patient.• For the first, the entire patient history is

relevant. E.g., suppose a patient had 3 rises in PSA then a decrease, followed by 10 years all clear? He shouldn’t be said to relapse.

• For the second, only current information can be used. A patient with 3 rises might well be given salvage therapy.


Other remarks (cont.):

For each purpose, timing is important (but presently ignored). E.g.,

Is it useful to detect a CF in two months?

Not very, because CF would have been detected anyway and treatment not delayed much.

Is it useful to detect a CF in ten years?

Perhaps not, because that CF might not be important to the patient. Its prevention may not be worth additional treatment.

When is it useful to predict a CF?



Taylor (see Wang and Taylor, 2001 for an application to AIDS) has jointly modeled PSA and CF in order to obtain the best prediction possible at any given time in a patient’s history.

For treatment purposes, a complex model yielding a probability of CF by (say) 3 years seems best, but results in a “black box” definition of biochemical failure.


V. Conclusions

Prognostic methods for prostate cancer can be improved upon and may depend upon their purpose.

Dependent censoring produces bias, avoidable only by complete

Even a large sample size can’t correct the problem.


V. Conclusions

Prognostic methods for prostate cancer can be improved upon and may depend upon their purpose.

Dependent censoring produces bias, avoidable only by complete

Even a large sample size can’t correct the problem.

If this is so obvious, why have thousands of patients been enrolled on trials with backdated endpoints?

1Statistics 741, Chappell - Spring 2012 Rick Chappell, Ph.D. Professor, Department of Biostatistics...

Documents

Transcript of 1Statistics 741, Chappell - Spring 2012 Rick Chappell, Ph.D. Professor, Department of Biostatistics...