Lecture 9: Hypothesis Testing One sample tests >2 sample.
-
Upload
randell-cunningham -
Category
Documents
-
view
241 -
download
2
Transcript of Lecture 9: Hypothesis Testing One sample tests >2 sample.
Lecture 9: Hypothesis Testing
One sample tests>2 sample
Hypothesis Testing for One-Sample
• Standard set-up
• What is q ? • Common approach– Assume distribution is exponential– Test that distribution is exponential with q = q0
0 0
0
:
:A
H
H
Pretty Stringent
• Actually
• As long as the hazard is specified for the range of t, tests can be performed
0 0
0
:
:A
H t t
H t t
General Form of Test
0
01 0
0
~ 0,1
i
i
D di Yi
h s
Y s
Z O E W t W s h s ds
V Z W s ds
ZN
V Z
Log-Rank
• W(ti) = Y(ti)
01 1
01
01 1
01
~ 0,1
D n
i ji j
n
jj
D n
i ji j
n
jj
Z d H T
V Z H T
d H TN
H T
Accounting for Left-Truncation
• Choice of weights is still W(t) = Y(t)
1
0 01
0 01 1
0 01
~ 0,1
D
ii
n
j jj
D n
i j ji j
n
j jj
O d
E V Z H T H L
d H T H LN
H T H L
Other Options
• Harrington and Fleming– Allows user to have flexibility in weighting– Can choose early or late departures to be more influential– Special case: Gehan-Wilcoxon– Harrington DP and Fleming TR (1982). A class of rank test
procedures for censored survival data. Biometrika 69, 553-566.• Gatsonis• Interesting aside
– Log-rank first introduced for one-sample testing by Breslow (1975)
– Extended to left-truncation by Hyde (1977) and Woolson (1981).
Notes
• An estimator of the variance, V, can be the empirical estimate rather than the hypothesized value
• When the alternative, h(t) > h0(t) is true, this variance estimator is expected to be larger and the test less powerful
• If h(t) < h0(t) then this variance will be smaller and the test more powerful
Example: Rheumatoid Arthritis
• 10 white males with RA followed for up to 18 years
• Objective: – Determine if men with RA are at greater risk of
mortality
Entry Time Exit Time di
43 51 0
44 54 0
45 51 0
45 60 0
48 61 0
49 55 0
50 59 1
51 69 1
53 68 0
54 70 0
Bone Marrow Transplant for Leukemia
• Patient undergoing bone marrow transplant (BMT) for acute leukemia
• Three types of leukemia– ALL– AML low risk– AML high risk
• What if we are interested in overall incidence rate (i.e. either relapse or death)
BMT Example
• Want to test whether or not survival in BMT patients follows an exponential distribution– What does this mean we are asking?
• Can estimate l from the data (recall the MLE for an exponential distribution)
R Code### BMT exampledata<-read.csv("H:\\BMTRY_722_Summer2013\\BMT_1_3.csv")
failtime<-ifelse(data$Relapse==0 & data$Death==0| data$Relapse==1, data$TTR, NA)failtime<-ifelse(data$Death==1 & data$TTR>=data$TTD, data$TTD, failtime)event<-ifelse(data$Relapse==1| data$Death==1, 1, 0)
st<-Surv(failtime, event)fit<-survfit(st~1)plot(fit, xlab="Time", ylab="S(t)", lwd=2)
#Calculating lambda hatlambda.hat<-sum(event)/sum(failtime)
“survdiff” FunctionDescription
Tests if there is a difference between two or more curves using the G-rho family of tests, or for a single curve against a known alternative
Usagesurvdiff(formula, data, subset, na.action, rho=0)
Argumentsformula: a formula expression as for other survival models, of the form Surv(time, status)~predictors. For a one-sample test, the predictors must consist of a single offset(sp) term, where sp is a vector giving the survival probability for each subject
“survdiff” Function
MethodThis function implements the G-rho family of Harrington and Fleming (1982), with weights on each death of S(t)^rho, where S is the Kapalan-Meier estimate of survival. With rho=0 this is the log-rank or Mantel-Haenszel test, and with rho=1 it is the equivalent to the Peto & Peto modification of the Gehan-Wilcoxon test.
If the right hand side of the formula consists only of an offset term, then a one sample test is done. To cause the missing values in the predictors to be treated as a separate group, rather than being omitted, use a factor function with its exclude argument.
R code#Estimating lambda >lambda.hat<-sum(event)/sum(failtime)# Expected S(t) = exp(-lambda.hat*t)> S.exp<-exp(-lambda.hat*failtime)
> one.sample.test<-survdiff(st~offset(S.exp))> one.sample.test1Observed Expected Z p 83 83 0 1> one.sample.test2<-survdiff(st~offset(S.exp), rho=1)> one.sample.test2Observed Expected Z p 83 83 0 0.00521#Comparing hypothesized dist’n to empirical dist’n> plot(fit, conf.int=F, lwd=2)> lines(sort(failtime), rev(sort(S.exp)), col=2, lwd=2, type="s")
R code#Estimating lambda for failure times <800> fail2<-failtime[which(failtime<800)]> event2<-event[which(failtime<800)]> lambda.hat2<-sum(event2)/sum(fail2)# Expected S(t) = exp(-.004*t)> S.exp2<-exp(- lambda.hat2 *fail2)> st2<-Surv(fail2, event2); fit2<-survfit(st2~1)
> one.sample.testa<-survdiff(st2~offset(S.exp2))> one.sample.testaObserved Expected Z p 80 80 0 1> one.sample.testb<-survdiff(st2~offset(S.exp2), rho=1)> one.sample.testbObserved Expected Z p 80 80 0.000 0.477
R code#Estimating lambda for failure times >800> fail3<-failtime[which(failtime>=800)]> event3<-event[which(failtime>=800)]> lambda.hat3<-sum(event3)/sum(fail3)# Expected S(t) = exp(-.004*t)> S.exp3<-exp(- lambda.hat3*fail3)> st3<-Surv(fail3, event3); fit3<-survfit(st3~1)
> one.sample.testc<-survdiff(st3~offset(S.exp3))> one.sample.testc Observed Expected Z p 3 3 -2.56e-16 1 > one.sample.testd<-survdiff(sts~offset(S.exp3), rho=1)> one.sample.testdObserved Expected Z p 3 3 -0.035 0.9730
Conclusions
• So what can we conclude about our original hypothesis?
Relevance
• Becoming more common• Phase II cancer studies with TTE outcomes
instead of response• But– Often more interested in median or 1 year survival
• Yet– Very important for sample size considerations– Most often assume study data will have
exponential distribution for sample size
On to something more interesting… comparing >2 samples
Comparing two or more samples
• Anova type approach
– Where t is the largest time for which all groups have at least one subject at risk
• Data can be right-censored (and left truncated) for the tests we will discuss
0 1 2: , for all
: at least on of is different for some
K
A j
H h t h t h t t
H h t t
Notation
• Let t1 < t2 < … < tD be distinct death times in all samples being compared
• At time ti , let dij be the number of events in group j out of Yij individuals at risk (j = 1,2,…,K)
• Define1
1
K
i ijj
K
i ijj
d d
Y Y
Rationale
• Weighted comparisons of the estimated hazard of the jth population under the null hypothesis and alternative hypothesis
• Based on Nelson-Aalen estimator• If the null is true, the pooled estimate of h(t)
should be an estimator for hj(t)
Applying the Test
• Let Wj(t) be a positive weight function s.t. Wj(t) = 0 if Yij = 0
• If all Zj(t)’s are close to zero, then little evidence to reject the null
1
1,2,...,
ij i
ij i
D d dj j Y Yi
Z W t
for j K
Common Form for Weight Functions
• All commonly used tests choose weight functions s.t.
• Note that weight is common across all j• Can redefine Z:
j ij iW t Y W t
1i
i
D dj i ij ij Yi
Z W t d Y
Test Statistic
• Variance and covariance of Zj(t) (K&M p. 207)
• Z1(t) , Z2(t) , ..., ZK(t) are linearly dependent because their sum is 0
• For test statistic, choose K – 1 components• Chi-square test with K – 1 d.f. where S-1 is the
variance-covariance matrix
'2 11 1 2 1 1 2 1, ,..., , ,...,K K KZ Z Z Z Z Z
Log-Rank Test for 2 Groups
• For log-rank W(ti)=1• Have 2 groups and want to test if survival is the same
in the groups
• We want to develop a nonparametric test of
0
1
Group0 : ~
Group1: ~
i
i
T F
T F
0 0 1 0 0 1 0 0 1
0 1 0 1 0 1
: : :
: : :A A A
H F F H S S H h h
H F F H S S H h h
Log-Rank Test for 2 Groups• If and follow some parametric distribution
and are in the same family, this is easy • For example assume
• But need a test whose validity doesn’t depend on parametric assumptions
0F 1F
jt
jS t e
Constructing the Log-Rank Test
• Recall our notation– t1 < t2 < … < tD are D distinct ordered event times
– Yij = # people in the group j at risk at ti
– Yi = # people at risk across groups at ti
– dij = # of people in group j that fail at ti
– di = # of people in across groups that fail at ti
Constructing the Log-Rank Test
• We can summarize the information at time ti in a 2x2 table
Fail Don’t Fail
Group 0
Group 1
Constructing the Log-Rank Test
Constructing the Log-Rank Test
Constructing the Log-Rank Test
Toy Example
• Say we have the following data on two groups:
• We want to test the hypothesis
Group0 : 3,6 ,9,9,11 ,16
Group1: 8,9,10 ,12 ,19,23
c
c
0 0 1
0 1
:
:A
H h t h t
H h t h t
Toy Example
Toy Example
Same Test in R> time<-c(3,6,9,9,11,16,8,9,10,12,19,23)> cens<-c(1,0,1,1,0,1,1,1,0,0,1,0)> grp<-c(1,1,1,1,1,1,2,2,2,2,2,2)> grp<-as.factor(grp)> > sdat<-Surv(time, cens)> survdiff(sdat~grp)Call:survdiff(formula = sdat ~ grp)
N Observed Expected (O-E)^2/E (O-E)^2/Vgrp=1 6 4 2.57 0.800 1.62grp=2 6 3 4.43 0.463 1.62
Chisq= 1.6 on 1 degrees of freedom, p= 0.203
Same Test in R> names(toy)[1] "n" "obs" "exp" "var" "chisq" "call"
> toy$obs[1] 4 3
> toy$exp[1] 2.566667 4.433333
> toy$var [,1] [,2][1,] 1.267778 -1.267778[2,] -1.267778 1.267778
> toy$chisq[1] 1.620508
UMP Tests
More general: 2 samples
• We can change the weight function• For K = 2, can use Z-score or c
2
1 1
1 11
2
11
~ 0,11
i
i
i i i i
i i i
D di i i Yi
D Y Y Y di iY Y Yi
W t d YZ N
W t d
Corrects for ties
Choice for Weight Functions
• W(t) = 1– Log-rank test– Optimal power for detecting differences when hazards
are proportional
• Wi(t) = Yi
– Gehan test– Generalization of 2-sample Mann-Whitney-Wilcoxon
test
1i
i
D dj ij ij Yi
Z d Y
1i
ij
D Yj ij iYi
Z d d
Choices for Weight Functions
• Fleming-Harrington– General case
– Special cases• Log-rank: q = 0• Mann-Whitney-Wilcoxon: p = 1, q = 0• q = 0, p > 0: gives greater weight to early departures• p = 0, q > 0: gives greater weight to late departures
– Allows specific choice of influence (for better or worse!)
, 1 1ˆ ˆ1
qp
p q i i iW t S t S t
Others?
• Many• Not all available in all software (e.g. Gehan not
in R)• Worth trying a few in each situation to
compare inferences
Caveat
• Note we are interested in the average difference (consider log-rank specifically)
• What if hazards cross?• Could have significant difference prior to some
t, and another significant difference after t: but what if direction differs?
Next time
• More on different weight functions• Tests for trends