EP304: Advanced Statistical Methods in Epidemiologydl.lshtm.ac.uk/programme/epp/docs/Examiner...
Transcript of EP304: Advanced Statistical Methods in Epidemiologydl.lshtm.ac.uk/programme/epp/docs/Examiner...
1
EP304: Advanced Statistical Methods in
Epidemiology Examination
Friday 3 June 2011: 10.00 am – 12.15 pm
Candidates are advised to spend the first FIFTEEN minutes of this exam reading the
question paper and planning their answers.
Candidates should answer ALL questions.
Use a SEPARATE answer book for each question and put a page number at the bottom of
each page used in the answer book.
A formulae sheet and statistical tables are provided for use at the end of the paper.
A hand held calculator may be used when answering questions on this paper. The calculator
may be pre-programmed before the examination. The make and type of machine must be
stated clearly on the front cover of the answer book.
2
Question 1
A demographic surveillance site was established in 2008 in a district in southern Tanzania,
with a total population of around 40000. A baseline census was completed in 2008, after
which all individuals have been followed up to the end of 2010. Children born in the area
since 2008 are included in the study population, as are children and adults who migrate into
the study area. For individuals who leave the study area, or have died since the baseline
census, their follow-up time stops on the date of migration or death.
There is only one health clinic in the study area that can diagnose and treat children with
severe pneumonia. During the whole of 2009, identifying information was collected on all
children in the study population who presented at the clinic with an episode of severe
pneumonia. This meant that each child with an episode of severe pneumonia could be linked
to the demographic surveillance data. For the purpose of the analysis here, a child’s follow-
up was censored when they reached 5 years old, and includes only person-time during 2009.
Key information collected is summarised in Table 1.
Severe pneumonia is rare, but children can have repeat episodes and one child had 3 such
episodes during 2009. A total of 7860 children, aged 5 years or under, were followed up for
part or all of 2009.
Table 1
Variable Coding
idno Unique identifying number for each child
sex 1= Male 2=Female
dob Date of birth
residence 0 = peri-urban 1 = rural
entry_date For the first record for each child, the entry_date is 1 January 2009, or in-
migration date if the child was born or moved into the study area after 1
January 2009
If a child has a second record, then the entry_date for the second record is
equal to the exit_date of the first record for the child (i.e. equal to the date
of the first severe pneumonia event)
If a child has a third record, then the entry_date for the third record is
equal to the exit_date of the second record for the child
exit_date For records ending without a severe pneumonia event, the exit date is:
31 December 2009, or date of death or out-migration date or date of 5th
birthday, if any of these events occurred before 31 December 2009
For records ending with a severe pneumonia event, the exit date is the date
of the clinic attendance with severe pneumonia
finexit Last date the child was resident in the study population during 2009
spneumonia Severe pneumonia; 0 = No 1 = Yes
totepi Total episodes of severe pneumonia during 2009
0 = none 1 = one 2 = two 3 = three
recordno 1 for the first record for each child; 2 for the second record for a child ;
3 for the third record for a child ; 4 for the fourth record for a child
Children with no episode of severe pneumonia have only one record.
Children with 1 episode of severe pneumonia have two records
Children with 2 episodes of severe pneumonia have three records
Children with 3 episodes of severe pneumonia have four records
3
a) The researchers first investigated how many children had each of 0, 1, 2 or 3 episodes of
severe pneumonia during 2009, by tabulating the variable totepi. The result is shown in
Table 2 below:
Table 2
. tab totepi if recordno==1
totepi | Freq. Percent Cum.
------------+-----------------------------------
0 | 7,594 96.62 96.62
1 | 250 3.18 99.80
2 | 15 0.19 99.99
3 | 1 0.01 100.00
------------+-----------------------------------
Total | 7,860 100.00
(i) What is the implication of the fact that children can have repeated episodes of
severe pneumonia, rather than at most one episode, for the statistical analysis?
Justify your answer, and explain the consequence for the results of the analysis if
this fact is ignored.
(10 marks)
(ii) Given the distribution of the total number of episodes per child in Table 2, how
important is it for the analysis to properly account for the fact that a child can have
repeated events? Justify your answer. (5 marks)
The following command was used in Stata as the first step to analysing the rate of severe
pneumonia in children aged <5 years old:
stset exit_date, origin(dob) time0(entry_date) fail(spneumonia) exit(finexit) scale(365.25)
id(idno)
id: idno
failure event: spneumonia != 0 & spneumonia < .
obs. time interval: (entry_date, exit_date]
exit on or before: time finexit
t for analysis: (time-origin)/365.25
origin: time dob
---------------------------------------------------------------------------
7860 subjects
283 failures in multiple failure-per-subject data
5422.591 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 5
4
(iii) What is the interpretation of the value of “earliest observed entry t” and “last
observed exit t”? (5 marks)
(iv) Calculate the overall rate of severe pneumonia among children <5 years old, using
the output from the stset command. (5 marks)
(v) From the listing below, describe the follow-up, and experience of severe
pneumonia, of child idno 5462792. (5 marks)
. list idno entry_date exit_date finexit spneumonia if idno==5462792
+--------------------------------------------------------+
| idno entry_date exit_date finexit spneumonia |
|--------------------------------------------------------|
| 5462792 01jan2009 08jan2009 28jul2009 1 |
| 5462792 08jan2009 28jul2009 28jul2009 0 |
+--------------------------------------------------------+
b) Next, the following analysis was done:
. strate residence, per(1000)
failure _d: spneumonia
analysis time _t: (exit_date-origin)/365.25
origin: time dob
exit on or before: time finexit
id: idno
Estimated rates (per 1000) and lower/upper bounds of 95% confidence intervals)
+----------------------------------------------------+
| residence D Y Rate Lower Upper |
|----------------------------------------------------|
| 0 183 2.2741 80.472 69.619 93.019 |
| 1 100 3.1485 31.761 26.108 38.638 |
+----------------------------------------------------+
. stmh residence, c(1,0)
5
RR estimate, and lower and upper 95% confidence limits
----------------------------------------------------------
RR chi2 P>chi2 [95% Conf. Interval]
----------------------------------------------------------
0.395 60.03 0.0000 0.309 0.504
----------------------------------------------------------
(i) Is there statistical evidence of an association between area of residence and the rate
of severe pneumonia? Justify your answer, including reference to the 95%
confidence intervals and p-values for the rates and rate ratios. (10 marks)
The next step in the analysis was as follows:
. stsplit timeband, at(0,0.5,1,2,3,5)
. strate timeband, per(1000)
+--------------------------------------------------------+
| timeband D Y Rate Lower Upper |
|--------------------------------------------------------|
| 0 36 0.5496 65.5004 47.2473 90.8052 |
| .5 62 0.5283 117.3673 91.5049 150.5393 |
| 1 115 1.0671 107.7671 89.7659 129.3783 |
| 2 46 1.0895 42.2204 31.6242 56.3670 |
| 3 24 2.1881 10.9685 7.3518 16.3643 |
+--------------------------------------------------------+
(ii) What is the meaning of the variable “timeband”? To what time period does the
timeband .5 correspond? (5 marks)
(iii) From the output above, is there statistical evidence of an association between the
variable timeband and the rate of severe pneumonia? Justify your answer. (5
marks)
6
c) Next, the following analysis was done:
. stmh residence, by(timeband) c(1,0)
RR estimate, and lower and upper 95% confidence limits
+--------------------------------+
| timeband RR Lower Upper |
|--------------------------------|
| 0 0.45 0.23 0.88 |
| .5 0.36 0.21 0.61 |
| 1 0.37 0.25 0.54 |
| 2 0.38 0.21 0.70 |
| 3 0.33 0.14 0.79 |
+--------------------------------+
Overall estimate controlling for timeband
----------------------------------------------------------
RR chi2 P>chi2 [95% Conf. Interval]
----------------------------------------------------------
0.375 67.43 0.0000 0.294 0.478
----------------------------------------------------------
Approx test for unequal RRs (effect modification): chi2(4) = 0.41
Pr>chi2 = 0.9815
Based on this stratified analysis, is there evidence that the variable timeband confounds or
modifies the association between area of residence and the rate of severe pneumonia in
children aged under 5 years old? Justify your answer. (10 marks)
d) The following Poisson regression models, and corresponding likelihood ratio tests, were
then done:
7
Model A
xi: streg i.residence i.sex i.timeband, dist(exp)
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iresidenc~1 | .375984 .0467786 -7.86 0.000 .2946225 .4798138
_Isex_2 | 1.196 .1432106 1.49 0.135 .9458169 1.51236 _Itimeband_2 |
1.827776 .3830194 2.88 0.004 1.212131 2.756109
_Itimeband_3 | 1.658995 .3168413 2.65 0.008 1.140984 2.412187
_Itimeband_4 | .65924 .1467178 -1.87 0.061 .4261903 1.019726
_Itimeband_5 | .1634126 .0430692 -6.87 0.000 .097486 .2739233
------------------------------------------------------------------------------
. est store A
(i) By comparing the rate ratio for area of residence from Model A, with the rate ratio
for area of residence from the Mantel-Haenszel analysis stratified on timeband, does
child sex confound the association between area of residence and the rate of severe
pneumonia? Justify your answer. (5 marks)
Model B
. xi: streg i.residence i.sex timeband, dist(exp)
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iresidenc~1 | .3821039 .0475275 -7.73 0.000 .2994372 .4875926
_Isex_2 | 1.193832 .1429444 1.48 0.139 .9441129 1.509602
timeband | .5238446 .0303475 -11.16 0.000 .4676171 .586833
------------------------------------------------------------------------------
. est store B
. lrtest A B
Likelihood-ratio test LR chi2(3) = 56.68
(Assumption: B nested in A) Prob > chi2 = 0.0000
(ii) Interpret the likelihood ratio test comparing Model B to Model A. (5 marks)
8
e) A random-effects Poisson regression model was then fitted to the data, to properly
account for the clustering in the data.
. gen y=_t-_t0
Model C
. xi: xtpoisson _d i.residence i.timeband i.sex, e(y) re i(idno) irr
Random-effects Poisson regression Number of obs = 12909
Group variable: idno Number of groups = 7860
Random effects u_i ~ Gamma
Wald chi2(6) = 191.31
Log likelihood = -1531.503 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_d | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iresidenc~1 | .3769066 .0476705 -7.71 0.000 .2941544 .4829389
_Itimeband_2 | 1.814057 .3822294 2.83 0.005 1.200326 2.741591
_Itimeband_3 | 1.657297 .3200445 2.62 0.009 1.13507 2.419791
_Itimeband_4 | .6574241 .147549 -1.87 0.062 .4234539 1.020669
_Itimeband_5 | .1630945 .0432013 -6.85 0.000 .097044 .2741006
_Isex_2 | 1.189539 .1454658 1.42 0.156 .9360249 1.511715
y | (exposure)
-------------+----------------------------------------------------------------
/lnalpha | -.6312896 .718153 -2.038844 .7762645
-------------+----------------------------------------------------------------
alpha | .5319054 .3819895 .1301792 2.173338
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 2.69 Prob>=chibar2 = 0.051
(i) Comparing the output from Model A and Model C, does the conclusion about the
association between area of residence and the rate of severe pneumonia change
when the clustering in the data is accounted for? Justify your answer. (5 marks)
(ii) Based on Model C, is there statistical evidence of within-child correlation for the
rate of severe pneumonia? (5 marks)
f) To investigate whether maternal HIV infection was a risk factor for child severe
pneumonia, a case-control study was done. There were 266 children with one or more
episodes of severe pneumonia, with a total of 283 episodes during 2009. For each of the
283 episodes, the child was recruited as a case. 283 individually-matched control children
9
were selected from the DSS population, with one control child for each case. The control
child was matched on age in months, sex, and neighbourhood, and was selected within
one week of the date that the matched case presented at the health clinic – i.e.
“concurrent” sampling of controls was done.
(i) Why were control children selected with concurrent sampling? Are the exposure
odds ratios that will be calculated from this case-control study, estimates of the
disease risk ratio, odds ratio, or rate ratio? Justify your answer. (10 marks)
(ii) The association with maternal HIV was summarised using discordant pairs as
below:
Case mother HIV-
negative
Case mother HIV-
positive
Control mother HIV-negative 232 38
Control mother HIV-positive 11 2
Calculate the exposure odds ratio for the association between maternal HIV and
child severe pneumonia, comparing children with HIV-positive mothers to children
with HIV-negative mothers. (5 marks)
(iii) The following conditional logistic regression model was fitted. The variable casecon
is coded 0 for a control and 1 for a case, the variable sex is coded as before (1=male,
2=female), and caseid is the variable that uniquely identifies each case-control pair.
. xi: clogit casecon i.moth_hiv*i.sex, strata(caseid) or
Conditional (fixed-effects) logistic regression Number of obs = 566
LR chi2(2) = 17.20
Prob > chi2 = 0.0002
Log likelihood = -187.5612 Pseudo R2 = 0.0438
------------------------------------------------------------------------------
casecon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Imoth_hiv_1 | 6.99995 5.291451 2.57 0.010 1.590922 30.79932
_Isex_2 | (omitted)
_ImotXsex_~2 | .3809551 .3241933 -1.13 0.257 .0718621 2.019517
Based on the model above, in which the variable moth_hiv is coded 0 for HIV-
negative mothers and 1 for HIV-positive mothers, calculate the odds ratio for the
association between maternal HIV and severe child pneumonia, separately for boys
and girls.
(5 marks)
10
Question 2
Health care workers in South Africa are at higher risk of tuberculosis than the general
population and additional measures are needed to protect them. Isoniazid preventive therapy
consists of screening for tuberculosis followed by a course of isoniazid (usually 6-9 months)
for those without active tuberculosis. At an individual level, isoniazid preventive therapy has
been shown to reduce the subsequent risk of tuberculosis. The South African government
wants to know if providing isoniazid preventive therapy to all healthcare workers in the
country would reduce the high incidence of tuberculosis among healthcare workers. Hence,
researchers planned a cluster randomised controlled trial of isoniazid preventive therapy in
healthcare workers in South Africa, with healthcare facilities as the clusters.
a) (i) Give two advantages and two disadvantages of conducting this trial at the cluster
level rather than the individual level. (15 marks)
(ii) Give one advantage and one disadvantage of conducting this trial in a few, large
facilities, rather than many, small facilities. (10 marks)
(iii) This study was planned as an unpaired design. Give one advantage and one
disadvantage of pair-matching as an alternative study design. (10 marks)
b) (i) Define each of intra-cluster correlation and between-cluster variation and explain
how they are related to each other. How would you interpret each of (1) an intra-
cluster correlation coefficient of 0 and (2) an intra-cluster correlation coefficient of
1? (10 marks)
An estimate of the coefficient of variation was not available for this situation. The researchers
think that the best guess of the population mean and standard deviation of health-facility
tuberculosis rates were 5/100py and 1.5/100py, respectively.
(ii) Define the coefficient of variation k, and calculate the coefficient of variation k in
this situation. (5 marks)
(iii) Explain why the coefficient of variation of the true cluster-level means is important
in determining sample size calculations. In particular, why does a larger coefficient
of variation imply a larger number of clusters are required? (5 marks)
The researchers calculated that ten clusters per arm would be sufficient to detect a 50%
reduction in tuberculosis incidence from 5/100py in the control arm to 2.5/100py in the
intervention arm, assuming an unpaired design, k=0.35, 80% power and follow up of 250
healthcare workers for a one year period, after completion of isoniazid preventive therapy in
the intervention clusters and a similar time period for control clusters. (Note py = person-
years.)
The researchers conducted the trial and decided to analyse the trial using the cluster level
summaries.
11
c) (i) What would have been the consequences for the p-value and confidence interval if
this trial had been analysed using individual level data without taking into account
the cluster level? (5 marks)
(ii) Why do you think the researchers decided to analyse this trial using cluster level
summaries instead of a random effects model? (5 marks)
d) The twenty cluster-level rates (per 100 person years) were then analysed by study arm
using an unpaired t-test and the output is given below:
. ttest rates, by(study_arm)
Two-sample t test with equal variances
----------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
-------------+--------------------------------------------------------------------
control | 10 5.714149 .3204526 1.01336 4.989234 6.439063
intervention | 10 3.139567 .0985532 .3116527 2.916624 3.36251
-------------+--------------------------------------------------------------------
combined | 20 4.426858 .3373994 1.508896 3.720673 5.133043
-------------+--------------------------------------------------------------------
diff | 2.574581 .335265 1.870216 3.278947
----------------------------------------------------------------------------------
diff = mean(control) - mean(intervention) t = 7.6792
Ho: diff = 0 degrees of freedom = 18
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000
(i) State and interpret the rate difference, calculated from the cluster level summaries,
and its confidence interval and p-value. (5 marks)
(ii) Calculate and interpret the rate ratio and an approximate 95% confidence interval
calculated from the cluster level summaries, using the following formula for the
confidence interval of the log rate ratio (logRR):
logRR ± 1.96 * √var(logRR)
where the variance of the log rate ratio (logRR), is given below, in which s1 is the
standard deviation in the intervention arm and s0 is the standard deviation in the
control arm, c1 is the number of clusters in the intervention arm and c0 is the number
of clusters in the control arm, and 1r is the mean of the cluster-level rates in the
intervention arm and 0r is the mean of the cluster-level rates in the control arm. (10
marks)
2
00
2
0
2
11
2
1)var(logrc
s
rc
sRR
12
e) The researchers conducted a short, baseline interview with all participants in both
intervention and control clusters, to collect basic demographic information and to test for
HIV infection. The researchers thought that imbalance in any of these variables between
the two study arms may bias the comparison between the two arms.
(i) Why would imbalance be a greater concern in a cluster randomised trial than an
individually randomised trial with the same number of participants? (5 marks)
The researchers calculated ratio residuals at the cluster-level (i.e. the ratio of observed cases
to expected cases for each cluster), adjusting for age, gender and HIV status, and then carried
out an unpaired t-test on these ratio residuals (Stata output given below).
. ttest residuals, by(study_arm)
Two-sample t test with equal variances
----------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
-------------+--------------------------------------------------------------------
control | 10 1.367021 .1557167 .4924194 1.014766 1.719277
intervention | 10 .7440783 .1015229 .3210435 .5144176 .9737389
-------------+--------------------------------------------------------------------
combined | 20 1.05555 .1152823 .515558 .8142612 1.296838
-------------+--------------------------------------------------------------------
diff | .622943 .1858886 .2324055 1.01348
----------------------------------------------------------------------------------
diff = mean(control) - mean(intervention) t = 3.3512
Ho: diff = 0 degrees of freedom = 18
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.9982 Pr(|T| > |t|) = 0.0036 Pr(T > t) = 0.0018
(ii) Does a mean ratio residual of 0.744 in the intervention arm imply that the
intervention increased or decreased the incidence rate of TB? Explain your answer.
(5 marks)
(iii) Calculate and interpret the adjusted rate ratio and its 95% confidence interval
calculated from the cluster-level summaries. (10 marks)
END OF QUESTIONS
(A formulae sheet and statistical tables follow)
13
SUMMARY OF STATISTICAL FORMULAE MSc/Postgraduate Diploma Epidemiology
June 2011 examinations This summary sheet includes formulae from all EP modules and so will include formulae which some students are not familiar with; students are only expected to be able to apply formulae covered in modules they have studied. Please note however that more basic formulae are not included here and students are expected to know these. 1) Single Sample:
a) Proportion, , , estimated as
, for confidence intervals
95% confidence interval for :
, for significance tests
Test hypothesis :
b) Mean, , estimated as
i) Large Sample
95% confidence interval for :
Test hypothesis :
ii) Small Sample
95% confidence interval for :
where and is the 2-tailed 5% point of a t-
distribution with degrees of freedom (df)
Test hypothesis : , df
2) Two Independent Samples:
a) Difference in proportions, (where and )
95% confidence interval for :
where estimated as:
p
npSE
1
n
pppSE
1
pSEp 96.1
npSE 00 1
0 pSE
pz 0
,x n
xSE
n
sxSE
xSEx 96.1
0 xSE
xz 0
xSEtx 05.0,
1 n 05.0,t
0 xSE
xt 0 1 n
21 pp 1
11
n
rp
2
22
n
rp
21 2121 96.1 ppSEpp
21 ppSE
2
22
1
11 11
n
pp
n
pp
14
Test hypothesis :
where estimated as:
and the common proportion,
A slightly more conservative test uses a continuity correction, where
,
or analyse as a contingency table (see 6 below)
b) Difference in means,
i) Large Samples , estimated as
95% confidence interval for :
Test hypothesis :
ii) Small Samples (where )
estimated as
where
95% confidence interval for :
where and is the 2-tailed 5% point
of a t-distribution with degrees of freedom (df)
Test hypothesis : , df
3) Paired Samples
a. Difference in means,
Take differences in paired values; analyse differences using formulae for single sample mean [1(b)].
21 21
21
ppSE
ppz
pooled
21 ppSEpooled
21
111
nnpp
21
21
nn
rrp
21
2121
21 11
ppSE
nnppz
pooled
22
21 xx
2
2
2
1
2
121
nnxxSE
2
2
2
1
2
121
n
s
n
sxxSE
21 2121 96.1 xxSExx
21 21
21
xxSE
xxz
21
21 xxSE 21
11
nns
11
11
21
2
22
2
112
nn
snsns
21 2105.0,21 xxSEtxx
221 nn05.0,t
21
21
21
11
nns
xxt
221 nn
21 xx
15
b. Difference in proportions, ,
95% confidence interval for :
Test hypothesis : , df = 1
where r and s are the number of discordant pairs, and N is the total number of pairs
4) r x c contingency table
Test hypothesis of no association : , df = (r-1) ×(c-1)
Where: O = observed number in a cell
E = expected number in a cell under the null hypothesis
r = number of rows, c = number of columns
5) 2 x c contingency table
Assign score to each column of table
Test hypothesis of no linear trend :
, df = 1
Where = mean score for subjects in row 1 of table
= mean score for subjects in row 2 of table
= number of subjects in row 1 of table
= number of subjects in row 2 of table
s = standard deviation of scores combining subjects in rows 1 and 2
6) 2 x 2 contingency table
Test hypothesis of no association :
a b a+b
c d c+d df = 1
a+c b+d N
A slightly more conservative test uses a continuity correction,
, df = 1
21 pp N
srppSE
21
21 2121 96.1 ppSEpp
21
sr
srX paired
2
21
E
EOX
2
2
21
2
2
212
11
nns
xxX
1x
2x
1n
2n
dbcadcba
NbcadX
2
2
dbcadcba
NNbcadX
2
21
2
16
7) Mantel-Haenszel χ2 test for several 2 x 2 tables :
Test hypothesis of no association :
, df = 1
where and
Mantel-Haenszel Odds Ratio = where the summation is over each of
the strata.
8) Linear regression :
Equation of fitted line y = a + b x
95% confidence interval for :
where and is the 2-tailed 5% point of a t-distribution with degrees
of freedom (df)
Test hypothesis of no linear association: , df =
Alternatively, the same test in terms of the correlation coefficient r:
d.f. = .
9) Likelihood Ratio Test The likelihood ratio statistic (LRS) for testing for an association is calculated as: = ( − ) , where L1 is the log likelihood of the model with the exposure variable, and L0 is the log likelihood of the model without the exposure variable. The LRS is then referred to the χ2 distribution, with the degrees of freedom equal to the number of parameters that were excluded from the model.
10) Population attributable risk & population attributable risk fraction
r0 is risk (or rate) in unexposed group, r1 is risk (or rate) in exposed group; r is risk (or rate) in total study population, p is proportion of exposed in the population, p1 is the proportion of exposed among cases RR is risk ratio (rate ratio, odds ratio)
PAR = r – r0, or PAR = p(r1 – r0) PAF = PAR/ r So PAF = (r – r0)/ r or PAF = p(RR–1)/ [p(RR–1) + 1]
i
ii
aV
aEaX
2
2
i
iiiii
n
cabaaE
12
ii
iiiiiiiii
nn
dbcadcbaaV
iii
iii
ncb
nda
xy
bSEtb 05.0,
2 n 05.0,t
bSE
bt 2n
21
2
r
nrt
2n
17
Also: PAF = [p1 (RR – 1)] / RR. For matched case control studies, this formula is used with RR the matched odds ratio. This formula is also used when adjusting for confounding, with RR the adjusted rate ratio (or odds ratio for exposure in a case control study) obtained by stratification or regression methods.
11) Risk Ratio and Odds Ratio: Error Factor (EF) for use in calculation of 95% confidence
intervals:
Exposure Outcome
Yes No
Yes a b
No c d
95% confidence limits for the risk ratio, RR, in cohort or cross-sectional studies, are given by (RR/EF) to (RRxEF) where EF is the error factor:
EF = exp (1.96 x )
95% confidence limits for the odds ratio, OR, for cross-sectional or unmatched case control studies, are given by (OR/EF) to (ORxEF) where EF is the error factor:
EF = exp (1.96 x )
For 1:1 matched case control studies, the 95% confidence limits for the odds ratio are given by (OR/EF) to (ORxEF), where OR is the matched odds ratio and
EF = exp (1.96 x ), where r and s are the numbers of discordant pairs.
12) Rates and the Rate Ratio:
95% confidence limits for a rate is given by: (R/EF) to( RxEF) where EF is the error factor: EF = exp (1.96 x √ (1/e)), where e is the number of events observed. 95% confidence limits for the rate ratio RR are given by (RR/EF) to (RRxEF) where EF is the error factor:
EF = exp (1.96 x ) where e1 and e2 are the number of events in the
exposed and unexposed groups.
13) Vaccine efficacy: When the two groups being compared are vaccinated and unvaccinated
individuals in a cohort study or randomized trial, vaccine efficacy is defined as: 100 x (1-RR), where RR is the ratio of the incidence rate in the
vaccinated group to the incidence rate in the unvaccinated group.
dccbaa
1111
dcba
1111
sr
11
21
11ee
18
Table A1 Areas in tail of the standard normal distribution.
Tabulated area: Proportion of the area of the standard normal distribution that is above z
Second decimal place of z
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641 0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247 0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859 0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483 0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121 0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776 0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451 0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148 0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867 0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611 1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379 1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170 1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985 1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823 1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681 1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559 1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455 1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367 1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294 1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233 2.0 0.02275 0.02222 0.02169 0.02118 0.02068 0.02018 0.01970 0.01923 0.01876 0.01831 2.1 0.01786 0.01743 0.01700 0.01659 0.01618 0.01578 0.01539 0.01500 0.01463 0.01426 2.2 0.01390 0.01355 0.01321 0.01287 0.01255 0.01222 0.01191 0.01160 0.01130 0.01101 2.3 0.01072 0.01044 0.01017 0.00990 0.00964 0.00939 0.00914 0.00889 0.00866 0.00842 2.4 0.00820 0.00798 0.00776 0.00755 0.00734 0.00714 0.00695 0.00676 0.00657 0.00639 2.5 0.00621 0.00604 0.00587 0.00570 0.00554 0.00539 0.00523 0.00508 0.00494 0.00480 2.6 0.00466 0.00453 0.00440 0.00427 0.00415 0.00402 0.00391 0.00379 0.00368 0.00357 2.7 0.00347 0.00336 0.00326 0.00317 0.00307 0.00298 0.00289 0.00280 0.00272 0.00264 2.8 0.00256 0.00248 0.00240 0.00233 0.00226 0.00219 0.00212 0.00205 0.01999 0.00193 2.9 0.00187 0.00181 0.00175 0.00169 0.00164 0.00159 0.00154 0.00149 0.00144 0.00139 3.0 0.00135 0.00131 0.00126 0.00122 0.00118 0.00114 0.00111 0.00107 0.00104 0.00100 3.1 0.00097 0.00094 0.00090 0.00087 0.00084 0.00082 0.00079 0.00076 0.00074 0.00071 3.2 0.00069 0.00066 0.00064 0.00062 0.00060 0.00058 0.00056 0.00054 0.00052 0.00050 3.3 0.00048 0.00047 0.00045 0.00043 0.00042 0.00040 0.00039 0.00038 0.00036 0.00035 3.4 0.00034 0.00032 0.00031 0.00030 0.00029 0.00028 0.00027 0.00026 0.00025 0.00024 3.3 0.00023 0.00022 0.00022 0.00021 0.00020 0.00019 0.00019 0.00018 0.00017 0.00017 3.6 0.00016 0.00015 0.00015 0.00014 0.00014 0.00013 0.00013 0.00012 0.00012 0.00011 3.7 0.00011 0.00010 0.00010 0.00010 0.00009 0.00009 0.00008 0.00008 0.00008 0.00008 3.8 0.00007 0.00007 0.00007 0.00006 0.00006 0.00006 0.00006 0.00005 0.00005 0.00005 3.9 0.00005 0.00005 0.00004 0.00004 0.00004 0.00004 0.00004 0.00004 0.00003 0.00003
19
Table A2 Percentage points of the t distribution.
One-sided P value
0.25 0.1 0.05 0.025 0.01 0.005 0.0025 0.001 0.0005
Two-sided P value
d.f. 0.5 0.2 0.1 0.05 0.02 0.01 0.005 0.002 0.001
1 1.00 3.08 6.31 12.71 31.82 63.66 127.32 318.31 636.62 2 0.82 1.89 2.92 4.30 6.96 9.92 14.09 22.33 31.60 3 0.76 1.64 2.35 3.18 4.54 5.84 7.45 10.21 12.92 4 0.74 1.53 2.13 2.78 3.75 4.60 5.60 7.17 8.61 5 0.73 1.48 2.02 2.57 3.36 4.03 4.77 5.89 6.87 6 0.72 1.44 1.94 2.45 3.14 3.71 4.32 5.21 5.96 7 0.71 1.42 1.90 2.36 3.00 3.50 4,03 4.78 5.41 8 0.71 1.40 1.86 2.31 2.90 3.36 3.83 4.50 5.04 9 0.70 1.38 1.83 2.26 2.82 3.25 3.69 4.30 4.78 10 0.70 1.37 1.81 2.23 2.76 3.17 3.58 4.14 4.59 11 0.70 1.36 1.80 2.20 2.72 3.11 3.50 4.02 4.44 12 0.70 1.36 1.78 2.18 2.68 3.06 3.43 3.93 4.32 13 0.69 1.35 1.77 2.16 2.65 3.01 3.37 3.85 4.22 14 0.69 1.34 1.76 2.14 2.62 2.98 3.33 3.79 4.14 15 0.69 1.34 1.75 2.13 2.60 2.95 3.29 3.73 4.07 16 0.69 1.34 1.75 2.12 2.58 2.92 3.25 3.69 4.02 17 0.69 1.33 1.74 2.11 2.57 2.90 3.22 3.65 3.96 18 0.69 1.33 1.73 2.10 2.55 2.88 3.20 3.61 3.92 19 0.69 1.33 1.73 2.09 2.54 2.86 3.17 3.58 3.88 20 0.69 1.32 1.72 2.09 2.53 2.84 3.15 155 3.85 21 0.69 1.32 1.72 2.08 2.52 2.83 3.14 3.53 3.82 22 0.69 1.32 1.72 2.07 2.51 2.82 3.12 3.50 3.79 23 0.68 1.32 1.71 2.07 2.50 2.81 3.10 3.48 3.77 24 0.68 1.32 1.71 2.06 2.49 2.80 3.09 3.47 3.74 25 0.68 1.32 1.71 2.06 2.48 2.79 3.08 3.45 3.72 26 0.68 1.32 1.71 2.06 2.48 2.78 3.07 3.44 3.71 27 0.68 1.31 1.70 2.05 2.47 2.77 3.06 3.42 3.69 28 0.68 1.31 1.70 2.05 2.47 2.76 3.05 3.41 3.67 29 0.68 1.31 1.70 2.04 2.46 2.76 3.04 3.40 3.66 30 0.68 1.31 1.70 2.04 2.46 2.75 3.03 3.38 3.65 40 0.68 1.30 1.68 2.02 2.42 2.70 2.97 3.31 3.55 60 0.68 1.30 1.67 2.00 2.39 2.66 2.92 3.23 3.46 120 0.68 1.29 1.66 1.98 2.36 2.62 2.86 3.16 3.37
0.67 1.28 1.65 1.96 2.33 2.58 2.81 3.09 3.29
20
Table A3 Percentage points of the 2 distribution.
In the comparison of two proportions (2 × 2 2 or Mantel–Haenszel 2 test) or in the assessment of a trend, the percentage points give a two-sided test. A one-sided test may be obtained by halving the P values. (Concepts of one- and two-sidedness do not apply to larger degrees of freedom, as these relate to tests of multiple comparisons.)
P value
d.f. 0.5 0.25 0.1 0.05 0.025 0.01 0.005 0.001
1 0.45 1.32 2.71 3.84 5.02 6.63 7.88 10.83 2 1.39 2.77 4.61 5.99 7.38 9.21 10.60 13.82 3 2.37 4.11 6.25 7.81 9.35 11.34 12.84 16.27 4 3.36 5.39 7.78 9.49 11.14 13.28 14.86 18.47 5 4.35 6.63 9.24 11.07 12.83 15.09 16.75 20.52 6 5.35 7.84 10.64 12.59 14.45 16.81 18.55 22.46 7 6.35 9.04 12.02 14.07 16.01 18.48 20.28 24.32 8 7.34 10.22 13.36 15.51 17.53 20.09 21.96 26.13 9 8.34 11.39 14.68 16.92 19.02 21.67 23.59 27.88 10 9.34 12.55 15.99 18.31 20.48 23.21 25.19 29.59 11 10.34 13.70 17.28 19.68 21.92 24.73 26.76 31.26 12 11.34 14.85 18.55 21.03 23.34 26.22 28.30 32.91 13 12.34 15.98 19.81 22.36 24.74 27.69 29.82 34.53 14 13.34 17.12 21.06 23.68 26.12 29.14 31.32 36.12 15 14.34 18.25 22.31 25.00 27.49 30.58 32.80 37.70 16 15.34 19.37 23.54 26.30 28.85 32.00 34.27 39.25 17 16.34 20.49 24.77 27.59 30.19 33.41 35.72 40.79 18 17.34 21.60 25.99 28.87 31.53 34.81 37.16 42.31 19 18.34 22.72 27.20 30.14 32.85 36.19 38.58 43.82 20 19.34 23.83 28.41 31.41 34.17 37.57 40.00 45.32 21 20.34 24.93 29.62 32.67 35.48 38.93 41.40 46.80 22 21.34 26.04 30.81 33.92 36.78 40.29 42.80 48.27 23 22.34 27.14 32.01 35.17 38.08 41.64 44.18 49.73 24 23.34 28.24 33.20 36.42 39.36 42.98 45.56 51.18 25 24.34 29.34 34.38 37.65 40.65 44.31 46.93 52.62 26 25.34 30.43 35.56 38.89 41.92 45.64 48.29 54.05 27 26.34 31.53 36.74 40.11 43.19 46.96 49.64 55.48 28 27.34 32.62 37.92 41.34 44.46 48.28 50.99 56.89 29 28.34 33.71 39.09 42.56 45.72 49.59 52.34 58.30 30 29.34 34.80 40.26 43.77 46.98 50.89 53.67 59.70 40 39.34 45.62 51.81 55.76 59.34 63.69 66.77 73.40 50 49.33 56.33 63.17 67.50 71.42 76.15 79.49 86.66 60 59.33 66.98 74.40 79.08 83.30 88.38 91.95 99.61 70 69.33 77.58 85.53 90.53 95.02 100.43 104.22 112.32 80 79.33 88.13 96.58 101.88 106.63 112.33 116.32 124.84 90 89.33 98.65 107.57 113.15 118.14 124.12 128.30 137.21 100 99.33 109.14 118.50 124.34 129.56 135.81 140.17 149.45
21
EP304: Advanced Statistical Methods in
Epidemiology
Examiner’s Report
Question 1
The question tested students’ understanding of survival analysis. Students were required to
show knowledge and understanding of: classical (stratified) methods of analysis, including
assessment of confounding and interactions; and Poisson regression, including assessment of
confounding and use of random effects models to account for multiple episodes. A nested
matched case-control study was also included, that required students to understand: the
exposure odds ratio as an estimate of the population odds, risk or rate ratio; and methods of
analysis, including conditional logistic regression.
(a) This was generally answered well. Part (i) required students to note that repeat events
in the same child may not be independent, i.e. there may be within-child correlation
for the event rate. This could result in confidence intervals being too narrow/ standard
errors being too small and p-values being too small. Part (ii) required noting that very
few children have more than one event and so it may not be very important to
properly account for the fact that a child can have repeated events in the analysis. Part
(iii) required students to realize that the origin was set at the date of birth and so
follow-up time was based on age. Hence, the earliest observed entry of 0 showed that
at least one child was included since birth and the last observed exit of 5 children
showed that at least one child was included in the study until they were 5 years old.
Part (iv) was a calculation using the number of failures (283) and the total time at risk
(5422.591) to give a rate of 283 / 5422.591 = 0.052 per year. Full marks were also
given for 5.2 per 100 person-years or 52 per 1000 person-years. Part (v) needed
students to give a description that included the following points: follow-up started on
1 January 2009, and the child presented at the clinic with severe pneumonia on 8
January 2009. They were then followed up until 28 July 2009 and at the time the
follow-up stopped the child did not have severe pneumonia.
Those students who did not get full marks did so generally because they were not
thorough enough in their answers. For example, stating that repeat events lead to
dependent data, but then not adding that this would lead to smaller standard errors and
so narrower confidence intervals and smaller p-values in analysis.
(b) Part (i) was answered well with students expected to observe strong statistical
evidence of an association and to note the following points for full marks: the rate of
severe pneumonia was much lower among children living in the rural area compared
22
with the peri-urban area, the confidence intervals for the two rates did not overlap, the
rate ratio was 0.40 with a 95% confidence interval from 0.31 to 0.50 and there was
very strong evidence that the rate ratio is different to 1 (p<0.001). Part (ii) was
answered less well – several students did not realise that the timeband variable
referred to age and almost no students noted that it referred to current age. Timeband
.5 corresponded to when a child’s age was between 6 months and one year old. Part
(iii) was answered well, with most students correctly citing the very different rates in
the 5 age groups with most of the confidence intervals not overlapping as evidence of
an association.
(c) This section was answered well by most students, with marks generally lost for
incomplete answers rather than lack of understanding. There was very little
confounding: the crude and adjusted rate ratios were very similar - 0.375 compared
with 0.395. There was no evidence of effect modification: the age-specific rate ratios
were very similar and there was no statistical evidence for effect modification,
p=0.98.
(d) Part (i) was answered very well with students correctly noting that child sex did not
confound the association between area of residence and the rate of severe pneumonia
– the rate ratio was 0.376 adjusted for child sex compared with 0.375 adjusted only
for child age, which is a very small difference. Part (ii) was not answered so well,
with students averaging half marks. Model A assumes a categorical variable for
timeband, while Model B assumes a linear association. The null hypothesis was that
the association is linear and so a p-value of <0.001 leads us to reject the null and
conclude that there is strong evidence that the association between child age and the
rate of severe pneumonia does not follow a linear trend (on the log(rate) scale), after
adjusting for area of residence and sex. Many students lost a mark for not noting that
the test was adjusted for area of residence and sex.
(e) This section was generally answered well. Part (i) required students to note that there
was very little change to the conclusion with the point estimate of the rate ratio being
very similar, as were the standard errors and the confidence intervals. In part (ii), the
p-value given for the likelihood ratio test of alpha was 0.051, giving many students
problems in terms of either stating “clearly no effect” or “evidence for an effect”,
rather than quantifying this as “weak evidence”. This also led some students to say
there was clearly an important effect of clustering, when in reality the rate ratios and
their standard errors did not change much (as very few children had multiple
episodes).
(f) Part (i) was answered poorly by many students. Pneumonia could be a recurrent
disease in this study, with children having repeat episodes and so it is appropriate to
calculate a rate ratio, rather than an odds or risk ratio. By sampling concurrently, the
rate ratio will be estimated. Part (ii) required students to calculate the exposure odds
ratio as 38/11 = 3.45. Part (iii) gave an odds ratio of 7.00 for boys and 6.999 x 0.3809
= 2.67 for girls. Parts (ii) and (iii) were generally answered well, but several students
seemed to have run out of time and so got zero marks for this section.
23
Question 2
The question tested students’ understanding of cluster randomised trials: including design
issues; intra-cluster correlation and between-cluster variation; sample size issues; and both
unadjusted and adjusted cluster-level analysis.
(a) This was generally answered well. However, students often gave standard responses
from the material, without thinking whether these would apply in the given situation.
Any reasonable answers were allowed for part (i), (ii) and (iii).
For part (i) these included:
Advantages – likely to be a mass effect and would want to capture this; may be
logistically easier if only have to do intervention in fewer healthcare facilities, rather
than all healthcare facilities if an individually randomised trial were to be conducted;
avoid the contamination that would occur if an individually randomised trial was
done, due to intervention and control people working together. Disadvantages –
cluster randomised trials need larger sample size; higher possibility for baseline
imbalance between study arms; analysis is more complicated.
For part (ii) these included:
Advantages – logistically easier to do the study in a few, large clusters.
Disadvantages – larger sample size likely to be required if using a few large clusters
than many small clusters.
For part (iii) these included:
Advantages – in a pair matched design, k is the between cluster variation within
matched pairs, which may be substantially smaller than the unmatched k, giving a
smaller sample size.
Disadvantages – the number of effective observations is halved, so if the matching is
not very effective, a larger sample size will be required; if one cluster drops out of a
study (e.g. a hospital is closed down), then both clusters in the pair are lost from the
study.
(b) Part (i) was answered by almost all students, but with high variability in the quality of
the answers. A good answer would consist of the following points:
Intra-cluster correlation is present if observations from two individuals in the same
cluster are more alike than observations from two individuals in different clusters.
Between-cluster variation is present if the true mean of the outcome variable differs
among clusters.
24
If individuals within a cluster are no more alike than they are to individuals in a
different cluster, then there will be no variation in the true mean value of the outcome
variable between clusters. Conversely, if individuals in the same cluster are more
alike than they are to individuals in a different cluster, then there will be between-
cluster variation for the true mean of the outcome variable.
An intra-cluster correlation coefficient of 0 means that individuals in the same cluster
are no more alike than they are to individuals in a different cluster. An intra-cluster
correlation coefficient of 1 means that all individuals in the same cluster have the
same value for the outcome variable, and this happens for all clusters.
Part (ii) required students to know the formula was CoV (or “k”) = SD/mean and then
to calculate this as 1.5/5 = 0.3. However, this was answered poorly or not at all by
many students. Many gave the coefficient of variation as mean/SD instead of
SD/mean.
Part (iii) was answered well by most students – if the between-cluster variation for the
true TB incidence rate is large, then it will be harder to detect a difference between
study arms against this “background noise” and so a larger sample size will be
required to get the necessary power.
(c) This section was answered well by most students, with students correctly noting for
part (i) that p-values will be too small and confidence intervals will be too narrow and
so the evidence for an effect will be exaggerated and for part (ii) that random effects
models tend not to give reliable results with less than around 15 clusters per arm
(compared to only 10 clusters per arm in this study).
(d) Part (i) was answered well. Full marks were given for noting all of the following
points: the rate difference is 2.57 i.e. the rate in the intervention arm is 2.57/100py
less than in the control arm, with a 95% CI of 1.87 to 3.28 i.e. we are 95% confident
that the true difference lies between 1.87/100py and 3.28/100py less in the
intervention arm and a p-value <0.001 suggesting very strong evidence that this
difference is not due to sampling variation.
However, part (ii) was answered poorly by most students. The necessary formulae
were given, yet many students struggled with the calculations required, even if they
correctly identified all the numbers to go into the formula. This may have been due to
exam stress. The correct calculation was as follows:
The rate ratio is 3.140/5.714 = 0.550 and so the rate of tuberculosis in the intervention
arm was 45.0% less than in the control arm. The variance of the log (rate ratio) was
004128.0714.510
013.1
140.310
3117.02
2
2
2
2
00
2
0
2
11
2
1
rc
s
rc
sV
25
134.1)004128.096.1exp()96.1exp( VEF
EFRR = 0.550/1.134 = 0.48.
EFRR =0.550x1.134 = 0.62.
Hence, we are 95% confident that the true rate ratio lies between 0.48 and 0.62.
(e) This section was answered poorly by most students (average mark of 6.7 out of 20
marks). This is some of the most advanced material in the course and so perhaps this
is not surprising.
Part (i) required students to note that there are only ten clusters in each arm and so the
likelihood of imbalance arising by chance in at least one variable related to the
outcome (e.g. gender, age, HIV status, prior history of TB, etc.) is much higher than
in an individually randomised trial with 2,500 participants per study arm.
In part (ii), a mean ratio residual of 0.744 implies that the intervention has decreased
the number of cases, as the ratio residual is the ratio of observed cases to expected
cases, where the expected cases are calculated assuming no intervention effect, but
based on the characteristics of the study participants in the cluster. Hence a mean ratio
residual of 0.744 implies 26% fewer cases were observed in the intervention arm than
expected, given the characteristics of those in the intervention arm, and so it may be
thought that the difference is due to the intervention (or imbalance in other variables
that were not adjusted for).
Part (iii) required students to remember that they could use the same parts of the
output in the same formula as they did for d(ii). The correct calculation is as follows:
The adjusted rate ratio is 0.7441/1.367 = 0.544 and so the rate of tuberculosis in the
intervention arm was 46.0% less than in the control arm, adjusting for age, gender and
HIV status. The variance of the log(rate ratio) was
03158.0367.110
4924.0
7441.010
3210.02
2
2
2
2
00
2
0
2
11
2
1
rc
s
rc
sV
417.1)03158.096.1exp()96.1exp( VEF
EFRR = 0.544/1.417 = 0.38.
EFRR =0.544x1.417 = 0.77.
Hence, we are 95% confident that the true adjusted rate ratio lies between 0.38 and
0.77.