as06

39
Cox regression (AS06) EPM304 Advanced Statistical Methods in Epidemiology Course: PG Diploma/ MSc Epidemiology This document contains a copy of the study material located within the computer assisted learning (CAL) session. If you have any questions regarding this document or your course, please contact DLsupport via [email protected] . Important note: this document does not replace the CAL material found on your module CDROM. When studying this session, please ensure you work through the CDROM material first. This document can then be used for revision purposes to refer back to specific sessions. These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale or further copying. © London School of Hygiene & Tropical Medicine September 2013 v1.0

description

stats notes

Transcript of as06

  • Cox regression (AS06)

    EPM304 Advanced Statistical Methods in Epidemiology Course: PG Diploma/ MSc Epidemiology

    This document contains a copy of the study material located within the computer assisted learning (CAL) session. If you have any questions regarding this document or your course, please contact DLsupport via [email protected]. Important note: this document does not replace the CAL material found on your module CDROM. When studying this session, please ensure you work through the CDROM material first. This document can then be used for revision purposes to refer back to specific sessions. These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale or further copying.

    London School of Hygiene & Tropical Medicine September 2013 v1.0

  • Section 1: Cox regression Aims To give an introduction of how to model rates in follow-up studies using Cox regression. Objectives By the end of this session you will be able to: Explain the data structure for Poisson and Cox regression Describe the relationship between Poisson and Cox regression Explain how Cox regression estimates rate ratios Know that you can use different time scales in Cox regression models Explain the effect of different time scales on model estimates Section 2: Planning your study

    In SM11 and AS05 you were introduced to Poisson regression, for modelling data from follow-up cohort studies while allowing for time changing exposures. In a Poisson model like this, the time scale is divided into fairly broad bands such as 5- or 10-year intervals. In SM11 you were also given a brief introduction to the key concepts underlying Cox regression. The idea behind Cox regression is very similar to Poisson but it is based on much narrower intervals of time. The aim of this session is to provide a simple introduction to the method, building on the material in SM11 (but this session is also largely stand-alone, compared with the material on Cox regression in SM11).

    To work through this session you should be familiar with survival analysis and regression models, specifically the Poisson model. If you need to review any materials before you continue, refer to the appropriate sessions below. Survival analysis SM03

    Introduction to Poisson and Cox regression SM11 Stratifying on time AS04 Further Poisson regression AS05

  • 2.1: Planning your study To illustrate the concepts and method of Cox regression we will use a simple example and data from two studies: 1. The Whitehall study 2. The Diet study

    Click on each of these studies to see the details below.

    Interaction: Hyperlink: The Whitehall study:

    Output (appears in new window): Whitehall study This cohort study was set up to examine risk factors for mortality in male government employees (civil servants) working around Whitehall, London. Employees were recruited between 1967 and 1970. Information on exposure to selected risk factors was obtained by a self-administered questionnaire and a screening examination during this period. All participants were followed at the National Health Service Central Registry to identify mortality and emigration. Information on death (date and cause) was provided for those who died. The results shown in this session are from a 10% random sample of the total dataset.

    Interaction: Hyperlink: The Diet study:

    Output (appears in new window): Diet study

    This study was a pilot study of the use of weighed diet records in epidemiological studies. The dataset relates subsequent incidence of coronary heart disease (CHD) to dietary energy intake. It consists of 337 men (bus drivers, bus conductors and bankers) who were aged 30 to 67 at the time of the survey. During the follow-up period, 80 deaths were recorded, 46 of them attributed to cardiovascular disease. Section 3: From Poisson to Cox

    To illustrate the progression from Poisson to Cox we will start with a simple example. The diagram opposite shows the follow up for 6 individuals. During the period of follow-up 4 individuals died. What is the mortality rate in this simple example, to 3 decimal places?

  • Interaction: Calculation: Rate, = Output: Incorrect answer: Sorry, that's not right. The rate is given by = D / Y, where D is the number of deaths and Y is the total person-years at risk. From the diagram opposite we can see that there are 4 deaths, and the total follow-up time is 132 person-years.

    Therefore, = 4 / 132 = 0.030. Correct answer: Correct Yes, the rate, = D / Y, and: D = 4 (number of deaths) Y = 132 (total person-years at risk)

    So, = 4 / 132 = 0.030.

    Interaction: Button: Hint:

    Output (appears in new window): Hint Remember the rate is calculated as = D / Y, where D = number of deaths (events),

    Y = total person-time at risk

  • 3.1: From Poisson to Cox

    To allow changing exposure we divide the follow-up time into bands, as shown below. To do this we use a Lexis expansion. This allows the rate to change with age.

    Complete the table below by calculating the rates in each age band to 3 decimal places. You can click on the highlighted cells to see the person-time sum in each band. Age band

    30 to 39 years

    40 to 49 years

    50 to 59 years

    60 to 69 years

    D 0 1 2 1 Y 38 59 29 6 Rate (calc1) (calc2) (calc3) (calc4) Interaction: hotspot: 38 Output: Y = 2 + 8 + 7 + 10 + 2 + 9 = 38 Interaction: hotspot: 59 Output:

  • Y = 10 + 10 + 10 +9 + 10 + 10 = 59 Interaction: hotspot: 29 Output: Y = 7 + 2 + 4 + 6 + 10 = 29 Interaction: Calculation: (calc1) Output: Incorrect answer: Sorry, that's not right. In this age band there were no deaths (D = 0) and 38 person-years at risk (Y = 38), so:

    Rate = 0 / 38 = 0.000 Correct answer: Correct In this age band there were no deaths and 38 person-years at risk:

    Rate = 0 / 38 = 0.000 Interaction: Calculation: (calc2) Output: Incorrect answer: No, in this age band there was 1 death (D = 1) and 59 person-years at risk (Y = 59), so:

    Rate = 1 / 59 = 0.017 Correct answer: Correct In this age band there was 1 death and 59 person-years at risk, so:

    Rate = 1 / 59 = 0.017 Interaction: Calculation: (calc3) Output: Incorrect answer: That's not right. In this age band there were 2 deaths (D = 2) and 29 person-years at risk (Y = 29), so:

    Rate = 2 / 29 = 0.069 Correct answer: Correct In this age band there were 2 deaths and 29 person-years at risk, so:

    Rate = 2 / 29 = 0.069 Interaction: Calculation: (calc4) Output:

  • Incorrect answer: Sorry, in this age band there was 1 death (D = 1) and 6 person-years at risk (Y = 6), so you should have got:

    Rate = 1 / 6 = 0.167 Correct answer: Correct Yes, in this age band there was 1 death and 6 person-years at risk, so:

    Rate = 1 / 6 = 0.167

  • 3.2: From Poisson to Cox

    Age band

    30 to 39 years

    40 to 49 years

    50 to 59 years

    60 to 69 years

    D 0 1 2 1 Y 38 59 29 6 Rate 0.00 0.017 0.069 0.167 You can click the 'swap' button below to see these rates displayed graphically.

    Interaction: Button: Swap:

    Output (changes graphic on RHS):

    Interaction: Button: Show:

    Output (appears in new window): Notice how the plot is displayed as steps, to illustrate the constant rates within each interval. Strictly speaking, this is how we should plot rates that are assumed constant over a period of time. However, it is common to join estimates with a sloping line as we did in SM11 for estimates from the Poisson model. Because of the nature of this session, the stepped plots are more suitable for illustration here.

  • 3.3: From Poisson to Cox

    What is the assumption we make about these age specific rates?

    Interaction: Button: More:

    Output: In calculating the age-specific rates for these 10-year age bands we have made the assumption that the mortality rate is constant within each of these bands. Do you think this is a valid assumption? What would you do if the rate changes rapidly with time? Go on to the next card to find out.

    3.4: From Poisson to Cox

    The assumption is valid if the rate does not change much within each age band. If the rate changes rapidly we could only assume a constant rate within an interval for very small intervals that is, by reducing the width of the intervals we could assume a constant rate.

    Click below to split the interval 50 to 59 years into two equal intervals.

    Interaction: Button: Split:

  • Output (changes figure below):

    3.5: From Poisson to Cox

    Examine the diagram below. How many events, and how many person years at risk are there in the two smaller intervals between 50 to 59 years? 50 to 54 years: D = (calc1) Y =(calc2) 55 to 59 years: D = (calc3) Y =(calc4) Now calculate rates in these two intervals, to 3 decimal places. 50 to 54 years: Rate =(calc5) 55 to 59 years: Rate =(calc6)

    Interaction: Calculation: D = (calc1) Output: Incorrect answer: No, there was 1 death in the interval 50 to 54 years, so D = 1. Correct answer: Correct

  • Yes, there was 1 death in this interval. Interaction: Calculation: Y =(calc2) Output: Incorrect answer: No, Y is the total person-time at risk in this interval, which is given by 5 + 2 + 4 + 5 + 5 = 21.

    Remember that the interval 50 54 years includes the whole of year 54, so on the plot opposite it covers the interval between Age = 50 and Age = 55. Correct answer: Correct

    Yes, the total person-time at risk in this interval is 5 + 2 + 4 + 5 + 5 = 21. Interaction: Calculation: D = (calc3) Output: Incorrect answer: No, there was 1 death in the interval 55 to 59 years, so D = 1. Correct answer: Correct

    Yes, there was 1 death in this interval. Interaction: Calculation: Y =(calc4) Output: Incorrect answer: No, from the diagram you can see that the total person-time at risk, Y, in the interval 55 years to 59 years, is 2 + 1 + 5 = 8. Remember that the interval 55 59 years includes the whole of year 59, so on the plot below it covers the interval between Age = 55 and Age = 60.

    Correct answer: Correct

    The total person-time at risk in this interval is 2 + 1 + 5 = 8. Interaction: Calculation: Rate =(calc5)

  • Output: Incorrect answer: Sorry, that's incorrect. The rate in this interval is given by: Rate = D / Y = 1 / 21

    = 0.048 Correct answer: That's right, the rate in this interval

    = D / Y = 1 / 21 = 0.048. Interaction: Calculation: Rate =(calc6) Output: Incorrect answer: Sorry, that's incorrect. The rate in this interval is given by: Rate = D / Y = 1 / 8

    = 0.125 Correct answer: Correct

    Yes, the rate in this interval is given by D / Y = 1 / 8 = 0.125.

  • 3.6: From Poisson to Cox

    The plot opposite now shows the new age-specific rates, with age band 50 to 59 years split into two intervals. Click 'swap' to see the plot of rates for the previous, larger interval of 50 to 59 years. Notice that the rate for the larger interval (0.069) is between the rate of the two smaller intervals (0.048 and 0.125).

  • Interaction: Button:

    Swap: Output (changes table above):

    3.7: From Poisson to Cox

    We could make the intervals smaller and smaller. Suppose the interval for 50 54 years became progressively smaller.

  • It is not so obvious with this simple example, but imagine a large cohort with many individuals in making the intervals smaller, the assumption of a constant rate within each interval is more likely to hold.

    3.8: From Poisson to Cox When do you think very small intervals are appropriate? Choose one of the answers below. Interaction: Hotspot: When there is at least one event per day Output: (appears in new window): Very small intervals are appropriate when the rate changes rapidly with time, but we do not have to be so exact as to say there must be one event per day! Interaction: Hotspot: When the rate of events is changing rapidly and the assumption of a constant rate within larger intervals is invalid Output: (appears in new window): Yes, that's correct, when the rate changes rapidly we cannot assume it is constant within larger intervals, so narrow intervals are more appropriate.

  • Interaction: Hotspot: If there is a non-time-dependent exposure that has a significant effect on the rate Output: (appears in new window): No, this is not related to the effect of other non-time dependent exposures. Once we have controlled for the effect of a rapidly changing rate over time we can assess the effect of other exposures. Age-specific rates for very small agebands

    3.9: From Poisson to Cox

    Eventually, the intervals could be so narrow that each interval contained no more than a single event. We call these small intervals of time timeclicks. Within each timeclick we can now assume the rate is constant, although, over time, rates can vary continuously. This is what happens in Cox regression.

    In this situation, the numerator of a rate is at most 1, because we allow at most one event in a timeclick. The denominator in the calculation of a rate is no longer the person-time at risk but the number of individuals at risk at that time, which is known as the risk set.

    Calculation of a rate in a time-click in which an event occurs

  • Rate = 1 _ no. of people when an event occurs Rate = 1 _ risk set

    Note that there can be timeclicks in which no event occurs, in which case the rate is zero. 3.10: From Poisson to Cox

    If we look at the diagram below, the risk set is all subjects whose lines cross the vertical line at which an event occurs. What is the risk set when subject 2 dies, at 52 years? (calc1) What is the rate when subject 2 dies? (calc2)

    We don't normally calculate the rate within every timeclick in this way - this is simply an illustration of what happens. We will consider risk sets in more detail later in the session. Interaction: Calculation: (calc1) Output(appears in new window): Incorrect answer: Remember that the risk set is the number of subjects in the follow-up at the time the event occurs. When subject 2 dies there are 5 individuals (including subject 2) so the risk set is 5. Correct answer: Correct

    Yes, the risk set is the number of subjects in the follow-up at the time the event occurs. When subject 2 dies there are 5 individuals (including subject 2) so the risk set is 5. Interaction: Calculation: (calc2) Output(appears in new window): Incorrect answer: No, remember that there is only one event, so the numerator is 1. Therefore the rate is given by: Rate = 1 / risk set

    = 1 / 5 = 0.2 Correct answer:

  • Correct Rate = 1 / risk set

    = 1 / 5 = 0.2

    Interaction: Button: Note:

    Output (appears in new window): You can relate this to the survival analysis session. We learnt that that the probability of survival was calculated at the time of an event and depended on the number of individuals at that time. Review this topic from SM03. 3.11: From Poisson to Cox

    In considering a Poisson or Cox model, the appropriate regression analysis for a dataset depends on the data structure in relation to the assumptions we can make about how the rate varies (or not) with time. The 3 models we have considered are shown below. On the next page we will consider the parameters included and estimated in a Cox model.

  • Interaction: Tabs: Poisson :

    Output: Simplest Poisson model Data: Not stratified Assumption: Rate is constant over

    total time

    Interaction: Tabs: Poisson/lexis :

    Output: Poisson model after Lexis expansion Data: Data stratified by large intervals of time Assumption: Rates are constant within

    each time interval

    Interaction: Tabs: Cox :

    Output: Cox model Data: Data stratified into small intervals of time, called timeclicks. Assumption: Rate varies over time, but is constant within each

    timeclick, since there is at most one event. Section 4: The Cox regression model

    In a Poisson model with lexis expansion, when a rate varies with large intervals of time, a model that accounts for the changing rates can be written: Rate = Constant x Timeband x Exposure Where

  • Constant is the rate in the baseline group that is the lowest timeband and lower exposure level, Timeband is the rate ratio of the timebands compared to the baseline, Exposure is the rate ratio for the higher levels of exposure compared to the baseline.

    4.1: The Cox regression model

    In AS05 we analysed data from the Whitehall study. We specifically examined the effect of grade of employment (low grade compared to high grade) on rates of coronary heart disease (CHD), adjusted for the changing age during follow-up period. The corresponding Poisson model is written Rate = Constant x Ageband x Grade

    The model parameters, and a plot of the estimated rates, are shown below. Interaction: Tabs: Parameters :

    Output: Parameter estimates from a Poisson model

    Rate ratio

    Grade1 1.413

  • Ageband50 6.317

    Ageband55 8.682

    Ageband60 18.471

    Ageband65 27.240

    Ageband70 33.680

    Ageband75 45.169

    Ageband80 88.943

    The constant rate = 0.0003

    Interaction: Tabs: Plot :

    Output:

    You can click on the rates (white boxes) in the plot above to see how the rates are obtained from the model parameters.

    (Hotspots from table above)

    Interaction: Hotspot: 0.038

    Output: (appears in new window): This is the rate in the ageband 80+ years (Ageband80) for low-grade workers (Grade1). Rate = Constant x Ageband80 x Grade1 = 0.0003 x 88.943 x 1.413

    = 0.038

  • Interaction: Hotspot: 0.014

    Output: (appears in new window): This is the rate in the ageband 75 to 79 years (Ageband75) for high-grade workers. Rate = Constant x Ageband75 = 0.0003 x 45.169

    = 0.014

    Interaction: Hotspot: 0.008

    Output: (appears in new window): This is the rate in the ageband 60 to 64 years (Ageband60) for low-grade workers (Grade1). Rate = Constant x Ageband60 x Grade1 = 0.0003 x 18.471 x 1.413

    = 0.008 4.2: The Cox regression model Consider a general Poisson model with one exposure adjusted for time.

    Rate = Constant x Timeband x Exposure Interaction: Button: More: Output (changes text above): Rate = Constant x Timeband x Exposure (appears on LHS):

    Imagine we combined the constant parameter and the timeband parameters and called this the changing baseline rate. As we go from one time band to the next this is the baseline rate in the unexposed group compared to the exposed group. Click the terms in the model below to show these parameters in the plot.

    Rate = Baseline rate x Exposure

    Interaction: Hyperlink: Baseline rate: Output (changes table on RHS):

  • Interaction: Hyperlink: Exposure: Output (changes table on RHS):

  • 4.3: The Cox regression model We can make the time intervals smaller and smaller until they are so small that they only contain at most one event. As mentioned earlier, we call such a small interval of time a timeclick.

    The baseline rate in the model now changes for every timeclick.

    Interaction: Button: Show:

    Output (appears on LHS): Rate = Baseline rate x Exposure where Baseline rate = Constant x Timeclick and Timeclick is the rate ratio for each very small time interval.

    Rates within each timeclick

  • 4.4: The Cox regression model

    This model with the changing baseline rate is a Cox regression model. Rate =

    Baseline rate

    x Exposure

    Changes with time

    An important characteristic of this model is that the baseline rate varies with time and so does the rate in each of the other categories of the exposure variable(s). The effect of exposure is estimated across this changing rate. What is the assumption we make with this model? Interaction: thought bubble: (LHS output)

    The effect of exposure is estimated across this changing rate. In order to do this, the model assumes that the ratio of the rate for the exposure compared to the baseline rate is constant over time.

  • For this reason, the Cox model is also known as the proportional hazards model. (This is similar to the proportional odds or proportional rates assumption for logistic and Poisson regression). Note: Hazard is simply another word for risk or rate in this sense. It is commonly used in the analysis of cohort studies where Cox regression is used.

    4.5: The Cox regression model

    When we perform a Cox regression, we do not estimate the changing baseline rate. We only obtain parameter estimates for the effect of exposure, which do not depend on the baseline rate. The advantage of using a Cox model is that we can model the relationship between rates and time as finely as possible, therefore accounting for changing rates. We do not have to assume that rates are constant over larger intervals of time.

    The plots opposite illustrate a Poisson model and a Cox model.

    Interaction: Tabs: Poisson : Output:

    Rate = Constant x Timeband x Exposure Within each time band we observe: D (the number of events) and Y (the person-time at risk)

    Then rates ( = D / Y) are calculated within each time band.

  • Interaction: Tabs: Cox : Output:

    Rate = Baseline x Exposure Rates are calculated for risk sets. Exposures within risk sets are compared. Closely models the relationship between rates and time.

    Section 5: An epidemiological example We will now illustrate Cox regression using data from a pilot study of the use of diet records in epidemiological studies. Using this dataset we will examine the relationship between diet and coronary heart disease (CHD). The dataset consisted of 337 men (bus conductors, bus drivers and bankers) At the time of the survey men were aged 30 to 67 years During the follow-up period there were 46 deaths caused by CHD The exposure of interest was energy intake. This was categorised as a binary variable, Energy: either < 27.5 kcal per day, or 27.5 kcal per day.

    Interaction: Button: More: Output (appears on RHS):

    Assuming that all the subjects have the same risk of CHD death the Cox model for the effect of exposure to high energy intake is shown below.

    Rate = changing Baseline rate x Energy

    This model is based on the follow-up time scale, so each timeclick is a small interval of follow-up time.

  • 5.1: An epidemiological example

    Rate = changing Baseline rate x Energy

    What estimates do we obtain in this model? Select one of the options shown below.

    Interaction: Hotspot:A constant rate at time = 0 in the low energy group

    Output: (appears in new window): No, in a Cox model we only obtain an estimate for the effect of exposure. We do not know the constant rate for the baseline exposure level at time = 0.

    Interaction: Hotspot: A rate ratio for each time click Output: (appears in new window):

    A Cox model adjusts for the effect of time, but we do not obtain an estimate for the effect of each timeclick compared to time = 0. Try again

    Interaction: Hotspot: A rate ratio for the effect of high energy

    Output: (appears in new window): Yes, a Cox model gives a parameter estimate for the effect of exposure to high energy intake. 5.2: An epidemiological example The table below shows the results of a Cox regression for the effect on CHD death rates of high energy intake compared to low energy intake. The model controls for the effect of time since entry to the study.

    Note: The interpretation of these results is similar to that of other regression models such as a Poisson or logistic model.

    What is the effect of high-energy intake on the rate of CHD mortality? Are you surprised by this result?

    Interaction: Button: thought bubble:

    Output (appears in new window): The data from this study show that a high daily energy intake reduces the rate of CHD deaths by 48% (RR = 0.52). In other words, the rate of CHD deaths for the high-energy group is 52% of the rate for the low-energy intake group.

    You may find this result surprising, that the more you eat, the less you are at risk from CHD. However, energy consumption is closely related to energy used, because

  • a higher energy intake is a marker for physical activity. As a result, the men who have a higher energy intake are more physically active, and are at lower risk of CHD. Results from a Cox model

    Rate ratio

    Standard error

    95% confidence limits

    Energy1

    0.5234 0.1581 0.2865 0.9462

    Wald test, z = 2.143 , P = 0.03

    Many statistical packages will refer to the rate ratio as the hazard ratio. This is simply another expression for the measure of effect for the ratio of two rates. 5.3: An epidemiological example What is the null hypothesis for the Wald test? Choose from the options below. Interaction: Hotspot: The rate ratio for the effect of Energy does not change over time. Output: (appears in new window): No, this is proportional hazards assumption - the assumption of the Cox model that the ratio for the effect of high energy intake does not change over time. However, the null hypothesis of the Wald test is different. Please try again. Interaction: Hotspot: The rate of CHD deaths in the high energy group is the same as the rate in the low energy group. Output: (appears in new window): Yes, the null hypothesis of the Wald test is that the rate of CHD deaths in the high-energy group is the same rate of CHD deaths in the low-energy group, i.e., rate ratio = 1

    Interaction: Hotspot: The baseline rate is constant across time. Output: (appears in new window): No, in the Cox model the baseline rate does change over time. Please try again Results from a Cox model

  • Rate ratio

    Standard error

    95% confidence limits

    Energy1

    0.5234 0.1581 0.2865 0.9462

    Wald test, z = 2.143 , P = 0.03 5.4: An epidemiological example

    What can you conclude from the Wald test?

    Interaction: Button: thought bubble (1):

    Output (appears in new window): The Wald test shows some evidence against the null hypothesis that RR=1, since P = 0.03. We can therefore conclude that there is a difference in the rate of CHD deaths for those with a high-energy intake compared to those with a low energy intake. (back to main page)

    How would you interpret the confidence interval for the rate ratio?

    Interaction: Button: thought bubble (2) :

    Output (appears in new window): RR = 0.52 (95% CI: 0.29 to 0.95)

    This study shows that high energy intake reduces the rate of CHD deaths by 48%. We can be confident that the true reduction of CHD deaths in the population is between 5% and 71%. Section 6: Alternative time scales and risk sets

    In the diet example we have just examined, time was defined as time since entry to the study, i.e., follow-up time. An alternative would be to define time as age.

    Why might we want to do this?

    Interaction: Button: thought bubble:

    Output (appears in new window):

  • We may want to use age as the time scale because, as we have seen in previous analysis using Poisson regression, the rate of death can change with age.

    Age is generally a strong determinant of mortality risk. With a Cox model the choice of time scale determines the composition of a risk set at the time of an event.

    Remember that a risk set is all the subjects observed at the point in time when an event occurs.

    6.1: Alternative time scales and risk sets

    The table below shows data for 10 subjects from a cohort study. Subjects 5 and 7 died on the highlighted dates. All other subjects survived until the end of follow-up.

    Click below to see these data plotted using calendar time as the time scale.

    Interaction: Button: Plot:

    Output (changes table on RHS):

  • A cohort of 10 subjects

    6.2: Alternative time scales and risk sets

    Click below to show the risk set for subject 5. Interaction: Show Output:

    For subject 5, the risk set using calendar time is 9. That is the number of subjects whose observation line crosses the vertical line when subject 5 dies. What is the risk set when

    subject 7 dies? Interaction: calculation: Incorrect response: No, the vertical line at the death of subject 7 cuts through 7 follow-up lines, so the risk set for subject 7 on the calendar timescale is 7. Correct response: Correct Yes, the risk set for subject 7 on the calendar timescale is 7.

    Entry to study End of study Subject Date Age Date Age 1 13/06/65 29.3 31/12/89 53.8 2 23/10/72 25.2 31/12/89 42.4 3 03/03/59 22.1 31/12/89 52.8 4 10/10/67 32.2 31/12/89 54.4 5 02/01/60 33.1 04/07/79 52.6 6 09/01/75 42.1 31/12/89 57.1 7 05/08/53 35.2 03/10/68 50.4 8 10/10/62 27.0 31/12/89 47.2 9 02/03/62 44.8 31/12/89 72.7 10 01/11/70 51.5 31/12/89 70.6

  • 6.3: Alternative time scales and risk sets Let's now look at this data ordered by age. Click below to change the timescale. Interaction: Show Output: Now, count the risk sets at the age when each death occurs. Risk set for Subject 5: (Calc 1)

    Risk set for Subject 7: (Calc 2)

    Interaction: calculations: Calc 1 Incorrect response: No, the vertical line at the death of subject 5 cuts through 7 follow-up lines, so the risk set for subject 5 on the age time scale is 7. Correct response:

  • Correct Yes, the risk set for subject 5 on the age time scale is 7. Interaction: calculations: Calc 2 Incorrect response No, the vertical line at the death of subject 7 cuts through 7 follow-up lines, so the risk set for subject 7 on the age time scale is 7. Correct response Correct Yes, the risk set for subject 7 on the age time scale is 7. 6.4: Alternative time scales and risk sets

    Now let's order the data by follow-up time. Click below to change the timescale. Interaction: Show Output: (RHS)

  • (LHS) Now, count the risk sets at the follow-up time when each death occurs. Risk set for Subject 5: (Calc 1)

    Incorrect response

    No, the vertical line at the death of subject 5 cuts through 6 follow-up lines, so the risk set for subject 5 on the follow-up time scale is 6. Correct response Correct Yes, the risk set for subject 5 on the follow-up time scale is 6. (text)

    Risk set for Subject 7: (Calc 2) Incorrect response No, the vertical line at the death of subject 7 cuts through 9 follow-up lines, so the risk set for subject 7 on the age time scale is 9. Correct response

  • Correct Yes, the risk set for subject 7 on the follow-up timescale is 9. 6.5: Alternative time scales and risk sets

    The table below shows the risk sets for the two deaths on the different time scales.

    We can see that choosing a different scale gives very different risk sets. How might this affect the results of a Cox regression and how do you know which time scale should be used? Interaction: thought bubble Output: The results of a Cox model using different time scales may give very different rates when an event occurs, because of the different risk sets. Because of this, different time scales may show a very different effect of exposure. The time scale should be chosen with respect to the exposure of interest. Further advice on this may be found on p291 of Essential Medical Statistics (2nd Ed) by Kirkwood & Sterne. Number in risk set

    Timescale Subject 5 Subject 7 calendar 9 6 age 7 7 follow-up 6 9 Section 7: Poisson or Cox?

    The results of a Poisson regression on the diet data with larger agebands are shown below. What is the main assumption that distinguishes this model from the Cox model?

    Interaction: thought bubble (1):

    (new window) The model assumes the rates of CHD deaths are constant within the specified age intervals. End new window.

  • How does the estimate for the effect of high-energy intake from this model compare to the effect from the Cox model? Interaction: thought bubble (2):

    (new window)

    This model shows a reduction in the rate of CHD deaths of 46% (RR = 0.54) which is very similar to the estimate from the Cox model (RR = 0.52). Rate ratio Standard error Energy1 0.5359 0.1622 Ageband50 1.4003 0.7387 Ageband55 1.3097 0.6912 Ageband60 2.3085 1.1281 Ageband65 2.3665 1.2795

    Age intervals: 30 to 50 years, 50 to 54 years, 55 to 59 years, 60 to 64 years,

    65 to 69 years. 7.1: Poisson or Cox? The essential difference between Poisson and Cox regression is that Cox regression uses very narrow time bands, which allows for rates that constantly change with time. Because of this, Cox regression has been mostly used to examine survival following some well-defined event where rates are expected to vary rapidly. For large-scale observational cohort studies Poisson regression, with relatively larger time bands, is preferable.

    Poisson Cox Controls for time using large time intervals

    Controls for time using very small intervals

    Rate constant within large time intervals

    Allows for rates that change with time

    Constant baseline rate estimated Changing baseline rate not estimated Crudely models relationship Finely models relationship Can calculate rates for all explanatory variables, including those whose

    Can calculate rate ratios for all explanatory variables, but not the

  • value changes with time. rates themselves. The exception is the variable that defines the time scale for which we cannot estimate a rate ratio. For example, if age is your time scale, you cannot examine the effect of age group on rates.

    Section 8: Summary

    This is the end of AS06. When you are happy with the material covered here please move on to session AS07 . The main points of this session will appear below as you click on the relevant title.

    Intervals of time and constant rates We may need to adjust for rates that change with time. In a Poisson model we have relatively large intervals of time; the assumption is that rates are constant within each interval. A Cox model is based on very small intervals of time called timeclicks, which contain at most 1 event. This produces a constant rate within each very small interval but across time rates can vary. So, for rapidly changing rates a Cox is the appropriate model.

    The Cox model

    A simple Cox model to describe the effect of one exposure can be written: Rate = Baseline rate x Exposure Click on each term in the model above for an explanation of its significance.

    Both parameters are used to describe the model, but it is only the effect of exposure that is estimated. Interaction: hyperlink: baseline rate Output: The Baseline rate is the rate in the unexposed group that changes with each small interval of time, the changing baseline rate. hyperlink: exposure Output:

  • Exposure is the effect of the exposure, i.e., the rate ratio for the rate in the exposed, compared to the unexposed. The proportional hazards assumption An important assumption in Cox regression is that the effect of exposure is proportional over time. In other words, the ratio of the rate for the exposure group compared to the changing baseline rate must be constant over time. This is known as the proportional hazards assumption, 'hazards' meaning rates.

    Cox or Poisson?_

    Both Poisson and Cox regression account for rates that change with time in the analysis of follow-up cohorts. However, where the rate is changing rapidly and cannot be assumed to be constant within a larger interval of time, a Cox model is more appropriate. Cox regression finely models the relationship between rates and time. Poisson regression crudely models the relationship between rates and time.

    2.1: Planning your study3.1: From Poisson to Cox3.2: From Poisson to Cox3.3: From Poisson to Cox3.4: From Poisson to Cox3.5: From Poisson to Cox3.6: From Poisson to Cox3.7: From Poisson to Cox3.8: From Poisson to Cox3.9: From Poisson to Cox3.10: From Poisson to Cox3.11: From Poisson to Cox4.1: The Cox regression model4.2: The Cox regression model4.3: The Cox regression model4.4: The Cox regression model4.5: The Cox regression model5.1: An epidemiological example5.2: An epidemiological example5.3: An epidemiological example5.4: An epidemiological example6.1: Alternative time scales and risk sets6.2: Alternative time scales and risk sets6.3: Alternative time scales and risk sets6.4: Alternative time scales and risk sets6.5: Alternative time scales and risk sets7.1: Poisson or Cox?