7/24/2019 Longitudinal Data Analysis - Note Set 13
1/12
Longitudinal Data Analysis - Note Set 13
Topics in the Remainder of the Course
Longitudinal analysis when the model for the outcome variable is
nonlinear
Binary outcomes (logistic regression)
Counts (Poisson regression)
Generalized estimating equations for marginal models
Nonlinear mixed effects analysis for subject-specific models
Inverse Probability of Missingness weighting
Hierarchical models for multilevel data
346
Topics for Today
Logistic Regression
Poisson Regression
The Generalized Linear Model
347
7/24/2019 Longitudinal Data Analysis - Note Set 13
2/12
Longitudinal Data Analysis - Note Set 13
Review: Logistic and Poisson Regression
We begin with a review of logistic and Poisson regression for a singleresponse variable. This review will explain why we need a differentapproach when the regression model for the response is nonlinear, that is,when the expected response does not change linearly with changes in thevalue of the covariates.
Logistic Regression:
So far, we have considered linear regression models for a continuous
response, Y , of the following form
Y = 0 + 1X1 + 2X2 + . . . + kXk +
In the univariate case, the response variable, Y, was assumed to have anormal distribution with mean
E(Y) = 0 + 1X1 + 2Xz+ . . . + kXk
and with variance, 2.
348
The population intercept, 0, represents the mean value of the responsewhen all of the covariates have value equal to zero.
Regression coefficients for covariates, say 1, represent the expected
change in the mean response for a single-unit change in X1, given nochange in the values of the other covariates.
In many studies, however, we are interested in a response variable that is
dichotomous rather than continuous.
This situation calls for a regression model for a binary (or dichotomous)
response.
349
7/24/2019 Longitudinal Data Analysis - Note Set 13
3/12
Longitudinal Data Analysis - Note Set 13
Let Y be a binary response, where
Y = 1 represents a `success';
Y = 0 represent a `failure'.
Then the mean of the binary response variable, denoted , is the
proportion of successes or the probability that the response equals 1.
That is,
= E(Y) = Pr(Y = 1) = Pr(`success')
With a binary response, we are usually interested in estimating the
Probability, , and modeling how it depends on the values of covariates.
The most common approach to this problem is logistic regression.
350
A naive strategy for modeling a binary response is to perform an ordinarylinear regression analysis using the regression model
= E(Y ) = 0 + 1X1 + 2Xz+ . . . + kXk
This is sometimes done but is usually not employed in the analysis ofsubject level data since is a probability and is restricted to valuesbetween 0 and 1.
Also, the usual assumption of homogeneity of variance would be violatedsince the variance of a binary response depends on the mean, i.e.
var(Y ) = (1 -)
Instead, we can consider a logistic regression model where
ln [/(1 -)] = 0 + 1X1 + 2Xz+ . . . + kXk
351
7/24/2019 Longitudinal Data Analysis - Note Set 13
4/12
Longitudinal Data Analysis - Note Set 13
Relation between log(/(1 )) and
352
This model accommodates the constraint that is restricted to values
between 0 and 1.
Recall that /(1 - ) is defined as the odds of success.
Therefore, modeling with a logistic function can be considered to beanalogous to a linear regression model where the mean of the continuousresponse has been replaced by the logarithm of the odds of success.
Note that the relationship between and the covariates is non-linear. Wecan use ML estimation to obtain estimates of the logistic regressionparameters, under the assumption that the binary responses are Bernoullirandom variables.
353
7/24/2019 Longitudinal Data Analysis - Note Set 13
5/12
Longitudinal Data Analysis - Note Set 13
Given the logistic regression model
ln [(1 -)] = 0 + 1X1 + 2Xz+ . . . + kXk
the population intercept, 0, has interpretation as the log odds of successwhen all of the covariates take on the value zero.
The population slope, say 1, has interpretation in terms of the change inlog odds of success for a single-unit change in X1 given that all of the othercovariates remain constant.
When one of the covariates is dichotomous, say X1, then 1 has a specialinterpretation:
exp (1) is the odds ratio or ratio of odds of success for the two possiblelevels of X1 (given that all of the other covariates remain constant).
354
Keep in mind that as:
increases
the odds of success increases
and the log odds of success increases
Similarly, as:
decreases
the odds of success decreases
and thus the log odds of success decreases
355
7/24/2019 Longitudinal Data Analysis - Note Set 13
6/12
Longitudinal Data Analysis - Note Set 13
Longitudinal Assessment of Neurocognitive
Function After CABGNEJM 2001; 344:395-402
The investigators studied predictors of cognitive decline five years after
surgery in 261 patients undergoing surgery. Decline in postoperative
function was defined as a drop of 1 SD or more in the scores on tests of
any one of four domains of cognitive function.
Factors predicting long-term cognitive decline were determined by
multivariable logistic regression.
The incidence of cognitive decline was 53 percent at discharge, 36 at six
weeks, and 42 percent at five years.
356
Newman, M. F. et al. N Engl J Med 2001;344:395-402
Univariable and Multivariable Predictors of Cognitive Decline and Change
in the Compos ite Cogniti ve Index at Five Years
357
7/24/2019 Longitudinal Data Analysis - Note Set 13
7/12
Longitudinal Data Analysis - Note Set 13
Poisson Regression
In Poisson regression, the response variable is a count (e.g. number of
cases of a disease in a given period of time) and the Poisson distribution
provides the basis of likelihood-based inference.
Often the counts may be expressed as rates. That is, the count or
absolute number of events is often not satisfactory because any
comparison will depend on the sizes of the groups (or the
`time at risk') that generated the observations.
Like a proportion or probability, a rate provides a basis for direct
comparison.
In either case, Poisson regression relates the expected counts or rates to a
set of covariates.
358
The Poisson regression model has two components:
1. The distributional assumption: The response variable is a count and isassumed to have a Poisson distribution.
That is, the probability a specific number of events, Y, occurs is
Pr(Y events) = e- Y/(Y!)
Note that is the expected count or number of events and the expected
rate is given by /t, where t is a relevant baseline measure (e.g. t might
be the number of persons or the number of person-years of observation).
359
7/24/2019 Longitudinal Data Analysis - Note Set 13
8/12
Longitudinal Data Analysis - Note Set 13
2. The regression model:
ln( /t) = 0 + 1X1 + 2Xz+ . . . + kXk
Since ln( /t) = ln() - ln(t), the Poisson regression model can also bewritten as
ln() = ln(t) + 0 + 1X1 + 2Xz+ . . . + kXk
where the `coefficient' associated with ln(t) is fixed as 1.
This adjustment term is known as an `offset'.
360
Therefore, modeling (or /t) with a log function can be considered to be
equivalent to a linear regression model where the mean of the continuous
response has been replaced by the logarithm of the expected count (or
rate).
Once again, the relationship between (or /t) and the covariates is
non-linear.
We can use ML estimation to obtain estimates of the Poisson regression
parameters, under the assumption that the responses are Poisson random
variables.
361
log linear
7/24/2019 Longitudinal Data Analysis - Note Set 13
9/12
Longitudinal Data Analysis - Note Set 13
Given the Poisson regression model
ln(/t) = 0 + 1X1 +2Xz+ . . . + kXk
the population intercept, 0, has interpretation as the log expected rate
when all the covariates take on the value zero.
The population slope, say 1, has interpretation in terms of the change inlog expected rate for a single-unit change in X1 given that all of the othercovariates remain constant.
When one of the covariates is dichotomous, say X1, then 1 has a special
interpretation:
exp (1) is the rate ratio for the two possible levels of X1 (given that all ofthe other covariates remain constant).
362
Example: Prospective study of coronary heart disease (CHD).
The study observed 3154 men aged 40-50 for an average of 8 years and
recorded incidence of cases of CHD.
The risk factors considered include:
Smoking exposure: 0, 10, 20, 30 cigs per day;
Systolic BP: 0 (< 140), 1 ( 140);
Behavior Type: 0 (type B), 1 (type A).
A simple Poisson regression model is:
ln(/t) = ln(rate of CHD) = 0 + 1Smoke
or ln() = ln(t) + 0 + 1Smoke
363
7/24/2019 Longitudinal Data Analysis - Note Set 13
10/12
Longitudinal Data Analysis - Note Set 13
Person Blood
Years Smoking Pressure Behavior CHD
5268.2 0 0 0 202542.0 10 0 0 16
1140.7 20 0 0 13
614.6 30 0 0 3
4451.1 0 0 1 41
2243.5 10 0 1 24
1153.6 20 0 1 27
925.0 30 0 1 17
1366.8 0 1 0 8
497.0 10 1 0 9
238.1 20 1 0 3
146.3 30 1 0 7
1251.9 0 1 1 29
640.0 10 1 1 21374.5 20 1 1 7
338.2 30 1 1 12
364
For these data, the ML estimate of 1 is 0.0318. That is, the rate of CHDincreases by a factor of exp(0.0318) = 1.032 for every cigarette smoked.
Alternatively, the rate of CHD in smokers of one pack per day (20 cigs) is
estimated to be (1.032)20 = 1.88 times higher than the rate of CHD in
non-smokers.
We can include the additional risk factors in the following model:
ln ( /t) = 0 + 1 Smoke + 2 Type + 3BP
Effect Estimate Std. Error
Intercept -5.420 0.130
Smoke 0.027 0.006
Type 0.753 0.136
BP 0.753 0.129
365
7/24/2019 Longitudinal Data Analysis - Note Set 13
11/12
Longitudinal Data Analysis - Note Set 13
Now, the adjusted rate of CHD (controlling for blood pressure and behaviortype) increases by a factor of exp(0.027) = 1.028 for every cigarette smoked.
Thus, the adjusted rate of CHD in smokers of one pack per day (20 cigs) is
estimated to be (1.027)20 = 1.704 times higher than the rate of CHD in non-
smokers.
Finally, note that when a Poisson regression model is applied to data
consisting of very small rates (say, /t
7/24/2019 Longitudinal Data Analysis - Note Set 13
12/12
Longitudinal Data Analysis - Note Set 13
Barger, L. K. et al. N Engl J Med 2005;352:125-134
Weekly Hours That Interns Worked as a Percentage of Reported Weeks
368
Results
A total of 320 motor vehicle crashes were reported, including 133 that were
consequential; 131 of the 320 crashes occurred on the commute from
work.
Every extended shift (> 24 hrs) scheduled per month increased the
monthly rate of any motor vehicle crash by 9.1 percent (95 percent
confidence interval, 3.4 to 14.7 percent) and increased the monthly rate of
a crash on the commute from work by 16.2 percent (95 percent confidence
interval, 7.8 to 24.7 percent).
369
Top Related