1 Date Name, department Statistical Analysis of Longitudinal Data Ziad Taib Biostatistics, AZ April...
-
Upload
mathew-lory -
Category
Documents
-
view
217 -
download
1
Transcript of 1 Date Name, department Statistical Analysis of Longitudinal Data Ziad Taib Biostatistics, AZ April...
Name, department
1 Date
Statistical Analysis of Longitudinal Data
Ziad Taib
Biostatistics, AZ
April 2011
Name, department
Date2
Outline of lecture 1
1. An introduction
2. Two examples
3. Principles of Inference
4. Modelling continuous longitudinal data
Name, department
Date3
Part 1: An introduction
Name, department
Date4
Why longitudinal data?
Very useful for their own sake. With longitudinal data, we have the possibility of
understanding what mixed models are about in a relatively simple but yet rich enough context.
___________________________________
A good reference is the book ”Designing experiments and analyzing data” by Maxwel l& Delaney (2004)
Name, department
Date5
Longitudinal Data
Repeated measures are obtained when a response is measured repeatedly on a set of units• Units:
• Subjects, patients, participants, . . .
• indivduals, plants, . . .
• Clusters: nests, families, towns, . .
• . . .
• Special case: Longitudinal data
Obs! Possible to handle several levels
Name, department
Date6
A motivating example
Consider a randomized clinical trial with two treatment groups and repeated measurements at baseline, 3 and 6 months later. As it turned out some of the data was missing. Moreover patients did not always comply with time requirements. Our first reaction is to try to compensate for the missing values by some kind of imputation, or to use list-wise deletion.
Both ”methods” having their shortcomings, wouldn't it be nice to be able to use something else? There is in fact an alternative method: using the idea of mixed models.
With mixed models,1. we can use all our data having the attitude that ”what is missing is
missing”. 2. we can even account for the dependencies resulting from measurements
made on the same individuals at different times. 3. we don’t need to be consistent about time.
A
B
Baseline 3 months 6 months
Name, department
Date7
Mixed effects models
Ordinary fixed effects linear model usually assume:
1) independence with the same variance.2) normally distributed errors.3) constant parameters
If we modify assumptions 1) and 3), then the problem becomes more complicated and in general we need a large number of parameters only to describe the covariance structure of the observations. Mixed effects models deal with this type of problems.
In general, this type of models allows us to tackle such problems as: clustered data, repeated measures, hierarchical data.
constant. ),,0( is , 2 INXY
nnn x
x
Y
Y
...
1
......
1
...1
1
011
Name, department
Date8
Various forms of models and relation between them
LM: Assumptions:
1. independence,
2. normality,
3. constant parameters
GLM: assumption 2) Exponential family
LMM: Assumptions 1) and 3) are modified
GLMM: Assumption 2) Exponential family and assumptions 1) and 3) are modified
Repeated measures: Assumptions 1) and 3) are modified
Longitudinal dataMaximum likelihood
Classical statistics (Observations are random, parameters are unknown constants)
Bayesian statistics
LM - Linear model
GLM - Generalised linear model
LMM - Linear mixed model
GLMM - Generalised linear mixed model
Non-linear models
Name, department
Date9
Part 2: Two examples
Rat data Prostate data
Name, department
Date10
Example 1: Rat Data (Verbecke et al)
Research question How does craniofacial growth in the wistar rat depend on testosteron production?
Name, department
Date11
Simplifie
d
(univariate) re
sponse
Name, department
Date12
•Randomized experiment in which 50 male Wistar rats are randomized to:
Control (15 rats) Low dose of Decapeptyl (18 rats) High dose of Decapeptyl (17 rats)
Treatment starts at the age of 45 days. Measurements taken every 10 days, from day 50
on. The responses are distances (pixels) between two
well defined points on x-ray pictures of the skull of each rat. Here, we consider only one response, reflecting the height of the skull.
Prevents the production of testesterone
45
Days
60 7050 80
Name, department
Date13
Individual profiles:
1. Connected profiles better that scatter plots2. Growth is expected but is it linear3. Of interest change over time (i.e. Relationship between response and age)
Name, department
Date14
Complication: Many dropouts due to anaesthesia imply less power but
no bias.
Without dropouts easier problem because of balance.
Name, department
Date15
Remarks:
Much variability between rats Much less variability within rats Fixed number of measurements scheduled per
subject, but not all measurements available due to dropout, for known reason.
Measurements taken at fixed time points
Research question: How does craniofacial growth in the wistar
rat depend on testosteron production ?
Name, department
Date16
Example 2: The BLSA Prostate Data
Name, department
Date17
Example 2: The BLSA Prostate Data (Pearson et al., Statistics in Medicine,1994). Prostate disease is one of the most common and
most costly medical problems in the world. Important to look for biomarkers which can detect the disease at an early stage.
Prostate-Specific Antigen is an enzyme produced by both normal and cancerous prostate cells. It is believed that PSA level is related to the volume of prostate tissue.
Problem: Patients with Benign Prostatic Hyperplasia also have an increased PSA level
Overlap in PSA distribution for cancer and BPH cases seriously complicates the detection of prostate cancer.
Name, department
Date18
Research question: Can longitudinal PSA profiles be used to detect prostate cancer in an early stage ?
A retrospective case-control study based on frozen serum samples:
16 control patients 20 BPH cases 14 local cancer cases 4 metastatic cancer cases
Name, department
Date19
Individual profiles:
Name, department
Date20
Remarks:
Much variability between subjects Little variability within subjects Highly unbalanced data
Research question: Can longitudinal PSA profiles be used to
detect prostate cancer in an early stage ?
Name, department
Date21
Part 3: Principles of Inference
Name, department
Date22
Fisher´s likelihood Inference for observable y and fixed parameter q Data Generation : Given a stochastic model
, Generate data, y, from
Parameter Estimation : Given the data y, make inference about q by using the likelihood
Connection between two processes :
)(yf
)/( yL
)()/( yfyL
)(yf
Name, department
Date23
(Classical) Likelihood Principle
Birnbaum (1962) All the evidence or information about the parameters in the data is in the likelihood.
Conditionality principle& Sufficiency principle
Likelihood principle
Name, department
Date24
Bayesian Inference for observable y and unobservable n Data Generation : Generate data according to
1. n, from
2. For n fixed generate y from
Combine into Parameter Estimation : Given the data y, make
inference about n by using The connection between two processes:
)(f
)/()()/()( yfyfyff
)/()( yff
)/( yf
)/( yf
prior
posterior
Compare with )/( yL
)/()(),()/()()(
),()/( yffyfyfyf
yf
yfyf
Name, department
Date25
Extended likelihood inference: (Lee and Nelder) for observable y, fixed parameter q and unobservable n
Name, department
Date26
Parameter estimation )()/( yfyL
Name, department
Date27
Extended Likelihood Principle
Björnstad (1996) All information in the data about the unobservables and the parameters is in the “likelihood”.
Conditionality principle& Sufficiency principle
Likelihood principle
Name, department
Date28
Prediction: predict the number of seizures during the next week
Name, department
Date29
Name, department
Date30
Bayesian Predictive Inference
Given n, the observations y are assumed to be independent. How do we predict the next value, Y, of the observable? In a Bayesian setting we may determine the posterior and define the predictive density of Y given y as:)/( yxfY
)/( yf
Obs!
Jefreys’ Priors
Name, department
Date31
Bayesian inference (Pearson, 1920)
Name, department
Date32
Name, department
Date33
Nelder and Lee (1996)
?
Name, department
Date34
Name, department
Date35
Part 4: A Model for Longitudinal Data
Name, department
Date36
Introduction
In practice: often unbalanced data due to (i) unequal number of measurements per subject (ii) measurements not taken at fixed time points.
Therefore, ordinary multivariate regression techniques are often not applicable.
Often, subject-specific longitudinal profiles can be well approximated by linear regression functions. This leads to a 2-stage model formulation:
Stage 1: A linear (e.g. regression) model for each subject separately
Stage 2: Explain variability in the subject-specific (regression) coefficients using known covariates
Name, department
Date37
A 2-stage Model Formulation: Stage 1 Response Yij for ith subject, measured at time tij, i = 1, . . . , N,
j = 1, . . . , ni Response vector Yi for ith subject:
Zi is a (ni x q) matrix of known covariates and
bi is a (ni x q) matrix of parameters
Note that the above model describes the observed variability within subjects
iiiiiiii
iniii
InNZY
YYYYi
2
21
often ),,0(~ ,
)',...,,(
Possibly after some convenient transformation
Name, department
Date38
Stage 2
Between-subject variability can now be studied from relating the parameters bi to known covariates
Ki is a (q x p) matrix of known covariates and
b is a (p-dimensional vector of unknown regression
parameters Finally
iii bK
),0(~ ii Nb
Name, department
Date39
The General Linear Mixed-effectsModel The 2-stages of the 2-stage approach can now be
combined into one model:
Average evolution Subject specific
Name, department
Date40
Convenient using multivariate normal.Very difficult with other distributions
The general mixed effects models can be summarized by:
Terminology:• Fixed effects: b• Random effects: bi
• Variance components: elements in D and Si
Name, department
Date41
Remarks
1. It is occasionally unclear if we should treat an effect as a fixed or a mixed effect. For example in clinical trials with treatment and clinic as “factors” should we consider clinics as random?
2. Considering the general form of a mixed effects model
notice that the fixed effects are involved only in mean values (just like in ordinary linear models) while random effects modify the covariance matrix of the observations.
iiiii bZXY
?
Name, department
Date42
Example: The Rat Data
Name, department
Date43
Transformation of the time scale to linearize the profiles:
Note that t = 0 corresponds to the start of the treatment (moment of randomization)
• Stage 1 model:
]10
)45(1ln[
ij
ijij
AgetAge
iijijiiij njtY ,1,... ,21
Name, department
Date44
Stage 1
i
ii
2
1
Name, department
Date45
Stage 2 model:
In the second stage, the subject-specific intercepts and time effects are related to the treatment of the rats
Name, department
Date46
The hierarchical versus the marginal Model
The general mixed model is given by It can be written as
It is therefore also called a hierarchical model
Name, department
Date47
f(yi I bi)f(bi)
f(yi)
Marginally we have that is distributed as
Hence
Name, department
Date48
Example: The Rat Data
Linear model where eachrat has its own interceptand its own slope
Can be negative or positivereflecting individual deviationfrom average
Name, department
Date49
Notice that the model assumes that thevariance function is quadratic over time.
Comments:• Linear average evolution in each group• Equal average intercepts• Different average slopes
Moreover, taking
Name, department
Date50
),cov()(
),cov(
),cov(1
,
),cov(1
,1
),cov(1
)cov(,1
),1,,1(
))(),((
112221122111
11222112212111
112
2211212111
1122212
12111
1122
11
22
121
2
11
21
ii
ii
ii
ii
iii
i
ii
ii
i
i
i
dttdttd
dttdtdtd
tdtddtd
tdd
ddt
tt
ttCov
ttCov
YY
Name, department
Date51
Name, department
Date52
Name, department
Date53
The prostate data
iijijiijii
ij
ij
njtt
PSA
Y
,1,... ,
)1ln(2
321
A model for the prostate cancer Stage 1
Name, department
Date54
The prostate data
Age could not be matched
jiiiii
jiiiii
jiiiii
i
i
i
bMLBCAge
bMLBCAge
bMLBCAge
31514131211
2109876
154321
3
2
1
A model for the prostate cancer Stage 2
Ci, Bi, Li, Mi are indicators of the classes: control, BPH, local or
metastatic cancer. Agei is the subject’s age at diagnosis. The parameters in the first row are the average intercepts for the different classes.
Name, department
Date55
The prostate data
This gives the following model
eij
Name, department
Date56
Stochastic components in general linear mixed model
Average evolution
Subject 2
Subject 1
Time
Res
pons
e
Name, department
Date57
References
Aerts, M., Geys, H., Molenberghs, G., and Ryan, L.M.(2002). Topics in Modelling of Clustered Data. London: Chapman and Hall.
• Brown, H. and Prescott, R. (1999). Applied Mixed Models in Medicine. New-York: John Wiley & Sons.
• Crowder, M.J. and Hand, D.J. (1990). Analysis of Repeated Measures. London: Chapman and Hall.
• Davidian, M. and Giltinan, D.M. (1995). Nonlinear Models For Repeated Measurement Data. London: Chapman and Hall.
Davis, C.S. (2002). Statistical Methods for the Analysis of Repeated Measurements. New York: Springer-Verlag.
Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger, S.L. (2002). Analysis of Longitudinal Data. (2nd edition). Oxford: Oxford University Press.
Name, department
Date58
References
Fahrmeir, L. and Tutz, G. (2002). Multivariate Statistical Modelling Based on Generalized Linear Models, (2nd edition). Springer Series in Statistics. New-York: Springer-Verlag.
Goldstein, H. (1979). The Design and Analysis of Longitudinal Studies. London: Academic Press.
Goldstein, H. (1995). Multilevel Statistical Models. London: Edward Arnold.
Hand, D.J. and Crowder, M.J. (1995). Practical Longitudinal Data Analysis. London: Chapman and Hall.
Jones, B. and Kenward, M.G. (1989). Design and Analysis of Crossover Trials. London: Chapman and Hall.
Kshirsagar, A.M. and Smith, W.B. (1995). Growth Curves. New-York: Marcel Dekker.
Lindsey, J.K. (1993). Models for Repeated Measurements. Oxford: Oxford University Press.
Longford, N.T. (1993). Random Coefficient Models. Oxford: Oxford University Press.
Name, department
Date59
References
Pinheiro, J.C. and Bates D.M. (2000). Mixed effects models in S and S-Plus, Springer Series in Statistics and Computing. New-York: Springer-Verlag.
Searle, S.R., Casella, G., and McCulloch, C.E. (1992). Variance Components. New-York: Wiley.
Senn, S.J. (1993). Cross-over Trials in Clinical Research. Chichester: Wiley.
Verbeke, G. and Molenberghs, G. (1997). Linear Mixed Models In Practice: A SAS Oriented Approach, Lecture Notes in Statistics 126. New-York: Springer-Verlag.
Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. New-York: Springer-Verlag.
Vonesh, E.F. and Chinchilli, V.M. (1997). Linear and Non-linear Models for the Analysis of Repeated Measurements. Marcel Dekker: Basel.
Name, department
Date60
Any Questions?