1 Date Name, department Statistical Analysis of Longitudinal Data Ziad Taib Biostatistics, AZ April...

Name, department

1 Date

Statistical Analysis of Longitudinal Data

Ziad Taib

Biostatistics, AZ

April 2011

Name, department

Date2

Outline of lecture 1

1. An introduction

2. Two examples

3. Principles of Inference

4. Modelling continuous longitudinal data

Name, department

Date3

Part 1: An introduction

Name, department

Date4

Why longitudinal data?

Very useful for their own sake. With longitudinal data, we have the possibility of

understanding what mixed models are about in a relatively simple but yet rich enough context.

___________________________________

A good reference is the book ”Designing experiments and analyzing data” by Maxwel l& Delaney (2004)

Name, department

Date5

Longitudinal Data

Repeated measures are obtained when a response is measured repeatedly on a set of units• Units:

• Subjects, patients, participants, . . .

• indivduals, plants, . . .

• Clusters: nests, families, towns, . .

• . . .

• Special case: Longitudinal data

Obs! Possible to handle several levels

Name, department

Date6

A motivating example

Consider a randomized clinical trial with two treatment groups and repeated measurements at baseline, 3 and 6 months later. As it turned out some of the data was missing. Moreover patients did not always comply with time requirements. Our first reaction is to try to compensate for the missing values by some kind of imputation, or to use list-wise deletion.

Both ”methods” having their shortcomings, wouldn't it be nice to be able to use something else? There is in fact an alternative method: using the idea of mixed models.

With mixed models,1. we can use all our data having the attitude that ”what is missing is

missing”. 2. we can even account for the dependencies resulting from measurements

made on the same individuals at different times. 3. we don’t need to be consistent about time.

A

B

Baseline 3 months 6 months

Name, department

Date7

Mixed effects models

Ordinary fixed effects linear model usually assume:

1) independence with the same variance.2) normally distributed errors.3) constant parameters

If we modify assumptions 1) and 3), then the problem becomes more complicated and in general we need a large number of parameters only to describe the covariance structure of the observations. Mixed effects models deal with this type of problems.

In general, this type of models allows us to tackle such problems as: clustered data, repeated measures, hierarchical data.

constant. ),,0( is , 2 INXY

nnn x

x

Y

Y

...

1

......

1

...1

1

011

Name, department

Date8

Various forms of models and relation between them

LM: Assumptions:

1. independence,

2. normality,

3. constant parameters

GLM: assumption 2) Exponential family

LMM: Assumptions 1) and 3) are modified

GLMM: Assumption 2) Exponential family and assumptions 1) and 3) are modified

Repeated measures: Assumptions 1) and 3) are modified

Longitudinal dataMaximum likelihood

Classical statistics (Observations are random, parameters are unknown constants)

Bayesian statistics

LM - Linear model

GLM - Generalised linear model

LMM - Linear mixed model

GLMM - Generalised linear mixed model

Non-linear models

Name, department

Date9

Part 2: Two examples

Rat data Prostate data

Name, department

Date10

Example 1: Rat Data (Verbecke et al)

Research question How does craniofacial growth in the wistar rat depend on testosteron production?

Name, department

Date11

Simplifie

d

(univariate) re

sponse

Name, department

Date12

•Randomized experiment in which 50 male Wistar rats are randomized to:

Control (15 rats) Low dose of Decapeptyl (18 rats) High dose of Decapeptyl (17 rats)

Treatment starts at the age of 45 days. Measurements taken every 10 days, from day 50

on. The responses are distances (pixels) between two

well defined points on x-ray pictures of the skull of each rat. Here, we consider only one response, reflecting the height of the skull.

Prevents the production of testesterone

45

Days

60 7050 80

Name, department

Date13

Individual profiles:

1. Connected profiles better that scatter plots2. Growth is expected but is it linear3. Of interest change over time (i.e. Relationship between response and age)

Name, department

Date14

Complication: Many dropouts due to anaesthesia imply less power but

no bias.

Without dropouts easier problem because of balance.

Name, department

Date15

Remarks:

Much variability between rats Much less variability within rats Fixed number of measurements scheduled per

subject, but not all measurements available due to dropout, for known reason.

Measurements taken at fixed time points

Research question: How does craniofacial growth in the wistar

rat depend on testosteron production ?

Name, department

Date16

Example 2: The BLSA Prostate Data

Name, department

Date17

Example 2: The BLSA Prostate Data (Pearson et al., Statistics in Medicine,1994). Prostate disease is one of the most common and

most costly medical problems in the world. Important to look for biomarkers which can detect the disease at an early stage.

Prostate-Specific Antigen is an enzyme produced by both normal and cancerous prostate cells. It is believed that PSA level is related to the volume of prostate tissue.

Problem: Patients with Benign Prostatic Hyperplasia also have an increased PSA level

Overlap in PSA distribution for cancer and BPH cases seriously complicates the detection of prostate cancer.

Name, department

Date18

Research question: Can longitudinal PSA profiles be used to detect prostate cancer in an early stage ?

A retrospective case-control study based on frozen serum samples:

16 control patients 20 BPH cases 14 local cancer cases 4 metastatic cancer cases

Name, department

Date19

Individual profiles:

Name, department

Date20

Remarks:

Much variability between subjects Little variability within subjects Highly unbalanced data

Research question: Can longitudinal PSA profiles be used to

detect prostate cancer in an early stage ?

Name, department

Date21

Part 3: Principles of Inference

Name, department

Date22

Fisher´s likelihood Inference for observable y and fixed parameter q Data Generation : Given a stochastic model

, Generate data, y, from

Parameter Estimation : Given the data y, make inference about q by using the likelihood

Connection between two processes :

)(yf

)/( yL

)()/( yfyL

)(yf

Name, department

Date23

(Classical) Likelihood Principle

Birnbaum (1962) All the evidence or information about the parameters in the data is in the likelihood.

Conditionality principle& Sufficiency principle

Likelihood principle

Name, department

Date24

Bayesian Inference for observable y and unobservable n Data Generation : Generate data according to

1. n, from

2. For n fixed generate y from

Combine into Parameter Estimation : Given the data y, make

inference about n by using The connection between two processes:

)(f

)/()()/()( yfyfyff

)/()( yff

)/( yf

)/( yf

prior

posterior

Compare with )/( yL

)/()(),()/()()(

),()/( yffyfyfyf

yf

yfyf

Name, department

Date25

Extended likelihood inference: (Lee and Nelder) for observable y, fixed parameter q and unobservable n

Name, department

Date26

Parameter estimation )()/( yfyL

Name, department

Date27

Extended Likelihood Principle

Björnstad (1996) All information in the data about the unobservables and the parameters is in the “likelihood”.

Conditionality principle& Sufficiency principle

Likelihood principle

Name, department

Date28

Prediction: predict the number of seizures during the next week

Name, department

Date29

Name, department

Date30

Bayesian Predictive Inference

Given n, the observations y are assumed to be independent. How do we predict the next value, Y, of the observable? In a Bayesian setting we may determine the posterior and define the predictive density of Y given y as:)/( yxfY

)/( yf

Obs!

Jefreys’ Priors

Name, department

Date31

Bayesian inference (Pearson, 1920)

Name, department

Date32

Name, department

Date33

Nelder and Lee (1996)

?

Name, department

Date34

Name, department

Date35

Part 4: A Model for Longitudinal Data

Name, department

Date36

Introduction

In practice: often unbalanced data due to (i) unequal number of measurements per subject (ii) measurements not taken at fixed time points.

Therefore, ordinary multivariate regression techniques are often not applicable.

Often, subject-specific longitudinal profiles can be well approximated by linear regression functions. This leads to a 2-stage model formulation:

Stage 1: A linear (e.g. regression) model for each subject separately

Stage 2: Explain variability in the subject-specific (regression) coefficients using known covariates

Name, department

Date37

A 2-stage Model Formulation: Stage 1 Response Yij for ith subject, measured at time tij, i = 1, . . . , N,

j = 1, . . . , ni Response vector Yi for ith subject:

Zi is a (ni x q) matrix of known covariates and

bi is a (ni x q) matrix of parameters

Note that the above model describes the observed variability within subjects

iiiiiiii

iniii

InNZY

YYYYi

2

21

often ),,0(~ ,

)',...,,(

Possibly after some convenient transformation

Name, department

Date38

Stage 2

Between-subject variability can now be studied from relating the parameters bi to known covariates

Ki is a (q x p) matrix of known covariates and

b is a (p-dimensional vector of unknown regression

parameters Finally

iii bK

),0(~ ii Nb

Name, department

Date39

The General Linear Mixed-effectsModel The 2-stages of the 2-stage approach can now be

combined into one model:

Average evolution Subject specific

Name, department

Date40

Convenient using multivariate normal.Very difficult with other distributions

The general mixed effects models can be summarized by:

Terminology:• Fixed effects: b• Random effects: bi

• Variance components: elements in D and Si

Name, department

Date41

Remarks

1. It is occasionally unclear if we should treat an effect as a fixed or a mixed effect. For example in clinical trials with treatment and clinic as “factors” should we consider clinics as random?

2. Considering the general form of a mixed effects model

notice that the fixed effects are involved only in mean values (just like in ordinary linear models) while random effects modify the covariance matrix of the observations.

iiiii bZXY

?

Name, department

Date42

Example: The Rat Data

Name, department

Date43

Transformation of the time scale to linearize the profiles:

Note that t = 0 corresponds to the start of the treatment (moment of randomization)

• Stage 1 model:

]10

)45(1ln[

ij

ijij

AgetAge

iijijiiij njtY ,1,... ,21

Name, department

Date44

Stage 1

i

ii

2

1

Name, department

Date45

Stage 2 model:

In the second stage, the subject-specific intercepts and time effects are related to the treatment of the rats

Name, department

Date46

The hierarchical versus the marginal Model

The general mixed model is given by It can be written as

It is therefore also called a hierarchical model

Name, department

Date47

f(yi I bi)f(bi)

f(yi)

Marginally we have that is distributed as

Hence

Name, department

Date48

Example: The Rat Data

Linear model where eachrat has its own interceptand its own slope

Can be negative or positivereflecting individual deviationfrom average

Name, department

Date49

Notice that the model assumes that thevariance function is quadratic over time.

Comments:• Linear average evolution in each group• Equal average intercepts• Different average slopes

Moreover, taking

Name, department

Date50

),cov()(

),cov(

),cov(1

,

),cov(1

,1

),cov(1

)cov(,1

),1,,1(

))(),((

112221122111

11222112212111

112

2211212111

1122212

12111

1122

11

22

121

2

11

21

ii

ii

ii

ii

iii

i

ii

ii

i

i

i

dttdttd

dttdtdtd

tdtddtd

tdd

ddt

tt

ttCov

ttCov

YY

Name, department

Date51

Name, department

Date52

Name, department

Date53

The prostate data

iijijiijii

ij

ij

njtt

PSA

Y

,1,... ,

)1ln(2

321

A model for the prostate cancer Stage 1

Name, department

Date54

The prostate data

Age could not be matched

jiiiii

jiiiii

jiiiii

i

i

i

bMLBCAge

bMLBCAge

bMLBCAge

31514131211

2109876

154321

3

2

1

A model for the prostate cancer Stage 2

Ci, Bi, Li, Mi are indicators of the classes: control, BPH, local or

metastatic cancer. Agei is the subject’s age at diagnosis. The parameters in the first row are the average intercepts for the different classes.

Name, department

Date55

The prostate data

This gives the following model

eij

Name, department

Date56

Stochastic components in general linear mixed model

Average evolution

Subject 2

Subject 1

Time

Res

pons

e

Name, department

Date57

References

Aerts, M., Geys, H., Molenberghs, G., and Ryan, L.M.(2002). Topics in Modelling of Clustered Data. London: Chapman and Hall.

• Brown, H. and Prescott, R. (1999). Applied Mixed Models in Medicine. New-York: John Wiley & Sons.

• Crowder, M.J. and Hand, D.J. (1990). Analysis of Repeated Measures. London: Chapman and Hall.

• Davidian, M. and Giltinan, D.M. (1995). Nonlinear Models For Repeated Measurement Data. London: Chapman and Hall.

Davis, C.S. (2002). Statistical Methods for the Analysis of Repeated Measurements. New York: Springer-Verlag.

Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger, S.L. (2002). Analysis of Longitudinal Data. (2nd edition). Oxford: Oxford University Press.

Name, department

Date58

References

Fahrmeir, L. and Tutz, G. (2002). Multivariate Statistical Modelling Based on Generalized Linear Models, (2nd edition). Springer Series in Statistics. New-York: Springer-Verlag.

Goldstein, H. (1979). The Design and Analysis of Longitudinal Studies. London: Academic Press.

Goldstein, H. (1995). Multilevel Statistical Models. London: Edward Arnold.

Hand, D.J. and Crowder, M.J. (1995). Practical Longitudinal Data Analysis. London: Chapman and Hall.

Jones, B. and Kenward, M.G. (1989). Design and Analysis of Crossover Trials. London: Chapman and Hall.

Kshirsagar, A.M. and Smith, W.B. (1995). Growth Curves. New-York: Marcel Dekker.

Lindsey, J.K. (1993). Models for Repeated Measurements. Oxford: Oxford University Press.

Longford, N.T. (1993). Random Coefficient Models. Oxford: Oxford University Press.

Name, department

Date59

References

Pinheiro, J.C. and Bates D.M. (2000). Mixed effects models in S and S-Plus, Springer Series in Statistics and Computing. New-York: Springer-Verlag.

Searle, S.R., Casella, G., and McCulloch, C.E. (1992). Variance Components. New-York: Wiley.

Senn, S.J. (1993). Cross-over Trials in Clinical Research. Chichester: Wiley.

Verbeke, G. and Molenberghs, G. (1997). Linear Mixed Models In Practice: A SAS Oriented Approach, Lecture Notes in Statistics 126. New-York: Springer-Verlag.

Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer Series in Statistics. New-York: Springer-Verlag.

Vonesh, E.F. and Chinchilli, V.M. (1997). Linear and Non-linear Models for the Analysis of Repeated Measurements. Marcel Dekker: Basel.

Name, department

Date60

Any Questions?

1 Date Name, department Statistical Analysis of Longitudinal Data Ziad Taib Biostatistics, AZ April...

Documents

Transcript of 1 Date Name, department Statistical Analysis of Longitudinal Data Ziad Taib Biostatistics, AZ April...