Correlated data - Introduction -...

80
university of copenhagen department of biostatistics Faculty of Health Sciences Correlated data Introduction Julie Lyng Forman & Lene Theil Skovgaard November 25, 2013 1 / 80

Transcript of Correlated data - Introduction -...

Page 1: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Faculty of Health Sciences

Correlated dataIntroduction

Julie Lyng Forman & Lene Theil SkovgaardNovember 25, 2013

1 / 80

Page 2: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Introduction

I The idea of the courseI Comparing two types of measurementI Logarithmic transformationI Linear regressionI The general linear model

Home page:http://staff.pubhealth.ku.dk/~lts/CorrelatedMeasurementsE-mail: [email protected]

2 / 80

Page 3: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Aim of the course

To make the participants able to:I understand and interpret advanced statistical analysesI judge the assumptions behind the use of various methods of

analysesI perform own analyses using SASI understand output from a statistical program package

- in general, i.e. other than SASI present results from a statistical analysis - numerically and

graphically

To create a better platform for communication between ’users’ ofstatistics and statisticians, to benefit subsequent collaboration

3 / 80

Page 4: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

We expect students to . . .

Be interested

Be motivatedI ideally from your own (future) research project

Have basic knowledge of statistical concepts such as:I mean, averageI variance, standard deviation, standard errorI distributionI correlation, regression, anovaI t-test, χ2-test, F-test

4 / 80

Page 5: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Topics for the course

Quantitative data (normal distribution):I Analysis of variance

I Variance component modelsI General linear models / regression analysis

I Linear mixed modelsNon-normal outcome (binary data or count data):

I Logistic or Poisson regressionI Generalized linear mixed models

Not covered:I Multivariate data (several outcomes at once)I Censored data (survival analysis)

5 / 80

Page 6: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Recommended reading

I The lecture notes(can be downloaded from the course webpages).

I Brief notes about SAS-programming(can be downloaded from the course webpages).

I B.T. West, K.B. Welch and A.T. Galecki:Linear mixed models: a practical guide using statistical software,Chapman & Hall/CRC, 2007

We teach SAS programming.I . . . but the book also covers SPSS, R, Stata, and HLM.

6 / 80

Page 7: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Teaching activities

Lectures:I Mornings (9.15–12.00)I Copies of overheads must be downloaded in advanceI Coffee break around 10-10.30

Computers labs:I In the afternoon (13.00-15.45) following each lectureI Coffee, tea, and cake will be servedI Exercises will be handed outI Solutions can be downloaded after classes

7 / 80

Page 8: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Course diploma

To pass the course 80% attendance is required.I It is your responsibility to sign the list each morning and each

afternoon.I Note: 5× 2 = 10 lists, 80% equals 8 half days.

There is no compulsory home work . . .I but to benefit from the course you need to work with the

material at homeI We expect you to do so!

8 / 80

Page 9: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

What are repeated measurements?

Repeated measurements refer to data where the same outcome hasbeen measured in different situations (or at different spots) on thesame individuals.

I Special case: longitudinal means repeatedly over time.

Repeated measurements are termed clustered data when the sameoutcome is measured on groups of individuals from the samefamilies/workplaces/school classes/villages/etc.

9 / 80

Page 10: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Paired data

The most simple example of clustered or repeated measuments.I Two replicates or two subjects per cluster

Examples of paired data:I Same person with treatment and placebo (cross-over studies)I Baseline-follow up studiesI Twin studiesI Comparison of two measurement methodsI Reliability of a measurement method

Quantiative outcome analysed with the paired t-test BUToften the test is not in focus, rather estimation/quantification

10 / 80

Page 11: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Statistical analysis

The usual assumption is that observations are independent.

If you have clustered or repeated measurements the assumption ofindependence is violated.

I Your analyses must account for the repetitions/clustering.I In this course we will teach you how to do it.

Warning: Ignoring the repetitions/clustering and doing a standardanalysis most often leads to:

I P-values that are too small or too large.I confidence intervals that are too wide or too narrow.

11 / 80

Page 12: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Example: MF vs SV

Two measurement methods,expected to give the same result:

MF: Transmitral volumetric flow,determined by Dopplereccocardiography

SV: Left ventricular strokevolume, determined bycross-sectional eccocardiography

subject MF SV1 47 432 66 703 68 724 69 815 70 60. . .. . .. . .. . .18 105 9819 112 10820 120 13121 132 131

12 / 80

Page 13: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Comparison of measurement methods

Usually a comparison of a new experimental method with anestablished method (the reference)

I How well do the two measurements agree?I Is the new method biased compared to the reference?

The data is pairedI The subjects act as their own controlsI Hence we look at differences within subjects

Set up a statistical model to:I Describe the typical size of the differencesI Test if the bias (i.e. the mean difference) is zero

13 / 80

Page 14: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Description of the dataGraphical description

I ScatterplotI Sample pathsI Bland-Altman plotI Histogram

Numerical description

Variable Mean Std.Dev-------------------------MF 86.05 20.32SV 85.81 21.19DIF 0.24 6.96AVERAGE 85.93 20.46-------------------------

14 / 80

Page 15: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Statistical model for paired data

xi : MF-measurement for the i’th subjectyi : SV-measurement for the i’th subject

Look at the differences:

di = xi − yi , for i = 1, . . . , 21

The model asssumes that the differences? are:I independentI normally distributed di ∼ N (δ, σ2

d)? No assumptions are made about the distribution of the individualflow measurements15 / 80

Page 16: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The normal distribution

x

De

ns

ity

2

1 1( , )N m s

2

2 2( , )N m s

1 1m s+1 1m s-­ ­ 2 2m s-

2 2m s+2m1m

N (µ, σ2)

The mean is often denotedµ or α.

The standard deviation isoften denoted σ or ω.

The variance is σ2.

16 / 80

Page 17: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Paired t-test in SAS

Can be performed in two different ways:

1. as a paired two-sample test

PROC TTEST;PAIRED mf*sv;

RUN;

The TTEST ProcedureStatistics

Lower CL Upper CL Lower CL Upper CLDifference N Mean Mean Mean Std Dev Std Dev Std Devmf - sv 21 -2.932 0.2381 3.4078 5.3275 6.9635 10.056

Difference Std Err Minimum Maximummf - sv 1.5196 -13 10

T-TestsDifference DF t Value Pr > |t|mf - sv 20 0.16 0.8771

17 / 80

Page 18: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

One-sample tests in SAS, for differences

2. as a one-sample test on the differences:

PROC UNIVARIATE NORMAL;VAR dif;

RUN;

The UNIVARIATE ProcedureVariable: dif

Tests for Location: Mu0=0

Test -Statistic- -----p Value------

Student’s t t 0.156687 Pr > |t| 0.8771Sign M 2.5 Pr >= |M| 0.3593Signed Rank S 8 Pr >= |S| 0.7603

Moments

N 21 Sum Weights 21Mean 0.23809524 Sum Observations 5Std Deviation 6.96351034 Variance 48.4904762... ... ... ...

18 / 80

Page 19: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

About the paired t-test

Test of the null hypothesis H0 : δ = 0 (no bias)

The t-statistic is given by:

t = d − 0SEM = 0.24− 0

6.96/√

21= 0.158 ∼ t(20)

which gives P = 0.88, i.e. no significant bias.

Does this mean that the measurement methods are equally good?

19 / 80

Page 20: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Estimation of bias

The estimated mean difference is given by

d = 0.24 cm3

The estimate is our best guess, but repeating the experimentwould give us a somewhat different result

The estimate has a distribution, with an uncertainty called thestandard error of the estimate.

I The standard error of the mean is given by

SEM = sd√n = 6.96√

21= 1.52 cm3

20 / 80

Page 21: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

General confidence intervals

Confidence intervals tells us what the parameter is likely to beI An interval, that ’catches’ the true mean with a 95%

probability is called a 95% confidence intervalI 95% is called the coverage

The usual construction is:I Average ±t97.5%(n − 1) · SEMI Often a good approximation, even if data are not normally

distributed (due to the central limit theorem)

The t-quantile t97.5% may be looked up in a table or computed by a program (e.g. R,see http://mirrors.dotsrc.org/cran/).

21 / 80

Page 22: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Confidence limits for the bias

For the differences mf-sv, we get the confidence interval:

d ± t97.5%(20) · SEM0.24 ± 2.086 · 6.96/

√21

(−2.93 ; 3.41)

If there is a bias, it is likely (i.e. with 95% certainty) within thelimits (−2.93cm3, 3.41cm3)

Conclusion:We cannot rule out a bias of approx. 3 cm3 in either direction

22 / 80

Page 23: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

P-values and confidence intervals

Tests and confidence intervals are equivalent in a certain senseI They agree on ’reasonable’ values for the meanI The confidence interval contains the values δ0 for which

H0 : δ = δ0 would be accepted

But the P-value is less informative than the confidence intervalI If the study is large a tiny bias may be significantI If the study is small a large bias may be insignificantI Better use the confidence interval to judge the clinical

implications of the bias!

23 / 80

Page 24: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Note the difference

Standard error (of the mean), SE(M)I tells us something about the uncertainty of the estimate of

the meanI SEM = SD/

√n is the standard deviation in the distibution of

the estimateI – is used for comparisons, relations etc.

Standard deviation, SDI tells us something about the variation in our sample,I and presumably in the populationI – is used when describing the data

24 / 80

Page 25: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Normal regions

The normal region is an interval containing 95% of the ’typical’observations, i.e. the midrange of the population:

2.5%-quantile to 97.5%-quantile

If the distribution is normal N (µ, σ2), thenI 2.5%-quantile to 97.5%-quantile is µ± 1.96σ

An estimated normal region is given by:

Average± 2× SD

But this does not account for parameter uncertainty!

25 / 80

Page 26: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Prediction intervals

A prediction interval has to ’catch’ future observations with highprobability, say 95%.

x ± 2s is a good prediction interval if the sample is large.But if the sample is small the coverage will be too low.

95% coverage is attained by the prediction interval:

(x − s ·√

1 + 1/n · t2.5%, x + s ·√

1 + 1/n · t97.5%)

I.e. the probability that a randomly chosen subject from thepopulation has a value in this interval is 95% if the data is normal

26 / 80

Page 27: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Limits of agreement

Limits-of-agreement is the prediction interval for the differencebetween two measuring methods

I important for deciding whether or not two measurementmethods may replace each other.

Limits-of-agreement for mf-sv are given by:

0.24± 2.086 ·√

1 + 1/21 · 6.96 = (−14.97, 15.45)

While "x ± 2s" is too narrow / has too low coverage:

d ± 2 · sd = 0.24± 2 · 6.96 = (−13.68, 14.16)

27 / 80

Page 28: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Derivation of the prediction interval

Assume that dnew is a new observation, then

dnew − d ∼ N(0, σ2

d ·(1 + 1

n) )

dnew−dsd ·√

1+1/n∼ t(n − 1)

implying that with 95% probability:

t2.5% < dnew−dsd ·√

1+1/n< t97.5%

d + sd√

1 + 1/n · t2.5% < dnew < d + sd ·√

1 + 1/n · t97.5%

d − sd√

1 + 1/n · t97.5% < dnew < d + sd ·√

1 + 1/n · t97.5%

since t2.5% = −t97.5% by symmetry of the t-distribution.

28 / 80

Page 29: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Assumptions for the paired comparison

The differences:I are independent, i.e. the subjects are unrelated

I are normally distributed: judged graphically or numericallyI by inspection of histograms or QQ-plotsI by formal tests (e.g. PROC UNIVARIATE NORMAL in SAS)

I have have identical variances: judged using the ’Bland-Altmanplot’ of differencs vs. averages

Sometimes it is necessary to tranform the data in order to fulfillthe assumptions

29 / 80

Page 30: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Checking normality: the QQ-plot

Observed quantiles againsttheoretical normal quantiles

If the data is normal, the pointswill be close to the line

30 / 80

Page 31: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Model assumption: Normality?

Assumption: the differences follow a normal distribution.

We can check the assumption by e.g. looking at the histogram orthe QQ-plot.

But with large samples the assumption is not always necessary:I The validity of the t-test and the confidence intervals only rely

on the distributions of the average d . . .I and averages tend to be normal due to the CLT.

However: Normal regions (e.g. limits of agreement) require anormal distribution.

31 / 80

Page 32: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The central limit theorem (CLT)Averages of rolls of dice are more normal than a single roll

One dice roll

Average0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

2 dice rolls

Average0 1 2 3 4 5 6 7

0.0

0.2

0.4

0.6

10 dice rolls

Average2 3 4 5

0.0

0.2

0.4

0.6

50 dice rolls

Average2.5 3.0 3.5 4.0 4.5

0.0

0.5

1.0

1.5

32 / 80

Page 33: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Classical two-sample (unpaired) comparison

If the two treatments were applied to separate groups of subjcets– we have independent samplesTraditional model assumptions:

x11, · · · , x1n1 ∼ N (µ1, σ2)

x21, · · · , x2n2 ∼ N (µ2, σ2)

I All observations are independentI Observations follow a normal distribution within each groupI Both groups have the same variance, σ2

I The mean values, µ1 and µ2 may differ

33 / 80

Page 34: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Paired or unpaired comparison?

Note the consequences for the difference between MF and SV:

Estimated mean differenceI 0.24, CI: (-2.93, 3.41) according to the paired t-testI 0.24, CI: (-12.71, 13.19) according to the unpaired t-test

i.e. same estimate but a much wider confidence intervalI The latter is wrong!

You have to respect your design.I Do not forget to take advantage of a subject serving as its

own control (higher power with fewer individuals)

34 / 80

Page 35: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Comparing measurement methods

When comparing two measurement methods:I We have to determine the proper scale

before carrying out the statistical analysis

Is the precision of the measurements approximately the same overthe entire range?

I In that case look at differences on an absolute scaleI Use the differences between the raw measurements

Or does the precision increase with the size of the quantity beingmeasured?

I In that case look at differences on a relative scaleI Make a logarithmic transformation

35 / 80

Page 36: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Another comparison: REFE vs TEST

Two methods for determiningconcentration of glucose:

I REFE: Colour test, may be’polluted’ by urine acid

I TEST: Enzymatic test,more specific for glucose

Ref: R.G. Miller et.al. (eds):Biostatistics Casebook.John Wiley & Sons, 1980.

nr. REFE TEST1 155 1502 160 1553 180 169. . .. . .. . .44 94 8845 111 10246 210 188

average 144.1 134.2SD 91.0 83.2

36 / 80

Page 37: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The usual analysis - the naive approach

Do we see a systematic difference?Test ’δ=0’ assuming di = REFEi − TESTi ∼ N (δ, σ2

d)

d = 9.89, sd = 9.70⇒ t = dSEM = d

sd/√

n = 6.92 ∼ t(45)hence P< 0.0001 , i.e. stong indication of bias.

Limits of agreement tells us that the typical differences are

9.89± t97.5%(45) ·√

1 + 1/46 · 9.70 = (−9.85, 29.64)

Is this a valid analysis?!?

37 / 80

Page 38: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Plots of the raw data

Scatter plot and Bland Altman plot:

The variance of the differences increases with the level;so the model assumptions of the usual analysis are violated!38 / 80

Page 39: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Plots of the log-transformed data

Precision seem to be relative, hence we do a log-transformation

I The plots look better except for an outlier

39 / 80

Page 40: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Close up

Following a logarithmictransformation (andomission of the outlier)the Bland Altman plotlooks OK

40 / 80

Page 41: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Notes on the log-transformation

I It is the original measurements, that have to be transformedwith the logarithm, not the differences!

I Never make a logarithmic transformation on data that mightbe negative!

I It does not matter which logarithm you choose (i.e. whichbase of the logarithm) since they are all proportional

I The procedure with construction of limits of agreement is nowrepeated for the transformed observations

I The result can be transformed back to the original scale withthe anti-logarithm (exp for the natural logarithm)

41 / 80

Page 42: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The correct analysis

Do we see a systematic difference?Test ’δ=0’ assuming di = log(REFEi)− log(TESTi) ∼ N (δ, σ2

d)

d = 0.066, sd = 0.042⇒ t = dSEM = d

sd/√

n = 10.66 ∼ t(45)P< 0.0001 , i.e. stong indication of bias.

Limits of agreement tells us that the typical differences are

0.066± t97.5%(45) ·√

1 + 1/46 · 0.042 = (−0.020, 0.152)

. . . on Log-scale!

42 / 80

Page 43: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Back transformation

Limits of agreement on log-scale are (−0.020, 0.152),meaning that for 95% of the subjects we will have:

−0.020 < log(REFE)− log(TEST) < 0.152

i.e. − 0.020 < log(REFETEST

)< 0.152

Back transforming (using the exponential function):

0.982 = exp(−0.020) < REFETEST

< exp(0.152) = 1.162

or reversed: 0.859 = 11.162 <

TESTREFE

<1

0.982 = 1.02

So TEST will typically lie 14% below to 2% above REFE.43 / 80

Page 44: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Limits of agreement on the original scale

44 / 80

Page 45: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Non-normal data

If the normal distribution is not a good description:I Tests and confidence intervals are valid if the sample is

sufficiently large (due to the central limit theorem).

I To judge the reliability for a given sample:I Use resampling techniquesI Or check with a statistician

I Normal regions and limits of agreement becomeuntrustworthy!

45 / 80

Page 46: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Example: Fertility and aging

Cross-sectional study: 527 women aged 22–42.

Objective: How does fertility decline with age?

Outcomes: Physiological markers of fertilityI Menstrual cycle lengthI Reproductive hormones (FSH, AMH, . . . )I Ovarian volumeI Antral follicle count (AFC)

46 / 80

Page 47: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Simple linear regeression for AFCAFC = α+ β · age + ε – is this a good model?

47 / 80

Page 48: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Log-linear regressionA more plausible model is exponential decay, implying a linearmodel on logarithmic scale: log(AFC) = α+ β · age + ε

48 / 80

Page 49: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Regression with SAS

PROC GLM DATA=menopause;MODEL logafc = age / SOLUTION CLPARM;RUN;

The GLM Procedure

R-Square Coeff Var Root MSE logafc Mean0.053070 21.53554 0.622772 2.891832

Source DF Type III SS Mean Square F Value Pr > FAGE 1 11.41154527 11.41154527 29.42 <.0001

Parameter Estimate Std.Error t Value Pr > |t| 95% Confidence LimitsIntercept 4.066684811 0.21828311 18.63 <.0001 3.637869196 4.495500427AGE -0.035958049 0.00662907 -5.42 <.0001 -0.048980815 -0.022935284

Note: We could have used PROC REG instead.

49 / 80

Page 50: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Regression equation and estimates

The estimates for the linear regression on logarithmic scale are:

Intercept α = 4.07 (95% CI 3.64–4.50)I The "expected value for age= 0"!

Regression coefficient β = −0.036 (95% CI -0.049 to -0.023)I The expected decrease in log(AFC) with one year of aging.

50 / 80

Page 51: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Rate of decline

We see exponential decay on the natural scale.

The expected AFC for age x (median or geometric mean) is

AFC(x) = exp(α+ βx)

I With one year of aging x → x + 1I AFC(x + 1) = exp(α+ β(x + 1)) = exp(β) · AFC(x)I Annual rate of change is the factor exp(β)

corresponding to the decline {1− exp(β)} · 100%.I Estimated by exp(β) = 0.9646, i.e. a decline of 3.5%.

51 / 80

Page 52: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Multiple regression

The regression could be biased by possible confounders:I Use of oral contraceptives (yes, no)I Smoking (current, former og never)I Prenatal smoking exposure (yes, no)I BMI (under weight, normal weight, over weight, obese)

Adjust for these in a multiple regression (general linear model):

Yi = α+ βX + β1Xi,1 + . . .+ βkXi,k + εi

with k additional covariates.I Some of these are dummy variables coding for relevant groups

52 / 80

Page 53: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

SAS-program

PROC GLM DATA=menopause;CLASS oc smoking prenatsmoke bmigrp;MODEL logafc = oc smoking prenatsmoke bmigrp age

/ SOLUTION CLPARM;OUTPUT OUT=diagnostics p=fitted r=residual student=stres;RUN;

The GLM Procedure

Sum ofSource DF Squares Mean Square F Value Pr > FModel 8 22.7349809 2.8418726 7.58 <.0001Error 497 186.2490086 0.3747465Corrected Total 505 208.9839894

R-Square Coeff Var Root MSE logafc Mean0.108788 21.15842 0.612165 2.893247

53 / 80

Page 54: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

SAS-output

Source DF Type III SS Mean Square F Value Pr > FOC 1 8.38447592 8.38447592 22.37 <.0001SMOKING 2 0.04472481 0.02236240 0.06 0.9421PRENATSMOKE 1 1.74079772 1.74079772 4.65 0.0316BMIGRP 3 0.68550681 0.22850227 0.61 0.6089AGE 1 15.39698818 15.39698818 41.09 <.0001

StandardParameter Estimate Error t Value Pr > |t|Intercept 4.007665017 B 0.29614093 13.53 <.0001OC no 0.313390980 B 0.06625480 4.73 <.0001OC yes 0.000000000 B . . .SMOKING never -0.023610470 B 0.07174410 -0.33 0.7422SMOKING previous -0.023529734 B 0.08113255 -0.29 0.7719SMOKING smoker 0.000000000 B . . .PRENATSMOKE no-smoke 0.130881971 B 0.06072597 2.16 0.0316PRENATSMOKE smoke 0.000000000 B . . .BMIGRP normal 0.153602199 B 0.18013313 0.85 0.3942BMIGRP over25 0.084779529 B 0.19228883 0.44 0.6595BMIGRP over30 0.050248838 B 0.21702144 0.23 0.8170BMIGRP under18.5 0.000000000 B . . .AGE -0.047386837 0.00739279 -6.41 <.0001

Adjusted β = −0.047, i.e. rate of decline by 4.6%.54 / 80

Page 55: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

SAS-output

Parameter 95% Confidence Limits

Intercept 3.425822534 4.589507500OC no 0.183216961 0.443565000OC yes . .SMOKING never -0.164569584 0.117348643SMOKING previous -0.182934802 0.135875335SMOKING smoker . .PRENATSMOKE no-smoke 0.011570705 0.250193237PRENATSMOKE smoke . .BMIGRP normal -0.200314124 0.507518523BMIGRP over25 -0.293019675 0.462578732BMIGRP over30 -0.376143740 0.476641415BMIGRP under18.5 . .AGE -0.061911820 -0.032861855

. . . with 95% confidence interval (-0.062,-0.033),corresponding to a decline between 3.2% and 6.0%.

55 / 80

Page 56: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Interpretation of regression coefficients

Simple regression Y = α+ β · age + ε

I β is the expected change in log(AFC) when age increases byone year.

Multiple regression Y = α+ β · age + β1 ·X1 + . . .+ βkXk + ε

I β is the expected change in log(AFC) when age increases byone year and all other covariates are held fixed.

Similarly for the other covariates:I e.g. exp(0.154) ' 1.166 or 16.6% higher AFC for normal BMI

compared to < 18.5 and all other covariates held fixed.

56 / 80

Page 57: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Hypothesis tests

Does AFC decline with age?

T-test for H0 β = 0:I β = −0.0439, s.e(β) = 0.0074, t = β/s.e(β) = −6.41.I P < 0.0001 in t-distribution with 497 degrees of freedom.

Equivalent to F-test:I Mean Square(Age)/Mean Square(Error) = 41.09I P < 0.0001 in F-distribution with (1,497) degrees of freedom

Note: In case of a categorical covariates with more than two levelsonly the F-test is generally applicable.

57 / 80

Page 58: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Tests of type I and type III

Mind the difference!

Type I: Test the effect of each covariate after ajustment for allother covariates above it on the list.

I Sequential tests to be read bottom-up.

Type III: Test the effect of each covariate after ajustment for allother covariates on the list.

I Non-sequential tests, pick the one that you like.

58 / 80

Page 59: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Predictions (fitted values)

log(AFC) = α+ β · age + β1 · I (no prenatal smoking)+β2 · I (never smoker) + β3 · I (previous smoker)+β4 · I (normal BMI) + . . .+ β6 · I (BMI > 30)+β7 · I (No use of oral contraceptives)

Expected log(AFC) of a 30 year old woman, no smoking, normalweight, non-user of oral contraceptives:

log(AFC) = 4.008−0.047·30−0.024+0.131+0.154+0.313 = 3.172

I.e. we expect an AFC of exp(3.172) ' 24.59 / 80

Page 60: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Model assumptions

The general linear model assumes that:

1. The observations are independent2. The linear model for the mean is correct3. Error terms (εi ’s) are normally distributed with zero mean

and equal variances

Use the residuals for model diagnostics:

Ri = Yi − Yi

I "Observed value - Predicted value"I Standardized values are preferred for diagnostics (because of

varying estimation uncertainty in the predicted values)60 / 80

Page 61: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Residual plot

Should be fairly symemtric around zero and with no systematicpatterns.

61 / 80

Page 62: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Residuals against covariatesSimilar plot – looking for non-linear relation with a covariate.

62 / 80

Page 63: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Checking normality: the QQ-plot

63 / 80

Page 64: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Example: Maternal age at menopause

64 / 80

Page 65: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Example: Maternal age at menopause

Does the decline in fertility depend on heridatory factors?

Three groups according to maternal age at menopause:I Early, ≤ 45 years of ageI Normal, 46 to 54 years of ageI Late, > 55 years of age

We have a log-linear model for each group.I Is the rate of decline the same in all three groups?

65 / 80

Page 66: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Analysis of covariance

Another name for a general linear model with one quantiativecovariate and one categorical covariate

I We have one regression line for each group

Are the lines parallel?I If not we have an interaction between the two covariates

Are the lines identical?I If not we have differences among the groups

66 / 80

Page 67: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Example: Maternal age at menopause

In the late-group fertility seems to increase with age???67 / 80

Page 68: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Estimating regression lines

Model: log(AFC)ij = αj + βj · ageij , j = 1, 2, 3I One set of regression parameters per groupI Re-set the intercept at age= 22 for interpretability

data menopause;set menopause;age22 = age-22;run;

proc glm data=menopause;class menogrp;model logafc = menogrp age22*menogrp

/ noint solution clparm;run;

68 / 80

Page 69: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

ANCOVA-output

The GLM Procedure

Dependent Variable: logafc

R-Square Coeff Var Root MSE logafc Mean0.082607 21.27821 0.615330 2.891832

StandardParameter Estimate Error t Value Pr > |t| 95% Confidence Limits

MENOGRP early 3.328468294 0.20237671 16.45 <.0001 2.930893639 3.726042949MENOGRP late 2.744304704 0.24071674 11.40 <.0001 2.271409991 3.217199417MENOGRP normal 3.334785604 0.08572241 38.90 <.0001 3.166381562 3.503189646AGE22*MENOGRP early -0.052377545 0.01703328 -3.08 0.0022 -0.085839902 -0.018915188AGE22*MENOGRP late 0.022007035 0.01998117 1.10 0.2712 -0.017246526 0.061260596AGE22*MENOGRP normal -0.042078492 0.00764074 -5.51 <.0001 -0.057088940 -0.027068044

Increasing rate in the late maternal menopause group isinsignificant (P=0.27).

69 / 80

Page 70: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Rates of decline

When the slopes are back-transformed, they becomeestimated rates of decline, with 95%-confidence intervals:

Maternal menopause Rate of change in AFC per year (95% CI)Early (≤ 45 years) -5.1% (-8.2% to -1.9%)Normal (46-54 years) -4.1% (-5.5% to -2.7%)Late (> 55 years) +2.2% (-1.7% to +6.3%)

Increasing rate in the late-group might as well be a chance finding.

70 / 80

Page 71: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Re-parametrisationSame model other parameters:

log(AFC)i = α+ β · age + δ1 · I (group=1) + δ2 · I (group=2)+γ1 · I (group=1) · age + γ2 · I (group=2) · age

I Group 3 is reference with regression parameters α and β.I δ’s and γ’s are differences in regression parameters wrt ref.I Allows for testing differences among the groups.

title1 ’ANCOVA’;proc glm data=menopause;class menogrp;model logafc = menogrp age22 age22*menogrp / solution;run;71 / 80

Page 72: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

ANCOVA-output

The GLM Procedure

R-Square Coeff Var Root MSE logafc Mean0.082607 21.27821 0.615330 2.891832

Source DF Type III SS Mean Square F Value Pr > FMENOGRP 2 2.05075422 1.02537711 2.71 0.0676AGE22 1 2.65777726 2.65777726 7.02 0.0083AGE22*MENOGRP 2 3.77690717 1.88845358 4.99 0.0072

StandardParameter Estimate Error t Value Pr > |t|Intercept 3.334785604 B 0.08572241 38.90 <.0001MENOGRP early -0.006317310 B 0.21978322 -0.03 0.9771MENOGRP late -0.590480900 B 0.25552472 -2.31 0.0212MENOGRP normal 0.000000000 B . . .AGE22 -0.042078492 B 0.00764074 -5.51 <.0001AGE22*MENOGRP early -0.010299053 B 0.01866852 -0.55 0.5814AGE22*MENOGRP late 0.064085527 B 0.02139224 3.00 0.0029AGE22*MENOGRP normal 0.000000000 B . . .

Regression coefficients differ significantly, intercepts do not.72 / 80

Page 73: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Missing data problem?

We have missing data . . .I among younger women whose mothers aren’t yet menopausalI i.e. missing not at randomI data from some of the potentially most fertile tend to be

missing

This may cause biasI Particularly the late-group.

73 / 80

Page 74: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Assuming identical intercepts

Leave out the main effect of menogrp.

title1 ’ANCOVA with same intercept at age 22’;proc glm data=menopause;class menogrp;model logafc = age22 age22*menogrp/ solution clparm;run;

Output:Source DF Type I SS Mean Square F Value Pr > FAGE22 1 11.41154527 11.41154527 29.94 <.0001AGE22*MENOGRP 2 4.30076782 2.15038391 5.64 0.0038

Rate of decline still differ significantly between groups (P=0.004).

74 / 80

Page 75: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

A prettier picture

75 / 80

Page 76: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Estimated rates of decline

. . . when assuming identical intercepts (at age 22).

Estimated rates of decline with 95%-confidence intervals:

Maternal menopause Rate of decline in AFC per year (95% CI)Early (≤ 45 years) 4.7% (3.1% to 6.3%)Normal (46-54 years) 3.7% (2.3% to 4.9%)Late (> 55 years) 2.0% (0.4% to 3.6%)

76 / 80

Page 77: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Summary statistics

Numerical description of quantitative variables:

Location, centerI average (mean value) x = (x1 + · · ·+ xn)/nI median (middle observation, 50% above and 50% below)

VariationI variance, s2 = Σ(xi − x)2/(n − 1) (quadratic units)I standard deviation, s =

√variance (units as outcome)

I quantiles, e.g. Inter Quantile Range (25% to 75% quantile)I standard error, SE = s/

√n (uncertainty of mean estimate)

77 / 80

Page 78: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The summary statistics for ’MF vs SV’ are made using the code:

Note: the data is read in from the file ’mf_sv.txt’(text file with two columns and 21 observations)

DATA mydata;INFILE ’mf_sv.txt’ FIRSTOBS=2;INPUT mf sv;

dif=mf-sv;average=(mf+sv)/2;

RUN;

PROC MEANS DATA=mydata MEAN STD;RUN;

78 / 80

Page 79: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

The pictures for ’MF vs SV’ are made using the code:

proc gplot;plot mf*sv / haxis=axis1 vaxis=axis2 frame;

axis1 value=(H=2) minor=NONE label=(H=2);axis2 value=(H=2) minor=NONE label=(A=90 R=0 H=2);symbol1 v=circle i=none c=BLACK l=1 w=2;run;

proc gplot;plot flow*method=subject/ nolegend haxis=axis1 vaxis=axis2 frame;

axis1 value=(H=2) minor=NONE label=(H=2);axis2 value=(H=2) minor=NONE label=(A=90 R=0 H=2);symbol1 v=circle i=join l=1 w=2 r=21;run;

79 / 80

Page 80: Correlated data - Introduction - publicifsv.sund.ku.dkpublicifsv.sund.ku.dk/~lts/CorrelatedMeasurements/lectures/... · lts/CorrelatedMeasurements E-mail: ltsk@sund.ku.dk 2/80. university

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

proc gplot;plot dif*average / vref=0 lv=1 vref=0.24 15.5 -15.0 lv=2

haxis=axis1 vaxis=axis2 frame;axis1 value=(H=2) minor=NONE label=(H=2 ’average’);axis2 order=(-16 to 16 by 4) value=(H=2) minor=NONE

label=(A=90 R=0 H=2 ’difference MF-SV’);symbol1 v=circle i=none l=1 w=2;title h=3 ’Bland Altman plot’;run;

title;proc gchart;

vbar dif;run;

80 / 80