After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use...

27
U N I V E R S I T Ä T S M E D I Z I N B E R L I N After Work Statistics Ulrike Grittner Annette Aigner Institute of Biometry and Clinical Epidemiology [email protected]

Transcript of After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use...

Page 1: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

U N I V E R S I T Ä T S M E D I Z I N B E R L I N

After Work Statistics

Ulrike Grittner

Annette Aigner

Institute of Biometry and

Clinical Epidemiology

[email protected]

Page 2: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Institute of Biometry and Clinical EpidemiologyWe are…

• … open and helpful!

• … active in the statistical methodologic research and in

medical research

• …active in teaching in many ways

Our Service Unit Biometry

• Free biometrical consulting for all medical research

projects, registration online

• “Statistik-Ambulanz” (Walk-in service): Consultation

without prior registration every Tuesday from 9am to 12pm

• Training in biometrical topics and statistical software

• Responsibility for project biometry within cooperation

For further information visit us online:

https://biometrie.charite.de/

Contact: Univ.-Prof. Dr. Geraldine Rauch (Head of Institute),

Institut für Biometrie und Klinische Epidemiologie (iBikE)

Standort Mitte (Charité Campus Mitte)

Reinhardstraße 58, 10117 Berlin

Standort Mitte (Charité Campus Klinik)

Rahel-Hirsch-Weg 5, 10117 Berlin

1

Page 3: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Slot Topic

1 So many tests! The agony of choice.

2 So many questions! Multiple testing.

3 So many patients? Sample size calculation.

4 What is it this odds ratio? Logistic regression.

5 Missing information? Dealing with missing data.

6 The right time? Survival analysis.

7 The variety of influences - Mixed models.

8 Who fits together? Patient matching.

1 So viele Tests! Die Qual der Wahl.

2 So viele Fragestellungen! Multiples Testen.

3 So viele Patienten? Fallzahlplanung.

4 Was ist dieses Odds Ratio? Logistische Regression.

5 Fehlende Information? Umgang mit fehlenden Daten.

6 Der richtige Zeitpunkt? Analyse von Ereigniszeiten.

7 Die Vielfalt der Einflüsse – Gemischte Modelle.

8 Wer passt zusammen? Matching von Patienten.

2

Page 4: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

U N I V E R S I T Ä T S M E D I Z I N B E R L I N

The Diversity of Influences–

Mixed Models

Ulrike Grittner

Annette Aigner

Institute of Biometry and

Clinical Epidemiology

[email protected]

3

Page 5: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Overview

• What do you need mixed models for?

• Basic idea of mixed models

• Intra class correlation coefficient (ICC)

• ICC, Random Intercept & Random Slope Model

4

Page 6: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Mixed models

= multilevel models

= hierarchical models

= random effects models

= nested models

Nr. 5

Page 7: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Problem

Assumption for most statistical models

Independent data Dependent data

Association of body height and body

weight, cross-sectional (Bundesgesundheitssurvey 1998, RKI, n=7124)

diast. Blood pressure before and after

therapy, repeated measures(Holzgreve et al., British Medical Journal 299, 881-

886, 1989, 1 study arm: n=169)

6

Page 8: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Problem

Often we have a mixture of independent and

dependent data (cluster)

Examples

- Individual measures in different clusters

(grouped data)

- Repeated measures in individuals

7

Page 9: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Grouped Data

Example: Mathematics achievement of pupils in different schools

Assumption:

• math. Achievement of different pupils of one school is more similar as

math. Achievement of pupils of different schools

• Dependent data within schools, independent data across schools

school 1 school 2 school N…

p. 1 p. 2 p. n2 p. nNp. n1 p. 2p. 1p. 1 p. 2… … …

8

Page 10: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Repeated measures

Example: longitudinal study, repeated measures within individuals

Assumptions:

• measures of one individual are more similar as measures of

different individuals

• dependent data within individuals, independent data

across individuals

Ind.1 Ind. 2 Ind. N…

t 1 t 2 t n2 t nNt n1 t 2t 1t 1 t 2… … …

9

Page 11: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Example

• Study of math performance of 9th graders across 160 schools (7185

pupils)

Studie „High School and Beyond“ 1982 (Raudenbush & Bryk 2002)

data(MathAchieve), package lme4 in R

• Research question: Is there an association of the socio-

economic status (SES) of the pupils and their math

performance? (SES: Score measured using

education and income

of the parents)

ID School Min. Sex SES MathAch478 1499 No Female -0.678 5.608 479 1499 No Female -0.158 18.352 480 1499 No Female -0.468 5.949 481 1499 Yes Female -0.148 -1.462 482 1499 Yes Female -0.928 4.087 483 1499 No Female -0.218 10.258 484 1499 No Male 0.662 21.791 485 1499 No Male -0.228 1.365 486 1499 No Female -0.368 1.730 487 1499 Yes Female 0.342 5.093 …

10

Page 12: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Simple Linear Regression

Data of 8 schools (342 pupils)

𝐘 = 𝜶 + 𝜷 ∙ 𝑿 + 𝒆𝒊Math_Perf = mean_Math + β∙SES_Score + 𝒆𝒊

Problem: Similarity in math performance of pupils in same school is ignored

Key-Message 1:

Simple linear

regression models

are not appropriate

for grouped

/clustered data.

11!

Page 13: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Mixed = Fixed and Random Effects

• Fixed Effects:

- Allow statements on general associations

regression coefficients

- Interpretation as in “normal” regression models

• Random Effects:

- Account for dependency between measures in a cluster

- Account for heterogeneity between clusters (independency)

Variance estimation for each level + residual variance

- Random intercept / random slope

12

Page 14: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Fixed vs Random Effects

Fixed Effects:

- Effects we are interested in

(research question)

Example: SES

- Not randomly chosen

- Would again be chosen

for another study

- Difference among

measures is useful

information

Random Effects:

- Not directly of interest

Example: Schools

- Randomly chosen

- Different schools would be

chosen for another study

- Differences between

schools are often not of

interest

13

Page 15: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Two Main Models

Random Intercept Model• Individual intercepts (means) for each cluster

• Association of independent and dependent

variable is constant across clusters (example:

association between SES and math performance)

Random Intercept and Random Slope Model• Individual intercepts (means) for each cluster

• Strength of Association of independent and

dependent variable varies across clusters, individual

slopes for each cluster http://mfviz.com/hierarchical-

models/

14

Page 16: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Random Intercept Model (fixed slope)

yij =α0+u0j+ β1 · xij + εij

Fixed Effects:

α0 : intercept

β1 : fixed effect of x on y

Random Effects:

εij : residual of observation i in cluster j, εij ∼ N(0, σ²)

u0j : residual of cluster j, u0j∼ N(0, τ00)

Mixed effects:

(α0 + u0j) … intercept of cluster j

• Individual intercepts for each cluster

• Association of x and y is fixed across all clusters

15

Page 17: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Random Intercept Model (fixed slope)

Mathe-Scoreij = (10.1 + u0j) + 2.7 · SESij+εij

Random effects:

u0j ∼ N(0, 8.5), εij ∼ N(0, 33.4)

fit <- lmer(MathAch ~ SES + (1|School), data=dat_8schools)

Key-Message 2:

Random intercept

Models account

differences in mean

outcome between

clusters.

16!

Page 18: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Random Intercept Model (fixed slope)

Mathe-Scoreij = (10.1 + u0j) + 2.7 · SESij+εij

u0j ∼ N(0, 8.5)

εij ∼ N(0, 33.4)

Interpretation

• 1 point higher SES-Score → 2.7 points higher math performance

• Mean math performance across all schools: 10.1 Points.

• Schools differ with regard to mean math performance

(Variance=8.5 points)

estimate (95% CI)

Fixed Effects

Intercept 10.1 (7.8; 12.3)

SES 2.7 (1.8; 3.7)

Random Effects

Variance bw. schools

(τ00)

8.5 (2.3; 26.2)

Residual variance (σ2) 33.4 (28.7; 38.9)

17

Page 19: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Intra class correlation coefficient (ICC)

ICC = τ00

τ00+σ2

τ00 : variance between clusters

σ2 : variance within clusters= residual variance

ICC = proportion of total variance that is due to differences

between clusters

• ICC = 0 … no variance between clusters, all clusters are equal

• ICC = 1 … all variance is explained by clusters, measures within clusters

are equal

18

Page 20: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Random Intercept Model (fixed slope)

ICC =τ00/(τ00+σ2)

= 8.5 /(8.5+ 33.4) = 0.20

20% of the differences in math performance are due to

differences between schools

Mathe-Scoreij = (10.1 + u0j) + 2.7 · SESij+εij

u0j ∼ N(0, 8.5), εij ∼ N(0, 33.4)

Key-Message 3:

The ICC is a measure of the

proportion of the total variance

in the outcome, that is due to

differences between clusters.

19!

Page 21: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Random Intercept and Random Slope

yij = α0+u0j+(β1+u1j)· xij + εijFixed Effects:

α0 : Intercept

β1 : fixed effect of x on y

Random Effects:

εij : Residual of observation i of cluster j, εij ∼ N(0, σ²)

u0j : random effect of intercept of cluster j, u0j∼ N(0, τ00)

u1j : random slope of cluster j, u1j∼ N(0, τ10)

Mixed effects:

(α0 + u0j) : intercept of cluster j

(β1 + u1j) : slope of cluster j

individual intercepts AND slopes for each

cluster

20

Page 22: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Random Intercept and Random Slope

Sleepstudy (changes in reaction time after restriction of sleep time, data in r

package lme4)

Design: longitudinal, 18 individuals, 10 days

Day 0: normal sleep duration

From day 1 to end of study: only 3 hours night sleep

Outcome: mean reaction time per day

Research question: How does the restricted night sleep influences the

reaction ability of people?

21

Page 23: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Random Intercept and Random Slope

Sleepstudy (reaction time after sleep restriction)

Random InterceptRandom Intercept

und Random Slope

22

Page 24: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Random Intercept und Random Slope

lmer(Reaction~Days+(1|Subject)+(0+Days|Subject),data=sleepstudy)

Estimate (95% CI)

Fixed Effects

(Intercept) 251.4 (237.6; 265.2)

Days 10.5 (7.3; 13.6)

Random Effects

Intercept subjects 627.6 (15.3; 37.8)

Slope Days 35.9 (4.0; 8.8)

Residual Variance 653.6 (22.9; 28.8)

Interpretation:

• The reaction time increases on average 10.5 ms per day

• There are differences between the individuals in the mean reaction

time (variance: 627.6 ms)

• There are differences between the individuals in the increase of the

reaction time (slope) over the study time (variance: 35.9)

source: sleepstudy data (R package lme4)

Key-Message 4:

Random Intercept &

Slope Models account

for mean differences

and different slope in the

outcome between

clusters.

23!

Page 25: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

…. More Mixed Models (mm)

Other Scaling of dependent variable:

– dichotomous (binary logistic mm)

– ordinal (ordinal mm)

– multinomial (multinomial-logistic mm)

Other models:

Survival (frailty models, joint models)

Models, that account for spatial correlations

GAMMS: Generalized Additive Mixed Models (using smooth functions)

applications:

cross-over designs

meta analyses

multi-centre studies

multi-rater settings

>2 levels

24

Page 26: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

Practical Hints

1. before doing elaborates statistical analyses: use descriptive

statistics + graphs => KNOW YOUR DATA!!!

2. Be economical with random effects (use only few)

3. Try to explain as much as possible variance of the

outcome in the fixed effects

4. Check by 1. and by the interpretation of the models, if the

regression coefficients are reasonable!

25

Page 27: After Work Statistics...Practical Hints 1. before doing elaborates statistical analyses: use descriptive statistics + graphs => KNOW YOUR DATA!!! 2. Be economical with random effects

References

• Snijders TAB, Bosker RJ (1999) Multilevel analysis. An introduction to

basic and advanced multilevel modeling. Sage Publications, London

• Verbeke G, Molenberghs G (2000) Linear mixed models for longitudinal

data. Springer, New York

• Petrie A, Sabin C (2013) Medical Statistics at a Glance. John Wiley &

Sons. in ecology with R. Springer, New York

26