Post on 10-Jul-2020
U N I V E R S I T Ä T S M E D I Z I N B E R L I N
After Work Statistics
Ulrike Grittner
Annette Aigner
Institute of Biometry and
Clinical Epidemiology
ulrike.grittner@charite.de
Institute of Biometry and Clinical EpidemiologyWe are…
• … open and helpful!
• … active in the statistical methodologic research and in
medical research
• …active in teaching in many ways
Our Service Unit Biometry
• Free biometrical consulting for all medical research
projects, registration online
• “Statistik-Ambulanz” (Walk-in service): Consultation
without prior registration every Tuesday from 9am to 12pm
• Training in biometrical topics and statistical software
• Responsibility for project biometry within cooperation
For further information visit us online:
https://biometrie.charite.de/
Contact: Univ.-Prof. Dr. Geraldine Rauch (Head of Institute),
Institut für Biometrie und Klinische Epidemiologie (iBikE)
Standort Mitte (Charité Campus Mitte)
Reinhardstraße 58, 10117 Berlin
Standort Mitte (Charité Campus Klinik)
Rahel-Hirsch-Weg 5, 10117 Berlin
1
Slot Topic
1 So many tests! The agony of choice.
2 So many questions! Multiple testing.
3 So many patients? Sample size calculation.
4 What is it this odds ratio? Logistic regression.
5 Missing information? Dealing with missing data.
6 The right time? Survival analysis.
7 The variety of influences - Mixed models.
8 Who fits together? Patient matching.
1 So viele Tests! Die Qual der Wahl.
2 So viele Fragestellungen! Multiples Testen.
3 So viele Patienten? Fallzahlplanung.
4 Was ist dieses Odds Ratio? Logistische Regression.
5 Fehlende Information? Umgang mit fehlenden Daten.
6 Der richtige Zeitpunkt? Analyse von Ereigniszeiten.
7 Die Vielfalt der Einflüsse – Gemischte Modelle.
8 Wer passt zusammen? Matching von Patienten.
2
U N I V E R S I T Ä T S M E D I Z I N B E R L I N
The Diversity of Influences–
Mixed Models
Ulrike Grittner
Annette Aigner
Institute of Biometry and
Clinical Epidemiology
ulrike.grittner@charite.de
3
Overview
• What do you need mixed models for?
• Basic idea of mixed models
• Intra class correlation coefficient (ICC)
• ICC, Random Intercept & Random Slope Model
4
Mixed models
= multilevel models
= hierarchical models
= random effects models
= nested models
…
Nr. 5
Problem
Assumption for most statistical models
Independent data Dependent data
Association of body height and body
weight, cross-sectional (Bundesgesundheitssurvey 1998, RKI, n=7124)
diast. Blood pressure before and after
therapy, repeated measures(Holzgreve et al., British Medical Journal 299, 881-
886, 1989, 1 study arm: n=169)
6
Problem
Often we have a mixture of independent and
dependent data (cluster)
Examples
- Individual measures in different clusters
(grouped data)
- Repeated measures in individuals
7
Grouped Data
Example: Mathematics achievement of pupils in different schools
Assumption:
• math. Achievement of different pupils of one school is more similar as
math. Achievement of pupils of different schools
• Dependent data within schools, independent data across schools
school 1 school 2 school N…
p. 1 p. 2 p. n2 p. nNp. n1 p. 2p. 1p. 1 p. 2… … …
8
Repeated measures
Example: longitudinal study, repeated measures within individuals
Assumptions:
• measures of one individual are more similar as measures of
different individuals
• dependent data within individuals, independent data
across individuals
Ind.1 Ind. 2 Ind. N…
t 1 t 2 t n2 t nNt n1 t 2t 1t 1 t 2… … …
9
Example
• Study of math performance of 9th graders across 160 schools (7185
pupils)
Studie „High School and Beyond“ 1982 (Raudenbush & Bryk 2002)
data(MathAchieve), package lme4 in R
• Research question: Is there an association of the socio-
economic status (SES) of the pupils and their math
performance? (SES: Score measured using
education and income
of the parents)
ID School Min. Sex SES MathAch478 1499 No Female -0.678 5.608 479 1499 No Female -0.158 18.352 480 1499 No Female -0.468 5.949 481 1499 Yes Female -0.148 -1.462 482 1499 Yes Female -0.928 4.087 483 1499 No Female -0.218 10.258 484 1499 No Male 0.662 21.791 485 1499 No Male -0.228 1.365 486 1499 No Female -0.368 1.730 487 1499 Yes Female 0.342 5.093 …
10
Simple Linear Regression
Data of 8 schools (342 pupils)
𝐘 = 𝜶 + 𝜷 ∙ 𝑿 + 𝒆𝒊Math_Perf = mean_Math + β∙SES_Score + 𝒆𝒊
Problem: Similarity in math performance of pupils in same school is ignored
Key-Message 1:
Simple linear
regression models
are not appropriate
for grouped
/clustered data.
11!
Mixed = Fixed and Random Effects
• Fixed Effects:
- Allow statements on general associations
regression coefficients
- Interpretation as in “normal” regression models
• Random Effects:
- Account for dependency between measures in a cluster
- Account for heterogeneity between clusters (independency)
Variance estimation for each level + residual variance
- Random intercept / random slope
12
Fixed vs Random Effects
Fixed Effects:
- Effects we are interested in
(research question)
Example: SES
- Not randomly chosen
- Would again be chosen
for another study
- Difference among
measures is useful
information
Random Effects:
- Not directly of interest
Example: Schools
- Randomly chosen
- Different schools would be
chosen for another study
- Differences between
schools are often not of
interest
13
Two Main Models
Random Intercept Model• Individual intercepts (means) for each cluster
• Association of independent and dependent
variable is constant across clusters (example:
association between SES and math performance)
Random Intercept and Random Slope Model• Individual intercepts (means) for each cluster
• Strength of Association of independent and
dependent variable varies across clusters, individual
slopes for each cluster http://mfviz.com/hierarchical-
models/
14
Random Intercept Model (fixed slope)
yij =α0+u0j+ β1 · xij + εij
Fixed Effects:
α0 : intercept
β1 : fixed effect of x on y
Random Effects:
εij : residual of observation i in cluster j, εij ∼ N(0, σ²)
u0j : residual of cluster j, u0j∼ N(0, τ00)
Mixed effects:
(α0 + u0j) … intercept of cluster j
• Individual intercepts for each cluster
• Association of x and y is fixed across all clusters
15
Random Intercept Model (fixed slope)
Mathe-Scoreij = (10.1 + u0j) + 2.7 · SESij+εij
Random effects:
u0j ∼ N(0, 8.5), εij ∼ N(0, 33.4)
fit <- lmer(MathAch ~ SES + (1|School), data=dat_8schools)
Key-Message 2:
Random intercept
Models account
differences in mean
outcome between
clusters.
16!
Random Intercept Model (fixed slope)
Mathe-Scoreij = (10.1 + u0j) + 2.7 · SESij+εij
u0j ∼ N(0, 8.5)
εij ∼ N(0, 33.4)
Interpretation
• 1 point higher SES-Score → 2.7 points higher math performance
• Mean math performance across all schools: 10.1 Points.
• Schools differ with regard to mean math performance
(Variance=8.5 points)
estimate (95% CI)
Fixed Effects
Intercept 10.1 (7.8; 12.3)
SES 2.7 (1.8; 3.7)
Random Effects
Variance bw. schools
(τ00)
8.5 (2.3; 26.2)
Residual variance (σ2) 33.4 (28.7; 38.9)
17
Intra class correlation coefficient (ICC)
ICC = τ00
τ00+σ2
τ00 : variance between clusters
σ2 : variance within clusters= residual variance
ICC = proportion of total variance that is due to differences
between clusters
• ICC = 0 … no variance between clusters, all clusters are equal
• ICC = 1 … all variance is explained by clusters, measures within clusters
are equal
18
Random Intercept Model (fixed slope)
ICC =τ00/(τ00+σ2)
= 8.5 /(8.5+ 33.4) = 0.20
20% of the differences in math performance are due to
differences between schools
Mathe-Scoreij = (10.1 + u0j) + 2.7 · SESij+εij
u0j ∼ N(0, 8.5), εij ∼ N(0, 33.4)
Key-Message 3:
The ICC is a measure of the
proportion of the total variance
in the outcome, that is due to
differences between clusters.
19!
Random Intercept and Random Slope
yij = α0+u0j+(β1+u1j)· xij + εijFixed Effects:
α0 : Intercept
β1 : fixed effect of x on y
Random Effects:
εij : Residual of observation i of cluster j, εij ∼ N(0, σ²)
u0j : random effect of intercept of cluster j, u0j∼ N(0, τ00)
u1j : random slope of cluster j, u1j∼ N(0, τ10)
Mixed effects:
(α0 + u0j) : intercept of cluster j
(β1 + u1j) : slope of cluster j
individual intercepts AND slopes for each
cluster
20
Random Intercept and Random Slope
Sleepstudy (changes in reaction time after restriction of sleep time, data in r
package lme4)
Design: longitudinal, 18 individuals, 10 days
Day 0: normal sleep duration
From day 1 to end of study: only 3 hours night sleep
Outcome: mean reaction time per day
Research question: How does the restricted night sleep influences the
reaction ability of people?
21
Random Intercept and Random Slope
Sleepstudy (reaction time after sleep restriction)
Random InterceptRandom Intercept
und Random Slope
22
Random Intercept und Random Slope
lmer(Reaction~Days+(1|Subject)+(0+Days|Subject),data=sleepstudy)
Estimate (95% CI)
Fixed Effects
(Intercept) 251.4 (237.6; 265.2)
Days 10.5 (7.3; 13.6)
Random Effects
Intercept subjects 627.6 (15.3; 37.8)
Slope Days 35.9 (4.0; 8.8)
Residual Variance 653.6 (22.9; 28.8)
Interpretation:
• The reaction time increases on average 10.5 ms per day
• There are differences between the individuals in the mean reaction
time (variance: 627.6 ms)
• There are differences between the individuals in the increase of the
reaction time (slope) over the study time (variance: 35.9)
•
source: sleepstudy data (R package lme4)
Key-Message 4:
Random Intercept &
Slope Models account
for mean differences
and different slope in the
outcome between
clusters.
23!
…. More Mixed Models (mm)
Other Scaling of dependent variable:
– dichotomous (binary logistic mm)
– ordinal (ordinal mm)
– multinomial (multinomial-logistic mm)
Other models:
Survival (frailty models, joint models)
Models, that account for spatial correlations
GAMMS: Generalized Additive Mixed Models (using smooth functions)
applications:
cross-over designs
meta analyses
multi-centre studies
multi-rater settings
>2 levels
24
Practical Hints
1. before doing elaborates statistical analyses: use descriptive
statistics + graphs => KNOW YOUR DATA!!!
2. Be economical with random effects (use only few)
3. Try to explain as much as possible variance of the
outcome in the fixed effects
4. Check by 1. and by the interpretation of the models, if the
regression coefficients are reasonable!
25
References
• Snijders TAB, Bosker RJ (1999) Multilevel analysis. An introduction to
basic and advanced multilevel modeling. Sage Publications, London
• Verbeke G, Molenberghs G (2000) Linear mixed models for longitudinal
data. Springer, New York
• Petrie A, Sabin C (2013) Medical Statistics at a Glance. John Wiley &
Sons. in ecology with R. Springer, New York
26