Multi-level Analysis Recognizing the Problem

92
Multi-level Analysis Recognizing the Problem Maureen Smith, MD PhD Depts. of Population Health Sciences and Family Medicine University of Wisconsin-Madison

description

Multi-level Analysis Recognizing the Problem. Maureen Smith, MD PhD Depts. of Population Health Sciences and Family Medicine University of Wisconsin-Madison. Target Audience. Those of you likely to sit down in front of a computer and try to do this! Make sure you have pencil and paper. - PowerPoint PPT Presentation

Transcript of Multi-level Analysis Recognizing the Problem

Page 1: Multi-level Analysis Recognizing the Problem

Multi-level AnalysisRecognizing the Problem

Maureen Smith, MD PhDDepts. of Population Health Sciences

and Family MedicineUniversity of Wisconsin-Madison

Page 2: Multi-level Analysis Recognizing the Problem

Target Audience

Those of you likely to sit down in front of a computer and try to do this!

Make sure you have pencil and paper.

Page 3: Multi-level Analysis Recognizing the Problem

Goals Today

• Understand how – Data and multi-level modeling relate– Underlying concepts are ubiquitous– What the typical output means– Much trouble you can get into

• We will not– Spend a lot of time of statistical tests– Spend a lot of time of software

Page 4: Multi-level Analysis Recognizing the Problem

A day in the life of a researcher

• We have data– ID (observation #)– X (variable 1)– Y (variable 2)

• We want to use the value of X to explain the value of Y

ID X Y

1 60 3

2 75 6

3 81 10

4 70 7

5 65 5

Page 5: Multi-level Analysis Recognizing the Problem

Welcome to the fantasy world of linear regression

• A simple model

yi = intercept + slope(xi) + errori indicates observations

(1…N)

• Assumptions– Linearity– Independence– Normality– Constant variance

Page 6: Multi-level Analysis Recognizing the Problem

Reality check

• How often are observations truly independent from one another?– Dot indicates geographic

location of teenager– Orange or green indicates

hair color

• Do these teenagers look independent?

Page 7: Multi-level Analysis Recognizing the Problem

Clustering

(Artificial or natural)

Page 8: Multi-level Analysis Recognizing the Problem

1) Clustering introduced in sampling

• Multistage sampling– Circles represent city blocks– Blocks randomly sampled– All persons in block surveyed to

determine attitudes

• Persons in one block are more like their neighbors than persons who live in another block

• Nesting or clustering of data – Persons within blocks

Block 1 Block 2

Block 3Block 4

Not all blocks are selected

Page 9: Multi-level Analysis Recognizing the Problem

Effect of sample design on errors

• Errors in linear regression– Assume independence– Each person => info– Each person worth “1”

• If clustering occurs– Obs not independent– Each person => less info– Each person worth < “1”

Block 1 Block 2

Block 3Block 4

Page 10: Multi-level Analysis Recognizing the Problem

Simple linear regression won’t work!

• Violates assumption of independence

• If don’t account for it– Standard errors are too small – Makes coefficients look more significant– “You think there is more information in the

data than actually exists”

Page 11: Multi-level Analysis Recognizing the Problem

How much information is lost?“Design Effect”

• If designing a study using multistage sampling, need to increase sample size to account for loss of information

• Design effect– Each observation is “worth less”– Need to estimate your “effective” sample size– Used for sample size calculations in multi-stage sampling

Neffective = Nn Design effect

Page 12: Multi-level Analysis Recognizing the Problem

Questions – Pair up!

• Multi-stage sample design– City blocks N= 3– Persons N=26

• Design effect = 2

1. What is the effective sample size?

2. What sample size would you use in your power calculations?

Block 1 Block 2

Block 3Block 4

Page 13: Multi-level Analysis Recognizing the Problem

2) Clustering introduced naturally

• Analyze costs of care for hospitalized patients

• Patients in one hospital are more alike than patients in another hospital

• Nesting or clustering of data– Patients within hospitals

Hospital 1Hospital 2

Hospital 3

Hospital 4

Page 14: Multi-level Analysis Recognizing the Problem

Effect of natural clusters on errors

• Same effect on errors– Obs not independent– Each person => less info– Each person worth < “1”

• Simple linear regression won’t work!

Hospital 1Hospital 2

Hospital 3

Hospital 4

Page 15: Multi-level Analysis Recognizing the Problem

Accounting for Clustering

Do we care?

Page 16: Multi-level Analysis Recognizing the Problem

What do we do?

• First question - do we care?– Is clustering a nuisance?

OR– Is clustering an interesting phenomenon?

• Leads to different analytic strategies

Page 17: Multi-level Analysis Recognizing the Problem

If clustering is a nuisance

• Example - Multi-stage sampling– Don’t care how people vary within city

blocks versus between city blocks– Artificially imposed by the sampling design– Not interested in measuring it– Just want to correct for it

• Use analytic strategies that correct for clustering

Page 18: Multi-level Analysis Recognizing the Problem

How to correct errors for clustering

• Robust estimates of variance– Stata “, robust cluster (____)”– SAS empirical estimates of variance

• Programs that account for complex survey design (weights, strata, clusters)– Stata “svy” commands– SAS “survey___” commands

• Other strategies

Page 19: Multi-level Analysis Recognizing the Problem

If clustering is interesting

• Example - examine costs for hospitalized patients

• Split out the variation in costs– How much variation due to differences in patients?– How much variation due to differences in hospitals?

• Examine factors that explain variation in costs– Characteristics of patients– Characteristics of hospitals

• Analytic strategy = Multi-level modeling!

Page 20: Multi-level Analysis Recognizing the Problem

Questions

1. Identify 3 patient characteristics that might explain variation in costs

2. Identify 3 hospital characteristics that might explain variation in costs

3. Do you think more of the variation in costs is explained by the patient or the hospital?

Page 21: Multi-level Analysis Recognizing the Problem

Representing Clustering in a Model

(Multi-level models)

(Hierarchical linear models)

(Random effects models)

Page 22: Multi-level Analysis Recognizing the Problem

The concept of “levels”

• Our example – 2 levels– Micro = patients (N=26)

• Micro-level = “units”

– Macro = hospitals (N=3)• Macro-level = “groups”

• At each level– Patient characteristics– Hospital characteristics

Hospital 1Hospital 2

Hospital 3

Hospital 4

Page 23: Multi-level Analysis Recognizing the Problem

Data Structure - Patient

Patient ID

Hospital ID

Age(X)

Cost (Y)

1 1 60 3

2 1 75 6

3 2 81 10

4 2 70 7

5 2 65 5

• Y represents a patient characteristic– Cost (thousands of $)

• X represents a patient characteristic– Age– Note – understand possible

mechanism at each step– “Older patients are sicker

and tend to cost more”

Patient-level data ( = “unit-level data”)

Page 24: Multi-level Analysis Recognizing the Problem

Simple Linear Regression

yi = a + bxi + ei

• i indexes patients (i=1 to N)

• Relates x to y

• Both variables are patient characteristics

• Remember the assumptions

Page 25: Multi-level Analysis Recognizing the Problem

Questions

costi = a + b(agei) + ei

1. Is there a problem with this model when applied to these data?

2. If so, what?

Patient ID

Hospital ID

Age (X)

Cost (Y)

1 1 60 3

2 1 75 6

3 2 81 10

4 2 70 7

5 2 65 5

Page 26: Multi-level Analysis Recognizing the Problem

The Problem

• Does not account for the clustering of patients within hospitals– Data have a

structure that is not represented

– ei - Assumption of independence is not met

Patient ID

Hospital ID

Age (X)

Cost (Y)

1 1 60 3

2 1 75 6

3 2 81 10

4 2 70 7

5 2 65 5

Page 27: Multi-level Analysis Recognizing the Problem

Do we care?

• If clustering is nuisance => Stata robust option

• If clustering is interesting => Multilevel model

Page 28: Multi-level Analysis Recognizing the Problem

Data Structure - Hospital

Hospital ID

Beds (W)

1 10

2 65

Hospital-level data ( = “group-level data”)

• W represents a hospital characteristic– # of beds in the hospital

• Possible mechanism = “Bigger hospitals do things more expensively”– More technology– More high-cost specialists

Page 29: Multi-level Analysis Recognizing the Problem

Combined Data Structure

Hospital ID

Patient ID

Age (X)

Cost (Y)

1 1 60 3

1 2 75 6

2 3 81 10

2 4 70 7

2 5 65 5

Patient-level data

+Hospital

IDBeds (W)

1 10

2 65

Hospital-level data

= ?

Page 30: Multi-level Analysis Recognizing the Problem

Combined Data Structure

Patient ID

Hospital ID

Age (X)

Cost (Y)

Beds (W)

1 1 60 3 10

2 1 75 6 10

3 2 81 10 65

4 2 70 7 65

5 2 65 5 65

Patient- and hospital-level data

• Age (X) and Cost (Y)– Variation between

patients

• Beds (W)– Only variation

between hospitals– No variation within

hospitals

Page 31: Multi-level Analysis Recognizing the Problem

WARNING – Equations coming up!

Remember - In multi-level modeling …

SUBSCRIPTS ARE YOUR FRIENDS!

Page 32: Multi-level Analysis Recognizing the Problem

Simple Linear Regression(one approach to modeling this data structure)

yij = a + bxij + dwj + eij

• j indexes hospitals (j=1 to N)• i indexes patients within hospitals (i=1 to nj)

costij = a + b(ageij) + d(bedsj) + eij

• Frequently used

Page 33: Multi-level Analysis Recognizing the Problem

Questions

1. Is there a problem with this model when applied to these data?

2. If so, what?

Patient ID

Hospital ID

Age (X)

Cost (Y)

Beds (W)

1 1 60 3 10

2 1 75 6 10

3 2 81 10 65

4 2 70 7 65

5 2 65 5 65

costij = a + b(ageij) + d(bedsj) + eij

Page 34: Multi-level Analysis Recognizing the Problem

The Problem, Part 2

• You must assume that all of the data structure is represented by the explanatory variables

• Unlikely this will account for the clustering of patients within hospitals– Assumes that all clustering within hospitals is explained

by the number of beds in the hospital (W)– If “beds” does not explain all clustering, then

assumption of independence is not met for eij

Page 35: Multi-level Analysis Recognizing the Problem

How do we represent the clustering?

• Let the regression coefficients vary from group to group

yij = aj + bjxij + dwj + eij

• Groups j can have higher or lower values of aj and bj

• Why not create dj?

Page 36: Multi-level Analysis Recognizing the Problem

Starting simple – random intercept

• Model the clustering between groups– Let the intercept only (aj) vary from group to group

– Take out all group-level variables (W)

yij = aj + bxij + eij

• Groups j - higher or lower values of aj only

• Assumes some groups tend to have, on average, higher or lower values of Y

Page 37: Multi-level Analysis Recognizing the Problem

Question

yij = aj + bxij + eij

1. Why take the group-level variable (W) out of this model?

2. Must W be taken out of the model?

Page 38: Multi-level Analysis Recognizing the Problem

How do we want to model variation between groups?

• W – a “partial” way to model variation between groups– If included, it will pick up part of the variation between

groups– “Part of the variation in costs between hospitals will be

explained by the number of beds in the hospital”

• Goal of a random intercept model– Model the actual structure of the data– Let groups vary, on average, in Y– “Let the hospitals vary, on average, in cost”

Page 39: Multi-level Analysis Recognizing the Problem

How do we actually do it?

yij = aj + bxij + eij

• Split aj into (a0 + uj)

yij = a0 + uj + bxij + eij

• a0 = average intercept (constant)• uj = deviation from the average intercept for group j = conditional on X, individuals in group j have Y values

that are uj higher than the overall average

• “Conditional on patient age, patients in Hospital j have costs that are uj higher than the average costs for all patients”

Page 40: Multi-level Analysis Recognizing the Problem

Representation as EquationsSingle Equation vs. Multiple Equation Representation

(1) yij = a0 + bxij + uj + eij

OR

(2) yij = aj + bxij + eij

aj = a0 + uj

• uj = deviation from the overall average for group j

Page 41: Multi-level Analysis Recognizing the Problem

What do we do with uj?Part 1 – Fixed effects

• Are groups j regarded as unique?– Do you want to draw conclusions about

each group?

TREAT AS “FIXED EFFECTS”

• Create j – 1 indicator variables (0/1)

• Leads to j – 1 regression parameters

Page 42: Multi-level Analysis Recognizing the Problem

Questions

1. For our data, what does this equation look like if uj is modeled as a fixed effect?

2. Are all indicator variables in a model also fixed effects?

Patient ID

Hospital ID

Age (X)

Cost (Y)

1 1 60 3

2 1 75 6

3 2 81 10

4 2 70 7

5 2 65 5

costij = a0 + b(ageij) + uj + eij

Page 43: Multi-level Analysis Recognizing the Problem

Modeling uj as a fixed effect(uj = “differences between hospitals”)

costij = a0 + b(ageij) + c(hosp2ij) + eij

• hosp2 = 0/1 – 1 = patient i in hospital 2, 0 = patient i in hospital 1

• Do we need index j? No – why?

costi = a0 + b(agei) + c(hosp2i) + ei

• What assumptions does this model make?

Page 44: Multi-level Analysis Recognizing the Problem

What do we do with uj?Part 2 – Random effects

• Three issues– Are groups regarded as sample from pop.?– Do you want to test the effect of group level variables

(remember W = # beds)?– Do you have small group sizes (2-50 or 100)?

TREAT AS “RANDOM EFFECTS”• Model uj explicitly• Additional assumption that uj is i.i.d.

– Groups (hospitals) considered exchangeable• Can include group-level explanatory variables (W)

Page 45: Multi-level Analysis Recognizing the Problem

Questions

1. For our data, what does this equation look like if uj is modeled as a random effect?

2. How would we include our hospital-level explanatory variable?

yij = a0 + b(xij) + uj + eij

Patient ID

Hospital ID

Age (X)

Cost (Y)

Beds (W)

1 1 60 3 10

2 1 75 6 10

3 2 81 10 65

4 2 70 7 65

5 2 65 5 65

Page 46: Multi-level Analysis Recognizing the Problem

Modeling uj as a random effect(uj = “differences between hospitals”)

costij = a0 + b(ageij) + uj + eij

• uj = deviation from the average cost for hospital j = estimated using HLM, SAS, Stata (get a number!)

costij = a0 + b(ageij) + d(bedsj) + uj + eij

• Uses the number of beds in the hospital to explain some of the variation in uj

• Question - what happens to uj if the number of beds explains all of the differences between hospitals?

Page 47: Multi-level Analysis Recognizing the Problem

Question

costij = a0 + b(ageij) + c(hosp2ij) + d(bedsj) + uj + eij

• Is this equation an example of random or fixed effects?

• What are the challenges in modeling this equation?

Page 48: Multi-level Analysis Recognizing the Problem

It is GARBAGE!!!!!

• Totally meaningless– Models uj as fixed effect as well as a random

effect with a hospital-level covariate

• Probably won’t run (if it does, don’t believe anything you see)

• Can’t model uj (variation in the average cost between hospitals) as both a random effect and a fixed effect at the same time!

Page 49: Multi-level Analysis Recognizing the Problem

What we have done so far

• We discussed– Clustering (artificial and natural)– Accounting for clustering

• Nuisance = robust estimates of variance• Interesting = multilevel models

– Representing clustering in simple model• Fixed effects• Random effects with group-level explanatory

variables

Page 50: Multi-level Analysis Recognizing the Problem

What we will do next

• Sitting down at the computer– Modeling random effects using SAS proc

mixed– Random coefficients other than the

intercept (briefly)

• Repeated measures

• Non-continuous outcomes

• Computer programs

Page 51: Multi-level Analysis Recognizing the Problem

SAS Proc Mixed

Sitting in front of the computer

(Major reality check!)

Page 52: Multi-level Analysis Recognizing the Problem

Random Effects Models for Continuous Outcomes

• SAS proc mixed – Powerful and dangerous– Poor documentation (use Singer 1998)– Defaults may not be appropriate– Single level representation of equations

• Stata xtreg – Good documentation– Defaults usually ok– Single level representation of equations

Page 53: Multi-level Analysis Recognizing the Problem

Example of a Hospital EpidemicRevised from Singer 1998

• 7,185 patients admitted to 160 hospitals

• 14-67 patients per hospital– Hospital with N=67 in Washington

DC• Patient-level outcome is the

severity of new disease (MATHACH)

• Question: what does an equation look like if hospital modeled as random effect?– Use single and multiple equation

representation

Patient ID Hospital

MATHACH (Y)

1 1 3

2 1 6

3 2 10

4 2 7

5 2 5

Page 54: Multi-level Analysis Recognizing the Problem

Modeling hospital as a random effect(uj = “differences between hospitals”)

MATHACHij = a0 + uj + eij

and

MATHACHij = aj + eij

aj = a0 + uj

• uj = deviation from the average MATHACH severity score for hospital j

• Write code to model using SAS proc mixed (single equation)

Page 55: Multi-level Analysis Recognizing the Problem

Code for SAS proc mixed(Described using Singer’s terminology)

Proc mixed noclprint covtest;

Class hospital;

Model MATHACH = /solution;

Random intercept /subject=hospital;

Model statement indicates fixed effects (in this case, only one fixed effect – the intercept

a0 – which is implied)

Indicates random effects (intercept and error term) and the specification of the level 2

units (hospitals). The error term eij is implied.

Page 56: Multi-level Analysis Recognizing the Problem

STOP!• Something seems strange!• How can an intercept be fixed and random?

• Proc mixed terminology derived from multiple equation representation

MATHACHij = aj + eij

aj = a0 + uj

• Fixed and random in proc mixed refer to the modeling of aj, not uj

• BUT decision to model hospitals as fixed or random applies to uj, not aj (earlier discussion)

Page 57: Multi-level Analysis Recognizing the Problem

Important Lesson(This is a source of major confusion)

• Use of the terminology (fixed vs. random) differs widely

• Even when the term in question (e.g., uj) is agreed upon, definitions not only differ but are often incompatible (Gelman 2004 Ann Stat)

• Solution– Don’t just throw terminology around– Draw the dataset, write down the equation(s) and

circle the terms that you want to model– Make sure you understand exactly where those

terms are on the computer output

Page 58: Multi-level Analysis Recognizing the Problem

Code for SAS proc mixed

Proc mixed noclprint covtest;

Class hospital;

Model MATHACH = /solution;

Random intercept /subject=hospital;

Fixed effects = a0 is implied (representing the fixed part of

the intercept).

Adds random effect for hospital = uj (representing the random part

of the intercept).Random effect for patient = eij is implied (represents error term).

MATHACHij = a0 + uj + eij

Page 59: Multi-level Analysis Recognizing the Problem

SAS OutputMATHACHij = a0 + uj + eij

Page 60: Multi-level Analysis Recognizing the Problem

Yikes!

More info needed

Page 61: Multi-level Analysis Recognizing the Problem

Question: What IS a random effect?

MATHACHij = a0 + uj + eij

• How do we describe it?

• What do we want to know?

• How do you model those subscripts?

Page 62: Multi-level Analysis Recognizing the Problem

A random effect is a random variable

• Random variables are described by distributions

• Start with the easy one (basic error term)– eij ~ N(0,2)

– The estimated parameter is 2

• Same approach to new term– uj ~ N(0,)

– The estimated parameter is

• Assume that eij and uj are independent

Page 63: Multi-level Analysis Recognizing the Problem

Interpreting this equation

• One estimated fixed effect– a0 describes average MATHACH score for the hospitals

• Two random effects (variances are estimated) describes variability in hospital means (variability in

average MATHACH between hospitals) describes variability in MATHACH within hospitals

• What is this equation called?

MATHACHij = a0 + uj + eij uj ~ N(0,) eij ~ N(0,2)

Page 64: Multi-level Analysis Recognizing the Problem

Code for SAS proc mixed

Proc mixed noclprint covtest;

Class hospital;

Model MATHACH = /solution;

Random intercept /subject=hospital;

Request hypothesis tests for variance and covariance

components ( & 2)

Identify categorical variables

/solution prints estimates and hypothesis tests for fixed effects (a0 in this case)

Page 65: Multi-level Analysis Recognizing the Problem

SAS Output

MATHACHij = a0 + uj + eij uj ~ N(0,) eij ~ N(0,2)

Page 66: Multi-level Analysis Recognizing the Problem

SAS Output

MATHACHij = a0 + uj + eij uj ~ N(0,) eij ~ N(0,2)

Page 67: Multi-level Analysis Recognizing the Problem

SAS Output

MATHACHij = a0 + uj + eij uj ~ N(0,) eij ~ N(0,2)

Page 68: Multi-level Analysis Recognizing the Problem

ICC - A cool thing

• Use the output to calculate the ICC (intraclass correlation coefficient)

• Figure out the portion of the total variance that occurs between (as opposed to within) hospitals– Bigger means more clustering

• ICC = / ( + 2) = 8.6 / (8.6 + 39.1) = 0.18

Page 69: Multi-level Analysis Recognizing the Problem

Adding predictor variables

• Patient-level covariate is SES

• Hospital-level covariate (MEANSES) is aggregate of patient SES

• MEANSES and SES are centered at the grand mean (mean of 0)

Patient ID

Hospital

SES (X)

MATHACH (Y)

MEANSES (W)

1 1 60 3 -2

2 1 -75 6 -2

3 2 81 10 9

4 2 -70 7 9

5 2 65 5 9

Page 70: Multi-level Analysis Recognizing the Problem

Question

• What does the equation look like if we add a hospital-level characteristic (MEANSES)?

• What do we expect might happen to the estimate of uj?

Page 71: Multi-level Analysis Recognizing the Problem

Solution

MATHACHij = a0 + d(MEANSESj) + uj + eij

uj ~ N(0,) eij ~ N(0,2)

Use the average SES in hospitals to explain some of the variation in MATHACH between hospitals

We would expect that the uj (represented by ) might decrease

Page 72: Multi-level Analysis Recognizing the Problem

Code for SAS proc mixed

Proc mixed noclprint covtest;

Class hospital;

Model MATHACH = MEANSES /solution ddfm=bw;

Random intercept /subject=hospital;

Added fixed effect for MEANSES (recall that fixed

effect for a0 is implied)

Keep random effect for hospital = uj (representing the random

part of the intercept).Random effect for patient = eij is implied (represents error term).

MATHACHij = a0 + d(MEANSESj) + uj + eij

Use between/within method for computing denominator degrees

of freedom (read about it)

Page 73: Multi-level Analysis Recognizing the Problem

SAS Output

Estimate d for MEANSES

Estimate

of decreasesfrom 8.6to 2.6

Page 74: Multi-level Analysis Recognizing the Problem

Interpreting the Numbers

a0 = 12.65 = ?

d = 5.86 = ?

uj = ?

eij = ?

Remember = 2.64

Remember 2 = 39.16

MATHACHij = a0 + d(MEANSESj) + uj + eij

uj ~ N(0,) eij ~ N(0,2)

Question: what is the real-world meaning of each term?What does each term say about MATHACH and hospitals?

Page 75: Multi-level Analysis Recognizing the Problem

Interpreting the OutputConditional Fixed Effects

• a0 = Intercept estimates average MATHACH among hospitals when all other predictors are 0– MEANSES is centered at the grand mean (has a

mean of 0)– Intercept is average MATHACH in a hospital of

average MEANSES

• d = estimated coefficient on MEANSES– 1 unit increase in average SES in a hospital

associated with a 5.86 units increase in MATHACH

MATHACHij = a0 + d(MEANSESj) + uj + eij

Page 76: Multi-level Analysis Recognizing the Problem

Interpreting the OutputConditional Random Effects

= describes variability in average MATHACH between hospitals after accounting for MEANSES– Decreased from 8.6 to 2.6– MEANSES explains a large portion of between-hospital

variation in MATHACH

2 = residual variability in MATHACH among patients within hospitals– Does not change significantly

MATHACHij = a0 + d(MEANSESj) + uj + eij

uj ~ N(0,) eij ~ N(0,2)

Page 77: Multi-level Analysis Recognizing the Problem

More Cool Things

Var. Comp.

a0 = 12.65d = not est. = 8.61 2 = 39.15

MATHACHij = a0 + d(MEANSESj) + uj + eij

uj ~ N(0,) eij ~ N(0,2)

Add MEANSES

a0 = 12.65

d = 5.86

= 2.64

2 = 39.16

Write down everything on this slide

Page 78: Multi-level Analysis Recognizing the Problem

Questions

• What percent of the explainable variation in mean hospital MATHACH scores is explained by MEANSES?

• What is the (residual) ICC among hospitals after accounting for MEANSES?

Page 79: Multi-level Analysis Recognizing the Problem

Solutions

• 69% of the explainable variation in average MATHACH between hospitals is explained by MEANSES

(8.61 – 2.64) / 8.61 = 0.69

• Residual ICC = 0.06 = 2.64 / (2.64 + 39.16)– Correlation between hospitals that have the

same average SES– Can you drop the random effect for hospitals?

Has a sufficient amount of uj been picked up by MEANSES?

Page 80: Multi-level Analysis Recognizing the Problem

Random Coefficients

Briefly!

Page 81: Multi-level Analysis Recognizing the Problem

How do we represent the clustering?

• Let the regression coefficients vary from group to group

yij = aj + bjxij + dwj + eij

• Groups j can have higher or lower values of aj and bj

Page 82: Multi-level Analysis Recognizing the Problem

Representation as EquationsMultiple Equation Representation

yij = aj + bjxij + eij

aj = a0 + uj

bj = b0 + vj

eij ~ N(0,2) uj ~ N(0,u) vj ~ N(0,v) cov (uj,vj) = uv

Page 83: Multi-level Analysis Recognizing the Problem

Doing Random Coefficients

• Can definitely do it

• Extremely complex to model

• Difficult to interpret (must be very careful about centering)

• Maureen’s opinion– Usually like hitting a gnat with a sledgehammer– Almost impossible to explain in the real world

Page 84: Multi-level Analysis Recognizing the Problem

Repeated Measures

One case where you might want to think about random coefficients

Page 85: Multi-level Analysis Recognizing the Problem

What do the data look like?

• Repeated measures of SBP over time are the level-1 units

• The time point of each measurement is a level-1 characteristic

• Each set of measurements clustered within level-2 units (patients)

• Age represents a level-2 characteristic

Obs #SBP (Y)

Time (X)

Patient

Age (W)

1 140 1 1 40

2 90 2 1 40

3 160 1 2 60

4 120 3 2 60

5 130 4 2 60

Page 86: Multi-level Analysis Recognizing the Problem

Question

• What is the equation to model SBP, accounting for clustering within patients?– As a function of

time?– Adding in patient

age?

Obs #SBP (Y)

Time (X)

Patient

Age (W)

1 140 1 1 40

2 90 2 1 40

3 160 1 2 60

4 120 3 2 60

5 130 4 2 60

Page 87: Multi-level Analysis Recognizing the Problem

Solutions

SBPij = a0 + b(TIMEij) + uj + eij

uj ~ N(0,) eij ~ N(0,2)

SBPij = a0 + b(TIMEij) + d(AGEj) + uj + eij

uj ~ N(0,) eij ~ N(0,2)

Page 88: Multi-level Analysis Recognizing the Problem

Code for SAS proc mixed

Proc mixed noclprint covtest;

Class patient;

Model SBP = TIME /solution ddfm=bw;

Random intercept /subject=patient type=un;

Specifies that you want the structure of the variance-covariance matrix of the

random effects to be unstructured (previously it was compound symmetry).

SBPij = a0 + b(TIMEij) + uj + eij

Page 89: Multi-level Analysis Recognizing the Problem

Non-continuous outcomes

Slightly more complex extensions

Page 90: Multi-level Analysis Recognizing the Problem

Basics are similar

• Generalized linear mixed models

• Move to Stata – SAS has multiple problems– Xt commands are useful– GLLAMM is great and flexible

• Skrondal & Rabe-Hesketh have good introductory paper

Page 91: Multi-level Analysis Recognizing the Problem

Choosing a Computer Program

• Decision points– Distribution of dependent variable– Easy of use and compatibility– Single vs. multiple equation representation

• Major choices– SAS vs. Stata– Specialized programs (HLM, MLWin)

Page 92: Multi-level Analysis Recognizing the Problem

Thank you!

(All those identifying symptoms of MATHACH, please proceed to the closest Washington DC hospital)