Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD...

24
Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoi nt analys is logistic regression multi-level methods our own survey

Transcript of Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD...

Page 1: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

AMMBR course design

CONTENT METHOD

Y is 0/1

conjoint analysis

logistic regression

multi-level methodsour own

survey

Page 2: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

DATA COLLECTION

Page 3: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Data collection

• surveys not ready to roll – some technical issues unsolved

• please wait until you get the green light to send your invitations

• If you have not done so yet – send your list of respondents (it’s mandatory)

Page 4: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

MULTI – LEVEL ANALYSIS

Page 5: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Multi-level models or ...

dealing with clustered data.One solution: the variance component model

•Bayesian hierarchical models •mixed models (in SPSS)•hierarchical linear models •random effects models •random coefficient models •subject specific models •variance component models •variance heterogeneity models

Page 6: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Clustered data / multi-level models

• Pupils within schools (within regions within countries)

• Firms within regions (or sectors)

• Vignettes within persons

[copy to blackboard]

Page 7: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Two issues with clustered data

• Your estimates will (in all likelihood) be too precise: you find effects that do not exist in the population

[further explanation on blackboard]

• You will want to distinguish between effects within clusters and effects between clusters

[see next two slides]

Page 8: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

On individual vs aggregate data

For instance: X = introvert X = age of McDonald’s employee Y = school results Y = like the manager

Page 9: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Had we only known, that the data are clustered!

So the effect of X within clusters can be different from the effect between clusters!

Using the school example: lines represent schools. And within schools the effect of being introvert is positive!

Page 10: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

MAIN MESSAGES

Be able to recognize clustered data and deal with it appropriately (how you do that will follow)

Distinguish two kinds of effects: those at the "micro-level" vs those at the aggregate level

(and do not test a micro-hypothesis with aggregate data)

Page 11: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

A toy example – two schools, two pupils

Overall mean(0)

Two schools each with two pupils. We first calculate the means.

Overall mean= (3+2+(-1)+(-4))/4=0

3

2

-1

-4

exam

sco

re

School 2School 1

(taken from Rasbash)

Page 12: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Now the variance

Overall mean(0)

3

2

-1

-4

exam

sco

re

School 2School 1

The total variance is the sum of the squares of the departures of the observations around mean divided by the sample size (4) =

(9+4+1+16)/4=7.5

Page 13: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

The variance of the school means around the overall mean

3

2

-1

-4

exam

sco

re

School 2School 1

Overall mean(0)

2.5

-2.5

The variance of the school means around the overall mean=

(2.52+(-2.5)2)/2=6.25 (total variance was 7.5)

Page 14: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

The variance of the pupils scores around their school’s mean

3

2

-1

-4

exam

sco

re

School 2School 1

2.5

-2.5

The variance of the pupils scores around their school’s mean=

((3-2.5)2 + (2-2.5)2 + (-1-(-2.5))2 + (-4-(-2.5))2 )/4 =1.25

Page 15: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

-> So you can partition the variance in individual level and school level

How much of the variability in pupil attainment is attributable to factors at the school and how much to factors at the pupil level?

In terms of our toy example we can now say

6.25/7.5= 82% of the total variation of pupils attainment is attributable to school level factors

1.25/7.5= 18% of the total variation of pupils attainment is attributable to pupil level factors

And this is important; we want to know how

to explain (in this example)

school attainment,and appararently thedifferences are at theschool level more than

the pupil level

Page 16: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Standard multiple regression won't do

Y D1 D2 D3 D4 D5 id …

+4 -1 -1 0 1 0 1

-3 1 1 1 0 -1 1

+2 0 0 1 0 -1 2

0 1 0 -1 1 0 2

+1 … … … … … 3

+2 … … … … … 3

-3 … … … … … 4

+4 … … … … … 4

… … … … … … …

So:

You can use all the data and just run a multiple regression, but then you disregard the clustering effect, which gives uncorrect confidence intervals

You can aggregate within clusters, and then run a multiple regression on the aggregate data. Two problems: no individual level testing possible + you get less data points.

So what can we do?

Page 17: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Multi-level models

The usual multiple regression model assumes

... with the subscript "i" defined at the case-level.

... and the epsilons independently distributed with covariance matrix I.

With clustered data, you know these assumptions are not met.

Page 18: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Solution 1: add dummy-variables per cluster

• So just multiple regression, but with as many dummy variables as you have clusters (minus 1)

... where, in this example, there are j+1 clusters.

IF the clustering is (largely) due to differences in the intercept between persons, this might work.

BUT if there are only a handful of cases per person, this necessitates a huge number of extra variables

Page 19: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Solution 2: split your micro-level X-vars

Say you have:

then create:

and add both as predictors (instead of x1)

Make sure that you understand what

is happening here,and why it is of use.

Page 20: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Solution 3: the variance component model

In the variance component model, we split the randomness

in a "personal part" and a "rest part"

Page 21: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Now: how do you do this in Stata?

<See Stata demo> [note to CS: use age and schooling as examples to split at restaurant level]

relevant commandsxtset and xtregbys <varA>: egen <meanvarB> = mean(<varB>)gen dvarB = <varB> - <meanvarB>

convenience commandstab <var>, gen() droporder desedit sum

Page 22: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

Up next

• How do we run the "Solution 1" and "Solution 2" analysis?

• Random intercept we now saw, but how about random slopes?

Page 23: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

When you have multi-level data (2 levels)

1. If applicable: consider whether using separate dummies per group might help (use only when this does not create a lot of dummies)

2. Run an empty mixed model (i.e., just the constant included) in Stata. Look at the level on which most of the variance resides.

3. If applicable: divide micro-variables in "group mean" variables and "difference from group mean" variables.

4. Re-run your mixed model with these variables included (as you would a multiple regression analysis)

Page 24: Advanced Methods and Models in Behavioral Research – 2009/2010 AMMBR course design CONTENT METHOD Y is 0/1 conjoint analysis logistic regression multi-level.

Advanced Methods and Models in Behavioral Research – 2009/2010

To Do

• Put your respondent list only if you have not done so yet.

• Check the material online (as of tomorrow morning)

• Check in Stata: how do I create:– dummies per cluster– the mean of a variable within a cluster– the deviation from the mean within a cluster

• Next time: bring your laptop. We’ll have a full session of practicing only