Introduction to Bayesian SEM

8/11/2019 Introduction to Bayesian SEM

1/127

Bayesian Structural Equation Modelling Using Mplus


2/127

Overview: Major Steps in the Bayesian Approach to Data Analysis


3/127

Research Question

Estimation

Model fit

Hypotheses evaluation - Model selection

The Data to be Collected: Variables and Sample Size

How to enter the data into Mplus

Missing data

The Statistical Model

How to specify a statistical model in MplusHow to specify an imputation model in Mplus

The Prior Distribution

Default Uninformative and how to specify in Mplus

Informative and how to specify in Mplus

The Posterior Distribution

Estimates and credible intervals

How to check convergence

Model fit

Hypothesis evaluation and Model selection

How to interpret Mplus output


4/127

Lecture 1: Bayesian Estimation

Data, Research Question, and Statistical Model


5/127

Research Question

?


6/127

The Data N=65

... ... ... ...20 3 7 13

21 1 4 11

22 2 4 9

23 3 6 11

24 4 6 9

25 8 7 1626 11 9 16

27 5 3 7

28 5 5 8

29 11 6 14

30 6 6 10

31 7 5 11

32 8 8 10

33 9 5 8

34 2 2 1

35 4 4 8

... ... ... ...

ID Stork Urban Babies

title:

Mediation Model for the Stork Data;

data:

file = stork.txt;

variable:

names = ID stork urban babies;

usev = stork urban babies;


7/127

The Statistical Model - 1

model:

urban on stork (a);babies on urban stork (b c);

[urban] (d);

[babies] (e);

urban (f);

babies (g);

Stork

Urban

Babies

a b

c

f

g


8/127


d

af

Urban = d + a Stork + error with error ~ N(0,f)


9/127


City

Village

Rural

c

Babies = e + c Stork + b Urban + error with error ~ N(0,g)


10/127


Introducing Prior, Posterior, and Sampling Based Estimation

Using One Variable


11/127

The Prior Distribution - 1 - Introduction - Non Informative Prior Distribution

A simple example based on expert elicitation: How many babies are born per

1,000 inhabitants per year in the Netherlands .

... ...

20 13

21 11

21 9

22 11

23 9

24 16

25 16

26 7

27 8

... ...

ID Babies

Data

The mean is 9, the standard error of the mean

is .5 this means that the data tell us that

between 8 and 10 babies are born.

Note I computed a confidence interval for the

mean using 9 +/- 2 x .5. Note that 2 is almost

1.96 a more precise value for the computation

of a confidence interval.

No prior information was used, that is, an

uninformative prior distribution was used.

model:

[babies] (a);

babies (b);


12/127

The Prior Distribution - 2 - Introduction - Informative Prior Distribution

Expert Elicitation:

I assume that in each region containing 1,000 persons, the age distribution is

uniform between 0-100 years of age.

This means that each year 200 persons are between 20 and 40 years of age

(the fertile years). Which renders 80 couples and 40 bachelors.

On average I expect each couple to have 2 children, that is, 160 children over

the course of 20 years. This means 8 children per year per region containing

1,000 persons.

In my line of argument Im most uncertain about the uniform age distribution.

I know the amount of elderly is increasing, so maybe there are only 160 persons

between 20 and 40 years of age 64 couples 128 children about 6 children

per year. On the other hand there may still be less elderly than young, so maybe

240 96 couples 192 chidren about 10 per year.

In summary, I expect 8, but my credible interval is between 6 and 10 which means

my personal standard error is 1 (8 +/- 2 x 1 gives my credible interval).


13/127

The Prior Distribution - 3 - Introduction

The Normal Prior Distribution Used for means and regression coefficients.

MODEL PRIORS:

a ~ N(8,1);

8

8

106

1410 12

12 14

6

42

42

8 1410 12642

MODEL PRIORS:

a ~ N(8,9);

MODEL PRIORS:

a ~ N(8,100000);

a

a

a


14/127

The Prior Distribution - 4 - Introduction

The Inverse Gamma Prior Distribution Used for variances.

MODEL PRIORS:

b ~ IG(.001,.001);

b0

The default in Mpus is

uninformative improper

MODEL PRIORS:

b ~ IG(-1,.0);

b0

Uninformative

proper

h b d


15/127

The Posterior Distribution - 1 - Introduction

Combining Data Knowledge and Prior Knowledge

a - Mean Number of Babies

98

Prior Data

Posterior

h i i ib i 2 d i


16/127


The posterior distribution combines the information with respect to the mean

number of babies in the data with the information in the prior distribution. This

combination is executed by Mplus.

Using sampling the information in the posterior distribution with respect to the

mean number of babies is made accesible:

Th P i Di ib i 3 I d i


17/127

9.1

7.9

8.3

9.9

7.1

...

...

...

...

a

Estimate:

mean or median

SD

Credible Interval:

central or highest

a

mean

median

2.5% 97.5%

analysis:

estimator = bayes;

process = 2;fbiter = 100000;

point = median;

fbiter

output:

tech1 tech8

standardized(stdyx)

cinterval(hpd);

plot:

type = plot1 plot2 plot3;

Data + Prior


Th P t i Di t ib ti 4 I t d ti


18/127

model:

[babies] (a);

babies (b);

MODEL PRIORS:

a ~ N(8,1);

b ~ IG(.001,.001);

model:[babies] (a);

babies (b);

MODEL PRIORS:

a ~ N(0,100000);

b ~ IG(.001,.001);

Estimate S.D. Lower 2.5% Upper 2.5%

Means

BABIES 9.078 0.443 8.203 9.945

A Non Informative Prior Distribution for the Mean Number of Babies

An Informative Prior Distribution for the Mean Number of Babies

Estimate S.D. Lower 2.5% Upper 2.5%

Means

BABIES 8.904 0.405 8.098 9.688



19/127



Using The Stork Data (three variables) and

Uninformative Priors

Th P i Di t ib ti 5 U i f ti P i Di t ib ti f th St k D t


20/127

The Prior Distribution -5 - Uninformative Prior Distributions for the Stork Data

MODEL PRIORS:

a ~ N(0,100000);b ~ N(0,100000);

c ~ N(0,100000);

d ~ N(0,100000);

e ~ N(0,100000);

f ~ IG(.001,.001);

g ~ IG(.001,.001);

model:

urban on stork (a);

babies on urban stork (b c);

[urban] (d);

[babies] (e);

urban (f);

babies (g);

User Specified

Mplus Default

MODEL PRIORS:

a ~ N(0,Infinity);

b ~ N(0,Infinity);

c ~ N(0,Infinity);

d ~ N(0,Infinity);

e ~ N(0,Infinity);

f ~ IG(-1,0);

g ~ IG(-1,0);

The Posterior Distribution 5 Bayesian Estimation Using Markov Chain Monte Carlo Methods


21/127

The Posterior Distribution - 5 - Bayesian Estimation Using Markov Chain Monte Carlo Methods

model constraint:

new(indirect);

indirect = a*b;

a b c d e f g indirect

initial values initial values initial values initial values

... ... ... ... ... ... ... ...

.35 1.14 -.11 2.89 4.00 3.46 7.15 .42

.29 1.69 -.32 1.75 5.10 3.01 7.30 .49

... ... ... ... ... ... ... ...fbiter fb fb fb fb fb fb fb

Stork

Urban

Babies

a b

c

f

g

analysis:

estimator = bayes;

process = 2;fbiter = 100000;

point = median;

The Posterior Distribution 6 Output Computed Using the MCMC Sample


22/127

output:

tech1 tech8 standardized(stdyx) cinterval(hpd);

plot:

type = plot1 plot2 plot3;

The Posterior Distribution - 6 - Output Computed Using the MCMC Sample

The Posterior Distribution 7 Histograms Estimates and Credible Intervals


23/127

Babies on Stork

The Posterior Distribution - 7 - Histograms, Estimates and Credible Intervals

The Posterior Distribution 8 Histograms Estimates and Credible Intervals


24/127

Indirect

Note that the credible interval

is not symmetric!

The Posterior Distribution - 8 - Histograms, Estimates and Credible Intervals

The Posterior Distribution 9 Estimates and Credible Intervals


25/127

MODEL RESULTS

Posterior One-Tailed 95% C.I.

Estimate S.D. P-Value Lower 2.5% Upper 2.5%

URBAN ON

STORK 0.375 0.072 0.000 0.236 0.517

BABIES ON

URBAN 1.143 0.185 0.000 0.781 1.509STORK -0.111 0.124 0.181 -0.356 0.131

Intercepts

URBAN 2.894 0.460 0.000 1.978 3.787

BABIES 4.007 0.847 0.000 2.320 5.646

Residual Variances

URBAN 3.465 0.653 0.000 2.360 4.828

BABIES 7.159 1.359 0.000 4.937 10.094

New/Additional Parameters

INDIRECT 0.422 0.108 0.000 0.225 0.644

The Posterior Distribution - 9 - Estimates and Credible Intervals


26/127


27/127


INTERMEZZO

P-values

The Posterior Distribution - 10 - The one-tailed p-value


28/127

0

90% CI

95%CI

If 90% CI touches 0 the one-tailed p-value is .05.

If 95% CI touches 0 the one-tailed p-value is .025.

For about normal posterior distributions multiplication with 2

renders a two-tailed p-value.

Urban On Stork (a)

The Posterior Distribution 10 The one tailed p value

The Posterior Distribution - 11 - p-values credible intervals and model selection


29/127

p-values:

For example .05

Surely God loves the .06 as much as the .05

Publication bias

Multiple hypotheses testing and capitalization on chance

Credible Intervals and Confidence Intervals

What is the value of the parameter of interestIs the parameter positive, negative or is zero also in the ball-park

With multiple parameters still capitalization on chance

Model Selection

Compare a few carefully chosen models

Very power-full in combination with credible intervals andstandardized estimates

The Posterior Distribution 11 p values, credible intervals, and model selection


30/127



Using The Stork Data (three variables) and

informative Priors

The Prior Distribution - 6- Informative Based on Historical Data


31/127

The Prior Distribution 6 Informative Based on Historical Data

... ... ... ...

20 3 7 13

21 1 4 11

22 2 4 9

23 3 6 11

24 4 6 9

25 8 7 16

26 11 9 16

27 5 3 7

28 5 5 8

29 11 6 14

30 6 6 10

31 7 5 1132 8 8 10

33 9 5 8

34 2 2 1

35 4 4 8

... ... ... ...


1 5 62 11 8

... ... ...

... ... ...

80 0 1

ID Stork Urban

The Current Data Historical Data

model:

urban on stork (a);

[urban] ;

urban;

MODEL RESULTS Estimate S.D.URBAN ON STORK 0.400 0.050

a ~ N(.400,.0025)


32/127

The Prior Distribution - 8- Informative Based on Historical Data


33/127

MODEL PRIORS:

a ~ N(.400,.0025);

b ~ N(0,100000);

c ~ N(0,100000);

d ~ N(0,100000);e ~ N(0,100000);

f ~ IG(.001,.001);

g ~ IG(.001,.001);

model:

urban on stork (a);


[urban] (d);

[babies] (e);urban (f);

babies (g);

User Specified

Suppose the data are collected by another research group

in the Netherlands in 2010.

e o st but o 8 o at e ased o sto ca ata

The Posterior Distribution - 12- Comparing Results from Uninformative and Informative Priors


34/127

MODEL RESULTS Estimate S.D. Lower 2.5% Upper 2.5%

URBAN ON STORK 0.375 0.072 0.236 0.517

INDIRECT 0.422 0.108 0.225 0.644

MODEL RESULTS Estimate S.D. Lower 2.5% Upper 2.5%

URBAN ON STORK 0.391 0.041 0.314 0.473

INDIRECT 0.444 0.086 0.283 0.621

MODEL PRIORS:a ~ N(.400,.0025);

MODEL PRIORS:

a ~ N(0,100000);

The result of using subjective priors is a gain in information. But, do you trust this?

Would you be willing to use and defend this approach?

p g

The Prior Distribution - 9 - Extra Tools for the Specification of Informative Priors


35/127

p

MODEL PRIORS:

b ~ N (0, 1);

c ~ N (0, 1);

COVARIANCE (b, c) = 0.5;

output:

tech1 tech3 tech8

standardized(stdyx) cinterval(hpd);

Summary


36/127

y

Research Question

Statistical Model

Prior Distribution - Informative Prior Distributions

Posterior Distribution

- Assymetric Credible Intervals

- Small Sample Inferences no Asymptotic Approximations

- No Heywood Cases, Like, for Example, Negative Variances

- Sampling will ofter Work where Maximum Likelihood Fails

References Bayesian Structural Equation Modelling


37/127

y q g

A relatively accessible introduction to Bayesian structural equation modeling can be found in:

Kaplan, D. and Depaoli, S. (2012). Bayesian Structural Equation Modeling. In R.H. Hoyle (Ed.),

Handbook of Structural Equation Modeling, pp. 650-673. New York: The Guilford Press.

A classic about the elicitation of prior knowledge is:

OHagan, A., Buck, C.E., Daneshkhah, A., Eiser, J.R., Garthwaithe, P.H., Jenkinson, D.J..,

Oakley, J.E., and Rakow, T. (2006). Uncertain Judgements. Eliciting Experts Probabilities.

Chichester: Wiley.

A classic introduction to Bayesian data analysis is:

Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (2004). Bayesian Data Analysis.

Boca Raton, FL: Chapman & Hall/CRC.

The documentation provided by Mplus is:

Muthen, B. (2010). Bayesian analysis in Mplus: A brief introduction.

Asparouhov, T. and Muthen, B. (2010). Bayesian analysis in Mplus: Technical Implementation.


38/127

Lecture 2: Bayesian Estimation in the Presence of Missing Data

Introduction

Missing Data - 1 - Introduction


39/127

... ... ... ...20 3 7 13

21 1 4 11

22 999 4 9

23 3 6 11

24 4 6 9

25 8 7 1626 11 999 999

27 5 3 7

28 5 5 8

29 999 6 14

30 6 6 10

31 7 5 999

32 8 8 10

33 999 999 8

34 2 2 1

35 4 4 8

... ... ... ...


variable:



missing = all (999);

Stork

Urban

Babies

a b

c

f

g

Missing Data - 2 - Introduction


40/127

By default Mplus with analysis: estimator = bayes; will use the statistical model

that is specified to impute the missing data.

First I will explain what is meant by imputation of the missing data.

Secondly I will explain why it is usually NOT a good idea to used the statistical model

that is specified to impute the missing data.

One exception occurs if the amount of missing values is very small. A good question iswhat is a small amount of missing values?

Another exception occurs if missings occur in variables that are ONLY a dependent

variable and if the missingness is MAR given the predictors of the dependent variable.

Third of all I will introduceMultiple imputation using a general imputation model

Analysis of each imputed data set using a statistical model that is consistent with the

imputation model

Summarizing the results obtained from the analysis of each imputed data set


41/127


Multiple Imputation

Multiple Imputation Using the Statistical Model - 1


42/127

d

af

Multiple Imputation Using the Statistical Model - 2


43/127

a b c d e f g 22-S 26-U 26-B 29-U 31-B 33-S 33-U

0 0 0 0 0 1 1 0 0 0 0 0 0 0

... ... ... ... ... ... ... ... ... ... ... ... ... ...

.35 1.14 -.11 2.89 4.00 3.46 7.15 5 5 12 7 9 2 3

.29 1.69 -.32 1.75 5.10 3.01 7.30 7 3 11 5 10 3 4

... ... ... ... ... ... ... ... ... ... ... ... ... ...

fbiter fb fb fb fb fb fb fb fb fb fb fb fb fb

MODEL RESULTS



BABIES ON

URBAN 1.143 0.185 0.000 0.781 1.509

STORK -0.111 0.124 0.181 -0.356 0.131

New/Additional Parameters

INDIRECT 0.422 0.108 0.000 0.225 0.644


44/127


Data that are not Missing at Random

Multiple Imputation Using the Statistical Model- 3 - Data that are NOT Missing at Random


45/127

Stork

Urban

Babies

a b

c

f

g

... ... ... ...

20 3 7 1321 1 4 11

22 999 4 9

23 3 6 11

24 4 6 9

25 8 7 16

26 11 999 99927 5 3 7

28 5 5 8

29 7 999 14

30 6 6 10

31 7 5 999

32 8 8 1033 999 999 8

34 2 2 1

35 4 4 8

... ... ... ...


Multiple Imputation Using the Statistical Model- 4 - Data that are NOT Missing at Random


46/127

Urban Babies

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

Urban Babies

1 1

2 23 3

4 4

5

6

7

8

Urban Babies

1 1

2 23 3

4 4

2.5 5

2.5 6

2.5 7

2.5 8

Urban Babies

1 1

2 23 3

4 4

5 5

6 6

7 7

8 8

Urban Babies

Model:

Babies on Urban;

Urban;

Model:

Babies with Urban;

Urban Babies


47/127


Data that are Missing at Random

Multiple Imputation Using the Statistical Model- 5 - Data that are Missing at Random


48/127

Urban Babies

1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

Urban Babies

1 1

2 23 3

4 4

5

6

7

8

Urban Babies

1 1

2 23 3

4 4

5 5

6 6

7 7

8 8

Urban Babies

1 1

2 23 3

4 4

5 5

6 6

7 7

8 8

Urban Babies

Model:

Babies on Urban;

Urban;

Model:

Babies with Urban;

Urban Babies

Multiple Imputation Using a General Imputation Model - 1 - Data that are Missing at Random


49/127

... ... ... ...

20 3 7 1321 1 4 11

22 999 4 9

23 3 6 11

24 4 6 9

25 8 7 16

26 11 999 99927 5 3 7

28 5 5 8

29 7 999 14

30 6 6 10

31 7 5 999

32 8 8 1033 999 999 8

34 2 2 1

35 4 4 8

... ... ... ...


Stork

Urban

Babies

model:

stork with urban;stork with babies;

urban with babies;

[stork];

[urban];

[babies];


50/127

Multiple Imputation Using a General Imputation Model - 2 - How to do it in Mplus


51/127

title: this is an example of multiple imputation

for a set of variables with missing values using

a general statistical model;

data: FILE = storkMI.txt;

variable:


auxiliary = ID;

usevariables = stork urban babies;missing = all (999);

analysis: estimator = bayes;

fbiter = 10000;

proces = 2;

data imputation:

impute = stork urban babies;

ndatasets = 10;

thin = 1000;

save = storkimp*.dat;

model: stork with urban babies;

urban with babies;

[stork];

[urban];

[babies];

output: tech8;

plot: type = plot1 plot2 plot3;

Multiple Imputation Using a General Imputation Model - 3 - Multiple Imputations


52/127

... ... ... ...

20 3 7 13

21 1 4 11

22 999 4 9

23 3 6 11

24 4 6 925 8 7 16

26 11 999 999

27 5 3 7

28 5 5 8

29 999 6 14

30 6 6 10

... ... ... ...


... ... ... ...

3 7 13 20

1 4 11 21

4 4 9 22

3 6 11 234 6 9 24

8 7 16 25

11 8 12 26

5 3 7 27

5 5 8 28

9 6 14 29

6 6 10 30

... ... ...

... ... ... ...

3 7 13 20

1 4 11 21

7 4 9 22

3 6 11 234 6 9 24

8 7 16 25

11 9 14 26

5 3 7 27

5 5 8 28

8 6 14 29

6 6 10 30

... ... ... ...

... ... ... ...

3 7 13 20

1 4 11 21

5 4 9 22

3 6 11 234 6 9 24

8 7 16 25

11 8 13 26

5 3 7 27

5 5 8 28

11 6 14 29

6 6 10 30

... ... ... ...

... ... ... ...

3 7 13 20

1 4 11 21

6 4 9 22

3 6 11 234 6 9 24

8 7 16 25

11 5 10 26

5 3 7 27

5 5 8 28

11 6 14 29

6 6 10 30

... ... ... ...

Stork Urban Babies ID Stork Urban Babies IDStork Urban Babies ID Stork Urban Babies ID

m = 1, ..., M

Multiple Imputation Using a General Imputation Model- 4 - Data that are Missing at Random


53/127

It can never be ensured that data are missing at random.

Use enough variables in the imputation model to feel confident that

MAR is a reasonable assumption. There may be variables in the imputation

model that do not appear in the statistical model.

Can we in our example think of variables that could be very goodpredictors of missing data and that are not part of the statistical model?

Never use to many variables in the imputation model. A rule of thumb is

1 variable for every 20 cases in the data file. But this is only a rule of thumb!

Creating a good imputation model is partly ART, partly SKILL, and ratherBAYESIAN because it requires carefull prior thinking, that is thinking

without using empirical data.

Multiple Imputation Using a General Imputation Model - 5 - How to do it in Mplus


54/127

title:

Mediation Model for the Stork Data;

data:file = storkimplist.dat;

type = imputation;

variable:

names = stork urban babies ID;


missing = all (999);

model:

urban on stork (a);


[urban] (d);

[babies] (e);

urban (f);

babies (g);

model constraint:

new(indirect);

indirect = a*b;

analysis:

estimator = ml;

output:

standardized(stdyx);

Note the difference between the imputation model

and the statistical model!!

It is also quite common that the statistical model

contains only a subset of the variables used in the

imputation model.

Multiple Imputation Using a General Imputation Model - 6 - Analyse Each Imputed Data Set


55/127

... ... ... ...

20 3 7 13

21 1 4 11

22 999 4 9

23 3 6 11

24 4 6 925 8 7 16

26 11 999 999

27 5 3 7

28 5 5 8

29 999 6 14

30 6 6 10

... ... ... ...


... ... ... ...

3 7 13 20

1 4 11 21

4 4 9 22

3 6 11 234 6 9 24

8 7 16 25

11 8 12 26

5 3 7 27

5 5 8 28

9 6 14 29

6 6 10 30

... ... ...

... ... ... ...

3 7 13 20

1 4 11 21

7 4 9 22

3 6 11 234 6 9 24

8 7 16 25

11 9 14 26

5 3 7 27

5 5 8 28

8 6 14 29

6 6 10 30

... ... ... ...

... ... ... ...

3 7 13 20

1 4 11 21

5 4 9 22

3 6 11 234 6 9 24

8 7 16 25

11 8 13 26

5 3 7 27

5 5 8 28

11 6 14 29

6 6 10 30

... ... ... ...

... ... ... ...

3 7 13 20

1 4 11 21

6 4 9 22

3 6 11 234 6 9 24

8 7 16 25

11 5 10 26

5 3 7 27

5 5 8 28

11 6 14 29

6 6 10 30

... ... ... ...

Stork Urban Babies ID Stork Urban Babies IDStork Urban Babies ID Stork Urban Babies ID

m = 1, ..., M

Estimate SD Estimate SD Intercepts Estimate SD Estimate SD

10.109 1.303 9.843 1.221 BABIES 10.567 1.432 9.992 1.271

Estimate 10.002 SD 1.672 Rate of Missing Information .22

Multiple Imputation Using a General Imputation Model - 7 - Relative Efficiency


56/127

Relative efficiency = 1 / (1 + rate/M)

For the example on the previous transparancy:

Relative efficiency = 1 / (1 + .22/10) = .98

Multiple Imputation Using a General Imputation Model - 8 - Summarize the Multiple Analyses


57/127

INDIRECT 0.395 0.114 3.462 0.001 0.184

STDYX Standardization Two-Tailed Rate ofEstimate S.E. Est./S.E. P-Value Missing

URBAN ON

STORK 0.536 0.095 5.633 0.000 0.123

BABIES ON

URBAN 0.693 0.110 6.307 0.000 0.234STORK -0.123 0.124 -0.986 0.324 0.152

Intercepts

URBAN 1.335 0.299 4.463 0.000 0.059

BABIES 1.286 0.343 3.755 0.000 0.109

Residual Variances

URBAN 0.712 0.101 7.026 0.000 0.120BABIES 0.593 0.105 5.626 0.000 0.183

R-SQUARE

URBAN 0.288 0.101 2.842 0.004 0.120

BABIES 0.407 0.105 3.867 0.000 0.183


58/127


59/127


A Closer Look at the Imputation Model


60/127

Multiple Imputation Using a General Imputation Model - 11 - Consistency


61/127

Stork Babies

Stork

Urban

Babies

Multiple Imputation Using a General Imputation Model - 12 - Non Consistency


62/127

Stork

Urban

Babies

Stork

Urban

Babies

Stork

*

Stork

Multiple Imputation Using a General Imputation Model - 13- Non Consistency


63/127

Stork

Urban

Babies

Stork Urban

Babies

Stork

*

Urban

Summary


64/127

Imputation model and statistical model

Does the imputation model render data that are missing at random?

Are the imputation model and the statistical model congeneal?

The combination of multiple imputation with estimator = ML is possiblein Mplus. The combination with estimator = Bayes is not possible.

References Missing Data

A non technical introduction to missing data analysis and multiple imputation can be found in:


65/127

A non-technical introduction to missing data analysis and multiple imputation can be found in:

Schafer, J.L. And Graham, J.W. (2002). Missing data: Our view of the state of the

art. Psychological Methods, 7, 147-177.

Classic books about missing data analysis and multiple imputation are

Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys.New York: Wiley.

Schafer, J.L. (1997).Analysis of incomplete multivariate data.London: Chapman & Hall.

A contemporary book is:

Van Buuren, S. (2012). Flexible imputation of missing data.Boca Raton: Chapmann & Hall/CRC.

An important paper with respect to consistency is:

Meng, X-L. (2002). Multiple imputation inferences with uncongenial sources of input.

Statistical Science, 9, 538-573.


Asparouhov, T. and Muthen, B. (2010). Multiple imputation with Mplus.

Mplusautomation is developed by Michael Hallquist. It can be found at www.statmodel.comunder the tab

How-To choose Using Mplus via R.
http://www.statmodel.com/http://www.statmodel.com/


66/127

Lecture 3: Model Fit

Model Fit 1 The Covariance Matrix


67/127

... ... ... ...

20 3 7 13

21 1 4 11

22 2 4 9

23 3 6 11

24 4 6 9

25 8 7 16

26 11 9 1627 5 3 7

28 5 5 8

29 11 6 14

30 6 6 10

31 7 5 11

32 8 8 1033 9 5 8

34 2 2 1

35 4 4 8

... ... ... ...


S U B

S 10. 7

U 4. 0 4. 8

B 3. 4 5. 1 12. 2

The observed covariance matrix displays the

relation between each pair of variables in the

data matrix.

The model implied covariance matrix is a

reconstruction of the observed covariance

matrix using the statistical model at hand.

Model Fit 2 What is model fit? Why is it important?

9 model parameters


68/127

Stork Urban Babies

Stork

Urban

Babies

Stork

Urban

Babies

S U B

S 10. 7

U 4. 0 4. 8

B 3. 4 5. 1 12. 2

S U B

S 10. 7

U 4. 0 4. 8

B 3. 4 5. 1 12. 2

S U B

S 10. 7

U 0 4. 8

B 0 0 2. 2

Observed = Model Implied

Model Implied

Model Implied

Covariance Matrices9 model parameters

7 model parameters

6 model parameters

Model Fit 3


69/127

The chi square test is computed for each statistical model. It is a function of

- The observed covariance matrix- The model implied covariance matrix

- The difference between the number of parameters of the current and the

saturated statistical model.

It is a measure of the size of the difference between the observed and implied

covariance matrices.

The larger the size of the difference, that is, the larger the chi square value, the

less a statistical model is able to reconstruct the observed covariance matrix.

The hypothesis that is tested using the chi square test states that

the observed covariance matrix can adequately be reconstructed bythe current statistical model.

Model Fit 4

Using the observed data


70/127

Stork Urban BabiesUsing the observed data

and the statistical model

at hand

Parameters are sampledM-V M-V ... M-V

Used to Replicate Data

and Impute observed

missings

Xobs-Xrep Xobs-Xrep ... Xobs-Xrep

Used to compute the

CHI-test using the

parameters and theobserved-imputed

and replicated data

CHIobs-CHIrep CHIobs-CHIrep ... CHIobs-CHIrep

The proportion of pairs in which CHIrep is larger than CHIobs is the posterior predictive p-value

Model Fit 5


71/127

Model Fit 6


72/127

Stork Urban Babies

MODEL FIT INFORMATION

Number of Free Parameters 6

Bayesian Posterior Predictive Checking using Chi-Square

95% Confidence Interval for the Difference Between

the Observed and the Replicated Chi-Square Values

48.046 71.430

Posterior Predictive P-Value 0.000

Posterior predictive p-values around .50 indicate a model that

for all practical purposes is well fitting. Note that this approach

provides a rough model check and not a classical evaluation of

an hypothesis using a p-value.

References Model Fit


73/127

This model fit test was proposed by:

Scheines, R., Hoijtink, H., and Boomsma, A. (1999). Bayesian Estimation and Testing

of Structural Equation Models. Psychometrika, 64, 37-52.

Who based it on the work by:

Gelman, A., Meng, X-L, and Stern, H. (1996). Posterior predictive assessment of model

fitness via realized discrepancies. Statistica Sinica, 6, 733-807.


Asparouhov, T. and Muthen, B. (2010). Bayesian analysis in Mplus: Technical Implementation.


74/127

Model Selection 1 Introduction


75/127

What is a model?

Stork

Urban

Babies

Stork

Urban

Babies

Stork

Urban

Babies



76/127

What is a model?

IQ

AA

LA

A

A

A A

A

A

L

L

L

L

L

L

h d l?



77/127

What is a model?

Stork Babies

Stork Babies

Babies = a + b stork + error

b

b

MODEL PRIORS:

a ~ N(4,1)

b ~ N(1,1)

MODEL PRIORS:

a ~ N(4,1)

b ~ N(4,1)


78/127

Lecture 4: Model Selection Using the Bayes Factor, BIC and DIC

What is the Goal of Model Selection?



79/127

What is the goal of model selection?

To select the best model from the models that are under consideration.

What is the best model?

There are multiple answers to this question. Later in this lecture we will introduce

two options:

The model that has the smallest distance to the true model (DIC)

The model that maximizes the probability of the data (Bayes factor and BIC)

But all answers involve an evaluation of the misfit and complexity of each model.



80/127

What if the models are all wrong?

What if the true model is not in the set of models under consideration?

All models are wrong but some are useful

Should the null-hypothesis be among the models under consideration?

Should the alternative hypothesis be among the models under consideration?

It can serve as a fail-safe for the models under consideration. A model withrestrictions is only a good model if it is better than the corresponding model

without restrictions.

=



81/127

Why is model selection consistent with the empirical cycle?

Observation (exploratory research!!)

Induction: from observations to a theory

Deduction: deriving testable consequences fromthe theory, that is, models or hypotheses

Testing: confrontation of models or hypotheses

with empirical data


Why is Bayesian inference consistent with the empirical cycle?


82/127

Why is Bayesian inference consistent with the empirical cycle?

Observation (exploratory research!!)

Induction: from observations to a theory

Deduction: deriving testable consequences from

the theory, that is, models or hypotheses

Testing: confrontation of models or hypotheses

with empirical data

Prior knowledge and

prior thinking

Plausible models, probably

not the true model

Select the best model =

the current state of knowledge

Remember the earth is flat, the earth is round, and the earth is shaped somewhat

like an American football. This too is sequential theory updating using new data as they

become available.


83/127


84/127


Information Criteria

Model Selection 1 Information Criteria


85/127

IC = misfit + complexity

The smaller the value of IC the better the model at hand. Because:

We like well-fitting models

We like parsimonious, that is specific, not-complex models because

we can derive good predictions from them

misfit is determined by the posterior distribution

of the model parameters

complexity is a function of the number of parameters in model

and the amount of information in the prior distribution

to illustrate the main features a number of examples will be given



86/127

x

y

What is the y-value?

?1

?2

?3



87/127

x

y


?1

?2

?3

What is the fit of this model?

What is the complexity of this model?



88/127

x

y


?1

?2

?3

What is the fit of this model?

What is the complexity of this model?

Model Selection 5 Information Criteria Stork can not Predict Babies


89/127

Stork Babiespopulation correlation = 0, N=100

Stork Babies Stork Babiescompeting models

DIC = 274.67

misfit = 268.45

par = 3.11

BIC = 282.30

misfit = 268.38

par = 3.00

DIC = 272.23

misfit = 268.65

par = 1.89

BIC = 277.61

misfit = 268.39

par = 2.00

Model Selection 6 Information Criteria Stork can Predict Babies


90/127

Stork Babiespopulation correlation = .6, N=100

Stork Babies Stork Babiescompeting models

DIC = 229.54misfit = 223.32

par = 3.11

BIC = 237.07

misfit = 223.25

par = 3.00

DIC = 273.48misfit = 269.70

par = 1.89

BIC = 278.86

misfit = 269.65

par = 2.00

Model Selection 7 Information Criteria DIC and BIC can not Evaluate Models that Differ inthe Prior


91/127

TITLE: Illustrate misfit

and complexity;

MONTECARLO:NAMES ARE y x;

NOBSERVATIONS = 10000;

NREPS = 1;

SEED = 123;

MODEL POPULATION:y ON x * .6;

[y * 0];

y * .64;

[x * 0];

x * 1;

analysis:

estimator = bayes;

MODEL PRIORS:

a ~ N(.6,.01);

MODEL: y ON x (a);

OUTPUT: TECH9;

Simulate a data matrix

Analyse the simulated data matrix

Specification of the

simulation model

Specification of the

simulation study

y = a + b x + error and error ~ N(0,s2)

var y = b**2 var x + s2

= .6**2 + .64

= 1.0

Why is b in this setup the correlation:

Model Selection 8 Information Criteria DIC and BIC can not Evaluate Models that Differ inthe Prior


92/127

MODEL PRIORS:

b ~ N(.6,.01)

MODEL PRIORS:

b ~ N(0,1000000)

Stork Babiespopulation

correlation = .6

MODEL PRIORS:

b ~ N(0,.01)

N = 10000

DIC = 24060.54

par = 2.98

BIC = 24082.21

par = 3.00

DIC = 24060.33

par = 2.99

BIC = 24081.98

par = 3.00

DIC = 24060.35

par = 3.00

BIC = 24081.98

par = 3.00

N = 500

DIC = 1198.10

par = 2.88

BIC = 1210.95

par = 3.00

DIC = 1194.66

par = 2.91

BIC = 1207.48

par = 3.00

DIC = 1194.90

par = 3.03

BIC = 1207.47

par = 3.00



93/127

Summary:

Complexity and (mis) fit

Complexity not adequate for models that differ in the prior but Bayes factor

can deal with this situation. One example will be given during the last day

of this courseDIC or BIC? Depends on missing values present or not. Depends on the error

rates obtained using DIC and BIC.


94/127


Error Rates

Model Selection 1 Error Rates

b


95/127

Stork Babiesb

M1: b = 0 DIC = 273

M2: b 0 DIC = 229

The conclusion is that M2 is a better model than M1

But how certain are we about this?

What are the probabilities of making an incorrect decision?

M1: b = 0 BIC = 278

M2: b 0 BIC = 237

deltaDIC = 44 deltaBIC = 41

M2: b 0M1: b= 0Populations:

Model Selection 2 Error Rates - Frequency Evaluations


96/127

... ...

Data Matrices

Sampled from

Populations

deltaDIC or deltaBIC xx xx ... xx xx xx ... xx

Model Selection 3 Error Rates Frequency Evaluations

M1: b = 0 DIC = 273M2: b 0 DIC = 229


97/127

DIC, 1000 replications

18% > 05% < 0

M2: b 0 DIC = 229

deltaDIC = 44

correlation = 0, N=100 correlation = .3, N=100DIC, 1000 replications

Model Selection 4 Error Rates Frequency Evaluations

M1: b = 0 BIC = 278

M2: b 0 BIC = 237


98/127

BIC, 1000 replications

3% > 0

19% < 0

correlation = 0, N=100 correlation = .3, N=100


M2: b 0 BIC = 237

deltaBIC = 41

Model Selection 5 Error Rates A Simple Alternative For Frequency Evaluations

TITLE: Error Rates; M1: b = 0 DIC = 273M2: b 0 DIC = 229

M1: b = 0 BIC = 278

M2: b 0 BIC = 237


99/127

MONTECARLO:

NAMES ARE y x;

NOBSERVATIONS = 100;

NREPS = 1000;

SEED = 123;

RESULTS = PopH0AnH1.txt;

MODEL POPULATION:

y ON x * .3; !! y ON x * 0;[y * 0];

y * .91; !! y * 1;

[x * 0];

x * 1;

analysis:estimator = bayes;

fbiter = 10000;

MODEL: y ON x; !! y ON x @ 0;

OUTPUT: TECH9;

M2: b 0 DIC = 229

deltaDIC = 44M2: b 0 BIC = 237

deltaBIC = 41

DIC, 1000 replications

correlation = 0, N=100

correlation = .3, N=100


deltaDIC = 285.38 - 277.08

= 8.30

deltaBIC = 290.66 - 284.97

= 5.69

deltaDIC = 285.48 286.51= -1.03

deltaBIC = 290.75 - 294.40= -3.65

Model Selection 5 Error Rates


100/127

Summary:

How to determine the populations from which to simulate data. Keep power

analysis in the back of you mind. It is closely related.

Mplus does not give the error rates. However, in combination with SPSS

error rates can be computed. In Exercise 7 from the lab-meeting you have the

opportunity to compute error rates in the context of multiple regression.Mplus give a very rough alternative for error rates.

The error rates discussed here are unconditional: What is the probability of

erroneous decisions if data matrices come from M1 or M2.

Very interesting and very Bayesian are conditional error rates: What is the

probability that M1 and M2 are true if deltaBIC is equal to 2.45 for the

observed data. However, these probabilities are beyond the scope of thisworkshop.

References Model Selection

An introduction to model selection can be found in


101/127

An introduction to model selection can be found in

Burnham, K.P. And Anderson, D.R. (2002). Model Selection and Multi-Model Inference.

New York: Springer.

The DIC was introduced by

Spiegelhalter, D. J.,Best, N. G., Carlin, B. P., and Linde, A. V. D. (2002). Bayesian Measures

of Model Complexity and Fit.Journal of the Royal Statistical Society, 64, 583639.

The BIC is elaborate in

Kass, R.E. and Raftery, A.E. (1995). Bayes factors.Journal of the American Statistical

Association, 90,773-795.

A comparison and overview can be found in

Hamaker, E.L., Hattum, P. van, Kuiper, R., and Hoijtink, H. (2010). Model Selection based on

information criteria in multilevel modelling. In. J. Hox and K. Roberts. Handbook of

Advanced Multilevel Modelling. London, Taylor and Francis.


102/127

Lecture 5: An Application of Model Selection

An Application of Model Selection 1


103/127

Introduction of the Twin data

and

Analysis of the first model


title: The Twin Data File;


104/127

data: file = twins.txt;

variable:

names = ID sex zygosity mothed fathed income eng1 eng2

math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

usev = mothed fathed eng1 eng2


missing = all(999);

model: fac by eng1 eng2 math1 math2 socsci1 socsci2 natsci1

natsci2 vocab1 vocab2;

fac on mothed fathed;

analysis:

estimator = bayes;process = 2;

fbiter = 10000;

point = median;

output: standardized(stdyx) tech1 tech3 tech8 cinterval(hpd);




105/127

M-ED F-ED

F

M1 E1 S1 N1 V1M2 E2

S2 N2 V2

Model: 1 Factor and Education



106/127

*** WARNING

Data set contains cases with missing on x-variables.These cases were not included in the analysis.

Number of cases with missing on x-variables: 26

1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

For model comparison all analyses must be based on the same number of persons.

Therefore you have to deal with the missing data if Mplus excluses persons from the

analysis like it does in this example.

If there are relatively few missings like here a quick solution is to do a single imutation

using a sensible imputation model.

If there are many missings you have to resort to the use of multiple imputation and

DIC4. However, that is beyond the context of this course and also in statistical science

an area that is under development.


title: Single Imputation of the Twin Data File;


107/127

data: FILE = twins.txt;

variable:

names = ID sex zygosity mothed fathed income eng1 eng2


usev = mothed fathed income eng1 eng2


auxiliary = ID sex zygosity;

missing = all(999);

data imputation:

impute = mothed fathed income eng1 eng2


ndatasets = 1;

thin = 1000;save = twinimp*.dat;

analysis: estimator = bayes;

fbiter = 10000;

proces = 2;



108/127

model: mothed with fathed income eng1 eng2


fathed with income eng1 eng2math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

income with eng1 eng2


eng1 with eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

eng2 with math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

math1 with math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

math2 with socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

socsci1 with socsci2 natsci1 natsci2 vocab1 vocab2;

socsci2 with natsci1 natsci2 vocab1 vocab2;

natsci1 with natsci2 vocab1 vocab2;

natsci2 with vocab1 vocab2;

vocab1 with vocab2;

output: tech8;



109/127

Analyse the first model using the single imputed data set


title: The Twin Data File;


110/127

data: file = twinimp1.dat;

variable:

names = mothed fathed income eng1 eng2 math1 math2

socsci1 socsci2 natsci1 natsci2 vocab1 vocab2 ID sex zygosity;

usev = mothed fathed eng1 eng2


missing = all(999);

model: fac by eng1 eng2 math1 math2 socsci1 socsci2 natsci1

natsci2 vocab1 vocab2;

fac on mothed fathed;

analysis:

estimator = bayes;process = 2;

fbiter = 10000;

point = median;

output: standardized(stdyx) tech1 tech3 tech8 cinterval(hpd);




111/127

In itself these numbers have no meaning. They can only be compared to the

same numbers computed for one or more competing models.


Information Criterion

Deviance (DIC) 46237.298

Estimated Number of Parameters (pD) 31.861

Bayesian (BIC) 46388.873



112/127

M-ED F-ED

F1

M1 E1 S1 N1 V1 M2 E2 S2 N2 V2


F2



113/127

Income

F

M1 E1 S1 N1 V1 M2 E2 S2 N2 V2

Model: 1 Factor and Income



114/127

F1

M1 E1 S1 N1 V1 M2 E2 S2 N2 V2

Model: 2 Factor and Income

F2

Income





115/127




Estim number of Par (pD) 31.861





Estim Number of Par (pD) 34.841


Model: 2 Factor and IncomeModel: 1 Factor and Income











116/127

Are the differences in BIC and DIC convincing?

Should we determine the error rates?

Should we determine the conditional error rates?



Estimate S D P Value Lower 2 5% Upper 2 5%


117/127


FAC1 BY

ENG1 0.765 0.016 0.000 0.732 0.796

MATH1 0.691 0.020 0.000 0.651 0.728SOCSCI1 0.862 0.011 0.000 0.840 0.883

NATSCI1 0.770 0.016 0.000 0.738 0.801

VOCAB1 0.850 0.012 0.000 0.827 0.873

FAC2 BY

ENG2 0.748 0.017 0.000 0.713 0.780

MATH2 0.739 0.018 0.000 0.703 0.772SOCSCI2 0.868 0.011 0.000 0.847 0.888

NATSCI2 0.762 0.016 0.000 0.729 0.793

VOCAB2 0.862 0.011 0.000 0.839 0.883

FAC1 ON

MOTHED 0.098 0.042 0.010 0.016 0.180

FATHED 0.236 0.041 0.000 0.154 0.316FAC2 ON

MOTHED 0.088 0.042 0.018 0.006 0.170

FATHED 0.256 0.041 0.000 0.177 0.336

FAC2 WITH

FAC1 0.870 0.013 0.000 0.843 0.895



118/127



Intercepts

ENG1 3.423 0.140 0.000 3.149 3.698

ENG2 3.733 0.145 0.000 3.445 4.015

MATH1 2.731 0.121 0.000 2.484 2.958

MATH2 2.785 0.126 0.000 2.537 3.033

SOCSCI1 3.450 0.148 0.000 3.151 3.732

SOCSCI2 3.502 0.149 0.000 3.201 3.786

Residual Variances

ENG1 0.415 0.025 0.000 0.367 0.464

ENG2 0.441 0.026 0.000 0.392 0.492

MATH1 0.523 0.027 0.000 0.470 0.577

MATH2 0.455 0.026 0.000 0.405 0.507SOCSCI1 0.256 0.019 0.000 0.220 0.295

SOCSCI2 0.246 0.018 0.000 0.211 0.283

FAC1 0.907 0.020 0.000 0.868 0.944

FAC2 0.900 0.020 0.000 0.858 0.937


R-SQUARE


119/127


Variable Estimate S.D. P-Value Lower 2.5% Upper 2.5%

ENG1 0.585 0.025 0.000 0.536 0.633

ENG2 0.559 0.026 0.000 0.508 0.608

MATH1 0.477 0.027 0.000 0.423 0.530

MATH2 0.545 0.026 0.000 0.493 0.595

SOCSCI1 0.744 0.019 0.000 0.705 0.780

SOCSCI2 0.754 0.018 0.000 0.717 0.789

NATSCI1 0.592 0.025 0.000 0.544 0.640

NATSCI2 0.580 0.025 0.000 0.531 0.629

VOCAB1 0.723 0.020 0.000 0.682 0.761

VOCAB2 0.742 0.019 0.000 0.703 0.778


Variable Estimate S.D. P-Value Lower 2.5% Upper 2.5%

FAC1 0.093 0.020 0.000 0.056 0.132

FAC2 0.100 0.020 0.000 0.063 0.142



120/127

And now the empirical cycle has to be restarted !!!!

References An Application of Model Selection


121/127

Loehlin, J.C. and Nichols, R.C. (1976). Genes, Environment and Personality.

Austin TX: University of Texas Press.


122/127

Lecture 6: Model Selection in the Presence of Missing Data

Model Selection and Missing Data 1



123/127

... ... ... ...

20 3 7 13

21 1 4 1122 999 4 9

23 3 6 11

24 4 6 9

25 8 7 16

26 11 999 999

27 5 3 728 5 5 8

29 999 6 14

30 6 6 10

31 7 5 999

32 8 8 10

33 999 999 834 2 2 1

35 4 4 8

... ... ... ...



Situation 1: The data are MAR when the statistical model is equal to the imputation model


124/127

Situation 1: The data are MAR when the statistical model is equal to the imputation model

In Mplus, both the misfit and the complexity of the DIC are computed using only

the observed data, and, parameter values sampled and estimated using the statistical

model to impute the missing values.

This is a valid procedure that can be used without hesitation.

DIC = misfit + complexity = misfit + estimated number of parameters



125/127

BIC = misfit + complexity = misfit + log N x P

In Mplus in the misfit of the BIC is computed using only the observed data, and,

parameter values sampled and estimated using the statistical model to impute

the missing values.

The complexity is estimated as the log of the number of persons multipliedwith the number of parameters in a statistical model. As to yet it is unknown how

N should be determined in the presence of missing data. Mplus uses the sample

size. But this is an ad-hoc and unmotivated choice.

Currently it is not advised to used the BIC in the presence of missing data.



126/127

Situation 2: The statistical model is consistent with the imputation model, and, given

the imputation model the missing values are MAR

Using a three step procedure Mplus can be used to compute the DIC accounting

for the fact that some of the data are missing:

1. Multiply impute the data using the imputation model.

2. For each imputed data matrix compute the DIC using Mplus

3. Average the DICs obtained for the M imputed data matrices

The results is DIC4 as discussed by Celeux at al. (2006). This is not the definite answer

to the computation of the DIC in the presence of missing data, but at least there is

some support for this approach in the scientific literature. One is well advised to use

the MonteCarlo approach from Mplus to evaluate in each new situation how well the

DIC4 performs. It is beyond the scope of this course to show how this can be done.

Note that using MplusAutomation this can relatively easily be implemented (as opposed

to doing this manually). However, this is also beyond the scope of this course.

References Model Selection and Missing Data


127/127

A paper about the computation of DIC in the presence of missing data

Celeux, G., Forbes, F., Robert, C.P., and Titterington, D.M. (2006). Deviance Information

Criteria for Missing Data Models. Bayesian Analysis, 1, 651-674.

A paper about the difference between the imputation and analysis model in the

context of missing data

Kuiper, R.M. and Hoijtink, H. (2011). How to Handle Missing Data for Predictor Selection in

Regression Models Using the AIC. Statistics Neerlandica, 65,489-506.

Mplusautomation is developed by Michael Hallquist. If you google for CRAN

MPLUSAUTOMATION you will find the website from which the R package anddocumentation can be downloaded.

Introduction to Bayesian SEM

Documents

Transcript of Introduction to Bayesian SEM