Power analysis methods for tests in latent class and latent Markov ...
Transcript of Power analysis methods for tests in latent class and latent Markov ...
Tilburg University
Power analysis methods for tests in latent class and latent Markov models
Gudicha, Dereje
Document version:Publisher's PDF, also known as Version of record
Publication date:2015
Link to publication
Citation for published version (APA):Gudicha, D. (2015). Power analysis methods for tests in latent class and latent Markov models Ridderkerk:Ridderprint
General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
- Users may download and print one copy of any publication from the public portal for the purpose of private study or research - You may not further distribute the material or use it for any profit-making activity or commercial gain - You may freely distribute the URL identifying the publication in the public portal
Take down policyIf you believe that this document breaches copyright, please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.
Download date: 02. Apr. 2018
POWER ANALYSIS METHODS FOR TESTS IN LATENT CLASS AND LATENT
MARKOV MODELS
c© 2015 Dereje W. Gudicha. All Rights Reserved.
Neither this thesis nor any part may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, microfilming, and recording,
or by any information storage and retrieval system, without written permission of the
author.
The research presented in this thesis was supported by a grant from The Netherlands
Organization for Scientific Research (NWO, grant number 406-11-039).
Printing was financially supported by Tilburg University.
ISBN: 978-94-6299-150-7
Printed by: Ridderprint BV, Ridderkerk, The Netherlands
Cover design: StudioLIN
POWER ANALYSIS METHODS FOR TESTS
IN LATENT CLASS AND LATENT MARKOV
MODELS
PROEFSCHRIFT
ter verkrijging van de graad van doctor aan
Tilburg University op gezag van rector magnificus,
prof.dr. E.H.L. Aarts, in het openbaar te verdedigen
ten overstaan van een door het college voor promoties
aangewezen commissie in de aula van de Universiteit
op woensdag 7 oktober 2015 om 10.15 uur
door
Dereje Waktola Gudicha
geboren op 18 maart 1982 te Yaya Gulele, Ethiopie
Promotor: prof. dr. J.K. Vermunt
Copromotors: dr. V.D. Schmittmann
dr. F.B. Tekle
Overige leden van de Promotiecommissie: prof. dr. J. de Vries
prof. dr. M.J. de Rooij
prof. dr. C.V. Dolan
dr. D.L. Oberski
dr. M. Moerbeek
Contents
List of Tables ix
List of Figures xi
1 Introduction 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The most important parameters and hypotheses . . . . . . . . . . . . . . 3
1.2.1 Measurement parameters . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Structural parameters . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Transition parameters . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.4 Number of classes . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 More on power analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Outline of the dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Power and Sample Size Computation for Wald Tests in Latent Class
Models 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
v
vi CONTENTS
2.2 The LC model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Wald based power analysis for LC models . . . . . . . . . . . . . . . . . 18
2.3.1 The Wald statistic and its asymptotic properties . . . . . . . . . . 18
2.3.2 Power and sample size computation . . . . . . . . . . . . . . . . 20
2.3.3 Design factors affecting the power or the required sample size . . 23
2.4 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1 Numerical study set up . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.3 Performance of the power computation procedure . . . . . . . . . 29
2.5 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.1 Elements of the information matrix in an LC model for binary
responses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.2 An example of the Latent GOLD setup for Wald based power
computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Statistical Power of Likelihood-Ratio and Wald Tests in Latent Class
Models with Covariates 37
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 The LC model with covariates . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Power and sample size computations . . . . . . . . . . . . . . . . . . . . 44
3.3.1 Calculating the non-centrality parameter . . . . . . . . . . . . . . 45
3.3.2 Power computation . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3.3 Sample size computation . . . . . . . . . . . . . . . . . . . . . . 47
3.4 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.1 Study set up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 57
CONTENTS vii
4 Power Computation for Likelihood-Ratio Tests for the Transition Parameters
in Latent Markov Models 61
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 The LM model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.1 Hypotheses specified on transition parameters . . . . . . . . . . . 68
4.2.2 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 The likelihood-ratio test . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4 Power computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.1 The standard case . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.2 The non-standard case . . . . . . . . . . . . . . . . . . . . . . . 75
4.5 Design factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.1 Numerical study set up . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.7 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.8.1 Latent GOLD syntax for power computation . . . . . . . . . . . . 89
5 Power Analysis for the Likelihood-Ratio Test in Latent Markov Models:
Short-cutting the Bootstrap p-value Based Method 97
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 The LM model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Power analysis for the BLR test . . . . . . . . . . . . . . . . . . . . . . . 105
5.3.1 Power computation . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.2 Sample size computation . . . . . . . . . . . . . . . . . . . . . . 109
5.4 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.4.1 Numerical study set up . . . . . . . . . . . . . . . . . . . . . . . 110
5.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.5 Discussion and conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 117
viii CONTENTS
6 Summary and discussions 121
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.2 Direction for future research and study limitations . . . . . . . . . . . . . 126
6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
References 129
Acknowledgments 141
List of Tables
1.1 Important parameters of latent class and latent Markov models . . . . . . 3
2.1 Entropy based R-square values for different combinations of latent class-
specific design factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Estimated power for different class separation levels and different sample
sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Required sample size for different configurations of latent class-specific
design factors and different power levels . . . . . . . . . . . . . . . . . . 28
2.4 Theoretical and simulated power of the Wald test . . . . . . . . . . . . . 30
3.1 The computed entropy R-square for different design cells . . . . . . . . . 50
3.2 The power of the Wald and the likelihood-ratio test to reject the null
hypothesis that the covariate has no effect on class membership in the
2-class model; the case of equal class proportions . . . . . . . . . . . . . 52
3.3 The power of the Wald and the likelihood-ratio test to reject the null
hypothesis that the covariate has no effect on class membership in the
3-class model; the case of equal class proportions . . . . . . . . . . . . . 53
ix
x LIST OF TABLES
3.4 The power of the Wald and the likelihood-ratio test to reject the null
hypothesis that the covariate has no effect on class membership; the case
of unequal class proportions and six indicator variables . . . . . . . . . . 54
3.5 Sample size requirements for the Wald test when testing the covariate
effect on class memberships for different power levels, class-indicator
associations, number of indicator variables, number of classes, class
proportions, and effect sizes. . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6 Theoretical versus empirical (H1-simulated) power values for the Wald and
likelihood-ratio tests to reject the null hypothesis that the covariate has
no effect on class membership, given the design conditions of interest . . 56
4.1 Typical hypotheses formulated on the transition parameters of the latent
Markov model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 The power of the likelihood-ratio test to reject the null hypothesis that
πr|s = πs|r in the 2-state latent Markov model . . . . . . . . . . . . . . 82
4.3 The power of the likelihood-ratio test to reject the null hypothesis that
the covariate has no effect on the transition probabilities . . . . . . . . . 83
4.4 The power of the likelihood-ratio test to reject the null hypothesis π2|1 = 0 84
4.5 Evaluating the quality of the large data set method for likelihood-ratio
power computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.1 Values of conditional response probabilities . . . . . . . . . . . . . . . . . 111
5.2 Power of the BLR test for H0 : S = 2 versus H1 : S = 3 . . . . . . . . . 114
5.3 The power of the BLR test for testing H0 : S = 3 versus H1 : S = 4. . . 115
5.4 Power of the BLR test according to the short-cut and the PBP method
for several 3-state LM population models . . . . . . . . . . . . . . . . . . 117
List of Figures
4.1 Distribution of the likelihood-ratio statistic under the null and alternative
hypotheses and the statistical power. . . . . . . . . . . . . . . . . . . . . 72
5.1 Power by sample size for a 3-state LM population model with varying levels
of the measurement parameters, equal initial state proportions, 6 response
variables, and 3 time points . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 Power by sample size for a 3-state LM population model with varying levels
of the transition parameters, equal initial state proportions, 6 response
variables, and 3 time points . . . . . . . . . . . . . . . . . . . . . . . . . 117
xi
CHAPTER 1
Introduction
1.1 Introduction
Statistical models for studying the presence of subgroups within an overall population have
a history that goes back to 1894 when Karl Pearson proposed using a mixture of normal
distributions to demonstrate that a crab population consisted of two subspecies (Pearson,
1894). Much later, Lazarsfeld (1950) and Wiggins (1973) showed the utility of this
approach for social and behavioral sciences by proposing mixture models for categorical
responses which are currently known as latent class and latent Markov models. The
latent Markov model, also referred to as hidden Markov model (Rabiner, 1989) or latent
transition model (Collins & Wugalter, 1992), represents the longitudinal data variant of
the latent class model, where the focus is typically on describing the transitions between
classes at successive measurement occasions (Bartolucci, Farcomeni, & Pennoni, 2010;
Van de Pol & De Leeuw, 1986). It was not until the mid of 1990s that these models
began attracting the attention of both statisticians and applied researchers.
1
2 CHAPTER 1. INTRODUCTION
In recent years, the interest for latent class and latent Markov models has increased
greatly, among others because of the various useful extensions of the basic models and
the general availability of statistical packages implementing these models. Important
extensions include models for variables of different scale types (e.g., nominal, continuous,
ordinal, counts) (Vermunt & Magidson, 2002), models containing time-constant and/or
time-varying covariates (Bartolucci & Farcomeni, 2009; Dayton & Macready, 1988;
Vermunt, Langeheine, & Bockenholt, 1999), models that relax the local-independence
assumption (Hagenaars, 1988), and models with constraints on the parameters of interest
(Bartolucci, 2006; Mooijaart & Van der Heijden, 1992). These extensions, together with
the widespread availability of software packages for mixture modeling, such as Mplus
(L. Muthen & Muthen, 1998-2007), Latent GOLD (Vermunt & Magidson, 2013a), and
routines written for SAS (Lanza & Collins, 2008), R (e.g., FlexMix (Leisch, 2004), polca
(Linzer & Lewis, 2011), and depmixS4 (Visser & Speekenbrink, 2010)), and Stata (Rabe-
Hesketh, Skrondal, & Pickles, 2004), make it possible to successfully apply these models
in both cross-sectional and longitudinal studies.
Despite the more wide-spread use of latent class and latent Markov models, little is
still known about the study design requirements to successfully apply these techniques.
More specifically, statisticians have difficulty answering questions concerning the required
sample size, number and quality of response variables, and/or number of measurement
occasions to achieve sufficient statistical power of the tests used when applying these
methods. Since methods for assessing the statistical power of the performed tests are
currently lacking for mixture models in general and for latent class and latent Markov
models in particular, these techniques are often applied in a suboptimal manner.
The aim of this dissertation is to fill this important gap in the literature by developing
power analysis methods for the most important tests applied when using mixture models
for categorical response variables. Methods are described for a) determining the data
requirements to achieve a certain (acceptable) power level – for example, for determining
the necessary sample size or number of measurement occasions to achieve a power of
1.2. THE MOST IMPORTANT PARAMETERS AND HYPOTHESES 3
.8 or larger – and b) performing power calculations to evaluate whether a specific study
design yields an appropriate power level for the statistical tests of interest. An additional
objective is to learn more about the design factors affecting the power of statistical tests
in latent class and latent Markov analysis, which will make it easier to design studies with
sufficient power, and thus to use available resources more efficiently.
1.2 The most important parameters and hypotheses
Various types of parameters for which one would like to perform statistical tests can be
distinguished in latent class and latent Markov models. Table 1.1 presents a classification
of the important types of parameters and the typical aim of hypotheses about these
parameters. The parameters of main interest are the measurement parameters, the
structural parameters in models with covariates, the transition parameters in latent Markov
models, and the number of latent classes or latent states. Researchers using latent class
or latent Markov models will usually perform statistical tests for some of these parameters.
The most common tests are shown in the last column of Table 1.1. Below we discuss
these tests in more detail.
Table 1.1: Important parameters of latent class and latent Markov models
typical researcher’s aims when testingparameters of interest hypotheses concerning these parameters
measurement parameters determining the structure of classesstructural parameters assessing covariate-class associationstransition parameters describing transitions between states
number of classes determining the number of classes
1.2.1 Measurement parameters
The measurement parameters of latent class and latent Markov models define the class-
specific conditional response probabilities and describe the association between the latent
classes and the (observed) indicator variables. They aid in the interpretation of the
classes. Hypotheses about these parameters often concern the structure of classes; that
4 CHAPTER 1. INTRODUCTION
is, in the differences in response probabilities between or within latent classes. Examples
include testing the null hypothesis that response probabilities are equal across latent
classes, equal across indicator variables, or equal to specific values. The first hypothesis
finds applications in assessing statistical dependence between two qualitative variables,
here latent variable X with labels x = 1, 2, 3, ...C and the indicator variable Yj (for
j = 1, 2, 3, ....P ). The second hypothesis is useful, for example, when testing whether
indicator variables have equal error rates (Goodman, 1974; McCutcheon, 2002). Other
substantive applications include hypotheses concerning the equality of sensitivities and
specificities across tests, or the equality of sensitivities and specificities to certain values
(I. Yang & Becker, 1997).
As in standard logistic regression analysis (Agresti, 2007), null hypothesis significance
testing can be performed using Wald, likelihood-ratio, or score tests. Under certain
regularity conditions, these three test statistics are asymptotically equivalent, each
following a central chi-square distribution under the null hypothesis and a non-central
chi-square under the alternative hypothesis (Buse, 1982). When discussing tests on
hypotheses about the measurement parameters, in the thesis the focus is on the Wald
test. We discuss how to use its asymptotic distribution under the null and the alternative
hypotheses to compute the power or the sample size.
1.2.2 Structural parameters
In latent class models, the structural parameters refer to parameters describing how
the encountered latent classes are related to covariates, also referred to as explanatory
variables, predictors, external variables, independent variables, or concomitant variables
(Dayton & Macready, 1988). In latent Markov models, covariates may affect the latent
states at the different measurement occasions. Typically, these covariate effects are
modelled using multinomial logistic regression equations. Null significance testing for
the corresponding logistic parameters is usually done using either likelihood-ratio or
Wald tests. Examples of applications include assessing the significance of the effect of
1.2. THE MOST IMPORTANT PARAMETERS AND HYPOTHESES 5
maternal education on latent class memberships of positive health behaviors (Collins &
Lanza, 2010), the effect of education on latent class memberships of political orientations
(Hagenaars & McCutcheon, 2002), and the effect of age on latent class memberships
of crime delinquencies (Van der Heijden, Dessens, & Bockenholt, 1996). See also
Reboussin, Reboussin, Liang, and Anthony (1998) and Vermunt et al. (1999), who
presented applications of latent Markov models with covariates.
1.2.3 Transition parameters
In latent Markov models without covariates, not only the measurement parameters but
also the transition parameters describing the change in latent state membership over time
are important. Applications of hypotheses on the transition parameters include studies on
patients’ health status change over time (Bartolucci et al., 2010), youngsters’ substance
use behavior development over time (Jackson & Schulenberg, 2013), women’s dietary
pattern change over time (Sotres-Alvarez, Herring, & Siega-Riz, 2013), and smokers’
movement through a series of stages in their efforts to quit smoking (Martin, Velicer,
& Fava, 1996). One may also be interested in testing whether these transitions differ
between groups in the population; for example, whether substance use transitions differ
between males and females or whether health status transitions differ between a treatment
and a control group.
Hypotheses about the transition parameters are generally tested by using likelihood-
ratio tests, for which the asymptotic distribution under the null and alternative hypothesis
can be derived. However, when the null hypothesis involves setting one or more transition
probabilities to zero, i.e. on their boundary value, the asymptotic results for the likelihood-
ratio test do not hold anymore (Bartolucci, 2006). This implies that in this non-standard
situation, asymptotic distributions cannot be used for null significance testing or power
computation. Instead, simulation methods need to be used.
6 CHAPTER 1. INTRODUCTION
1.2.4 Number of classes
Thus far, we assumed that the number of latent classes or states is known. However, in
most applications this number is unknown, in which case the most important statistical
tests concern the number of classes. In principle, hypotheses about the number of classes
can be tested using likelihood-ratio tests. However, the usual asymptotic chi-square
distribution of the likelihood-ratio statistic does not hold when testing a model with C
classes against a model with C+ 1 classes. As an alternative, bootstrap based likelihood-
ratio tests have been suggested (McLachlan, 1987). Another option is to guide the
selection of the number of classes by making use of information criteria (IC) such as the
Akaike IC (Akaike, 1974), the Bayesian IC (Schwarz et al., 1978), and adjusted forms
of these ICs (e.g.,the penalized Akaike IC (Bozdogan, 1994), the consistent Akaike IC
(Bozdogan, 1987), and the sample size adjusted Bayesian IC (Sclove, 1987)). Because
these ICs lack the logic of the null hypothesis significance testing, here we focus on power
computation for the bootstrap likelihood-ratio test.
1.3 More on power analysis
For the development of power analysis methods for tests used in latent class and latent
Markov modeling, we can use input from two fields. The first is the field of mixture
modeling itself in which numerous simulation studies have been published on factors
affecting the correct estimation of the number of classes (Bacci, Pandolfi, & Pennoni,
2014; Bartolucci & Farcomeni, 2009; Collins & Wugalter, 1992; Dias & Goncalves, 2004;
Tofighi & Enders, 2008; Fonseca & Cardoso, 2007; Lukociene, Varriale, & Vermunt, 2010;
McLachlan & Peel, 2000; Nylund, Asparouhov, & Muthen, 2007; C. Yang, 2006). The
aim of most of these simulation studies was to determine which statistic – information
criteria (e.g., Akaike IC, Bayesian IC, etc.) or likelihood-ratio tests – is best able to select
the model with the correct number of classes under a variety of conditions. From these
studies we also know that the ability to find the correct number of classes is not only
1.3. MORE ON POWER ANALYSIS 7
affected by the type of statistic that is used, but also by the sample size, the differences
between the classes (or effect sizes), the number of observed response variables, the scale
types of these variables, the number of measurement occasions, the number of classes, and
the class sizes. Some of these factors, such as sample size and effect size, are relevant
for the power of any statistical test, whereas others are specific for mixture modeling.
However, based on these results it is still not clear how to compute the likelihood of
finding the correct number of classes by manipulating particular factors conditional on
other ones, which is what is needed to set up a study with a certain power level.
The second relevant field is the field of power calculation methods for other types
of analyses, such as log-linear analysis (O’Brien, 1986; Shieh, 2000), logistic regression
analysis (Demidenko, 2007; Whittemore, 1981), and structural equation modeling
(R. MacCallum, Lee, & Browne, 2010; Satorra & Saris, 1985). It should be noted that
latent class and latent Markov models with categorical indicator variables are similar to
log-linear models and with covariates included to logistic regression models, with the
“only” difference that the class membership is not directly observable. Because of these
similarities, for tests concerning the measurement, transition, and structural parameters,
we adapt the power analysis methods developed for log-linear and (multinomial) logistic
regression models. For tests concerning the number of classes, the likelihood-ratio power
analysis methods which are also used in structural equation modeling will be adapted (see
for example, R. MacCallum et al. (2010) and Satorra and Saris (1985)).
Specific aspects that should be addressed when implementing the existing power
analysis methods for mixture models are the following: a) In latent class and latent Markov
models, class membership is not directly observable. Factors affecting uncertainty about
the individuals’ class memberships are expected to affect the statistical power of the
tests, and therefore the power analysis method should take this into account. b) In some
applications of latent class and latent Markov models, the null hypotheses of interest are
specified by setting probabilities to zero, e.g. when testing the absence of transitions to
a particular state (Bartolucci, 2006). In such non-standard situations, testing and power
8 CHAPTER 1. INTRODUCTION
analysis methods which are based on the theoretical distributions of the test statistic
concerned do not apply. Whereas in other situations one may rely on the theoretical
distribution of the test statistic, calculating power using these theoretical distributions
requires us to specify the non-centrality parameter, which is generally not known. c) The
gold standard for significance testing of the null hypothesis with C- class model, against
the alternative with C + 1-class model, is the bootstrap likelihood-ratio test (McLachlan,
1987). Null significance testing using the bootstrap method involves using the parameter
estimates of the C-class model to generate simulated (also called bootstrap) data sets.
Both the C- and C + 1-class models are then fitted to these bootstrap data sets and the
likelihood-ratio value is obtained by computing the log-likelihood difference between the
two models, yielding the empirical distribution of the likelihood-ratio statistic under the
null hypothesis from which one can read the p-value (Nylund et al., 2007). For power
computation, not only the distribution under the null hypothesis but also the distribution
under the alternative hypotheses is required, which can also be constructed by simulation.
In practice, this means one should repeat the full bootstrap procedure for multiple samples
taking the C + 1- class model as a population model. Given the fact that the bootstrap
procedure itself is already computationally demanding, computing the power or required
sample size by performing the full bootstrap for multiple samples from the model under
the alternative hypothesis is generally not feasible.
1.4 Outline of the dissertation
This dissertation consists of four journal articles dealing with power and sample size
computation for tests concerning parameters of latent class and latent Markov models.
Whereas the chapters can be read independently, this also creates some overlap and
sometimes also slight inconsistencies in notation.
In Chapter 2, we study power analysis for tests concerning the measurement parameters
of latent class models. This chapter provides sample size and power computation methods
for the Wald test. Furthermore, we study design factors affecting the power of –and the
1.4. OUTLINE OF THE DISSERTATION 9
required sample size for –the Wald test. As always, it can be expected that power is
affected by the level of significance, the sample size, and the effect size (Cohen, 1988).
Other relevant factors are the number of classes, the class proportions, and the number of
indicator variables. In this chapter we also examine how to achieve a design with a certain
power level by manipulating these factors. A numerical study is presented in which we
assess the performance of the proposed method and illustrate the power and sample size
computation method, considering different scenarios for the study design.
In Chapter 3, we extend the Wald based power analysis method for measurement
parameters from Chapter 2 to be applicable to the structural parameters in latent class
models with covariates. When testing hypotheses about the structural parameters,
the likelihood-ratio test is sometimes used instead of the Wald test. In this chapter,
we therefore also present power analysis methods for the likelihood-ratio test, as well
as compare the statistical power of the likelihood-ratio and Wald tests for hypotheses
concerning the logit parameters in latent class models with covariates. The study design
and population characteristics affecting the power of these two tests are addressed as well.
In Chapter 4, we study power analysis methods for testing hypotheses about the
transition parameters in latent Markov models. Two types of situations are considered.
The first concerns the standard situation where the test statistic follows a known
theoretical distribution (i.e., chi-square distribution for the likelihood-ratio statistic),
implying that also power computation can be based on this theoretical distribution. The
second situation concerns power computation for the non-standard tests, which arises
when probabilities are fixed to zero. For the former case, we propose the exemplary
data set and large simulated data set methods for obtaining the non-centrality parameter.
For the non-standard case, we discuss power computation by Monte Carlo simulation.
Factors affecting the power of the tests are identified, and the power analysis methods
are illustrated with numerical experiments.
In Chapter 5, we present power analysis methods for the bootstrap likelihood-ratio
test, with a special emphasis on the number of states in latent Markov models. As
10 CHAPTER 1. INTRODUCTION
always, power can be computed as the proportion of the bootstrap p-values (PBP) for
which the null hypothesis is rejected. Such a method is computationally very demanding
as it requires performing the full bootstrap for multiple samples of the model under the
alternative hypothesis. We propose solving this computational time problem using a short-
cut method, in which the distributions of the test statistic under the null and alternative
hypotheses are constructed by simulation. A numerical study is conducted to (a) illustrate
the proposed power analysis methods and (b) compare the power estimate of the short-cut
method to the one of PBP method.
In Chapter 6, we provide a concluding discussion, describe directions for future
research, and discuss limitations with respect to the specific contribution of this
dissertation.
CHAPTER 2
Power and Sample Size Computation for Wald Tests in Latent
Class Models
Abstract
Latent class (LC) analysis is used by social, behavioral, and medical science researchers
among others as a tool for clustering (or unsupervised classification) with categorical
response variables, for analyzing the agreement between multiple raters, for evaluating
the sensitivity and specificity of diagnostic tests in the absence of a gold standard, and for
modeling heterogeneity in developmental trajectories. Despite the increased popularity
of LC analysis, little is known about statistical power and required sample size in LC
modeling. This chapter shows how to perform power and sample size computations
This chapter has been accepted for publication as: Gudicha, D.W., Tekle, F. B., & Vermunt, J.K. (in press). Power and Sample Size Computation for Wald Tests in Latent Class Models. Journal ofClassification.
11
12 CHAPTER 2. POWER FOR WALD TESTS
in LC models using Wald tests for the parameters describing the association between
the categorical latent variable and the response variables. Moreover, the design factors
affecting the statistical power of these Wald tests are studied. More specifically, we show
how design factors which are specific for LC analysis, such as the number of classes,
the class sizes, and the number of response variables, affect the information matrix.
The proposed power computation approach is illustrated using different scenarios of the
relevant design factors. A simulation study conducted to assess the performance of the
proposed power analysis procedure shows good performance in all situations that may be
encountered in practice.
2.1. INTRODUCTION 13
2.1 Introduction
Latent class (LC) analysis was initially introduced in the 1950s by Lazarsfeld (1950) as a
tool for identifying subgroups of individuals giving similar responses to sets of dichotomous
attitude questions. It took another two decades before LC analysis started attracting the
attention of other statisticians. Since then, various important extensions of the original
LC model have been proposed, such as models for polytomous responses, models with
covariates, models with multiple latent variables, and models with parameter constraints
(Dayton & Macready, 1976, 1988; Formann, 1982, 1992; Goodman, 1974; Magidson &
Vermunt, 2004; McCutcheon, 1987; Vermunt, 1996). More recently, statistical software
for LC analysis has become generally available – e.g., Latent GOLD (Vermunt & Magidson,
2013b), Mplus (L. Muthen & Muthen, 1998-2007), LEM (Vermunt, 1997), the SAS
routine PROC LCA (Lanza, Collins, Lemmon, & Schafer, 2007), and the R package
poLCA (Linzer & Lewis, 2011) – which has contributed to the increased popularity of this
model among applied researchers. Applications of LC analysis include building typologies
of respondents based on social survey data (McCutcheon, 1987), identifying subgroups
based on health risk behaviors (Collins & Lanza, 2010), identifying phenotypes of stalking
victimization (Hirtenlehner, Starzer, & Weber, 2012), and finding symptom subtypes
of clinically diagnosed disorders (Keel et al., 2004). Applications which are specific for
medical research include the estimation of the sensitivity and specificity of diagnostic tests
in the absence of a gold standard (Rindskopf & Rindskopf, 1986; I. Yang & Becker, 1997)
and the analysis of the agreement between raters (Uebersax & Grove, 1990).
Despite the increased popularity of LC analysis in a broad range of research areas,
no specific attention has been paid to power analysis for LC models. However, as in the
application of other statistical methods, users of LC models wish to confirm the validity
of their research hypotheses. This requires that a study has sufficient statistical power;
that is, that it is able to confirm a research hypothesis when it is true. Also reviewers
of journal publications and research grant proposals often request sample size and power
computations (Nakagawa & Foster, 2004). However, in the literature on LC analysis,
14 CHAPTER 2. POWER FOR WALD TESTS
methods for sample size and power computation are lacking as well as a thorough study
on the design factors affecting the power of statistical tests used in LC analysis .
In this chapter, we present a method for assessing the power of tests related to
the class-specific response probabilities, which are the parameters of main interest in
confirmatory LC analysis. Relevant tests include tests for whether response probabilities
are equal across latent classes, whether response probabilities are equal to specific
values, whether response probabilities are equal across response variables (indicators),
and whether sensitivities or specificities are equal across indicators (Goodman, 1974; Holt
& Macready, 1989; Vermunt, 2010b). Since the class-specific response probabilities are
typically parameterized using logit equations (Formann, 1992; Vermunt, 1997), as in
logistic regression analysis, hypotheses about these LC model parameters can be tested
using Wald tests (Agresti, 2007). The proposed power analysis method is therefore
referred to as a Wald based power analysis.
For logistic regression models, Demidenko (2007, 2008) and Whittemore (1981)
described the large-sample approximation for the power of the Wald test. In this chapter,
we show how to use this procedure in the context of LC analysis. An important difference
compared to standard logistic regression analysis is that in a LC analysis the predictor
in the logistic models for the responses, the latent class variable, is unobserved. This
implies that the uncertainty about the individuals’ class memberships should be taken into
account in the power and sample size computation. As will be shown, factors affecting this
uncertainty include the number of classes, the class sizes (or proportions), the strength
of the association between classes and indicator variables, and the number of indicator
variables (Collins & Lanza, 2010; Vermunt, 2010a).
The remainder of this chapter is organized as follows. First, we present the LC model
for dichotomous responses and discuss the relevant hypotheses for the parameters of the
LC model. Second, we discuss power computation for Wald tests in LC analysis and,
moreover, show how the LC specific design factors affect the power via the information
matrix. Third, we present a numerical study in which we assess the performance of the
2.2. THE LC MODEL 15
proposed method and illustrate power/sample size computation for different scenarios of
the relevant design factors. Finally, we provide a brief discussion of the main results of
our study.
2.2 The LC model
The LC model is a probabilistic clustering or unsupervised classification model for
dichotomous or categorical response variables (Goodman, 1974; Hagenaars, 1988;
Magidson & Vermunt, 2004; McCutcheon, 1987; Vermunt, 2010b). Taking the
dichotomous case as an example, let yij be the value of response pattern i for the binary
variable Yj , for j = 1, 2, 3, ..., P , where yij = 1 represents a positive response and 0 a
negative response. We denote the full-response vector by yi. For example, for P = 3, yi
takes on one of the following eights triplets of 0 and/or 1’s:
{(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)}.
The three response variables could, for example, represent the answers to the following
questions: “Do you support gay marriage?”, “Do you support a raise of minimum wages?”,
and “Do you support the initiative for health care reform?”In a sample of size n persons,
a particular person could answer these questions with ‘no’, ‘yes’, and ‘yes’, respectively, in
which case the response pattern for this subject becomes (0, 1, 1). In such an application,
the aim of the analysis would be to determine whether one can identify two latent classes
with different response tendencies (say republicans and democrats), and subsequently to
classify subjects into one of these classes based on their observed responses, or to compare
the probability of positive responses to a given response variable between the republican
and the democrat classes.
In general, for p dichotomous response variables, we have 2P tuples of 0 and/or 1’s.
We denote the number of individuals with response pattern yi by ni, where the total
sample size n =∑2P
i=1 ni. The LC model assumes that the response probabilities depend
16 CHAPTER 2. POWER FOR WALD TESTS
on a discrete latent variable, which we denote by X with categories t = 1, 2, 3, ..., C.
The probability of having response pattern yi is modeled as a mixture of C class-specific
probability functions (Dayton & Macready, 1976; Goodman, 1974; McCutcheon, 1987;
McLachlan & Peel, 2000; Vermunt, 2010b). That is,
p(yi,Ψ) =
C∑t=1
p(X = t)p(Y = yi|X = t), (2.1)
where p(X = t), which we also denote by πt, represents the relative size of class t,
and p(Y = yi|X = t) is the corresponding class-specific joint response probability.
The class-specific probabilities for binary variable Yj is usually modeled using a logistic
parameterization; that is, θjt = p(Yj = 1|X = t) =exp (βjt)
1+exp (βjt), where βjt is the log-odds
of giving a positive response on item j in class t. Moreover, assuming that the response
variables are independent within classes – which is referred to as the local independence
assumption – the LC model represented by equation (2.1) can be rewritten as follows:
p(yi,Ψ) =
C∑t=1
πt
P∏j=1
θyijjt (1− θjt)1−yij , (2.2)
where πt is such that 0 < πt < 1 and∑Ct=1 πt = 1. The vector of parameters Ψ
consists of the sub-vector π, the class sizes, and the sub-vector β, the class-specific
logits for the indicator variables (also refer to as the measurement parameters). For
example, for C = 2 and P = 3, the parameter vector will be: Ψ′
= (π′,β′) =
(π1, β11, β21, β31, β12, β22, β32). In the application presented above, these parameters
would correspond to the proportion of ‘republicans’, the log-odds of a republican responds
‘yes’ instead of ’no’ to questions Y1, Y2, and Y3, and the log-odds of a democrat responds
‘yes’ instead of ’no’ to questions Y1, Y2, and Y3.
In general, for a LC model having c classes and p binary indicator variables, we have
m = C − 1 + C · P free model parameters. These parameters are usually estimated
by maximum likelihood (ML) (Dayton & Macready, 1976; Goodman, 1974; McLachlan
& Peel, 2000; Vermunt, 2010b), which involves seeking the values of Ψ, say Ψ, which
2.2. THE LC MODEL 17
maximize the log-likelihood function:
l(Ψ) =
2P∑i=1
ni log p(yi,Ψ). (2.3)
Maximizing the log-likelihood function in equation (2.3) produces a unique estimate
for Ψ, provided that the LC model in equation (2.1) is identifiable. As indicated by
(Goodman, 1974), a necessary condition for an LC model to be identified is that the
number of independent response patterns is at least as large as the number of free model
parameters. That is, 2P − 1 ≥ m = C − 1 + C · P . A sufficient condition for local
identification is that the Jacobian is full rank (McHugh, 1956). Because the analytic
evaluation of the rank of the Jacobian is very difficult, Forcina (2008) proposed checking
identification of LC models by evaluating the rank of the Jacobian for a large number
of random parameter values. For the scenarios considered in this chapter we applied
Forcina’s method, which showed that the models were identified.
Typically, researchers using LC models do not only wish to obtain point estimates
for the Ψ parameters, but are also interested in tests concerning these parameters.
For simplicity we will focus on a single type of test, which in most applications is the
test of main interest. That is, the hypothesis testing to determine whether there is
a significant association between the latent classes and a particular indicator variable.
Inference regarding this association involves testing the null hypothesis that the response
logit does not differ across latent classes for the indicator variable concerned. This null
hypothesis can be formulated as H0 : βj1 = βj2 = ... = βjc, for j = 1, 2, 3, ..., P . An
equivalent formulation of this hypothesis is
H0 : βj1 − βj2 = 0
βj1 − βj3 = 0
...
βj1 − βjc = 0
Or, using matrix notation, as H0 : Hβj = 0, where H is a C − 1 by C design
18 CHAPTER 2. POWER FOR WALD TESTS
matrix with linear contrasts and βj is a C by 1 column vector with the parameters for Yj ,
i.e., β′
j = (βj1, βj2, ..., βjC). Under the null hypothesis of no association, the difference
βj1 − βjt occurs by chance alone, implying that the indicator does not contribute to the
definition of classes in a statistically significant way.
As already indicated in the introduction section, various other types of hypotheses
concerning the class-specific logit parameters may be of interest. Examples include tests
for whether βjt is equal to a particular value (e.g., β11 = 1), whether the βjt parameters
are equal across two or more items (e.g., β1t − β2t = 0), and whether the value is the
opposite of the value for another class (e.g., β11 + β12 = 0) (Goodman, 1974). In
medical research, we may be interested in comparing the sensitivity and specificity of
diagnostic tests (see, for example I. Yang and Becker (1997)), yielding hypotheses such
as β11 − β21 = 0 and β12 − β22 = 0, respectively. Note that all these hypotheses can be
expressed in the general form Hβ = 0.
2.3 Wald based power analysis for LC models
2.3.1 The Wald statistic and its asymptotic properties
One of the properties of the ML estimator is that, under certain regularity conditions
(McHugh, 1956; White, 1982), the estimator Ψ converges in probability to Ψ as the
sample size tends to infinity. That is, for any sequence Ψn we have Ψna.s.−−→ Ψ. The
other interesting property of the ML estimator is that it has a limiting normal distribution.
More specifically, for large sample size n,
√n(Ψn −Ψ) −→ N(0,V), (2.4)
where −→ denotes convergence in distribution, V = I−1(Ψ) is the asymptotic co-variance
of√nΨn, and I(Ψ) is the m by m information matrix (McHugh, 1956; Redner, 1981;
Rencher, 2000; Wald, 1943; Wolfe, 1970). The latter has the following block structure:
2.3. WALD BASED POWER ANALYSIS FOR LC MODELS 19
I(Ψ) =
I1 = {(πt, πs)} I2 = {(πt, βjl)}
I3 = {(βjq, πs)} I4 = {(βjq, βkl)}
,for t, s = 1, 2, 3, ...., C − 1, l, q = 1, 2, 3, ...., C and k, j = 1, 2, 3, ..., P . The sub-matrices
I1, I2, I3, and I4 are of dimensions C − 1 by C − 1, C − 1 by C ·P , C ·P by C − 1, and
C · P by C · P , respectively. The terms between braces indicate the parameters involved
in the sub-matrix concerned.
Using the algebraic properties of block matrices, it follows that
V = I−1(Ψ) =
A−1 −I−11 I2B−1
−I−14 I3A−1 B−1
, (2.5)
where A = I1 − I2I−14 I3 and B = I4 − I3I
−11 I2. A necessary condition for A to be
invertible, which is a requirement to obtain the covariance matrix of Ψn, is that both I1
and I4 are non-singular matrices (Rencher, 2000). In the appendix, we provide details on
the expressions for I1, I2, I3, and I4.
The consistency and multivariate normality discussed above apply to the estimators
of the component parameters as well. That is, using the property of multivariate normal
random variables which states that the sub-vectors of a multivariate normal are also
normal, the limiting distributions of π and β become
√n(πn − π) −→ N(0,A−1) (2.6)
√n(βn − β) −→ N(0,B−1). (2.7)
Also sub-vector βj of β is normally distributed, with mean βj and with co-variance Vj ,
being a C by C sub-matrix of B−1. In the remaining part of the paper, we focus on this
βj .
Using the Continuous Mapping Theorem (Mann & Wald, 1943), for a design matrix
H that defines the contrasts on the null hypothesis, one can show that Hβj −→
20 CHAPTER 2. POWER FOR WALD TESTS
N(Hβj ,HVjH′). The quadratic form of the test for the hypothesis H0 : Hβj = 0
yields the well-known Wald statistic:
W = n(
(Hβj)′(HVjH
′)−1(Hβj)
). (2.8)
Under the null hypothesis, that is, if H0 : Hβj = 0 holds, the Wald statistic W has
an asymptotic (central) chi-square distribution with C − 1 degrees of freedom (Rencher,
2000; Wald, 1943). That is,
W = n(
(Hβj)′(HVjH
′)−1(Hβj)
)−→ χ2
(C−1). (2.9)
Under the alternative hypothesis, W follows a non-central chi-square distribution with
C − 1 degrees of freedom and non-centrality parameter λ. That is,
W = n(
(Hβj)′(HVjH
′)−1(Hβj)
)−→ χ2
(C−1,λ) (2.10)
where λ = n(Hβj)′(HVjH
′)−1(Hβj).
2.3.2 Power and sample size computation
With the establishment of the distribution of the test statistic under the null and
alternative hypotheses and the availability of a closed form expression for the non-centrality
parameter λ, it becomes possible to compute the power of the test for a given sample size
or the sample size for a given power. As in any power analysis, we first have to define the
population model. In our case, this involves defining the number of classes and the number
of response variables, and, moreover, specifying the values for the class sizes π and the
class-specific logits β. For the assumed population model, we can compute the inverse
information matrix V which appears in the formula of the non-centrality parameter.
Once the population parameters are set and V is computed, power computation for a
given sample size and required sample size computation for a given power proceeds along
2.3. WALD BASED POWER ANALYSIS FOR LC MODELS 21
the steps described below.
Steps for power computation
Power computation proceeds as follows:
1. Compute the non-centrality parameter λ for the specified sample size n (use the
expression in equation (2.10)).
2. For a given value of type I error α, read the 100(1 − α) percentile value from the
(central) chi-square distribution. That is, find χ2(1−α)(C − 1) such that under the
null hypothesis, p(W > χ2
(1−α)(C − 1))
= α. This value is referred as the critical
value of a test.
3. Compute the power as the probability that a random variable W from the non-
central chi-square distribution (with non-centrality parameter λ given in step 1) will
assume a value greater than the critical value obtained under step 2.
Steps for sample size computation
Sample size computation proceeds as follows:
1. For a given value of α, read the 100(1 − α) percentile value from the (central)
chi-square distribution (see the second step for power computation).
2. For a given power and the critical value obtained in step 1, find the non-centrality
parameter λ such that, under the alternative hypothesis, the condition that power
is equal to p(W > χ2
(1−α)(C − 1))
is satisfied.
3. From the expression for λ, solve for the sample size as
n = λ(
(Hβj)′(HVjH
′)−1(Hβj)
)−1.
Software implementation
The above procedure for power computation can be applied using existing software for
LC analysis that allows defining starting values or fixed values for the logit parameters
22 CHAPTER 2. POWER FOR WALD TESTS
and that provides the (inverse) information matrix as output, for example, using LEM
(Vermunt, 1997), Mplus (L. Muthen & Muthen, 1998-2007), or Latent GOLD (Vermunt
& Magidson, 2013b). More specifically, with a LC analysis software package, one can
obtain the inverse information matrix V. This will typically require the following two
steps:
A. Create a data set containing all possible data patterns and with the expected
frequencies according to the LC model of interest as weights. This can be achieved
by running the LC software with the population parameters specified as fixed values
and with the estimated frequencies as requested output. The created output is, in
fact, a data set which is exactly in agreement with the population model. Such a
data set is sometimes referred as an ’exemplary’ data set (O’Brien, 1986).
B. Analyze the (exemplary) data set created in step A with the LC model of interest and
request the variance-covariance matrix of the parameters (the inverse information
matrix) as output. Note that when analyzing a data set which is exactly in
agreement with the model, the observed information matrix is identical to the
expected information matrix. The same applies to the approximate observed
information matrix based on the outer-product of the gradient contributions of
the data patterns.
The above two steps provide us with the inverse information matrix V. The actual
power or sample size computations using the steps described above can subsequently
be performed using software that allows performing matrix computations and that has
functions for obtaining the critical value from the chi-squared distribution and the non-
centrality value from the non-central chi-squared distribution. For this purpose, one can
use R.
The procedure described above is fully automated in version 5.0 of the Latent GOLD
program (Vermunt & Magidson, 2013b). Users define the population model and specify
either the sample size or the required power. The program computes the power or the
required sample size for the Wald tests it reports by default, as well as for other Wald
2.3. WALD BASED POWER ANALYSIS FOR LC MODELS 23
tests defined by the user. In the appendix, we give an example of the Latent GOLD syntax
for power computation.
2.3.3 Design factors affecting the power or the required sample
size
Now let us look in more detail at the factors affecting the power of the Wald test in LC
models. It should be noted that the power is determined by the value of the type I error
and the value of the noncentrality parameter λ. The larger the type I error and the larger
λ, the larger the power. The type I error is in turn increased by increasing the level of
significance α, which makes the null hypothesis more likely to be rejected. As can be
observed from equation (2.10), λ is a function of the sample size n, the precision of the
estimator (Vj), and the effect size Hβj . Note that in our case the effect size is the
difference between the class-specific β parameters or, equivalently, the strength of the
association between the classes and the response variable concerned.
Specific for LC models is that the precision of the estimator is affected by the fact
that class membership is unobserved; that is, that we are uncertain about a person’s class
membership. Recall from equation (2.5) that the block of V concerning the β parameters
is obtained as the inverse of B = I4−I3I−11 I2. This means that B becomes larger when I4
and I1 become larger and when I2 and I3 become smaller. To show how the uncertainty
about the class membership affects B, let us have a closer look at I4, which is the most
important term in B. Its elements are obtained as follows:
I4(βjq, βkl) =
2P∑i=1
p(X = q|yi)p(X = l|yi)(yij − θjq)(yik − θkl)p(yi), (2.11)
where θjq = exp(βjq)/(1 + exp(βjq)). (see the appendix for further detail on its
derivation.) As can be seen, specific for a LC analysis, the elements of the information
matrix are not only a function of the model parameters, but also of the posterior class
membership probabilities p(X = q|yi). For example, the contribution of response pattern
24 CHAPTER 2. POWER FOR WALD TESTS
i to the information on parameter βjq equals p(X = q|yi)2(yij − θjq)2p(yi). In other
words, response pattern i contributes with “weight” p(X = q|yi)2 to the information on
a parameter of class q. The contribution to total of the parameters of all C classes equals∑Ct=1 p(X = t|yi)2. This shows that the information is maximual when p(X = q|yi)
equals 1 for one class and 0 for the other classes, in which case the total contribution equals
1. This occurs when the classes are perfectly separated or when the class membership is
observed rather than latent.
Also the entries of I1 become larger when the posterior class membership probabilities
get closer to either 0 or 1. The matrices I2 and I3 capture the overlap in information
between the class sizes and the β parameters. The elements of this matrix are 0 when
separation is perfect and become larger with lower class separation.
The implication of the above is that the power can be increased by increasing the
separation between the classes; that is, by influencing the factors affecting the posterior
class membership probabilities. The posterior class membership probabilities depend on
the number of classes, the class sizes, the class-specific conditional response probabilities,
and the number of response variables (Collins & Lanza, 2010; Vermunt, 2010a). More
specifically, class separation is better with less latent classes, a more uniform (or balanced)
class distribution, response variables which are more strongly related to the classes, and
a larger number of response variables.
Note that the conditional response probabilities have a dual role. The more the
conditional response probabilities θjq or the logit parameters βjq differ across latent
classes, the larger the effect size and thus also the higher the power of the test for
the parameters of indicator variable Yj . However, a larger difference between classes in
the response on Yj also increases the class separation, and thus the power of all tests,
also the ones for the other response variables.
2.4. NUMERICAL STUDY 25
2.4 Numerical study
In this section, we present a numerical study that illustrates the Wald based power
analysis for different configurations of design factors. As was shown in section 2.3.3,
in addition to the usual factors (i.e., sample size, level of significance, and effect size),
power computation in LC models involves the specification of design factors such as the
number of classes, the number of observed response variables, the class sizes, and the
class-specific probabilities (or logits) for the response variables, which we refer to as LC-
specific design factors.
As already indicated in section 2.3.3, LC-specific design configurations yielding better
separated classes, or posterior class membership probabilities which are closer to either
0 or 1, yield more precise estimators, and as a result larger power of the Wald tests.
Therefore, in order to be able to compare different design configurations, it is important
to have a measure for class separation. For this purpose, we use the entropy based R-
square. The entropy of the posterior class membership probabilities for data pattern i,
denoted by Ei, equals∑Ct=1−p(X = t|yi) log p(X = t|yi). Note that Ei gets closer to
0 when the posteriors are closer to 0 and 1. The average entropy across data patterns,
denoted by E, equals∑2P
i=1Eip(yi). The entropy based R-square can now be obtained
as follows: R2entropy = 1 − E/E(0). Here, E(0) is the maximum entropy given the
class sizes; that is, E(0) =∑Ct=1−p(X = t) log p(X = t). The entropy based R-square
takes on values between 0 and 1, where larger R2entropy indicate larger separation between
classes. Values lower than .5, between .5 and .75, and larger than .75 correspond to LC
models with small, medium, and large class separation, respectively. Closer inspection
of the expression R2entropy = 1 − E/E(0) shows that the largest entropy based R-square
is obtained when E equals 0. This occurs when p(X = t|yi) is either 0 or 1 for each
response pattern yi; that is, when class separation is perfect.
26 CHAPTER 2. POWER FOR WALD TESTS
2.4.1 Numerical study set up
The LC-specific design factors that were varied are the number of classes, the number
of indicator variables, the class-specific conditional probabilities, and the class sizes. The
number of classes varied from 2 to 4 (i.e., C = 2, 3, 4). The number of indicator variables
was set to P = 6 and P = 10. In line with Vermunt (2010a), the class-specific conditional
probabilities θjt were 0.7, 0.8, and 0.9 (or, depending on the class, 1-0.7, 1-0.8, and 1-0.9),
corresponding to a weak, medium, and strong association between classes and indicator
variables. The θjt were high for class 1, say 0.8, and low for class C, say 1-0.8; with
C = 3, class 2 had high θjt values for the first half of the items and low values for the
other items; with C = 4, class 2 had low θjt values for the first half of the items and
high values for the other items, and class 3 had high θjt values for the first half of the
items and low values for the other items. The class sizes were equal or unequal, where for
the unequal conditions we used class sizes of (0.75, 0.25), (0.5, 0.3, 0.2), and (0.4, 0.3,
0.2, 0.1), for 2-, 3-, and 4-class LC models, respectively. For a 3-class LC model, unequal
class sizes of (0.6,0.3, 0.1) were also considered.
In addition to the four LC-specific design factors, we varied the sample size, power,
and effect size (Cohen, 1988). For power computation, the sample size was set to 75,
100, 200, 300, 500, 700, 1000, and 1500, whereas for sample size determination, the
power was set to .8, .9, and .95. The effect size is already specified via the response
probabilities θjt, where it should be noted that the logit coefficients βjt for which the
Wald tests are performed equal βjt = log θjt/(1 − θjt). The other factor considered is
the level of significance α which, in line with the common research practice where the
type I error rate is often fixed in advance, was fixed to 0.05.
2.4.2 Results
Table 2.1 presents the entropy based R-square for several combinations of the LC-specific
design factors. It shows how the value of this R-square measure is affected by the number
of classes, the class sizes, the number of indicators, and the strength of the class-indicator
2.4. NUMERICAL STUDY 27
Table 2.1: Entropy based R-square values for different combinations of latent class-specificdesign factors
Class sizeEqual Unequal More unequal
Number of classes C = 2 .818 .811(for P = 6 and θj1=0.8) C = 3 .627 .624
C = 4 .594 .589Number of indicators P = 6 .627 .624(for C = 3 and θj1=0.8) P = 10 .790 .788
θj1=0.7 .332 .330 .314Class-indicator associations θj1=0.8 .627 .624 .607(for C = 3, and P = 6) θj1=0.9 .880 .879 .871
Note: the ’unequal’ and ’more unequal’ class size conditions refer to the level of deviationfrom uniform class distribution. For example, for C = 3, we used (0.5, 0.3, 0.2) and (0.6,0.3, 0.1) to represent a smaller and larger deviation from a uniform class distribution,respectively.
associations, given specific values of the other design factors. As can be seen, the smaller
the number of the classes, the larger the number of indicator variables, or the stronger the
class-indicator associations, the larger the value of the entropy based R-square. Moreover,
the more equal the class sizes, the larger the entropy. It can also be seen that the entropy
based R-square may become very low when all conditions are less favorable.
Table 2.2: Estimated power for different class separation levels and different sample sizes
Entropy based R-squareSample size .314 .330 .607 .624 .790
75 .069 .115 .221 .515 .941100 .075 .139 .283 .645 .984200 .102 .239 .520 .922 1.000300 .130 .342 .706 .987 1.000500 .190 .533 .908 1.000 1.000700 .252 .687 .976 1.000 1.000
1000 .345 .842 .997 1.000 1.0001500 .492 .957 1.000 1.000 1.000
Note: H0 : βj1 = βj2 = ... = βjC for which j = 1 andC = 3.
To investigate the effect of class separation on the power of the Wald test for the
significance of a class-indicator association, the power is computed for five of the design
configurations that were presented in Table 2.1 under different sample sizes. The results
28 CHAPTER 2. POWER FOR WALD TESTS
are presented in Table 2.2. From this table, we can see that the power of a Wald test
for a class-indicator association strongly depends on the class separation. When classes
are well separated, a sample size of 100 can be large enough to achieve a power of .8 or
more. With a class separation of .330, .607, and .624, a sample sizes of 900, 370, and
140, respectively, is required to achieve such a power. With very badly separated classes
as in the worst condition, even a sample size of 1500 is not large enough to achieve a
power of .8.
Table 2.3: Required sample size for different configurations of latent class-specific designfactors and different power levels
Number of classes Number of indicatorsPower C = 2 C = 3 C = 4 P = 6 P = 10
.8 33 82 83 82 49
.9 45 108 108 108 64.95 55 131 130 131 78
Class-indicator Classassociations sizes
Power Low Medium High Equal Unequal More unequal.8 419 82 34 82 141 371.9 550 108 45 108 185 487
.95 671 131 55 131 226 594
Note: The baseline model is the model with C = 3, P = 6, equal size classes,and medium association between classes and indicators. One design factor isvaried to get the other conditions reported in the table.
Table 2.3 reports the required sample size for a specified power for various combinations
of LC-specific design factors. We use the condition with C = 3, P = 6, equal class sizes,
and medium class-indicator associations as the baseline. This condition requires sample
sizes of 82, 108, and 131, respectively, to achieve the three reported power levels. The
other conditions are obtained by varying one design factor at the time.
The results in Table 2.3 show that, as expected, the required sample size depends
on the number of classes, the number of indicators, the strength of the class-indicator
associations, and the class sizes. More specifically, keeping the other LC-specific design
factors constant, the larger the number of classes and the fewer the number of indicators,
the larger the required sample size to achieve the specified power level. The strength
2.4. NUMERICAL STUDY 29
of the class-indicator associations turns out to be one of the key factors affecting the
power; for example, to obtain a power of .80, we need at least 419 observations when
these associations are weak, but only 34 observations when these are strong. Moreover,
many more observations are required when the class sizes are unequal than when they are
equal; for example, to achieve a power of .95, we need approximately 130, 225, and 600
observations for the (0.334, 0.333, 0.333), (0.5, 0.3, 0.2), and (0.6, 0.3,0.1) condition,
respectively.
In summary, these results show that the strength of the class-indicator associations
and the class distribution have a much stronger impact on the power than the number
of classes and the number of indicator variables. The fact that the strength of the class-
indicator association is so important can be explained by the fact it affects both the class
separation and the effect size. For example, for P = 6, C = 3, and equal class sizes,
when the θjt value changes from .9 to .7, the class separation drops from .880 to .332
and the difference between classes in their conditional response probabilities drops from
.8 to .4. Thus, a θjt value of .9 yields not only a much larger R-square value but also a
much larger effect size than a θjt value of .7. The class sizes are important because the
power of a test regarding difference between groups depends strongly on the size of the
smallest group.
2.4.3 Performance of the power computation procedure
An important question is whether the theoretical power computed using the formulae
presented in this paper agrees with the actual power when using the Wald with empirical
data. To answer this question, we conducted a simulation study in which the theoretical
power is compared with the actual power in data sets generated from the assumed
population model. Note that the actual power equals the proportion of simulated data
sets in which the null hypothesis is rejected.
The population model is a 3-class LC model with six indicators and equal class sizes.
We varied the strength of the class-indicator associations (same three levels as above)
30 CHAPTER 2. POWER FOR WALD TESTS
and the sample size (75, 100, 200, 300, 500, 700, and 1000). The actual power was
computed using 500 samples from the population under the alternative hypothesis. For
each of these samples, the LC model is estimated and it is checked whether the Wald
value for the test of interest exceeds the critical value.
Table 2.4: Theoretical and simulated power of the Wald test
Class-indicator
Sample size
associations Method 75 100 200 300 500 700 1000Weak Theoretical .200 .254 .470 .649 .869 .958 .994
Simulated .145 .234 .444 .628 .838 .920 .960Medium Theoretical .762 .877 .995 1.000 1.000 1.000 1.000
Simulated .714 .848 .944 .992 1.000 1.000 1.000Strong Theoretical .989 .999 1.000 1.000 1.000 1.000 1.000
Simulated .986 1.000 1.000 1.000 1.000 1.000 1.000
Note: The power presented here is for the null hypothesis H0 : βj1 = βj2 = ... = βjC forwhich j = 1. Moreover, C = 3, P = 6, and class sizes are equal.
Table 2.4 presents the theoretical and actual power of the Wald test under the
investigated simulation conditions. As can be seen, both measures show the same overall
trend, namely that the power increases with increasing sample size and increasing effect
sizes (and class separation). However, the actual power of the Wald test is always slightly
lower than its theoretical value, where the differences are larger for the smaller sample
size and the weaker class separation conditions. An explanation for these differences is
that the estimated asymptotic variance-covariance matrix used in the simulated power
computations overestimates the variability of the βj parameters. On the other hand,
substantive conclusions are the same for the simulated and theoretical power levels
reported in Table 2.4. With the small effect size and the corresponding weak class
separation condition, a sample size of 500 is needed to achieve a power of .8; with
the medium class separation, a sample size of 100 suffices; and with the strong class
separation, less than 75 observations are needed.
2.5. DISCUSSION AND CONCLUSIONS 31
2.5 Discussion and conclusions
In LC analysis, the association between class membership and the response variables is
usually modeled using a logistic parametrization. This chapter dealt with power analysis
for Wald tests for these logit coefficients, for example, for the hypothesis of no association
between class membership and the response provided on one of the indicators. We showed
that, in addition to the usual design factors – that is, effect size, sample size, and level
of significance – the power of Wald tests in LC models depends on factors affecting the
amount uncertainty about the subjects’ class memberships. More specifically, factors
affecting the class separation also affect the power. The most important of these LC-
specific design factors are the number of classes, the class sizes, the strength of the
class-indicator associations, and the number of indicator variables.
A numerical study was conducted to illustrate the proposed power and sample size
computation procedures. More precisely, it was shown how class separation – quantified
using the entropy-based R-square – is affected by the number of classes, the class sizes,
the strength of the class-indicator associations, and the number of indicator variables,
and, moreover, how class separation affects the power. It turned out that under the
most favorable conditions a sample size of 100 suffices to achieve a power of .8 or .9.
For the situation where the entropy-based R-square is small, a considerably larger sample
size is required. It was shown that under the least favorable conditions, even a sample
size of 2000 did not suffice to achieve an acceptable power level. This demonstrates the
importance of performing a power analysis prior to conducting a study that will make use
of LC analysis.
If power turns out to be too low given the planned sample size, instead of increasing
the sample size, one may try to increase the class separation, for example, by using a larger
number of indicators or, if possible, also by using indicators of a better quality. Note that
improving the quality of indicators has a dual effect on the power of the Wald test for
class-indicator associations: It increases both the effect size and the class separation. This
dual effect could be seen in our numerical study where we saw a dramatic reduction of
32 CHAPTER 2. POWER FOR WALD TESTS
the required sample size when the θjt value increased from .7 to .9. In practice, improving
the quality of the indicators will not be easy, even in the type of confirmatory LC analyses
we were dealing with.
A simulation study was conducted to evaluate whether the theoretical power
corresponds with the actual power of the Wald test. It turns out that the estimated
power obtained with the formulae provided in this chapter is slightly larger than the
actual power, where we see a larger overestimation for smaller sample sizes and lower
power levels. This implies that to be on the safe side, to achieve the specified power, a
slightly larger sample size may be used than the estimated sample size.
In this paper, we restricted ourselves to power computations for Wald tests. However,
likelihood-ratio test are often used in LC models as well, either for testing the same kinds
of hypotheses as discussed here or for comparing models with different number of latent
classes. Future research will focus on power computation for likelihood-ratio tests in LC
models.
Another limitation of the current work is that we restricted ourselves to simple LC
models. In future work, we will investigate whether the methods discussed in this paper
can be extended to more complex LC models, such LC models with covariates, latent
Markov models, mixture growth models, and mixture regression models.
Most of the simulation studies on LC and mixture modelling show that larger sample
sizes may be needed than those found with the power computation method described in
the current paper (see for example, Nylund et al. (2007), Tofighi and Enders (2008), and
C. Yang (2006)). Those studies are, however, about deciding on the number of classes,
whereas here we focus on the class-indicator association for a single response variable
assuming that the number of classes is known. Note also that these studies typically do
not look at significance testing, but at the performance of measures like the Bayesian
information criteria (BIC), which may have less power because of their penalty for model
complexity. Further research is needed on the power of statistical tests for deciding about
the number of classes, for example, of the bootstrapped likelihood-ratio test.
2.6. APPENDIX 33
2.6 Appendix
2.6.1 Elements of the information matrix in an LC model for binary
responses
The elements of the information matrix I(Ψ), with Ψ′
= (π′,β′), equal to minus the
expected value of the second-order partial derivatives of the log-likelihood function defined
in equation (2.3) with respect to the free parameters divided by the sample size.
In a LC model, these have the following from:
I(ψl, ψq) = −E(∂2l(Ψ)
∂ψl∂ψq
)/n =
∑ ∂ log p(yi,Ψ)
∂ψl
∂ log p(yi,Ψ)
∂ψqp(yi,Ψ).
This shows that the computation of the information matrix requires solving the first-
order partial derivatives ∂ log p(yi)∂ψl
. For a class-proportion πt and a class-specific response
logit βjt, these take on the following form:
∂ log p(yi,Ψ)
∂πt=
p(X = t|yi)πt
− p(X = C|yi)πC
,
∂ log p(yi,Ψ)
∂βjt= p(X = t|yi)(yij − θjt).
This yields the following forms for the entries of the sub-matrix I1, I2, I3, and I4:
I1(πt, πs) =
2P∑i=1
(p(X = t|yi)
πt− p(X = C|yi)
πC
)(p(X = s|yi)
πs− p(X = C|yi)
πC
)p(yi,Ψ),
I2(πt, βjl) =
2P∑i=1
(p(X = t|yi)
πt− p(X = C|yi)
πC
)p(X = l|yi)(yij − θjl)p(yiΨ),
I3(βjq, πs) =
2P∑i=1
p(X = q|yi)(yij − θjq)(p(X = s|yi)
πs− p(X = C|yi)
πC
)p(yiΨ),
I4(βjq, βkl) =
2P∑i=1
p(X = q|yi)(yij − θjq)p(X = l|yi)(yik − θkl)p(yi,Ψ).
Note that p(yi,Ψ) =∑Ct=1 πt
∏Pj=1 θ
yijjt (1− θjt)1−yij is the probability for response
34 CHAPTER 2. POWER FOR WALD TESTS
pattern yi. Moreover p(X = t|yi) = πtp(Y =yi)|X=t)p(yi,Ψ) is the posterior class membership
probability, where p(Y = yi)|X = t) =∏
Pj=1θ
yijjt (1− θjt)1−yij is the joint class-specific
probability.
2.6.2 An example of the Latent GOLD setup for Wald based power
computation
The Latent GOLD 5.0 (Vermunt & Magidson, 2013b) Syntax system implements the
power computation procedure described in this paper. In order to perform such a Wald
power computation, one should first create a small “example” data set; that is, a data
set with the structure of the data one is interested in. With six binary response variables
(y1 through y6), this file could be of the form:
y1 y2 y3 y4 y5 y6
0 0 0 0 0 0
which is basically a data set with a single observation with a response of 0 on all six
variables.
For this small data set, one defines the model of interest and requests the power or the
required sample size using the output options. This is done as follows using the Latent
GOLD “options”, “variables”, and “equations” sections:
options
output parameters standarderrors
WaldPower=<number> WaldTest=’fileName’;
variables
dependent y1 2, y2 2, y3 2, y4 2, y5 2, y6 2;
latent x nominal 2;
equations
x <- 1;
y1 - y6 <- 1 | x;
2.6. APPENDIX 35
{0.0000000000
1.386294361 -1.386294361
1.386294361 -1.386294361
1.386294361 -1.386294361
1.386294361 -1.386294361
1.386294361 -1.386294361
1.386294361 -1.386294361}
In the “variables” section, we define the variables which are in the model and also
their number of categories. These are the six response variables and the latent variable
“x”. The “equations” section specifies the logit equations defining the model of interest,
as well as the values of the population parameters. Note that the value 1.386294361 for
a logit coefficients corresponds to a conditional response probability of .80.
The “output” line in the “options” section lists the output requested. With
WaldPower=<number>, one requests a power or sample size computation. When using
a “number” between 0 and 1, the program reports the required sample size for that
power, and when using a value larger than 1, the program reports the power obtained
with that sample size. The optional statement WaldTest=‘filename’ can be used to define
user-specific Wald test in addition to the test which are provided by default. The linear
contrasts for the user-defined hypotheses of interest are defined in a text file.
CHAPTER 3
Statistical Power of Likelihood-Ratio and Wald Tests in Latent
Class Models with Covariates
Abstract
This chapter discusses power and sample size computation for the Likelihood-ratio and
Wald tests used to test the significance of covariate effects in latent class models with
covariates. For both tests asymptotic distributions can be used; that is, the test statistic
can be assumed to follow a central chi-square under the null hypothesis and a non-central
chi-square under the alternative hypothesis. Power or sample size computation using these
asymptotic distributions requires specification of the non-centrality parameter which in
practice is rarely known. We show how to calculate this non-centrality parameter using a
large simulated data set, a data set generated according to the model under the alternative
This chapter is in preparation for submitting to a journal.
37
38 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
hypothesis. Simulations are conducted to evaluate the adequacy of the proposed power
analysis methods, determine study design requirements for achieving a certain power level,
and compare the power of the likelihood-ratio and the Wald test. The proposed power
analysis methods turn out to perform very well over a broad range of conditions. Moreover,
an important factor affecting the power is the class separation, implying that when class
separation is low, rather large sample sizes are needed to achieve a reasonable power level.
3.1. INTRODUCTION 39
3.1 Introduction
In recent years, latent class (LC) analysis has become part of the standard statistical
toolbox of researchers in the social, behavioral, and health sciences. A considerable
amount of articles have been published in which LC models are used (a) to identify
subgroups of subjects with similar behaviors, attitudes, or preferences, and (b) to
investigate whether the respondents’ class memberships can be explained by explanatory
variables such as age, gender, educational status, and treatment. This latter type of use
is often referred to as LC analysis with covariates or concomitant variables. Example
applications include the assessment of the effect of maternal education on latent classes
differing in health behavior (Collins & Lanza, 2010), of education and age on latent classes
with different political orientations (Hagenaars & McCutcheon, 2002), of age on latent
classes of crime delinquencies (Van der Heijden et al., 1996), and of paternal occupation
on latent classes with different gender-role attitudes (Yamaguchi, 2000). Methodological
aspects of the LC analysis with covariates were addressed by among others by Bandeen-
Roche, Miglioretti, Zeger, and Rathouz (1997), Dayton and Macready (1988), Formann
(1992), and Vermunt (1996).
As in standard logistic regression analysis, hypotheses about the effects of covariates
on the individuals’ latent class memberships are tested using either likelihood-ratio (LR)
or Wald tests (Agresti, 2007). Researchers planning to perform such tests often ask
questions such as: “What sample size do I need to detect a covariate effect of a certain
size?”, “If I want to test the effect of a covariate, should I worry about the number
and/or quality of the indicators used the LC model?”, and “Should I use LR or a Wald
test?”These questions can be answered by assessing the statistical power of the planned
tests; that is, by investigating the probability of correctly rejecting a null hypothesis when
the alternative is true. The aim of the current paper is to present power analysis methods
for the LR and the Wald test in LC models with covariates, as well as to assess the
data requirements for achieving an acceptable power level (say of .8 or larger). We also
compare the power of the LR and the Wald test for a range of design and population
40 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
characteristics.
Recently, power and sample size determination in LC and related models have received
increased attention in the literature. Gudicha et al. (in press) studied the power of the
Wald test for hypotheses on the association between the latent classes and the observed
indicator variable(s), and showed that power is strongly dependent on class separation.
Tein, Coxe, and Cham (2013) and Dziak, Lanza, and Tan (2014) studied statistical power
of tests used for determining the number of latent classes in latent profile and LC analysis,
respectively. To the best of our knowledge, no previous study has yet investigated power
analysis for LC analysis with covariates, nor compared the power of the LR and the Wald
test in LC analysis in general.
Hypotheses concerning covariate effects on latent classes may be tested using either
LR or Wald tests, but it is unknown which of these two types of tests is superior in this
context. While the LR test is generally considered to be superior (see, for example, Agresti
(2007) and Williamson, Lin, Lyles, and Hightower (2007)), the computational cost of the
LR test will typically be larger because it requires fitting both the null hypothesis and
the alternative hypothesis model, while the Wald test requires fitting only the alternative
hypothesis model. Note that when using LR tests, a null hypothesis model should be
estimated for each of the covariates, which can become rather time consuming given the
iterative nature of the parameter estimation in LC models and the need to use multiple sets
of starting values to prevent local maxima. A question of interest though is whether the
superiority of the LR test is substantial enough to outweigh the computational advantages
of the Wald test in the context of LC modeling with covariates.
For standard logistic regression analysis, various studies are available on power and
sample size determination for LR and Wald tests (Demidenko, 2007; Faul, Erdfelder,
Buchner, & Lang, 2009; Hsieh, Bloch, & Larsen, 1998; Schoenfeld & Borenstein, 2005;
Whittemore, 1981; Williamson et al., 2007). Here we not only build upon these studies,
but also investigate design aspects requiring special consideration when applying these
tests in the context of LC analysis. A logistic regression predicting latent classes differs
3.2. THE LC MODEL WITH COVARIATES 41
from a standard logistic regression in that the outcome variable, the individual’s class
membership, is unobserved, but determined indirectly using the responses on a set of the
indicator variables. This implies that factors affecting the uncertainty about the class
memberships, such as the number of indicators, the quality of indicators, and the number
of latent classes, will also affect the power and/or the required sample size (Gudicha et
al, in press) .
In the next section, we introduce the LC model with covariates and discuss the LR
and Wald statistics for testing hypotheses about the logit parameters of interest, present
power computation methods for the LR and the Wald tests, and provide a numerical study
illustrating the proposed power analysis methods. Then this chapter ends with discussion
and conclusions.
3.2 The LC model with covariates
Let X be the latent class variable, C the number of latent classes, and x = 1, 2, 3, ..., C
the class labels. We denote the vector of P indicator variables by Y = (Y1, Y2, Y3, ..., YP ),
and the response of subject i (for i = 1, 2, 3, ..., n) to a particular indicator variable by yij
and to all the P indicator variables by yi. Denoting the value of subject i for covariate
Zk (for k = 1, 2, 3, ...,K) by zik, we define the LC model with covariate as follows:
p(yi|zi) =
C∑x=1
p(X = x|Z = zi)
P∏j=1
p(Yj = yij |X = x) (3.1)
where zi is the vector containing the scores of subject i on the K covariates. The term
p(X = x|Z = zi) represents the probability of belonging to class x given the covariate
values zi, and p(Yj = yij |X = x) is the conditional probability of choosing response yij
given membership of class x.
The LC model defined in equation (3.1) is based on the following assumptions. Firstly,
we assume that the latent classes are mutually exclusive and exhaustive; that is, each
indvidual is a member of one and only one of the C latent classes. The second assumption
42 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
is the local independence assumption, which specifies that the responses to the indicator
variables are independent given the class membership. For simplicity, we also assume that,
given the class membership, the covariates have no effect on the indicator variables.
The term p(X = x|Z = zi) in equation (3.1) is typically modeled by a multinomial
logistic regression equation (Magidson & Vermunt, 2004). Using the first class as the
reference category, we obtain:
p(X = x|Z = zi) =exp (γ0x +
∑Kk=1 γkxzik)
1 +∑Cs=2 exp (γ0s +
∑Kk=1 γqszik)
,
where γ0x represents an intercept parameter and γkx a covariate effect. For each covariate,
we have C − 1 effect parameters. Assuming that the responses Yj are binary, the logistic
model for p(Yj = 1|X = x) may take on the following form:
p(Yj = 1|X = x) =exp(βjx)
1 + exp(βjx).
The γ parameters are sometimes referred as the structural parameters, and the β
parameters as the measurement parameters. We denote the full set of model parameters
by Φ, which with binary responses is a column vector containing (K + 1)(C − 1) +C · J
non-redundant parameters.
The parameters of the LC model with covariates are typically estimated by means of
maximum likelihood (ML) estimation, in which the log-likelihood function
l(Φ) =
n∑i=1
log p(yi|zi) (3.2)
is maximized using, for instance, the expectation maximization (EM) algorithm. Inference
concerning the Φ parameters is based on the ML estimates Φ, which can be used for
hypotheses testing or confidence interval estimation. In the current work, we focus on
testing hypotheses about the γ parameters, the most common of which is testing the
statistical significance for the effect of covariate k on the latent class memberships. The
corresponding null hypothesis can be formulated as
3.2. THE LC MODEL WITH COVARIATES 43
H0 : γk = 0,
which specifies that the γkx values in γ′
k = (γk1, γk2, γk3, ...γk(C−1)) are simultaneously
zero.2 Using either the LR or the Wald test, the null significance of this hypothesis is
tested against the alternative hypothesis:
H1 : γk 6= 0.
Following Agresti (2007)and Buse (1982), we define the LR and the Wald statistic for
this test as follows:
LR = 2l(Φ1)− 2l(Φ0)
W = γ′
kvar(γk)−1γk,
(3.3)
where l(.) is the log-likelihood function as defined in equation (3.2), Φ1 and Φ0 are the ML
estimates of Φ under the unconstrained alternative and constraint null model, respectively,
γk are the ML estimates for the logit coefficients of covariate Zk, and var(γk) is the C−1
by C − 1 covariance matrix for γk.
Probability theory for large samples suggests that, under certain regularity conditions,
if the null hypothesis holds, both the LR and W statistics asymptotically follow a central
chi-square distribution with C − 1 degrees of freedom (see for example Agresti (2007),
Buse (1982), and Wald (1943)). From this theoretical distribution, the p-value can
be obtained. Whether the null hypothesis should be rejected or retained is tested by
comparing the obtained p-value with the nominal type I error α. The decision rule is
to reject the null hypothesis if the p-value is smaller than α, or if the value of the test
statistic computed using equation (3.3) exceeds the critical value of the central chi-square
distribution that we obtain given C − 1 degrees of freedom and a type I error of α.
2For parameter identification, the logit parameter associated with the reference category is set to
zero, resulting in C − 1 non-redundant γ effect parameters. Note also that γ′
denotes the transpose ofa column vector γ.
44 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
3.3 Power and sample size computations
For power or sample size computation, not only the distribution of the test statistic under
the null hypothesis needs to obtain, but the distribution under the alternative hypothesis
as well. Under certain regularity conditions, if the alternative hypothesis holds, both the
LR and the Wald statistics follow a non-central chi-square distribution with C−1 degrees
of freedom and non-centrality parameter λ:
λLRn= n (2E[l(Φ1)]− 2E[l(Φ0)])
λWn= n
(γ′
kvar(γk)−1γk
).
(3.4)
Here, E[l(Φ1)] and E[l(Φ0)] denote the expected value of the log-likelihood for a
single observation under the alternative and null model, respectively, assuming that the
alternative model holds. In the definition of λWn, var(γk)−1 is the matrix of parameter
covariances based on the expected information matrix for a single observation. For the
Wald test, this large sample asymptotic approximation requires multivariate normality of
the ML estimates of the logit parameters, as well as that var(γk) is consistently estimated
by var(γk) (Redner, 1981; Satorra & Saris, 1985; Wald, 1943).
The power of a test is defined as the probability that the null hypothesis is rejected
when the alternative hypothesis is true. Using the theoretical distribution of the LR and
Wald tests under the alternative hypothesis, we calculate this probability as
powerLR = p(LR > χ2
(1−α)(C − 1))
powerW = p(W > χ2
(1−α)(C − 1)),
(3.5)
where χ2(1−α)(C − 1) is the (1 − α) quantile value of the central chi-square distribution
with C − 1 degrees of freedom, and LR and W are random variates of the corresponding
non-central chi-square distribution. That is, LR,W v χ2(C−1, λ), where λ is as defined
in equation (3.4).
Computing the asymptotic power (also called the theoretical power) using equation
3.3. POWER AND SAMPLE SIZE COMPUTATIONS 45
(3.5), requires us to specify the non-centrality parameter. However, in practice, this non-
centrality parameter is rarely known. Below, we show how to obtain the non-centrality
parameter using a large simulated data set, that is, a data set generated from the model
under the alternative hypothesis.
3.3.1 Calculating the non-centrality parameter
O’Brien (1986) and Self, Mauritsen, and Ohara (1992) showed how to obtain the non-
centrality parameter for the LR statistic in log-linear analysis and generalized linear analysis
using a so-called “exemplary” data set representing the population under the alternative
model. In LC analysis with covariates, such an exemplary data set would contain one
record for each possible combination of indicator variable responses and covariate values,
with a weight equal to the likelihood of occurrence of the pattern concerned. Creating
such an exemplary data set becomes impractical with more than a few indicator variables,
with indicator variables with larger numbers of categories, and/or when one or more
continuous covariates are involved. As an alternative, we propose using a large simulated
data set from the population under the alternative hypothesis. Though such a simulated
data set will typically not include all possible response patterns, if it is large enough, it
will serve as a good approximation of the population under H1.
By analyzing the large simulated data set using the H0 and H1 models, we obtain
the values of the log-likelihood function under the null and alternative hypotheses. The
large data set can also be used to get the covariance matrix of the parameters based
on the expected information matrix. These can be used to calculate the non-centrality
parameters for the LR and Wald statistics as shown in equation (3.4). More specifically,
the non-centrality parameter is calculated, using this large simulated data set, via the
following simple steps:
1. Create a large data set by generating say N = 1000000 observations from the model
defined by the alternative hypothesis.
2. Using this large simulated data set, compute the maximum value of the log-
46 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
likelihood for both the constrained null model and the unconstrained alternative
model. These log-likelihood values are denoted by l(Φ0) and l(Φ1), respectively. For
the Wald test, use the large simulated data to approximate the expected information
matrix under the alternative model. This yields var(γk), the approximate covariance
matrix of γk.
3. The non-centrality parameter corresponding to a sample of size 1 is then computed
as follows:
λLR1=
2l(Φ1)− 2l(Φ0)
Nand λW1
=γ′
kvar(γk)−1γkN
for the LR and Wald test, respectively. As can be seen, this involves computing
the LR and the Wald statistics using the information from step 2, and subsequently
rescaling the resulting values to a sample size of 1.
4. Using the proportionality relation between sample size and non-centrality parameter
as shown in equation (3.4), the non-centrality parameter associated with a sample
of size n is then computed as λLRn= nλLR1
and λWn= nλW1
(Brown, Lovato,
& Russell, 1999; McDonald & Marsh, 1990; Satorra & Saris, 1985).
3.3.2 Power computation
The power computation itself proceeds as follows:
1. Given the assumed population values under the alternative hypothesis, compute the
non-centrality parameter λ using the large simulated data set as discussed in section
3.3.1. Make sure that the non-centrality parameter is rescaled to the sample size
under consideration as shown in step 4 in section 3.3.1.
2. For a given type I error α, read the (1 − α) quantile value from the (central) chi-
square distribution with C−1 degrees of freedom. That is, find χ2(1−α)(C−1) such
that p(LR > χ2
(1−α)(C − 1))
= α and p(W > χ2
(1−α)(C − 1))
= α for the LR
3.3. POWER AND SAMPLE SIZE COMPUTATIONS 47
and Wald test statistics, respectively. This quantile – also called the critical value
– can be read from the (central) chi-square distribution table, which is available
in most statistics text books. For example, for α = .05 and C = 2, we have
χ2(.95)(1) = 3.84 (Agresti, 2007).
3. Using the non-centrality parameter value obtained in step 1 and the critical value
obtained in step 2, evaluate equation (3.5) to obtain the power of the LR or Wald
test of interest. This involves reading the probability concerned from a non-central
chi-square distribution with degrees of freedom C − 1 and non-centrality parameter
λ.
3.3.3 Sample size computation
The expression for sample size computation can be derived from the relation in equation
(3.4):
nLR = λ {2E[l(Φ1)]− 2E[l(Φ0)]}−1
nW = λ[γ′
kvar(γk)−1γk
]−1,
(3.6)
where nLR and nW are the LR and Wald sample size, respectively.
Using equation (3.6), the sample size required to achieve a specified level of power is
computed as follows:
1. For a given value of α, read the (1− α) quantile value from the central chi-square
distribution table.
2. For a given power and the critical value obtained in step 1, find the non-
centrality parameter λ such that, under the alternative hypothesis, the condition
that the power is equal to p(LR > χ2
(1−α)(C − 1))
for the LR statistic and
p(W > χ2
(1−α)(C − 1))
for the Wald statistic is satisfied.
3. Given the parameter values of the model under the alternative hypothesis and the
λ value obtained in step 2, use equation (3.6) to compute the required sample size.
48 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
Note that for sample size computation a large simulated data set is used as well to
approximate E[l(Φ0)], E[l(Φ1)], and var(γ).
3.4 Numerical study
The purpose of this numerical study is to 1) compare the power of the Wald test with the
power of the LR test, 2) investigate the effect of factors influencing the uncertainty about
the individuals’ class members – mainly the measurement parameters – on the power of
the Wald and LR tests concerning the structural parameters, 3) evaluate the quality of
the power estimation using the non-centrality parameter value obtained from the large
simulated data set, and 4) give an overview of the sample sizes required to achieve a power
level of .8 or higher, .9 or higher, or .95 or higher in several typical study designs. In the
current numerical study, we consider models with one covariate only, but the proposed
methods are also applicable with multiple covariates. We assume asymptotic distributions
for both the tests, and estimate the non-centrality parameter of the non-central chi-square
distribution using the large data set method described earlier. All analyses were done using
the syntax module of the Latent GOLD 5.0 program (Vermunt & Magidson, 2013a).
3.4.1 Study set up
The power of a test concerning the structural parameters is expected to depend on three
key factors: the population structure and the parameter values for the other parts of
the model, the effect sizes for the structural parameters to be tested, and the sample
size. Important elements of the first factor include the number of classes, the number
of indicator variables, the class-specific conditional response probabilities, and the class
proportions (Gudicha et al, in press). In this numerical study, we varied the number of
classes (C = 2 or 3) and the number of indicator variables (P = 6 or 10). Moreover, the
class-specific conditional response probabilities were set to 0.7, 0.8, or 0.9 (or, depending
on the class, to 1-0.7, 1-0.8, and 1-0.9), corresponding to the conditions with weak,
medium, and strong class-indicator associations. The conditional response probabilities
3.4. NUMERICAL STUDY 49
were assumed to be high for class 1, say 0.8, and low for class C, say 1-0.8, for all
indicators. In class 2 of the three-class model, the conditional response probabilities are
high for the first half and low for the second half of the indicators.
The effect size was varied for the structural parameters to be tested, that is, for the
logit coefficients that specify the effect of a continuous covariate Z on the latent class
memberships (see equation (3.2) above). Using the first class as the reference category,
the logit coefficients were set to 0.15, 0.25, and 0.5, representing the three conditions of
small, medium, and large effect sizes. Two conditions were used for the intercept terms:
in the zero intercept condition, the intercepts were set to zero for both C = 2 and C = 3,
while in the non-zero intercept condition the intercepts equaled -1.10 for C = 2, and -1.10
and -2.20 for C = 3. Note that the zero intercept condition yields equal class proportions
(i.e., .5 each for C = 2 and .33 each for C = 3), whereas the non-zero intercept condition
yields unequal class proportions (i.e., .75 and .25 for C = 2, and .69, .23, and .08 for
C = 3).
In addition to the above mentioned population characteristics, we varied the sample
size (n = 200, 500, or 1000) for the power computations. Likewise, for the sample size
computations, we varied the power values (power = .8, .9, or .95). The type I error was
fixed to .05 in all conditions.
Gudicha et al, (in press) showed that a study design with low separation between
classes leads to low statistical power of tests concerning the measurement parameters
in a LC model. Therefore, Table 3.1 shows the entropy R-square, which measures the
separation between classes for the design conditions of interest.
3.4.2 Results
Tables 3.2, 3.3, and 3.4 present the power of the Wald and LR tests for different sample
sizes, class-indicator associations, number of indicator variables, class proportions, and
effect sizes. Several important points can be noted from these tables. Firstly, the power
of the Wald and LR tests increases with sample size and effect size, which is also the
50 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
Table 3.1: The computed entropy R-square for different design cells
equal class proportions unequal class proportionsclass-indicator associations class-indicator associationsweak medium strong weak medium strong
C = 2 P = 6 .574 .855 .981 .534 .838 .978C = 2 P = 10 .732 .935 .997 .704 .944 .998
C = 3 P = 6 .354 .650 .900 .314 .618 .878C = 3 P = 10 .502 .805 .969 .462 .782 .963
Note. C = the number of classes; P = number of indicator variables. Theentropy R-square values reported in this table pertain to the model with smalleffect sizes for the covariate effects, and these entropy R-square values slightlyincrease for the case when we have larger effect sizes.
case for standard statistical models (e.g., logistic regression for an observed outcome
variable). Secondly, specific to LC models, the power of these tests is larger with stronger
class-indicator associations, a larger number of indicator variables, and more balanced
class proportions. These LC specific factors affect the class separations as well, as can be
seen from Table 3.1. Comparing the power values in Table 3.2 and 3.3, we also observe
that the statistical power of the tests depends on the number of classes as well. Thirdly,
the power of the LR test is consistently larger than the Wald test, though in most cases
differences are rather small.
The results in Tables 3.2, 3.3, and 3.4 suggest that, for a given effect size, a desired
power level of say .8 or higher can be achieved by using a larger sample, more indicator
variables, or, if possible, indicator variables that have a stronger association with the
respective latent classes. Given a set of often unchangeable population characteristics
(e.g., the class proportions, the class conditional response probabilities, and the effect
sizes of the covariate effects on latent class memberships), the most common research
practice to increase the power of a test is to increase the sample size. Table 3.5 presents
the required sample size for the Wald test to achieve a power of .8, .9, and .95 under
the investigated conditions. As can be seen from Table 3.5, for the situation where the
class proportions are equal, the number of response variables is equal to 6, the number
of classes is equal to 2, and the class-indicator associations are strong, a power of 0.80 or
higher is achieved 1) for a small effect size, using a sample of size 1434, 2) for a medium
3.4. NUMERICAL STUDY 51
effect size, using a sample of size 527, and 3) for a large effect size, using a sample of size
143. When the class-indicator associations are weak, the class proportions are unequal,
or the requested power is .9, the required samples become even larger. We also observe
from the same table that in 3-class LC models with 6 indicator variables and strong class-
indicator associations, a power of .80 or higher is achieved by using sample sizes of 2120,
777, and 210 for small, medium and large effect sizes, respectively.
To assess the accuracy of the proposed power analysis method, we also calculated the
empirical power by Monte Carlo simulation. Using the critical value from the theoretical
central chi-square distribution, we computed the empirical power as the proportion of the
p-values rejected in 5000 samples generated from the population under the alternative
hypothesis. In Table 3.6, we refer to this empirical power as ’LR empirical’ and ’Wald
empirical’, indicating the power values computed from the empirical distribution of the
LR and Wald statistics under the alternative hypothesis. We report results for the study
conditions with a small effect size and equal class proportions, but similar results were
obtained for the other conditions. Comparison of the theoretical with the corresponding
empirical power values shows that these are very close in most cases, meaning that the
approximation of the non-centrality parameter using the large simulated data set works
well. Overall, the differences between the theoretical and empirical power values are small,
with a few exceptions, which are situations in which the power is very low anyhow. The
exceptions occur when the class-indicator associations are weak in 2-class LC models with
6 indicator variables and in 3-class LC models with 6 as well as 10 indicator variables,
which in Table 3.1 correspond to the design conditions with entropy R-square values of
.574, .345, and .502, respectively.
52 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
Tab
le3.
2:T
he
pow
erof
the
Wal
dan
dth
elik
elih
oo
d-r
atio
test
tore
ject
the
nu
llh
ypot
hes
isth
atth
eco
vari
ate
has
no
effec
ton
clas
sm
emb
ersh
ipin
the
2-cl
ass
mo
del
;th
eca
seof
equ
alcl
ass
prop
orti
ons
n=
200
n=
500
n=
1000
effec
tcl
ass-
ind
icat
orcl
ass-
ind
icat
orcl
ass-
ind
icat
orsi
zeas
soci
atio
ns
asso
ciat
ion
sas
soci
atio
ns
wea
km
ediu
mst
ron
gw
eak
med
ium
stro
ng
wea
km
ediu
mst
ron
g
Six
ind
icat
orva
riab
les
smal
lW
ald
.12
5.1
64
.18
1.2
42
.33
8.3
79
.42
9.5
87
.64
5L
R.1
26
.16
6.1
80
.24
5.3
43
.37
7.4
34
.59
4.6
45
med
ium
Wal
d.2
69
.36
3.4
08
.54
6.7
21
.77
9.8
35
.94
5.9
71
LR
.26
0.3
69
.41
1.5
48
.72
9.7
84
.83
6.9
53
.97
3
larg
eW
ald
.70
2.8
68
.91
3.9
76
.99
81
11
1L
R.7
43
.88
5.9
23
.98
5.9
98
11
11
Ten
ind
icat
orva
riab
les
smal
lW
ald
.14
7.1
77
.18
4.2
97
.36
9.3
85
.52
3.6
33
.65
5L
R.1
51
.17
6.1
81
.30
7.3
67
.38
0.5
39
.63
.64
7
med
ium
Wal
d.3
19
.39
7.4
12
.65
3.7
66
.78
6.9
14
.96
7.9
74
LR
.31
5.4
02
.42
2.6
47
.77
3.7
96
.91
.96
9.9
76
larg
eW
ald
.81
2.9
03
.91
7.9
94
.99
9.9
99
11
1L
R.8
37
.91
8.9
30
9.9
96
.99
9.9
99
11
1
No
te.
Th
ep
ower
valu
esre
por
ted
inth
ista
ble
are
ob
tain
edby
assu
min
gth
eore
tica
lch
i-sq
uar
ed
istr
ibu
tio
ns
for
bo
thth
eW
ald
and
the
likel
iho
od
-rat
iote
stst
atis
tics
,fo
rw
hic
hth
en
on
-cen
tral
ity
par
amet
ero
fth
en
on
-cen
tral
chi-
squ
are
isap
prox
imat
edu
sin
ga
larg
esi
mu
late
dd
ata
set.
3.4. NUMERICAL STUDY 53
Tab
le3.
3:T
he
pow
erof
the
Wal
dan
dth
elik
elih
oo
d-r
atio
test
tore
ject
the
nu
llh
ypot
hes
isth
atth
eco
vari
ate
has
no
effec
ton
clas
sm
emb
ersh
ipin
the
3-cl
ass
mo
del
;th
eca
seof
equ
alcl
ass
prop
orti
ons
n=
200
n=
500
n=
1000
effec
tcl
ass-
ind
icat
orcl
ass-
ind
icat
orcl
ass-
ind
icat
orsi
zeas
soci
atio
ns
asso
ciat
ion
sas
soci
atio
ns
wea
km
ediu
mst
ron
gw
eak
med
ium
stro
ng
wea
km
ediu
mst
ron
g
Six
ind
icat
orva
riab
les
smal
lW
ald
.08
1.1
06
.12
5.1
31
.20
0.2
52
.22
2.3
65
.46
4L
R.0
80
.10
8.1
26
.13
0.2
06
.25
5.2
21
.37
7.4
71
med
ium
Wal
d.1
35
.21
4.2
72
.28
1.4
78
.59
9.5
17
.78
9.8
94
LR
.14
0.2
15
.27
2.2
95
.48
.60
0.5
40
.79
2.8
94
larg
eW
ald
.36
5.6
42
.77
9.7
52
.96
7.9
94
.96
81
1L
R.4
36
.68
6.8
10
.83
7.9
78
.99
6.9
89
11
Ten
ind
icat
orva
riab
les
smal
lW
ald
.08
9.1
18
.13
0.1
55
.23
3.2
65
.27
2.4
30
.49
LR
.09
2.1
19
.13
3.1
63
.23
6.2
74
.28
9.4
36
.50
4
med
ium
Wal
d.1
63
.25
2.2
87
.35
3.5
59
.62
8.6
32
.86
4.9
13
LR
.17
8.2
63
.29
0.3
91
.58
3.6
32
.68
6.8
82
.91
5
larg
eW
ald
.47
1.7
38
.80
7.8
71
.98
9.9
96
.99
41
1L
R.5
71
.77
2.8
23
.93
8.9
93
.99
7.9
99
11
No
te.
Th
ep
ower
valu
esre
por
ted
inth
ista
ble
are
ob
tain
edby
assu
min
gth
eore
tica
lch
i-sq
uar
ed
istr
ibu
tio
ns
for
bo
thth
eW
ald
and
the
likel
iho
od
-rat
iote
stst
atis
tics
,fo
rw
hic
hth
en
on
-cen
tral
ity
par
amet
ero
fth
en
on
-cen
tral
chi-
squ
are
isap
prox
imat
edu
sin
ga
larg
esi
mu
late
dd
ata
set.
54 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
Tab
le3.
4:T
he
pow
erof
the
Wal
dan
dth
elik
elih
oo
d-r
atio
test
tore
ject
the
nu
llh
ypot
hes
isth
atth
eco
vari
ate
has
no
effec
ton
clas
sm
emb
ersh
ip;
the
case
ofu
neq
ual
clas
spr
opor
tion
san
dsi
xin
dic
ator
vari
able
s
n=
200
n=
500
n=
1000
effec
tcl
ass-
ind
icat
orcl
ass-
ind
icat
orcl
ass-
ind
icat
orsi
zeas
soci
atio
ns
asso
ciat
ion
sas
soci
atio
ns
wea
km
ediu
mst
ron
gw
eak
med
ium
stro
ng
wea
km
ediu
mst
ron
g
2-c
lass
mo
del
smal
lW
ald
.10
2.1
33
.14
8.1
83
.26
3.2
99
.31
9.4
65
.52
5L
R.1
03
.13
6.1
53
.18
5.2
68
.31
2.3
22
.47
5.5
47
med
ium
Wal
d.1
95
.28
3.3
22
.41
1.5
90
.65
8.6
88
.87
2.9
18
LR
.19
7.2
82
.33
1.4
14
.59
0.6
74
.69
3.8
71
.92
6
larg
eW
ald
.54
9.7
61
.82
6.9
09
.98
8.9
96
.99
51
1L
R.5
90
.78
3.8
44
.93
3.9
91
.99
7.9
98
11
3-c
lass
mo
del
smal
lW
ald
.07
7.1
00
.12
0.1
20
.18
5.2
38
.19
8.3
34
.43
9L
R.0
76
.10
1.1
21
.11
9.1
88
.24
2.1
97
0.3
4.4
47
med
ium
Wal
d.1
25
.19
7.2
57
.25
3.4
39
.57
0.4
67
.74
6.8
73
LR
.12
7.2
08
.26
7.2
57
.46
5.5
93
.47
4.7
75
.88
9
larg
eW
ald
.33
7.6
00
.75
1.7
12
.95
1.9
90
.94
5.9
99
1L
R.3
87
.64
1.7
85
.78
2.9
66
.99
4.9
77
11
No
te.
Th
ep
ower
valu
esre
por
ted
inth
ista
ble
are
ob
tain
edby
assu
min
gth
eore
tica
lch
i-sq
uar
ed
istr
ibu
tio
ns
for
bo
thth
eW
ald
and
the
likel
iho
od
-rat
iote
stst
atis
tics
,fo
rw
hic
hth
en
on
-cen
tral
ity
par
amet
ero
fth
en
on
-cen
tral
chi-
squ
are
isap
prox
imat
edu
sin
ga
larg
esi
mu
late
dd
ata
set.
3.4. NUMERICAL STUDY 55
Tab
le3.
5:S
amp
lesi
zere
qu
irem
ents
for
the
Wal
dte
stw
hen
test
ing
the
cova
riat
eeff
ect
oncl
ass
mem
ber
ship
sfo
rd
iffer
ent
pow
erle
vels
,cl
ass-
ind
icat
oras
soci
atio
ns,
nu
mb
erof
ind
icat
orva
riab
les,
nu
mb
erof
clas
ses,
clas
spr
opor
tion
s,an
deff
ect
size
s.
power
=.8
power
=.9
power
=.95
effec
tcl
ass-
ind
icat
orcl
ass-
ind
icat
orcl
ass-
ind
icat
orsi
zeas
soci
atio
ns
asso
ciat
ion
sas
soci
atio
ns
wea
km
ediu
mst
ron
gw
eak
med
ium
stro
ng
wea
km
ediu
mst
ron
g
2-c
lass
mo
del
wit
heq
ual
clas
spr
op
orti
on
san
dsi
xin
dic
ator
vari
able
ssm
all
24
73
16
52
14
34
33
12
22
10
19
25
40
97
27
34
23
80
med
ium
91
16
06
52
71
21
08
11
70
51
50
91
00
38
72
larg
e2
53
16
51
43
33
82
21
19
14
18
27
32
36
2-c
lass
mo
del
wit
heq
ual
clas
spr
op
orti
on
san
dte
nin
dic
ator
vari
able
ssm
all
19
29
14
85
14
12
25
82
19
88
18
91
31
93
24
58
23
38
med
ium
70
95
44
51
89
49
72
96
93
11
73
90
18
57
larg
e1
94
14
81
40
26
01
98
18
83
21
24
52
32
2-c
lass
mo
del
wit
hu
neq
ual
clas
spr
op
orti
on
san
dsi
xin
dic
ator
vari
able
ssm
all
35
44
22
41
19
16
47
45
30
00
25
66
58
68
37
10
31
73
med
ium
13
06
81
17
00
17
49
10
98
93
72
16
31
35
71
15
9la
rge
36
22
21
18
74
84
29
52
50
59
93
65
31
03
-cla
ssm
od
elw
ith
equ
alcl
ass
pro
por
tio
ns
and
six
ind
icat
orva
riab
les
smal
l4
92
22
78
52
12
06
46
43
65
72
78
67
88
84
46
33
40
0m
ediu
m1
86
91
02
57
77
24
54
13
47
10
20
29
95
16
44
12
45
larg
e5
58
28
32
10
73
33
72
27
68
95
45
43
37
56 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
Tab
le3.
6:T
heo
reti
cal
vers
us
emp
iric
al(H
1-s
imu
late
d)
pow
erva
lues
for
the
Wal
dan
dlik
elih
oo
d-r
atio
test
sto
reje
ctth
en
ull
hyp
oth
esis
that
the
cova
riat
eh
asn
oeff
ect
oncl
ass
mem
ber
ship
,gi
ven
the
des
ign
con
dit
ion
sof
inte
rest
n=
200
n=
1000
clas
s-in
dic
ator
clas
s-in
dic
ator
asso
ciat
ion
sas
soci
atio
ns
wea
km
ediu
mst
ron
gw
eak
med
ium
stro
ng
2-c
lass
mo
del
wit
hsi
xin
dic
ator
vari
able
sW
ald
theo
reti
cal
.12
5.1
64
.18
1.4
29
.58
7.6
45
Wal
dem
pir
ical
.13
1.1
56
.17
6.4
29
.58
4.6
48
LR
theo
reti
cal
.12
6.1
66
.18
0.4
34
.59
4.6
45
LR
emp
iric
al.1
38
.17
7.1
82
.43
2.5
8.6
48
2-c
lass
mo
del
wit
hte
nin
dic
ator
vari
able
sW
ald
theo
reti
cal
.14
7.1
77
.18
4.5
23
.63
3.6
55
Wal
dem
pir
ical
.13
8.1
75
.19
6.5
13
.63
2.6
52
LR
theo
reti
cal
.15
1.1
76
.18
1.5
39
.63
.64
7L
Rem
pir
ical
.15
0.1
79
.18
9.5
37
.63
8.6
65
3-c
lass
mo
del
wit
hsi
xin
dic
ator
vari
able
sW
ald
theo
reti
cal
.08
1.1
06
.12
5.2
22
.36
5.4
64
Wal
dem
pir
ical
.18
7.1
34
.12
3.2
23
.36
8.4
54
LR
theo
reti
cal
.08
.10
8.1
26
.22
1.3
77
.47
1L
Rem
pir
ical
.23
8.1
46
.13
4.2
67
.37
4.4
56
3-c
lass
mo
del
wit
hte
nin
dic
ator
vari
able
sW
ald
theo
reti
cal
.08
9.1
18
.13
0.2
72
.43
0.4
90
Wal
dem
pir
ical
.16
9.1
18
.11
8.2
83
.42
6.5
08
LR
theo
reti
cal
.09
2.1
19
.13
3.2
89
.43
6.5
04
LR
emp
iric
al.1
61
.13
3.1
34
.28
6.4
43
.49
3
No
te.
Th
ep
ower
valu
esre
por
ted
inth
ista
ble
are
for
the
stu
dy
des
ign
con
dit
ion
sw
ith
smal
leff
ect
size
and
equ
alcl
ass
pro
por
tio
ns.
3.5. DISCUSSION AND CONCLUSIONS 57
3.5 Discussion and conclusions
Hypotheses concerning the covariate effects on latent class membership are tested using
a LR test or a Wald test. In the current study, we presented and evaluated a power
analysis procedure for the LR and the Wald test in latent class analysis with covariates.
We discussed how the non-centrality parameter involved in the asymptotic distributions
of the test statistics can be calculated using a large simulated data set, and how the value
of the obtained non-centrality parameter can subsequently be used in the computation of
the asymptotic power or the sample size. The proposed method requires us to specify the
population values under the alternative hypothesis, as is typical in power computation.
A numerical study was conducted to study how data and population characteristics
affect the power of the LR test and the Wald test, to compare the power of the two
tests, and to evaluate the adequacy of the proposed power analysis method. The results
of this numerical study showed that, as in any other statistical model, the power of both
tests depend on sample size and effect size. In addition to these standard factors, the
power of the investigated tests depends on factors specific to latent class models, such as
the number of indicator variables, the number of classes, the class proportions, and the
strength of the class-indicator associations. These latent class specific factors affect the
separation between the classes, which we assessed using the entropy R-square value.
We saw that the sample size required to achieve a certain level of power depends
strongly on the latent class specific factors. The stronger the class-indicator variable
associations, the more indicator variables, the more balanced the class proportions, and
the smaller the number of latent classes, the smaller the required sample size that is
needed to detect a certain effect size with a power of say .8 or higher. We can describe
the same finding in terms of the entropy R-square, that is, the larger the entropy R-
square, the smaller the sample size needed to detect a certain effect size with a power of
say .8 or higher. A more detailed finding is that for a given effect size, the improvement in
power obtained through adding indicator variables is more pronounced when class-indicator
associations are weak or medium than when they are strong.
58 CHAPTER 3. POWER IN LC MODELS WITH COVARIATES
In line with the previous studies (see for example Williamson et al. (2007)), the
power for the LR test is larger than for the Wald test, though the difference between
the two tests is rather small. An advantage of the Wald test is, however, that it is
computationally cheaper. Given the population values under the alternative hypothesis
and the corresponding non-centrality parameter, the sample size for the Wald test can be
computed using equation (3.6) directly. When using the LR test, the log-likelihood values
under both the null hypothesis and the alternative hypothesis must be computed, which
can be somewhat cumbersome when a model contains multiple covariates.
The adequacy of the proposed power analysis method was evaluated by comparing
the asymptotic power values with the empirical ones. The results indicated that the
performance of the proposed method is generally good. In the study design condition for
which the entropy R-square is low – this occurs when few indicator variables with weak
associations with the latent classes are used – and the sample size is small, the empirical
power seemed to be larger than the asymptotic power. But these were situation in which
the power turned out to be very low anyhow.
We presented the large data set power analysis method for a simple LC model with
cross-sectional data, but the same method may be applied with LC models for longitudinal
and multilevel data. Moreover, although the simulations in the current paper were
performed with a single covariate, it is expected that increasing the number of covariates
to two or more would improve the entropy R-square and therefore also the power. The
method may also be generalized to the so-called three-step approach for the analysis
of covariate effects on LC memberships (Bakk, Tekle, & Vermunt, 2013; Gudicha &
Vermunt, 2013; Vermunt, 2010a).
This research has several practical implications. Firstly, it provides an overview of
the design requirements for achieving a certain acceptable level of power in LC analysis
with a covariate affecting class memberships. Secondly, it presents a tool for determining
the required sample size given the specific research design that a researcher has in mind
instead of relying on a rule of thumb. Based on the literature and on the results of our
3.5. DISCUSSION AND CONCLUSIONS 59
study, we can conclude that easy rules of thumb, such as a sample size of 500 suffices
when the number of indicator variables is 6, cannot be formulated for LC analysis.
CHAPTER 4
Power Computation for Likelihood-Ratio Tests for the
Transition Parameters in Latent Markov Models
Abstract
Latent Markov (LM) models are increasingly used in a wide range of research areas
including psychological, sociological, educational, and medical sciences. Methods to
perform power computations are lacking, however. This chapter presents methods to
preform power analysis in LM models. Two types of hypotheses about the transition
parameters in LM models are considered. The first concerns the situation where the
likelihood-ratio test statistic follows a chi-square distribution, implying that also the power
This chapter has been accepted for publication as: Gudicha, D.W., Schmittmann, V. D.,& Vermunt, J. K. (2015). Power Computation for Likelihood-Ratio Tests for the TransitionParameters in Latent Markov Models. Structural Equation Modeling: A Multidisciplinary Journal, DOI:10.1080/10705511.2015.1014040.
61
62 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
computation can be based on this theoretical distribution. In the second case, power needs
to be computed based on empirical distributions constructed via Monte Carlo methods.
Numerical studies are conducted to illustrate the proposed power computation methods
and to investigate design factors affecting the power of this test.
4.1. INTRODUCTION 63
4.1 Introduction
Models involving latent classes are receiving increasing interest from applied researchers,
not only for the analysis of cross-sectional data but also in longitudinal studies, in which
respondents are assumed to switch between classes during the period of observation. The
occurrence of these transitions between latent classes (also called latent states) can be
studied by using latent Markov (LM) models, which are also referred to as hidden Markov
models or latent transition models (Collins & Wugalter, 1992; Poulsen, 1990; Rabiner,
1989; Van de Pol & De Leeuw, 1986; Visser, Raijmakers, & Molenaar, 2002).
This growing interest in LM models is fueled by both the progress that has been
achieved in extending the basic model (e.g., Wiggins (1973)) and the development of
various statistical packages for analyzing data using the LM models. Extensions to the
basic model include the use of time-constant and/or time-varying covariates (Chung,
Park, & Lanza, 2005; Reboussin et al., 1998; Vermunt et al., 1999), multiple response
variables (Bartolucci, 2006; Langeheine & Van de Pol, 1993; Wall & Li, 2009), and
grouping variable(s) (Collins & Lanza, 2010). These extensions, together with the growing
number of statistical packages (e.g., Latent GOLD (Vermunt & Magidson, 2013a), Mplus
(L. Muthen & Muthen, 1998-2007), the R-packages dempixS4 (Visser & Speekenbrink,
2010), and the SAS procedure PROC LTA (Lanza & Collins, 2008)) make it possible to
successfully apply LM models to many practical problems in longitudinal studies.
Despite these developments, methods to perform power computation in LM models
have received no attention in the methodological literature, as far as we know. In
many applications of LM models, hypotheses are typically tested using the likelihood-
ratio (LR) test without addressing power issues. Computing the power of tests (i.e., the
probability that the test rejects the null hypothesis when it is false) is, however, extremely
important for various reasons. When planning a study, power computation can help to
make an informed decision on the sample size or the number of measurement occasions
required to achieve a pre-specified power level for the tests of interest. When testing a
particular hypothesis, power calculation assesses the ability of a test to detect a statistically
64 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
meaningful effect when indeed there is such an effect in the population. This is of interest
when we wish to determine the usefulness of a test.
To perform a power calculation in LM models, we not only need to take into account
the sample size, effect size, and the level of significance, but also several other design
factors. For instance, in the latent class model, which can be conceived of as a special
case of the LM model, Gudicha et al.(in press) showed that a test can be underpowered
when associations between latent classes and response variables are weak, that is, if the
latent classes are poorly separated. See also Tein et al. (2013) who discussed statistical
power to detect the number of clusters in latent profile analysis. In LM models, also
the number of measurement occasions and the transition probabilities are expected to
affect the power. The objective of this chapter is twofold: to provide power computation
methods for hypotheses regarding the parameters of LM models and to identify design
factors that affect the power.
In general, two kinds of statistical test are of interest when using LM models. The
first kind pertains to hypotheses about the number of latent states (e.g., the test of a
model with three latent states against a model with two latent states). The second kind
concerns hypotheses for the parameters of the LM model, for example, for the transition
probabilities. In this chapter we focus on the latter type of test. More specifically,
we assume that the number of states is known, and focus on equality and fixed value
hypotheses for the model parameters. These include hypotheses stating that transition
probabilities are constant across time points, that certain transition probabilities are equal
to zero, or that transition probabilities are equal across two groups.
As we explain in detail below, for certain hypotheses on model parameters the standard
asymptotic results for the LR hold, implying that power computation can be based on
asymptotic distributions. For other hypotheses or, more specifically, for hypotheses stating
that probabilities are equal to zero, these asymptotic results do not hold (Bartolucci,
2006). For this non-standard situation in which asymptotic distributions cannot be used
for power computation, we propose constructing the empirical distribution of the LR
4.2. THE LM MODEL 65
statistic via Monte Carlo (MC) methods. Hereafter, we refer to the former and latter
situations as power computation under the standard and non-standard case, respectively.
The remainder of the chapter is organized as follows. We first introduce the LM model
and present examples of hypotheses that can be specified on the transition parameters of
this model. We then briefly explain the LR test and its asymptotic properties. Next, power
computation is presented for both the standard and the non-standard case. In addition,
we describe the design factors that affect the power of the LR test. We also present a
numerical study to illustrate the proposed power computation methods, and to examine
design configurations with acceptable power levels. In the final section, we provide a
discussion of the different power computation methods, as well as recommendations for
applied researchers and suggestions for future methodological studies.
4.2 The LM model
The LM model is considered as a probability based model, in which the observed response
patterns at a given time point are related with latent states producing these response
patterns, analogous to the latent class model. In addition, the probability of being in a
particular state at the current time point depends on the latent state of the previous time
point. The model has two sub-parts. The first sub-part, the measurement model, relates
the latent variables to the observed response variables. The second sub-part, the Markov
model, describes the probabilities of switching between latent states over time. The latter
model applies Markovian chains to account for the dependence between the latent states
at successive measurement occasions.
The LM model relies on two assumptions. The first is the local independence
assumption, which implies that the observed response patterns produced at time t depend
only on the current state. The second is the first-order Markov assumption, which implies
that the state occupied at time point t depends only on the state occupied at time point
t − 1 (Bartolucci, 2006; Vermunt et al., 1999). These two assumptions are specified on
the measurement and the Markov model, respectively. Below, we first introduce some
66 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
notation and then present the LM model.
Let yitj be the response of subject i to the jth response variable measured at occasion
t, for i = 1, 2, 3, .., n, j = 1, 2, 3, .., P , and t = 1, 2, 3, .., T . We denote the vector of
responses for subject i at occasion t by yit, and the vector of responses at all occasions
by yi. Let us denote a discrete latent state at time point t by Xt and its possible value
by xt where xt = 1, 2, 3, ...C. Then the probability of observing the response pattern yi
can be defined as
p(yi,Φ) =∑x1
∑x2
...∑xT
initial state probabilities︷ ︸︸ ︷p(X1 = x1)
∏t
transition probabilities︷ ︸︸ ︷p(Xt = xt|Xt−1 = xt−1)
∏j
p(yitj |Xt = xt)︸ ︷︷ ︸conditional response probabilities
(4.1)
where Φ is the vector of model parameters.
As can be seen from equation (4.1), the LM model has three fundamental sets of
parameters: The initial state probabilities, p(X1 = x1), the transition probabilities,
p(Xt = xt|Xt−1 = xt−1), and the conditional response probabilities, p(yitj |Xt = xt).
The initial state probabilities show the state proportions (or sizes) at the first measurement
occasion. The transition probabilities, conveniently collected in the so-called transition
matrix A(t) as shown below, provide the probabilities of switching between the states from
one measurement occasion to the next. The conditional response probabilities provide
information on the association between states and the response variables.
If we set the number of states to 3, for example, the transition between latent states
at time point (t− 1) and t can be expressed using a matrix of transition probabilities as
4.2. THE LM MODEL 67
A(t) =
t
π1|1 π2|1 π3|1
t− 1 π1|2 π2|2 π3|2
π1|3 π2|3 π3|3
, (4.2)
where the principal diagonal elements of matrix A(t) represent the probability of staying
in the same state between consecutive measurement occasions, and the off-diagonal
elements are the probabilities for switching from a particular state at time t − 1 to
another particular state at time t. For instance, π1|1 = p(Xt = 1|Xt−1 = 1)
represents the probability of remaining in state 1 at the current measurement occasion,
and π3|2 = p(Xt = 3|Xt−1 = 2) represents the probability of switching from state 2 at
the previous measurement occasion to state 3 at the current measurement occasion.
In certain applications, the effect of one or more covariates Z on the transition
probabilities may also be of interest. For instance, the effect of a dichotomous grouping
variable (e.g., Z = 0 for the control group, and Z = 1 for the treatment group). This
can be done by inserting the covariate Z into equation (4.1) as follows:
p(yi,Φ|zi) =∑x1
∑x2
...∑xT
p(X1 = x1)∏t
p(Xt = xt|Xt−1 = xt−1, zi)
∏j
p(yitj |Xt = xt). (4.3)
When covariates are included, the transition probabilities are generally re-parameterized
by specifying a multinomial logistic regression:
p(Xt = r|Xt−1 = s, zi) = πr|s =exp (βsr + γsrzi)∑cl=1 exp (βsl + γslzi)
. (4.4)
That is, in this case, we estimate the logit coefficients β and γ as parameters, rather than
the probabilities π directly.
68 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
4.2.1 Hypotheses specified on transition parameters
We can distinguish several types of hypotheses on the transition parameters of LM models.
Table 4.1 contains a classification of the most common hypotheses. A first distinction
concerns whether the hypothesis implies an equality constraint (say, π1|2 = π2|1), or a
fixed value constraint (say, π1|2 = 0.3); this is shown in the first column of Table 4.1.
Fixed value constraints can be further distinguished into boundary constraints, where the
parameter is fixed on a value on the boundary of the parameter space (i.e., zero or one
for probabilities) and non-boundary constraints, where the parameter is fixed to a value
inside the parameter space. This distinction is important, because fixed value boundary
constraints require non-standard hypothesis testing and power calculation methods, as
we address in detail below. Which testing methods can be used is shown in the last
column of Table 4.1. Further distinctions concern whether the constraints are imposed
on a specific parameter or on the whole set of transition parameters, and whether the
constraints are imposed on the transition parameters of the basic LM model (i.e., on the
probabilities) or on the transition parameters of the LM model with covariates (i.e., on
the logit coefficients).
We will now describe the hypotheses in Table 4.1 in more detail. H10 states the
probability of switching from state s to state r is equal to the probability of switching
from state r to state s; H20 states that given state s, the probability of a transition to state
r and state k is equal; H30 indicates that the probabilities in two cells of the transition
matrix are equal (e.g., π1|2 = π4|3); H40 assumes the transition matrix is symmetric (i.e.,
A equals its transpose); H50 implies that the transition matrix is time homogeneous; H6
0
sets the transition matrix of one group (e.g., the treatment group) equal to that of another
group (e.g., the control group); H70 fixes the probability of switching from state s to state
r to v, where v ∈ (0, 1) can be any user defined value; H80 defines the covariate to have no
effect on the probability of switching to the state r; H90 sets the probability of switching
from state s to state r to 0; and H100 assumes the transition matrix is diagonal, meaning
that there are no changes in state over time.
4.2. THE LM MODEL 69
Table 4.1: Typical hypotheses formulated on the transition parameters of the latentMarkov model
hypothesesconstraint on selected on whole testing
types transition parameters transition matrix methods
equalityH1
0 : πr|s = πs|r for some r, s H40 : A = A
′
standardH20 :πr|s = πk|s for some s H5
0 : A(t) = AH3
0 :πr|s = πk|l for some r, s, k, l H60 : A1 = A2
fixedvalue
non- H70 : πr|s = v, v ∈ (0, 1)
standardboundary H8
0 : γr = 0on H9
0 : πr|s = 0 H100 : A = diag{πs|s}, non-standard
boundary for some r, s for r = 1, 2, 3, ..., c
Note. A = a square matrix with entries equals the transition probabilities, πr|s; A′= transpose of
matrix A; A(t) = probability matrix for transitions between states at time point t− 1 and t.
It should be noted that in certain applications, hypotheses about the initial state and
the conditional response probabilities may be of interest as well (Visser et al., 2002).
As for the transition probabilities, also for the conditional response probabilities, one may
define equality or fixed-value restrictions, as discussed for latent class models by Goodman
(1974) and Mooijaart and Van der Heijden (1992). This means that the distinction
between different types of hypotheses in Table 4.1 (i.e., equality constraints, boundary
fixed value constraints, and non-boundary fixed value constraints) can analogously be
applied to hypotheses about the initial state and conditional response probabilities.
4.2.2 Parameter estimation
Hypothesis testing using the LR requires estimating both the restricted model defined by
the hypothesis of interest and the unrestricted model by means of maximum likelihood.
Assuming that the responses of the individuals are identically and independently
distributed, the log-likelihood for the model defined in equation (4.1) (also for equation
(4.3)) can be specified as
l(Φ) =∑i
log (p(yi)), (4.5)
As in other latent class and mixture models, maximum likelihood estimates of the
parameters of LM models can be obtained using the expectation maximization (EM)
70 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
algorithm. This is an iterative method which alternates between the E step in which
the expected value of the complete data log-likelihood – the log-likelihood if the latent
states would be observed – conditional on the observed data and the current parameter
estimates is computed, and the M step in which the parameters are updated by maximizing
the expected complete data log-likelihood (McLachlan & Krishnan, 2007).
Estimating the parameters using the above mentioned procedure is, however, not
always straightforward. Firstly, the log-likelihood function in equation (4.5) may contain
local maxima to which the optimization algorithm may converge (Visser et al., 2002).
Inference based on such a local maximum may result in erroneous conclusions about the
parameters and the fit of the LM model of interest. To prevent local maxima, one should
therefore make sure that the model is re-estimated with multiple sets of start values.
Secondly, in LM models, initial state, transition, and measurement model probabilities
are mutually dependent. Because of this contingency, misspecification in one part of the
model (e.g., the transition part) affects the estimate of parameters for the other part
(e.g., measurement model).
4.3 The likelihood-ratio test
Once the parameter estimates and the corresponding log-likelihood values are obtained
for the null (restricted) and the alternative (unrestricted) model, hypotheses such as those
presented in Table 4.1 can be tested using the LR. We define the LR statistic to compare
the null and alternative models as
LR = −2(l(Φ0)− l(Φ1)),
where l(.) is the log-likelihood function as shown in equation (4.5), and Φ1 and Φ0
are the parameters of the model under the alternative and null hypotheses, respectively.
Alternatively, the LR can be obtained by taking the difference between the goodness-of-fit
test statistics of the null and the alternative model, that is, LR = LR0 − LR1, where
4.4. POWER COMPUTATION 71
LR0 and LR1 compare the model concerned with the saturated model.
Under certain regularity conditions, under the null hypothesis, the LR follows
a (central) chi-square distribution with df degrees of freedom (Giudici, Ryden, &
Vandekerkhove, 2000). The number of degrees of freedom of the test is determined
by subtracting the number of parameters under the null from the number of parameters
under the alternative hypothesis. The general principle of this test is to reject the null
hypothesis if the observed value of the LR exceeds the (1− α) quantile value, also called
the critical value, of the central chi-square distribution with df degrees of freedom. Such
a testing procedure can be classified under what we referred to above as the standard
case. However, there are hypotheses for which the LR statistic does not follow a chi-
square distribution. For example, with hypotheses of the type H90 : πr|s = 0 and H10
0 :
A = diag{πs|s} (see Table 4.1).
4.4 Power computation
To compute the power, we should know or estimate the distribution of the test statistic
under both the null and alternative hypothesis. The distribution under the null hypothesis,
which is indicated with H0 in Figure 4.1, is required to compute the critical value, Q1−α,
corresponding to the pre-defined type 1 error α. The distribution under the alternative
hypothesis, indicated with H1 in Figure 4.1, is required to compute the power, that is,
the probability that the test statistic exceeds this critical value given that the alternative
hypothesis is true. In Figure 4.1, this probability corresponds to the shaded area; that is,
the area below the H1 curve to the right of the vertical dashed line at the critical value.
The next sub-sections describe various procedures for computing this probability under
the standard and non-standard testing cases.
4.4.1 The standard case
As already mentioned above, in the standard case, the LR statistic follows a central
chi-square distribution when the null hypothesis holds. When instead the alternative
72 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
Q1−α
H0
H1
power
Figure 4.1: Distribution of the likelihood-ratio statistic under the null and alternativehypotheses and the statistical power.
hypothesis holds, which is what we assume in power computation, the distribution of the
LR becomes a non-central chi-square. One approach to power computation involves
computing the non-centrality parameter λ, which quantifies the extent to which the
distribution of the LR under the alternative hypothesis deviates from its distribution under
the null hypothesis.
First let us describe a general power computation approach which does not require the
computation of the non-centrality parameter. Instead, the empirical distribution of the
LR under the alternative hypothesis is constructed using a Monte Carlo (MC) procedure.
This procedure, which we refer to as MC-based power computation, works as follows.
Step 1. A sample of a specified size, n, is repeatedly simulated (say M times) from the
population under the alternative hypothesis, and for each of these samples, the LR value
is computed by estimating both the null model and the alternative model. We denote the
LR value obtained with the m sample by LRm.
Step 2. The actual power associated with a sample of size n is computed as the proportion
of the simulated data sets in which the null hypotheses is rejected given the critical value
4.4. POWER COMPUTATION 73
Q1−α, which can be obtained from the central chi-square distribution with df degrees of
freedom. More formally,
powerMC1=
∑Mm=1 I(LRm > Q1−α)
M, (4.6)
where I(LRm > Q1−α) is an indicator function taking the value 1 when the LR value of
the mth sample exceeds the critical value, and is 0 otherwise.
The second, more elegant and more standard way of power computation involves
obtaining an estimate of the non-centrality parameter and subsequently computing the
power for a given n using the non-central chi-square distribution concerned. We discuss
two methods to obtain the non-centrality parameter, which both require analyzing a
single constructed data set. The first method, which we refer to as the exemplary data
method, uses a data file which is exactly in agreement with the population model under
the alternative hypothesis (O’Brien, 1986; Self et al., 1992). Power computation is
implemented in four steps as follows.
Step 1. An ’exemplary’ data set is created, which contains all possible response patterns
with weights equal to the model expected proportions under the alternative hypothesis.
Step 2. Using this data set, the log-likelihood is computed for both the constrained null
and the alternative model.
Step 3. The non-centrality parameter is approximated as
λ1 = −2(l(Φ0)− l(Φ1)), (4.7)
where l(Φ0) and l(Φ1) are the log-likelihood values under the null and the alternative
hypothesis, respectively, and λ1 represents the noncentrality corresponding to a sample
size of 1. Note that λ1 can also be computed as the difference between the goodness-of-fit
tests for the null and alternative models. Since the latter equals 0 (the alternative model
fits perfectly), λ1 equals the value of the likelihood-ratio goodness-of-fit statistic obtained
when estimating the null model.
74 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
Step 4. The non-centrality parameter obtained in Step 3 is rescaled to the sample size
n of interest. This is achieved using the proportionality between the sample size and the
non-centrality parameter: λn = n · λ1, where λn denotes the non-centrality parameter
for sample size n (Satorra & Saris, 1985; R. C. MacCallum, Browne, & Cai, 2006). The
power can now be computed as
power = p (LR > Q1−α(df)) = Fχ2(Q1−α, df, λn), (4.8)
where Fχ2(df, λ) is a function for a non-central chi-square distribution with df degrees of
freedom and non-centrality parameter λn, and Q1−α = χ2(1−α)(df) is the (1−α) quantile
value of the central chi-square distribution.
The number of response patterns in the exemplary data set, which depends on the
number of measurement occasions, the number of response variables, and the number
of response categories, can quickly become very large. For instance, even in a relative
small problem with four time points (T = 4) and six response variables (P = 6) with
two categories, the number of possible response patterns is already larger than 16 million.
This shows that the exemplary data method may quickly become impractical. We propose
resolving this problem by using a large simulated data set from the population under the
alternative hypothesis instead of an exemplary data set. We refer to this alternative to
the exemplary data method as the ’large simulated data’ method. The steps that need
to be taken for power computation are the following:
Step 1. Generate a large data set, say of size N = 100000, according to the model under
the alternative hypothesis.
Step 2. Estimate the models under both the null and the alternative hypotheses based
on the data obtained in Step 1. This yields the log-likelihood values for both models.
Step 3. Compute the non-centrality parameter as
λ1 =−2(l(Φ0)− l(Φ1))
N, (4.9)
4.4. POWER COMPUTATION 75
where λ1 is again the noncentrality parameter for a sample size of 1. Note that now the
likelihood-ratio goodness-of-fit test is not equal to 0 under the alternative model, which
means that we have to estimate both models.
Step 4. As in the exemplary data method, get λn = n · λ1 and obtain the power using
equation (4.8).
4.4.2 The non-standard case
In the non-standard case, the regularity conditions under which the LR follows an
asymptotic χ2-distribution are not satisfied. This happens, for instance, if parameters
are fixed on the boundary of the parameter space, as in hypotheses H90 and H10
0 . In this
non-standard case, the cut-off value Q1−α is generally not equal to χ2(1−α)(df). Thus,
the critical value of the LR obtained from the central chi-square cannot be used in the
subsequent power computation under the alternative hypothesis. Neither can we use the
non-central chi-square to approximate the distribution of the LR under the alternative
hypothesis, implying that the theoretical distributions mentioned above cannot be used
here for power computation. However, with the advance in computing facilities, instead
of relying on these theoretical distributions, one may compute power by applying MC
simulations. Two MC simulations are needed: one simulation is performed to obtain the
value of Q1−α, and the other simulation is performed to compute the power given the
Q1−α.
More specifically, in order to compute the value of Q1−α, the empirical distribution of
the LR under the null hypothesis should be constructed first. That is, generate M data sets
according to the model under the null hypothesis, and compute the LR statistic for each of
these samples. For sufficiently large M , the distributions of these LR values approximate
the population distribution of the LR statistic under the null hypothesis. Next, for the
specified α-level, this (1− α) quantile value is obtained as the value LR(1−α) that splits
the sorted LR values into the following two sets: the 100(1−α) percent smaller LR values
and the top 100α percent large LR values. That is,
76 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
Q1−α = {LR(1−α) : p(LR > LR(1−α)|H0) = α}. (4.10)
Similarly, the distribution of the LR under the alternative hypothesis is constructed
using M samples, but now generated according to the model under the alternative
hypothesis. Using this distribution the power is computed as the proportion of these
LR values that exceeds the Q1−α value obtained from equation (4.10). That is,
powerMC01= p(LR > Q1−α|H1) =
∑Mm=1 I(LRm > Q1−α)
M, (4.11)
where I(·) is again an indicator function indicating whether the LR value (computed
based on the H1 sample) exceeds the Q1−α value. Note that powerMC01in equation
(4.11) indicates that MC methods are applied under both the null and the alternative
hypothesis, while powerMC1in equation (4.6) indicates MC simulation is applied under
the alternative hypothesis only.
4.5 Design factors
As in other standard statistical model, the power of a test in LM models depends on the
significance level α, the effect size (difference between the parameter values under the
null and alternative hypotheses), and the sample size. This can be explained using Figure
4.1. If the value of α becomes larger, Q1−α shifts to the left, and consequently the region
under the curve H1 to the right of Q1−α gets larger. This implies that the larger α, the
larger the power of a test. For a fixed α-level, if the effect size gets larger, the value of
the non-centrality parameter gets larger, meaning that the curve indicated by H1 shifts to
the right, and consequently the power becomes larger. From the non-centrality parameter
and sample size relationship, λn = n · λ1, if the sample size increases, the non-centrality
parameter increases, meaning that the overlap between the probability distributions under
H0 and H1 decreases and thus the power increases.
In addition to these standard factors, the power of a test in LM models is expected
4.6. NUMERICAL STUDY 77
to depend on aspects of the measurement part and the transition part of these models.
In LM analysis, state membership is not directly observable, but is determined based on
responses provided to a set of observed response variables. Among others, uncertainty
about the state membership depends on the number of states, the state proportions,
the number of response variables, and the strength of the association between the latent
states and the response variables. See for example Collins and Lanza (2010) and Gudicha
et al.(in press). The stronger the state-response associations, the better the separation
between states will be. The better the states are separated, the less uncertain we are about
the respondents state membership given his/her responses to the observed variables. Also,
each additional measurement occasion provides additional information regarding the way
in which respondents change their state membership.
4.6 Numerical study
The purpose of this numerical study is to 1) illustrate the power computation methods
under the standard and non-standard case, 2) investigate how the study design factors
and the population model characteristics mentioned above may affect the power, and 3)
identify which design configurations yield an acceptable power level (power ≥ 0.8). We
focus on three of the hypotheses shown in Table 4.1. The first two hypotheses, H40 :
πr|s = πs|r for some r, s in the basic LM model, and H80 : γr = 0 in the LM model with
covariates are examples of the standard case. The third hypothesis concerns testing for
a zero entry in the transition probability matrix, H90 : πr|s = 0 for some s and r, is an
example of the non-standard case. The latent GOLD program (Vermunt & Magidson,
2013a) syntax examples used to perform this numerical study is shown in appendix.
4.6.1 Numerical study set up
In this numerical study, we restricted ourselves to LM models for dichotomous response
variables (say with the categories negative and positive). The α value was always assumed
to be .05, given it has obvious effect on the power of the statistical tests and the fact
78 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
that this value is often fixed in advance.
We varied the sample size (n = 300, 500, or 900), the number of measurement
occasions (T = 2 or 4), the number of latent states (C = 2 or 3), the number of response
variables (P = 6 or 10), the initial states proportions (uniform or non-uniform), the
strength of the association between latent states and response variables (weak, medium,
or strong), and the stability of the state membership (unstable, moderately stable, and
stable). The non-uniform initial state proportions were set to (0.7, 0.3) for C = 2 and to
(0.6, 0.3, 0.1) for C = 3). The settings for the association between states and responses
were specified using response probabilities equal to 0.7, 0.8, and 0.9 (or .3, .2, and
.1), respectively. For example, in the weak association condition, the probability of a
positive response was set to 0.7 for all variables in latent state one, to 0.3 in latent state
two, and to 0.7 for the first half of the items and to 0.3 for the remaining items in
state three. The basic settings for the latent transitions where obtained by setting the
main diagonal elements of the transition matrix to πr|r = 0.7, 0.8, or 0.9, and the other
elements to1−πr|rC−1 , which corresponds to unstable, moderately stable, and highly stable
state memberships, respectively.
1) For hypothesis H10 : πr|s = πs|r, we arrived at the respective transition matrices
under the alternative model (i.e., with differences in the off-diagonal elements) by
specifying the transition odds-ratios comparing the transition from s to r with the
transition from r to s, which is defined asπr|s/πs|sπs|r/πr|r
, to be equal to 1.3498, 1.8222 and
3.3201. These odds-ratios, which we hereafter refer to as small, medium, and large effect
sizes, correspond to differences in the transition probabilities ranging from 0.01 to 0.25.
2) For hypothesis H80 ; γr = 0, we added the effect of a dichotomous covariate on
the transitions. The covariate effect was specified by setting its (effect coded) coefficient
in the logistic regression model for the transitions to 0.25, 0.5, and 1. Or equivalently,
by setting the transition odds-ratio to 1.648, 2.7182, and 7.389, which corresponds to a
small, medium, and large effect size, respectively.
3) For hypothesis H90 ; π1|2 = 0, we restricted ourselves to the situation with T = 2,
4.6. NUMERICAL STUDY 79
P = 6, π1|1 = 0.7 or 0.9, C = 2, and equal initial state probabilities. This setting gives
a transition matrix of the type
0.7 0.3
δ 1− δ
and
0.9 0.1
δ 1− δ
,where δ is the value of πr|s under the alternative hypothesis. The value of δ, which
defines the effect size for the hypothesis that πr|s = 0, was set to 0.05, 0.1, or 0.2. The
association between states and response variables was set to weak, medium, and strong
as defined earlier.
4.6.2 Results
Table 4.2 presents the power to reject the null hypothesis πr|s = πs|r. As expected, the
power of the LR test depends on the association between the latent states and the response
variables, the number of measurement occasions, the sample size, the size of initial states,
the number of response variables, the transition probabilities, and the effect sizes. More
specifically, the stronger the association between latent states and response variables, the
larger the power. For a given number of response variables and measurement occasions,
say P = 6 and T = 2, reasonable power levels are achieved by increasing the sample
size. Or, for a given sample size, say n = 300, reasonable power levels are achieved by
increasing the number of response variables or measurement occasions. The power gain
achieved by increasing number of measurement occasions from 2 to 4 is larger than the
power gain achieved by increasing the sample size from 300 to 900. Also, with the current
design and population model characteristics, sampling from a population with equal initial
state probabilities increases the power.
Another interesting observation is that, keeping the other design factors constant, the
more unstable the state membership ( or the larger the transition probabilities), the larger
the power to demonstrate differences between transition probabilities. One can also see
from Table 4.2 that for the situation where the initial state proportions/probabilities are
80 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
equal, the number of response variables is equal to 6, and the sample size is equal to 300,
a power of 0.80 or higher is achieved 1) for low transition probabilities, when the effect
size is large and the association between states and response variables is strong, 2) for
moderate transition probabilities, when the effect size is large and the association between
states and response variables is medium or strong, and 3) for high transition probabilities,
when the effect size is large. For the situation when we have weak associations between
states and response variables, highly stable transition probabilities, and a low or moderate
effect size, such a power level is achieved only at the expense of increasing the sample
size, or the number of measurement occasions.
For the 3-state LM model, we do not show the results of the power calculation, as
they provide similar information as the 2-state LM model shown in Table 4.2. We should
however note that the power to demonstrate differences in the transition probabilities for
the 3-state LM model is in general lower than its corresponding power value for the 2-state
LM model, implying that the power depends on the number of states as well.
The power of the LR test to reject the null hypothesis that the covariate has no
effect on the transition probabilities is shown in Table 4.3. With respect to the design
factors, the general trend found is similar to the results from Table 4.2. That is, power
increases with sample size and effect size. Also, for a fixed sample size and effect size,
one can achieve a desired level of power by improving the measurement part of the LM
model, for example, by using response variables which have a strong association with the
latent states, or by increasing the number of response variables. Increasing the number
of measurement occasions could also greatly help in obtaining a desired power level. One
can also see from this table that power for the effect of the covariate on the transition
probabilities becomes larger when the state membership is more unstable, say πr|r = 0.7,
than when we have highly stable states, say πr|r = 0.9.
Table 4.4 shows the power to reject the null hypothesis that the probability of switching
from one state at time point (t − 1) to another state at time point t equals zero. As
compared with the other two hypotheses, the roles of transition probabilities on power is
4.6. NUMERICAL STUDY 81
small. Whereas the role of state-response association on the power is high. For example,
in a sample of 100 observations, if the state-response association is weak, the power to
detect a small proportion difference of .05 from the null hypothesis stating a proportion
of 0, is lower than 20 %. In contrast, when this state-response association is strong, the
power becomes over 90 %. When the state-response associations are strong, that is the
measurement model is strong, the state separation becomes high. In such a condition,
the possibility of observing the expected pattern from state 1 while in state 2 becomes
extremely small. Therefore, when the true underlying model generates some transitions
that are impossible under the restricted model, the likelihood of this restricted model will
decrease dramatically, because each impossible transition, say, a transition from state 1
to state 2, needs to be accommodated by assigning the observed pattern from state 1
to state 2, or vice versa. The decrease in the likelihood of the restricted model will be
accompanied by biased parameter estimates: the estimates of the conditional response
probabilities in the two states will be biased to be closer to each other, to increase the
likelihood of observing a state 1 pattern given state 2 or a typical state 2 pattern given
state 1.
The results presented in Tables 4.2 and 4.3 concern the standard case, and were thus
obtained using the large data method which assumes that the distributions under H0 and
H1 are known. That is, we assume a central chi-square under the null and non-central
chi-square under the alternative hypothesis, for which the value of the non-centrality
parameter is approximated based on a large data set. The results showing the quality of
the latter approximation as well as the asymptotic approximation of the chi-square itself
are presented in Table 4.5. As can be seen from this table, both the large data set and
the chi-square approximations are very good.
82 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
Tab
le4.
2:T
he
pow
erof
the
likel
iho
od
-rat
iote
stto
reje
ctth
en
ull
hyp
oth
esis
thatπr|s
=πs|r
inth
e2-
stat
ela
ten
tM
arko
vm
od
el
πr|r
=0.9
πr|r
=0.8
πr|r
=0.7
effec
tst
ate-
resp
on
sest
ate-
resp
on
sest
ate-
resp
on
sesi
zeas
soci
atio
nas
soci
atio
nas
soci
atio
nw
eak
med
ium
stro
ng
wea
km
ediu
mst
ron
gw
eak
med
ium
stro
ng
Eq
ual
init
ial
stat
e,P
=6
,an
dT
=2
smal
l.0
75
.09
4.1
17
.09
6.1
42
.17
6.1
13
.19
9.2
24
n=
300
med
ium
.16
6.2
79
.30
6.2
21
.44
4.5
32
.28
0.5
81
.62
2la
rge
.39
7.7
14
.87
3.6
82
.95
4.9
81
.79
5.9
84
.99
6
smal
l.0
93
.12
5.1
64
.12
8.2
06
.26
2.1
56
.29
9.3
41
n=
500
med
ium
.16
2.4
25
.46
6.3
35
.65
1.7
50
.42
7.7
98
.83
5la
rge
.59
2.9
03
.98
0.8
81
.99
7.9
99
.94
91
.00
1.0
0
smal
l.1
27
.18
7.2
56
.19
2.3
31
.42
6.2
43
.48
5.5
47
n=
900
med
ium
.25
4.6
61
.71
1.5
39
.88
3.9
42
.66
3.9
76
.97
6la
rge
.83
7.9
92
1.0
0.9
88
1.0
01
.00
.99
81
.00
1.0
0E
qu
alin
itia
lst
ate,P
=10
,an
dT
=2
smal
l.0
75
.10
8.1
27
.13
1.1
51
.18
1.1
53
.20
3.2
38
n=
300
med
ium
.19
5.2
85
.33
5.3
88
.50
5.5
44
.45
5.6
22
.64
5la
rge
.55
9.8
36
.86
5.8
74
.97
2.9
88
.94
5.9
95
.99
7E
qu
alin
itia
lst
ate,P
=6
,an
dT
=4
smal
l.1
93
.23
2.2
67
.26
4.4
03
.44
7.2
47
.48
6.5
56
n=
300
med
ium
.52
0.6
86
.77
9.7
57
.92
7.9
44
.76
4.9
70
.98
4la
rge
.97
1.9
99
1.0
0.9
99
1.0
01
.00
.99
91
.00
1.0
0U
neq
ual
init
ial
stat
e,P
=6
,an
dT
=2
smal
l.0
58
.09
5.1
13
.09
1.1
36
.14
9.0
87
.14
8.1
63
n=
300
med
ium
.10
8.2
03
.30
2.1
80
.35
8.4
30
.21
9.4
50
.58
6la
rge
.26
9.6
06
.78
7.5
75
.89
8.9
52
.68
3.9
54
.98
4
No
te.
Th
ep
ower
valu
esre
por
ted
inth
ista
ble
are
com
pu
ted
usi
ng
the
larg
ed
ata
set
met
ho
d.P
andT
den
ote
the
nu
mb
ero
fre
spo
nse
vari
able
san
dm
easu
rem
ent
occ
asio
ns,
resp
ecti
vely
.
4.6. NUMERICAL STUDY 83
Tab
le4.
3:T
he
pow
erof
the
likel
iho
od
-rat
iote
stto
reje
ctth
en
ull
hyp
oth
esis
that
the
cova
riat
eh
asn
oeff
ect
onth
etr
ansi
tion
prob
abili
ties
πr|r
=0.9
πr|r
=0.8
πr|r
=0.7
effec
tst
ate-
resp
on
sest
ate-
resp
on
sest
ate-
resp
on
sesi
zeas
soci
atio
nas
soci
atio
nas
soci
atio
nw
eak
med
ium
stro
ng
wea
km
ediu
mst
ron
gw
eak
med
ium
stro
ng
Eq
ual
init
ial
stat
e,P
=6
,an
dT
=2
smal
l.0
85
.12
7.1
87
.11
3.2
39
.31
6.1
46
.30
7.3
95
n=
300
med
ium
.18
0.4
28
.58
6.3
52
.71
9.8
74
.48
2.8
60
.94
9la
rge
.58
7.9
61
.99
6.8
89
.99
91
.00
.96
61
.00
1.0
0
smal
l.1
09
.18
5.2
89
.15
9.3
76
.49
6.2
17
.48
0.6
06
n=
500
med
ium
.27
6.6
49
.81
7.5
47
.91
6.9
83
.71
2.9
79
.99
7la
rge
.81
8.9
98
1.0
0.9
87
1.0
01
.00
.99
91
.00
1.0
0
smal
l.1
62
.30
4.4
85
.25
7.6
16
.76
2.3
63
.74
6.8
63
n=
900
med
ium
.46
5.8
94
.97
5.8
13
.99
51
.00
.93
21
.00
1.0
0la
rge
.97
51
.00
1.0
01
.00
1.0
01
.00
1.0
01
.00
1.0
0E
qu
alin
itia
lst
ate,P
=6
,an
dT
=4
smal
l.2
24
.42
1.5
23
.32
8.5
84
.75
1.3
96
.74
8.8
60
n=
300
med
ium
.62
3.9
42
.98
3.8
93
.99
71
.00
.95
11
.00
1.0
0la
rge
.99
81
.00
1.0
01
.00
1.0
01
.00
1.0
01
.00
1.0
0E
qu
alin
itia
lst
ate,P
=10
,an
dT
=2
smal
l.1
39
.17
8.1
93
.18
0.2
77
.32
7.2
23
.38
8.4
18
n=
300
med
ium
.28
7.5
38
.63
1.5
62
.85
0.8
85
.68
5.9
28
.94
9la
rge
.85
3.9
92
.99
7.9
90
1.0
01
.00
.99
91
.00
1.0
0U
neq
ual
init
ial
stat
e,T
=2
,an
dP
=6
smal
l.0
78
.13
5.1
87
.12
4.1
98
.31
1.1
63
.28
6.4
03
n=
300
med
ium
.17
1.4
46
.59
8.3
80
.71
9.8
65
.53
9.8
70
.94
7la
rge
.60
5.9
68
.99
8.9
12
1.0
01
.00
.98
31
.00
1.0
0
No
te.
Th
ep
ower
valu
esre
por
ted
inth
isT
able
are
com
pu
ted
usi
ng
the
larg
ed
ata
set
met
ho
d.
84 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERST
able
4.4:
Th
ep
ower
ofth
elik
elih
oo
d-r
atio
test
tore
ject
the
nu
llh
ypot
hes
isπ2|1
=0
n=
100
n=
200
n=
300
stat
e-re
spo
nse
stat
e-re
spo
nse
stat
e-re
spo
nse
asso
ciat
ion
asso
ciat
ion
asso
ciat
ion
π2|1
wea
km
ediu
mst
ron
gw
eak
med
ium
stro
ng
wea
km
ediu
mst
ron
g
π2|2
=0.7
.05
.18
2.6
40
.90
6.2
63
.81
0.9
87
.34
4.9
17
.99
9.1
.38
1.8
77
.99
1.5
44
.97
81
.00
.71
0.9
97
1.0
0.2
.53
3.9
68
1.0
0.7
95
.99
81
.00
.99
91
.00
1.0
0π2|2
=0.9
.05
.18
1.5
97
.89
4.2
62
.80
6.9
87
.33
0.9
10
.99
9.1
.36
2.8
72
.99
2.5
54
.97
91
.00
.68
5.9
96
1.0
0.2
.53
5.9
57
.99
9.7
58
.99
91
.00
0.8
87
1.0
01
.00
No
te.
Th
ep
ower
valu
esre
por
ted
inth
ista
ble
are
for
the
sim
ula
tio
nco
nd
itio
ns
invo
lvin
g6
resp
on
seva
riab
les,
2m
easu
rem
ent
occ
asio
ns,
and
equ
alin
itia
lst
ate
pro
por
tio
ns
ina
2-s
tate
late
nt
Mar
kov
mo
del
;p
ower
com
pu
tati
on
for
the
no
n-s
tan
dar
dca
se.
4.6. NUMERICAL STUDY 85T
able
4.5:
Eva
luat
ing
the
qu
alit
yof
the
larg
ed
ata
set
met
ho
dfo
rlik
elih
oo
d-r
atio
pow
erco
mp
uta
tion
πr|r
=0.9
πr|r
=0.8
πr|r
=0.7
effec
tth
eore
tica
lst
ate-
resp
on
sest
ate-
resp
on
sest
ate-
resp
on
sesi
zeve
rsu
sas
soci
atio
nas
soci
atio
nas
soci
atio
nem
pir
ical
wea
km
ediu
mst
ron
gw
eak
med
ium
stro
ng
wea
km
ediu
mst
ron
g
smal
lth
eore
tica
l.0
93
.12
5.1
64
.12
8.2
06
.26
2.1
56
.29
9.3
41
emp
iric
al.0
77
.13
4.1
79
.12
6.2
17
.25
6.1
51
.29
3.3
41
med
ium
theo
reti
cal
.16
2.4
25
.46
6.3
35
.65
1.7
50
.42
7.7
98
.83
5em
pir
ical
.17
5.3
86
.47
4.3
68
.65
8.7
50
.43
8.7
96
.84
6
larg
eth
eore
tica
l.5
92
.90
3.9
80
.88
1.9
97
.99
9.9
49
1.0
01
.00
emp
iric
al.5
44
6.9
07
.97
5.8
75
.99
7.9
99
.93
91
.00
1.0
0
No
te.
Th
eth
eore
tica
lp
ower
valu
esar
eco
mp
ute
dby
assu
min
ga
cen
tral
chi-
squ
are
un
der
the
nu
llan
dn
on
-cen
tral
chi-
squ
are
un
der
the
alte
rnat
ive
hyp
oth
eses
,fo
rw
hic
hth
en
on
-cen
tral
ity
par
amet
eris
appr
oxim
ated
byu
sin
ga
larg
ed
ata
set.
Wh
erea
s,fo
rth
eem
pir
ical
case
,th
ep
ower
isco
mp
ued
bysi
mu
lati
ng
the
dis
trib
uti
on
of
the
test
stat
isti
cu
nd
erth
eal
tern
ativ
eh
ypo
thes
is.
Th
ep
ower
valu
esre
por
ted
inth
ista
ble
are
for
the
des
ign
con
dit
ion
sin
volv
ing
6re
spo
nse
vari
able
s,tw
om
easu
rem
ent
occ
asio
ns,
and
equ
alin
itia
lst
ate
pro
por
tio
ns
in2
-sta
tela
ten
tM
arko
vm
od
el.
86 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
4.7 Discussion and conclusions
This chapter addressed power computation methods for testing hypotheses about
transition parameters of LM models, which are the transition probabilities themselves
in the basic LM model and the logistic regression coefficients in the LM model with
covariate(s). We showed how the hypotheses of main interest can be specified by imposing
equality constraints across parameters or fixing parameter(s) to some user defined value(s).
We distinguished power computation for the standard case and power computation for
the non-standard case, where the latter arises when probabilities are fixed to zero.
For the standard case, in which the likelihood-ratio statistic follows an asymptotic chi-
square distribution, two power computation approaches were discussed. The first consists
of approximating the distribution under the alternative hypothesis for a given sample size
n using MC simulation (referred to as MC1). The second approach involves estimating
the non-centrality parameter using either an exemplary data set or large simulated data
set and subsequently obtaining the power for any sample size n from the non-central chi-
square distribution. The advantage of the second approach is that it is computationally
cheaper. However, when we have doubts that the distribution of the LR test statistic under
the alternative is non-central chi-square, the MC1 simulation approach is the preferred
option. The MC1 simulation approach can also be applied when the distribution under
the null is known but the distribution under the alternative is unknown. We will come
back to this issue when discussing topics for future research.
The non-standard case occurs when the likelihood-ratio does not follow a standard
chi-square distribution. The most obvious example for this is when a parameter is fixed
to the boundary of the parameter space, which equals zero or one for probabilities. In
such situations, power computation by MC simulation is applicable (referred to as MC01).
We use the MC01 method to compute both the critical value under the null hypothesis
and the power under the alternative hypothesis given this critical value. Note that this
procedure is similar to the MC1 simulation approach discussed for the standard case, with
the only difference that the theoretical distribution under the null hypothesis is replaced
4.7. DISCUSSION AND CONCLUSIONS 87
by its empirical counterpart.
In our numerical study, we saw that the power to detect large effects can be small
even with a not very small sample of say 500 observations. Based on the results of the
numerical study, we therefore strongly recommend researchers who apply LR tests in LM
models to perform a power analysis prior to data collection. Our findings indicate several
important issues that should be taken into account. Firstly, in addition to the usual design
factors (i.e., effect size, sample size, and significance level α), a power analysis for LM
models should also involve various other design factors, namely, the number of time points,
the number of response variables, the strength of association between latent states and
response variables, the number of states, the initial state probabilities, and the transition
probabilities. Secondly, for a given effect size, a desired level of power can be achieved by
increasing the number of measurement occasions, by increasing the number of response
variables, or by using response variables that have strong associations with the latent
states. Moreover, situations in which the transition probabilities are small need special
care, since power may be low in such situations. Thirdly, when the association between
states and response variables is weak or the effect size is small, a reasonable power level
can be achieved at the expense of gathering more data, that is, by increasing the sample
size or the number of measurement occasions. In the scenarios we studied, increasing
the number of measurement occasions was more efficient than increasing the sample size.
This is probably connected to the fact that we looked at hypotheses for the transition
probabilities; that is, with more measurement occasions one has more information on
the transition probabilities. When testing hypotheses on the initial state or the response
probabilities, increasing the sample size is probably more effective.
In the MC-based power computation, the accuracy of the estimated power depends
strongly on the number of replications used. This is especially the case in the MC01
method used in the non-standard case, in which not only the power but also the
critical value under H0 was estimated by MC simulation. In our study, we used 5000
MC replications, which seemed to be large enough for the hypotheses we investigated.
88 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
However, the required number of MC replications may depend on the type of hypothesis
and the model complexity, hence, further research might explore the required number of
MC replications for LR tests in LM models.
While for the non-standard case we proposed approximating the distribution of the
LR statistic under H0 by simulation, its asymptotic distribution has been shown to be
chi-bar square (Bartolucci, 2006). This means one may also obtain the critical value
under H0 from the chi-bar square distribution, which for multiple parameter hypotheses
also requires performing some kind of MC simulation. However, power computation using
an asymptotic approach requires the distribution under the alternative as well. This is
a problem that has not yet been resolved. Another possible area for future research is
investigating whether this distribution is, for instance, a certain type of non-central chi-bar
square distribution (Shapiro, 1988).
Other possible areas for further research concern the application of the proposed power
computation methods with other types of hypotheses relevant for LM modeling. It seems
that both the standard and non-standard case methods can be directly transferred to
hypotheses about other parameters of the LM model, namely, the initial probabilities and
the conditional response probabilities. The MC method proposed for the non-standard
case may also be applicable in hypotheses tests concerning the number of latent states,
that is, when comparing models with C- and C + 1- states.
For the power computation, we estimate the (incorrect) model under H0 for data sets
generated under H1. That is, the measurement parameters are not fixed, but estimated
under this incorrect model. Using such a procedure, when state separation is strong,
estimating the parameters of the model with the transition probability constrained to
zero can be problematic: the measurement parameters are down-estimated. When the
bias in the measurement parameter cannot compensate (in terms of the log-likelihood
value) for the misspecification of the transition model, this may lead to overestimation of
power. Future research investigating parameter estimation with constraints on transition
parameters would be interesting.
4.8. APPENDIX 89
For simulation conditions in which the latent states are highly separated, when the
true underlying model generates some fraction of cross-over observations, the likelihood of
the restricted model decreases, because each impossible transition, say, a transition from
state 1 to state 2, needs to be accommodated by assigning the observed pattern from
state 1 to state 2, or vice versa. The decrease in the likelihood of the restricted model will
be accompanied by biased parameter estimates: the estimates of the conditional response
probabilities in the 2 states will be biased to be closer to each other, to increase the
likelihood of observing a state 1 pattern given state 2 or a typical state 2 pattern given
state 1. This could also result in overestimation of the LR power for rejecting the null
hypothesis that the transition probability is zero.
4.8 Appendix
4.8.1 Latent GOLD syntax for power computation
This appendix illustrates the application of the proposed power computation methods
using the Latent GOLD program. As an example, we use a 2-state LM model with six
binary response variables (y1 through y6). The H1 population model contains unequal
transition probabilities, and we test the H0 model with equal transition probabilities
against the H1 model.
In order to perform a power computation, one should first define a data file indicating
the time structure and the variables in the model. With T = 4 and p = 6, this file could
be of the form
id time y1 y2 y3 y4 y5 y6 n100000 n300
1 1 0 0 0 0 0 0 100000 300
1 2 0 0 0 0 0 0 100000 300
1 3 0 0 0 0 0 0 100000 300
1 4 0 0 0 0 0 0 100000 300
This data file contains 4 records (one for each measurement occasion) which are connected
90 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
by an identifier variable, arbitrary values for the response variables, and variables indicating
sample sizes to be used later on.
A latent GOLD syntax model consists of three sections:“options”, “variables”, and
“equations”. The relevant LM model is defined as follows
// basic model
options
output parameters=first standarderrors profile;
variables
caseid id;
dependent y1 nominal 2, y2 nominal 2, y3 nominal 2,
y4 nominal 2,y5 nominal 2,y6 nominal 2;
latent State nominal dynamic 2;
equations
State[=0] <- 1;
State2 <- (beta~tra) 1 | State[-1];
y1-y6 <- 1 | State;
The “output” option indicates that we wish to use dummy coding for the logit parameters
with the first category as the reference category. Subsequently, we define the variables
which are part of the model. Note that the latent variable “State” is specified to be
dynamic, which yields a latent variable which changes its value across measurement
occasions.
The three equations represent the logit equations for the initial state, the transitions,
and the measurement part of the model, respectively. Note that “1” indicates an intercept,
and “|” that the intercept depends on the variable concerned. A special type of coding
(called transition coding and indicated with ”˜ tra” ) is used for the logit parameters of
the transition model and, moreover, a label (beta) is specified for these parameters. This
label will be used below to impose restrictions.
4.8. APPENDIX 91
a). Standard case
Option 1. Implementation of the Monte Carlo based power computation method (MC1)
involves defining the H0 and H1 model in a single input file. Denoting the parts which
remain the same as in basic model defined above by “...”, the H1 model may equal:
// H1 model for MC-based method
...
equations
...
{0.000000000
-0.54729786 -1.14729786
1.386294361 -1.386294361
1.386294361 -1.386294361
1.386294361 -1.386294361
1.386294361 -1.386294361
1.386294361 -1.386294361
1.386294361 -1.386294361}
The numbers shown inside the curly bracket represent the values for the logit
parameters in the H1 model. The first row contains the initial state parameter(s),
the second row the transition parameters, and the remaining rows the measurement
parameters.
The H0 model in which the transitions are restricted to be equal is defined as follows:
// H0 model for MC-based method
options
...
montecarlo replicates=1000 power=’H1’ N=300 alpha=0.05;
variables
...
92 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
equations
...
beta[1,1] = beta[2,1];
As can be seen, the equality restriction on the transition logits is defined at the end
of the equations section. What is important to note is that the H0 model should
contain the “montecarlo” option indicating the number of Monte Carlo replications, the
“power” command with the name of the H1 model, the sample size “N”, and the level of
significance “alpha”. Running the H0 model will yield the power for LR test comparing
the two models.
Option 2. When using the large data set method, one should first simulate a large data
set from the population defined by H1 and subsequently analyze this data set using both
the H0 and H1 model. Simulating the large data set is done as follows:
// H1 model for simulating a large data file
options
...
outfile ’sim.dat’ simulation;
variables
...
caseweight n100000;
equations
...
{...}
Compared to the basic specification, we use the “outfile” option to indicate that a data
file should be simulated, use the “caseweight” to indicate the size of the large data set
(here 100000), and specify the parameter values of the population model.
To obtain the power we analyze the large data set with an input file containing both
the H0 and H1 model. The H1 model equals:
4.8. APPENDIX 93
// H1 model for large data based power computation method
options
...
output LLdiff=’H0’ LLdiffPower=300;
...
That is, we indicate that a log-likelihood difference test should be performed (“LLdiff”)
and that the power of this test should be computed for the specified sample size
(“LdiffPower”). We also define the H0 model itself, which again is the basic LM model
with the constraint “beta[1,1] = beta[2,1]”.
Option 3. Power computation using the exemplary data method is similar to the large
data method. First an exemplary data file which is exactly in agreement with the H1
model is created, and subsequently this data file is analyzed with both the null and the
alternative model. That is, first create an exemplary data file as
// H1 model for creating the exemplary data file
options
...
output WriteExemplaryData=’exemplary.dat’;
variables
...
equations
...
{...}
Next compute power using the created exemplary data file, by specifying the H0 and
H1 model in the same way as the power computation using the simulated large data file
method. The only difference, when compared with the simulated large data file method
discussed above, is that the case weight of the exemplary data file has to be specified in
both the H0 and H1 model. This requires adding the line “caseweight frequency;” to the
94 CHAPTER 4. POWER FOR TESTS FOR TRANSITION PARAMETERS
“variables” section.
b). Non-standard case
We will illustrate MC-based power computation for the non-standard case using an
example in which one of the transition probabilities is fixed to 0, implying that the
transition logit concerned is fixed to a large negative value (say -100). Power computation
in the non-standard case proceeds in two steps. First, we obtain the critical values under
H0 by simulation and subsequently we obtain the power given this critical value.
To obtain the critical value, we define the H0 and H1 model in the same input file.
In the H1 model, we use the “MCstudy” option and specify the number of Monte Carlo
replications, the H0 model, the sample size, and level of significance “alpha”, that is,
// H1 model for obtaining CV by simulation
options
....
montecarlo replicates=5000 MCstudy=’H0’ N=300 alpha=0.05;
...
The H0 model contains the population values for the free parameters as well as the
constraint. That is,
// H0 model for obtaining CV by simulation
...
equations
...
beta[1,1] = -100;
{...}
Running the H1 model gives us the critical value (CV).
In the final step, we obtain power by running the H1 and H0 models; that is, define
the H0 and H1 models in a single input file as
4.8. APPENDIX 95
// H0 model for obtaining power by simulation
options
...
montecarlo replicates=5000 power=’H1’ N=300 CV=2.2344;
variables
...
equations
...
b[1,1]=-100;
The H1 model is again equal to the basic model with the population values for the
parameters. Running the H0 model will give us the power for the specified sample size N
and the estimated critical value CV, which we set here to N = 300 and CV = 2.2344.
CHAPTER 5
Power Analysis for the Likelihood-Ratio Test in Latent Markov
Models: Short-cutting the Bootstrap p-value Based Method
Abstract
The latent Markov (LM) model is a popular method for identifying distinct unobserved
states and transitions between these states over time in longitudinally observed responses.
The bootstrap likelihood-ratio (BLR) test yields the most rigorous test for determining
the number of latent states, yet little is known about power analysis for this test. Power
could be computed as the proportion of the bootstrap p-values (PBP) for which the null
hypothesis is rejected. This requires performing the full bootstrap procedure for a large
number of samples generated from the model under the alternative hypothesis, which is
computationally infeasible in most situations. This chapter presents a computationally
This chapter has been submitted for publication.
97
98 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
feasible short-cut method for power computation for the BLR test. The short-cut method
involves the following simple steps: 1) obtaining the parameters of the model under the
null hypothesis, 2) constructing the empirical distributions of the likelihood-ratio under the
null and alternative hypotheses via Monte Carlo simulations, and 3) using these empirical
distributions to compute the power. We evaluate the performance of the short-cut method
by comparing it to the PBP method, and moreover show how the short-cut method can
be used for sample size determination.
5.1. INTRODUCTION 99
5.1 Introduction
In recent years, the latent Markov (LM) model has proven useful to identify distinct
underlying states and the transitions over time between these states in longitudinally
observed responses. In LM models, as in latent class models, or more generally in finite
mixture models, the observed responses are governed by a set of discrete underlying
categories, which are named states, classes, or mixture components. Moreover, the LM
model allows transitions between these states from one time-point to another, that is,
the state membership of respondents can change during the period of observation. The
LM model finds its application, for example, in educational sciences to study how the
interests of students in certain subjects changes over time (Vermunt et al., 1999), and in
medical sciences to study the change in health behavior of patients suffering from certain
diseases (Bartolucci et al., 2010). Various examples of applications in social, behavioral,
and health sciences are presented in the textbooks by Bartolucci, Farcomeni, and Pennoni
(2013) and Collins and Lanza (2010).
In most research situations, including those just mentioned, the number of states is
unknown and must be inferred from the data itself. The bootstrap likelihood-ratio (BLR)
test, proposed by McLachlan (1987) and extended by Feng and McCulloch (1996) and
Nylund et al. (2007), is often used to test hypotheses about the number of mixture
components. These previous studies focused on p-value computation, rather than on
power computation for the BLR test, which is the topic of the current study.
The assessment of the power of a test, that is, the probability that the test will
correctly reject the null hypothesis when indeed the alternative hypothesis is true, is
important at several stages of a research study. At the planning stage, an a priori power
analysis is useful for determining the data requirements of the study: e.g., the sample
size or number of time points at which measurement takes place. In general, the smaller
the sample size, the less power we have to reject the null hypothesis when it is false.
Therefore, too small a sample size may result in an under-extraction of the number of
states (see for example, Nylund et al. (2007) and C. Yang (2006)). This not only misleads
100 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
the conclusion about the number of states but also the interpretation of the state specific
parameters. Moreover, when the sample size is too small, the parameter estimates are
prone to be unstable and inaccurate estimates (Marsh, Hau, Balla, & Grayson, 1998).
Performing an a priori power analysis helps to determine the smallest necessary sample
size required to achieve a certain power level, usually a power level of .8 or larger, thereby
allowing the researcher to avoid excessively large, uneconomical sample sizes. Therefore,
when applying for a research grant, the funding agency may ask to justify the number
of subjects to be enrolled for the study through a power analysis. At the analysis stage,
a post hoc assessment of the power achieved given the specific design scenario and the
parameter values obtained should aid the interpretation of the study results. Therefore,
in order to assure confidence in the study results (or conclusions), journal editors often
ask to report the power.
Power computation is straightforward if under certain regularity conditions the
theoretical distributions of the test statistic under the null and the alternative hypothesis
are known. This is not the case for the BLR test in LM models. The power of a statistical
test can be computed as the proportion of the p-values (stemming from multiple data-
sets that were simulated given the alternative hypothesis), for which the null hypothesis is
rejected. When using the BLR statistic to test for the number of states in LM models, such
a power calculation becomes computationally expensive, because it requires performing
the bootstrap p-value computation for multiple sets of data. As explained in detail below,
it requires generating M data sets from the model under the alternative hypothesis,
and for each data set, estimating the models under the null and alternative hypotheses
to obtain the LR value. Whether the null hypothesis will be rejected for a particular
generated data set is determined by computing the bootstrap p-value, which in turn
requires (a) generating B data sets from the model estimates under the null hypothesis
and (b) estimating the models under the null and alternative hypotheses using these
B data sets. Hereafter, we refer to this computationally demanding procedure, which
involves calculating the power as the proportion of the bootstrap p-value for which the
5.2. THE LM MODEL 101
model under the null hypothesis is rejected, as the PBP method.
Because using the PBP method is infeasible in most situations, we propose an
alternative method which we refer to as the short-cut method. Computing the power using
the short-cut method involves constructing the empirical distributions of the LR under
both the null and alternative hypotheses. We show how the “population” parameters of
the model under the null hypothesis can be obtained based on a certain large data set,
and these parameters will in turn be used in the process to obtain the distribution of the
LR statistic under the null hypothesis. As explained in detail below, the distribution of the
LR under the null hypothesis is used to obtain the critical value, given a predetermined
level of significance. Given this critical value, we compute the power by simulating the
distribution of the LR under the alternative hypothesis. Using numerical experiments, we
examine the data requirements (e.g., the sample size, the number of time points, and the
number of response variables) that yield reasonable levels of power for given population
characteristics.
The remaining part of the paper is organized as follows. First, we describe the LM
model and the BLR test for determining the number of states. Second, we provide power
computation methods for the BLR test and discuss how these methods can be applied
to determine the required sample size. Third, numerical experiments that illustrate the
proposed methods of power and sample size computation are presented. Finally, we
provide a concluding discussion on the main results of our study.
5.2 The LM model
Let Yt = (Yt1, Yt2, Yt3, ...YtP ) for t = 1, 2, 3, ..., T be the P -dimensional response variable
of interest at time point t. Denoting the latent variable at time point t by Xt, in a LM
model the relationships among the latent and observed response variables at the different
time points can be represented by using the following simple path diagram.
102 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
X1 X2 XT
Y1 Y2 YT
...
...
An LM model is a probabilistic model defining the relationships between the time-
specific latent variables Xt (e.g., between X1, X2, and X3) and the relationships between
the latent variables Xt and the time-specific vectors of observed responses Yt (e.g., X1
with Y1). In the basic LM model, the latent variables are assumed to follow a first-order
Markov process (i.e., the state membership at t+1 depends only on the state occupied
at time point t), and to the response variables are assumed to be locally independent
given the latent states. Based on these assumptions, we define the S-state LM model as
a mixture density of the form
p(yi,Φ) =
S∑x1=1
S∑x2=1
S∑x3=1
...
S∑xT=1
p(x1)
T∏t=2
p(xt|xt−1)
P∏j=1
p(ytji|xt),
where yi denotes the vector of responses for subject i over all the time points, ytji the
response of subject i to the j-th variable measured at time point t, xt a particular latent
state at time point t, and Φ the vector of model parameters (Bartolucci et al., 2013;
Vermunt et al., 1999).
The LM model has three sets of parameters:
1. The initial state probabilities (or proportions) p(X1 = s) = πs satisfying∑Ss=1 πs =
1. That is, the probability of being in state s at the first time point;
2. The transition probabilities p(Xt = s|Xt−1 = r) = πts|r satisfying∑Ss=1 π
ts|r = 1.
These transition probabilities indicate the probabilities of remaining in a state or
switching to another state, conditional on the state membership at the previous time
5.2. THE LM MODEL 103
point. All transition probabilities are conveniently collected in a transition matrix,
in which the entry in row r and column s represents the probability of a transition
from state r at time point (t− 1) to state s at time point t;
3. The state-specific parameters of the density function p(ytji|xt), which govern the
association between the latent states and the observed response variables. The
choice of the specific density form for p(ytji|xt), which depends on the scale type
of the response variable, determines the state-specific parameters for this density
function. With continuous responses, one may, for example, define the state-specific
density to be a normal distribution, for which the parameters are the mean µtj|s
and the variance σ2t
j|s. With dichotomous and nominal responses, the multinomial
distribution is assumed, for which the parameters become the conditional response
probabilities p(ytji|xt = s) = θtj|s. The state-specific parameters and the transition
probabilities may vary across time, hence the subscript t, but are assumed to be
time-homogeneous during the remainder of this chapter.
Given a sample of size n, the parameters are typically estimated by maximizing the
log-likelihood function:
l(Φ) =
n∑i=1
log p(yi,Φ). (5.1)
The search for the values of Φ that maximize the log-likelihood function in equation (5.1)
can be carried out with the Expectation-Maximization (EM) algorithm (Dempster, Laird,
& Rubin, 1977; McLachlan & Krishnan, 2007), which alternates between computing
the expected complete data log-likelihood function (E step) and updating the unknown
parameters of interest by maximizing this function (M step). For LM models, a special
version of the EM algorithm with a computationally more efficient implementation of the
E step may be used. This algorithm is referred to as the Baum-Welch or forward-backward
algorithm (Bartolucci et al., 2010; Baum, Petrie, Soules, & Weiss, 1970; Vermunt, Tran,
& Magidson, 2008).
As already discussed in the introduction section, identifying of the number of latent
104 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
states is a common goal in LM modeling, and typically the first step in the analysis. Testing
hypotheses about the number of states involves estimating LM models with increasing
numbers of states and checking whether the model fit is significantly improved by adding
one or more states. More formally, the hypotheses about the number of states may be
specified as H0 : S = r versus H1 : S = s, where r < s. Usually, the r-and s-state model
differ by one state. For example, the test for H1 : 3-state LM model against H0 : 2-state
LM model. However, in principle, the comparison can also be between the 3-state and
the 1-state LM model. In this paper, we restrict ourselves to the situation in which r =
s− 1.
The LR statistic for this type of test is defined as
LR = 2(l(Φs)− l(Φr)), (5.2)
where l(·) is the log-likelihood function and Φs and Φr are the maximum likelihood
estimates under the alternative and null hypothesis, respectively. In the standard case,
under certain regularity conditions, it is generally assumed that the LR statistic in equation
(5.2) follows a central chi-square under the null hypothesis and a non-central chi-square
distribution under the alternative hypothesis (Steiger, Shapiro, & Browne, 1985). In such
a case, one may use the (theoretical) chi-square distribution with the appropriate number
of degrees of freedom to compute the p-value of the LR test given a predetermined level
of significance α or the power of the LR test given the population characteristics of H1
model. These asymptotic distributions however do not apply when using the LR statistic
for testing the number of latent states (Aitkin, Anderson, & Hinde, 1981).
One may however apply the method of parametric bootstrapping to construct the
empirical distribution of the LR, and subsequently use the contructed empirical distribution
for p-value computation. Due to advances in computing facilities, this can be applied
readily. Using parametric bootstrapping, the empirical distribution of the LR statistic
under the null hypothesis is constructed by generating B independent (bootstrap) samples
according to a parametric (probability) model p(y, Φr), where Φr itself is an estimate
5.3. POWER ANALYSIS FOR THE BLR TEST 105
computed based on a sample of size n (Feng & McCulloch, 1996; McLachlan, 1987;
Nylund et al., 2007). Denoting the bootstrap samples by yb (for b = 1, 2, 3, ...B),
equation (5.2) becomes
BLRb = 2(l(Φbs)− l(Φbr)), (5.3)
where BLRb denotes the BLR, computed for (bootstrap) sample yb.
So, sampling B data sets from the r-state LM model defined by p(y, Φr) and
computing the BLR statistic as shown in equation (5.3) for each of these data sets, yields
the BLR distribution under the null hypothesis. This distribution is then employed in the
bootstrap p-value computation. In short, the bootstrap p-value computation proceeds as
follows:
Step 1. Treating the ML parameter estimates as if they were the ”true” parameter values
for the r-state LM model, generate B independent (bootstrap) samples from the r-state
LM model.
Step 2. Compute the BLRb values as shown in equation (5.3), which requires us to fit
the r- and s-state models using the bootstrap samples generated in Step 1.
Step 3. Compute the bootstrap p-value as p = 1B
∑Bb=1 I(BLRb > LR), where I(·) is
the indicator function which takes on the value 1 if the argument BLRb > LR holds and
0 otherwise. The decision concerning whether the r-state LM model should be retained
or rejected in favor of the s-state model is then determined by comparing this p-value
with the predetermined significance level α.
5.3 Power analysis for the BLR test
As mentioned, two common goals of power analysis are (a) to determine the post hoc
power of a study (i.e., given a certain samples size, number of time points, and number of
response variables) and (b) to a priori determine the sample size (or other design factors
like the number of time points or the number of response variables) required to achieve a
106 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
certain power level. In both cases, we assume that the population parameters are known
(in a priori analyses a range of expected parameter values may be used) and other factors
are fixed. In what follows, we first show how the bootstrapping procedure discussed
above can used for power computation, and subsequently present the computationally
more efficient short-cut method for power and sample size computation in LM models.
5.3.1 Power computation
In this sub-section, we present two alternative methods for computing the power of the
BLR test. The first option, the PBP method, involves computing the power as the
proportion of the bootstrap p-values (PBP) for which H0 is rejected. More specifically,
the PBP method for power computation involves the following steps:
Step 1. Generate M independent samples, each of size n, from the true H1 model.
Step 2. For each samples m in Step 1, compute the likelihood-ratio LRm as shown in
equation (5.2).
Step 3. Obtain the bootstrap p-value of each sample m as pm = 1B
∑Bb=1 I(BLRbm >
LRm), where LRm is the LR of sample m from the H1 population, BLRbm is the
corresponding BLR for bootstrap sample b, and I(·) is the indicator function as defined
above.
Step 4 The actual power associated with a sample of size n is computed as the proportion
of the H1 data sets in which H0 is rejected. That is,
PBP =1
M
M∑m=1
I(pm < α), (5.4)
where the indicator function I(·) and α are as defined above.
As mentioned above, such a method of power computation is computationally
expensive and requires considerable amount of computer memory. For example, setting
M = 500 and B = 99 requires us to store and analyze M(B + 1) = 50000 data sets.
Also, in order to achieve a good approximation to the sampling distribution, which, if not
5.3. POWER ANALYSIS FOR THE BLR TEST 107
well approximated, could affect the p-value (and subsequently the power), both M and
B should be large enough.
For LM models, for which model fitting requires iterative procedures, power computation
by using the PBP method is computationally too intensive in practice. We propose a
computationally more efficient method, which we call the shortcut method. It works
very much as the standard power computation (see for example, Brown et al. (1999),
Satorra and Saris (1985), and Self et al. (1992)), with the difference that we construct
the distributions under H0 and H1 by Monte Carlo simulation. As explained below, the
distribution under H0 is used to obtain the critical value (CV), and the distribution under
H1 is used to compute the power given the CV.
First, the H0 “population” parameters needed to compute the CV should be obtained.
This can be achieved by creating an exemplary data set, which is a data file with all possible
response patterns and the population proportion under H1 as weights (O’Brien, 1986;
Self et al., 1992). Because in LM models with more than a few indicators and/or time
points, the number of possible response pattern is very large, this method cannot always
be applied. Therefore, as an alternative, using the parameter values of the H1 model,
we generate a large data set (e.g., 100000 observations), which is assumed to represent
the hypothetical H1 population. Estimating the H0 model (i.e., the r-state LM model)
using this large data set yields the pseudo parameter values for the r-state model. These
H0 parameters are then employed to construct the distribution of the LR under the null
hypothesis. That is, given the estimated parameters of the H0 model, generate K data
sets (each of size n) and for each of these data sets, compute the LR as shown in equation
(5.2). Next, order the LR values in such a way that LR[1] ≤ LR[2] ≤ LR[3] ≤ ... ≤ LR[K].
Given the nominal level α, compute the CV as
CV(1−α) = {LRk : p(LR > LR[k]|H0) = α}. (5.5)
Similarly, the distribution of the LR under the alternative hypothesis is constructed
using M samples of the H1 model. That is, given the parameters of the H1 model,
108 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
we generate M independent samples from the s-state LM model and for each of these
samples, compute the LR as shown in equation (5.2). For sufficiently large M , this yields
the true underlying sample distribution of the LR statistic under the alternative hypothesis.
The power is then computed from this empirical distribution as the probability that the
LR value exceeds the CV. That is,
power = p(LR > CV(1−α)|H1) =
∑Mm=1 I(LRm > CV(1−α))
M, (5.6)
where I(·) is the indicator function, indicating whether the LR value (computed based on
the b sample of the H1 population) exceeds the CV1−α value.
So both, the PBP and the short-cut method require M samples given H1 and the
calculation of the LR for each of these samples (i.e., steps 1 and 2 of the PBP power
calculation). The saving in computation time of the short-cut method lies in the omission
of the full bootstrap for each of the M samples from the H1 model. Rather, the LRs given
H1 are now evaluated against the approximated distribution of LRs given H0. Therefore,
compared to the PBP-based power computation, the number of data sets to be stored and
retrieved is much smaller when using the short-cut method. For example, for M = 500
and K = 500, we analyze M +K = 1000 data sets.
The short-cut method of power computation presented above can easily be implemented
using statistical software for LM analysis as outlined below.
1. Obtain the H0 population parameters: Given the parameters of the H1 model,
generate a large data (e.g., 10000 observations) from the H1 population. For this
purpose, any software that allows generating a sample from a LM model with fixed
parameter values can be used. For the numerical studies shown below, we used the
syntax module of the Latent GOLD 5.0 program (Vermunt & Magidson, 2013a).
Using this large data set, then estimate the parameters of the H0 model.
2. Compute the CV: Given the estimated parameters of the H0 model, generate K
data sets (each of size n) and for each of these data sets, compute the LR as shown
5.3. POWER ANALYSIS FOR THE BLR TEST 109
in equation (5.2). Note that this requires estimating both the r- and the s-state
model. For a sufficiently large K, the LR distribution approximates the population
distribution of the LR under the null hypothesis. We use this distribution to compute
the CV of the LR test as shown in equation (5.5).
3. Compute the power: Given the parameters of the H1 model, obtain the empirical
distribution of the LR. That is, generate M data sets from H1 model, and, using
these data sets, compute the LR as shown in (5.2). Given the CV and the empirical
distribution of the LR under H1, compute the power as shown in equation (5.6).
5.3.2 Sample size computation
In this section, we show how the procedure described above for power computation using
the short-cut method can be applied for sample size determination. For sample size
determination, step 1 of the power computation procedure (discussed under software
implementations) remains the same. The last two steps are however repeated for different
trial sample sizes. More specifically, suppose the investigator wishes to achieve a certain
pre-specified power level (say, power = .8 or larger) while avoiding the sample size to
become unnecessarily large. Then, the LR power computation is performed as outlined
in step 2 and 3, starting with a certain sample size n1. Below we provide power curves
that can be used as a guidance to locate this starting sample size. If the power obtained
based on these n1 observations is lower than .8, repeat step 2 and 3 by choosing n2
larger than n1. If the chosen n1 result in larger power instead (and we want to optimize
the sample size), choose n2 smaller than n1 and repeat step 2 and 3. In this way, the
power computation procedure is repeated for different trial samples of varying sizes, and
from these trial samples, the one that best approximates the desired power level is used
as the sample size for the study concerned. In our numerical study, we repeat this power
computation procedure for different sample sizes, which resulted in a series of power
values. By plotting these power values against the corresponding sample size, we obtain a
power curve from which one can easily determine the minimum sample size that satisfies
110 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
the power requirements, for example that the power should be larger than .8.
When designing a longitudinal study, it is also of interest to determine the number
of time points required to achieve a certain power level. For a fixed sample size, a fixed
number of response variables, and a priori specified H1 parameter values, the procedures
discussed above for sample size determination can be applied to the number of time
points determination as well. More specifically, in step 2 and 3 of the power computation
procedures, the number of time points T should be varied instead of the sample size n.
5.4 Numerical study
A numerical study is conducted to (a) illustrate the proposed power and sample size
computation methods, and (b) investigate whether the short-cut method and the PBP
method give similar results. This numerical study has an additional benefit for applied
researchers using the LM model: given the population characteristics, the resulting BLR
power tables and the power curves shown below may help to make an informed decision
about the data requirements in testing the number of states for the LM model. More
specifically, the results of this numerical study may be used as a guidance by applied
researchers to locate the initial trial sample size when computing the required sample size
to achieve a desired power level, as discussed in section 5.3.2.
5.4.1 Numerical study set up
The power of the BLR test for the number of states in LM models depends on several
design factors and population characteristics. See, for example, Gudicha et al., (2015)
who studied factors affecting the power in LM models. The design factors include the
sample size, the number of time points, and the number of response variables. The
number of latent states, and the various model parameter values (i.e., parameter values
for the initial state proportions, for the state transition probabilities, and for the state
specific densities) define the population characteristics.
5.4. NUMERICAL STUDY 111
In this numerical study, we varied both the design factors and the population
characteristics. The design factors varied were the sample size (n = 300, 500, or 700),
the number of time points (T = 3 or 5), and the number of response variables (P = 6 or
10). The population characteristics under the alternative hypothesis (i.e, the s-state LM
model for S = 3, or 4) were specified to meet varying levels of a) initial state proportions
(balanced, moderately imbalanced, highly imbalanced), b) stability of state membership
(stable, moderately stable, unstable), and c) state-response associations (weak, moderate,
strong) as follows.
In line with Dias (2006), the initial state proportions were specified using πs =
δs−1∑Sh=1 δ
h−1 . We set the values of δ to 1, 2, and 3, which correspond to balanced, moderately
imbalanced, highly imbalanced initial state proportions, respectively. For the transition
matrix, we used the specification suggested by Bacci et al. (2014), which under the
assumption of time homogeneity gives πs|r = ρ|s−r|∑Sh=1 ρ
|h−r| . Setting the values of ρ to
ρ = 0.1, 0.15, and 0.3 yields what we referred to above as stable, moderately stable,
and unstable state membership. In this numerical study, we restricted ourselves to the
situation that the response variables of interest are binary and that the state specific
conditional response probabilities are time-homogeneous. We set θj|1 to .75, .8 and .85,
θj|S to 1-.75, 1-.8, and 1-85, and for S = 3, θj|2 to .58, .65, and .7 which yields the
structure shown in Table 5.1. For S = 4, we used the same setting of conditional response
probabilities as for S = 3, but now defined the conditional response probabilities of the
remaining state as high (=θj|1) for half of the response variables and low (=θj|S) for the
other half.
Table 5.1: Values of conditional response probabilities
state-responses S=3 S=4association levels s = 1 s = 2 s = 3 s = 1 s = 2 s = 3 s = 4
Weak .75 .58 .25 .75 .58 .75 or.25 .25Moderate .80 .65 .20 .80 .65 .80 or .20 .20
Strong .85 .70 .15 .85 .70 .85 or.15 .15
The design factors and population characteristics were fully crossed resulting in 3
112 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
(sample size) × 2 (number of time points) ×2 (number of response variables)× 2 (number
of states) ×3 (initial state proportions) × 3 (transition probability matrices) × 3 (state-
response variables association levels) = 572 simulation conditions. For each simulation
condition, a large data set (of 100000 observations) was generated according to the H1
model and the H0 parameters were estimated using this data set. Next, for each simulation
condition, K = 1000 samples were generated according to the H0 parameters and the CV
was computed, assuming α = .05. Given a specified sample size, number of time points,
and the parameter values under the alternative hypothesis, the power was then computed
based on M = 1000 samples generated according to the H1 model as discussed in section
5.3. To minimize the problem of local maxima, we run all models using multiple starting
values.
5.4.2 Results
The results obtained from the numerical study for power computation by the short-cut
method are shown in Tables 5.2 and 5.3. Table 5.2 presents the power values for various
combinations of data and population characteristics. As expected, the power of the BLR
test increases with sample size, the number of time points, and the number of response
variables. Also, the more uniform (or balanced) the initial state proportions the larger
the power. Keeping the other design factors constant, the power of the BLR test in
general increases with stronger measurement conditions (i.e., weak to moderate to strong
state-response variable associations) and with more stable state membership transition
probabilities. Comparing the results in Table 5.2 with those in Table 5.3, holding the
other factors constant, the power of the BLR test to reject H0 : S = 2 in favour of
H1 : S = 3 is in general larger than for H0 : S = 3 against H1 : S = 4.
In the weak measurement condition and/or the highly imbalanced initial state
proportion condition, the power of the BLR test is in general very low, indicating that
very large sample sizes may be required to achieve an acceptable power level in these
conditions. Although the quality of state-response association plays a dominant role, the
5.4. NUMERICAL STUDY 113
power computed for the weak measurement condition improved substantially by increasing
the number of response variables or time points. Also, situations in which the state
membership is unstable (e.g., ρ = 0.3 or larger) need special care, since the power is low
in such situations.
114 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
Tab
le5.
2:P
ower
ofth
eB
LR
test
forH
0:S
=2
vers
usH
1:S
=3
Sta
te-r
esp
on
ses
asso
ciat
ion
sW
eak
Mo
der
ate
Str
on
gsa
mp
leIn
dex
of
stat
eIn
dex
of
stat
eIn
dex
of
stat
esi
zetr
ansi
tio
ntr
ansi
tio
ntr
ansi
tio
nρ=
0.1
ρ=
0.15
ρ=
0.3
ρ=
0.1
ρ=
0.15
ρ=
0.3
ρ=
0.1
ρ=
0.15
ρ=
0.3
30
0.1
88
.14
5.1
04
.30
1.2
6.1
76
.56
8.4
94
.33
9δ=
1,P
=6,T
=3
50
0.3
98
.30
1.1
78
.58
1.5
34
.29
4.8
69
.80
9.6
31
70
0.6
42
.43
9.2
38
.84
2.7
04
.40
5.9
78
.95
7.7
96
30
0.6
98
.55
9.2
28
.84
9.7
27
.39
4.9
72
.92
7.6
87
δ=
1,P
=6,T
=5
50
0.9
55
.86
8.4
16
.99
0.9
59
.72
61
.99
9.9
42
70
0.9
95
.97
.65
41
.99
9.8
87
11
.99
2
30
0.7
91
.64
6.4
02
.87
2.7
86
.55
1.9
87
.95
2.8
85
δ=
1,P
=10,T
=3
50
0.9
73
.94
1.7
02
.99
3.9
76
.86
61
1.9
93
70
01
.99
4.8
94
11
.97
41
11
30
0.1
47
.13
0.0
80
.24
7.1
97
.13
5.3
46
.30
8.2
49
δ=
2,P
=6,T
=3
50
0.2
9.2
10
.12
7.3
57
.35
1.2
44
.63
7.5
59
.45
77
00
.44
5.3
67
.19
3.5
94
.51
7.3
37
.80
1.7
63
.57
4
30
0.1
14
.07
5.0
73
.13
8.0
99
.09
0.1
71
.14
7.1
55
δ=
3,P
=6,T
=3
50
0.1
46
.11
2.1
04
.19
6.1
73
.13
1.3
07
.28
1.2
20
70
0.2
31
.18
6.1
24
.30
6.2
45
.19
5.5
15
.45
6.3
78
No
te.n
=sa
mp
lesi
ze,T
=n
um
ber
of
tim
ep
oin
ts,P
=n
um
ber
of
resp
on
seva
riab
les,δ=
init
ial
stat
epr
op
orti
on
ind
ex,
andρ
=st
ate
tran
siti
on
pro
bab
ility
ind
ex.
5.4. NUMERICAL STUDY 115T
able
5.3:
Th
ep
ower
ofth
eB
LR
test
for
test
ingH
0:S
=3
vers
usH
1:S
=4.
Sta
te-r
esp
on
ses
asso
ciat
ion
sW
eak
Mo
der
ate
Str
on
gn
um
ber
of
sam
ple
Ind
exo
fst
ate
Ind
exo
fst
ate
Ind
exo
fst
ate
tim
esi
zetr
ansi
tio
ntr
ansi
tio
ntr
ansi
tio
np
oin
tsρ=
0.1
ρ=
0.15
ρ=
0.3
ρ=
0.1
ρ=
0.15
ρ=
0.3
ρ=
0.1
ρ=
0.15
ρ=
0.3
30
0.1
21
.09
9.0
74
.17
0.1
20
.09
3.3
77
.30
07
.19
5T
=3
50
0.1
99
.15
8.1
22
.27
2.2
30
.17
1.6
43
.53
9.3
41
70
0.2
73
.21
8.1
51
.46
4.3
87
.23
3.8
11
.71
7.5
16
30
0.3
87
.23
7.1
47
.53
4.4
83
.21
2.8
72
.73
7.4
01
T=
55
00
.73
8.5
51
.21
4.8
82
.80
2.3
61
.99
4.9
62
.70
67
00
.91
9.7
36
.35
6.9
85
.91
8.5
72
11
.88
6
No
te.T
hes
ep
ower
valu
esar
ere
por
ted
for
the
sim
ula
tio
nco
nd
itio
nP
=6
andδ=
1.
116 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
Figures 5.1 and 5.2 present a power curve (as a function of sample size) for different
settings of the parameter values of the 3-state LM population model with equal initial
state proportions, 6 response variables, and 3 time points. Figure 5.1 shows that when the
state-response associations are weak, to achieve a power of .8 or larger, we may require a
sample of 1000 or more when state membership is stable, and a sample of 2000 or more
when state membership is unstable. We can also see from the same figure that when the
state-response associations are rather strong, the required sample sizes may drop to less
than 500 and 700, respectively for stable and unstable state membership conditions. As
can be seen from Figure 5.2, to achieve a power level of .8 when the state memberships
are moderately stable, sample sizes of at least 1200, 850, and 500, may be required in the
weak, medium, and strong measurement condition, respectively. For the situation when
the state memberships are unstable, such a power level is achieved by using a sample of
2000, 1300, and 700, respectively for weak, medium, and strong measurement conditions.
0.2
0.4
0.6
0.8
1.0
Weak state−indicators association
Sample Size
pow
er
200 300 500 700 1000 1500 2000
State transition
stablemoderately stableunstable
0.2
0.4
0.6
0.8
1.0
Moderate state−indicators association
Sample Size
pow
er
200 300 500 700 1000 1500 2000
0.2
0.4
0.6
0.8
1.0
Strong state−indicators association
Sample Size
pow
er
200 300 500 700 1000 1500 2000
Figure 5.1: Power by sample size for a 3-state LM population model with varying levelsof the measurement parameters, equal initial state proportions, 6 response variables, and3 time points
Table 5.4 shows a comparison of the short-cut method of BLR power computation
with the PBP method. As shown, the power values of the two methods are in general
comparable. Although the power values obtained by the short-cut method seem to be
slightly larger for some of the simulation conditions, overall differences do not lead to
5.5. DISCUSSION AND CONCLUSIONS 117
0.2
0.4
0.6
0.8
1.0
stable transitions
Sample Size
pow
er
200 300 500 700 1000 1500 2000
Measurement
WeakModerateStrong
0.2
0.4
0.6
0.8
1.0
moderatly stable transitions
Sample Size
pow
er
200 300 500 700 1000 1500 2000
0.2
0.4
0.6
0.8
1.0
unstable transitions
Sample Size
pow
er
200 300 500 700 1000 1500 2000
Figure 5.2: Power by sample size for a 3-state LM population model with varying levelsof the transition parameters, equal initial state proportions, 6 response variables, and 3time points
different conclusions regarding the hypotheses about the number of states.
Table 5.4: Power of the BLR test according to the short-cut and the PBP method forseveral 3-state LM population models
State-responses associationsWeak Strong
Index of state Index of statetransition transition
ρ = 0.1 ρ = 0.15 ρ = 0.3 ρ = 0.1 ρ = 0.15 ρ = 0.3
n = 300 PBP .180 .148 .116 .550 .496 .320short-cut .188 .145 .104 .568 .494 .339
n = 500 PBP .394 .280 .150 .858 .804 .610short-cut .398 .301 .178 .869 .809 .631
n = 700 PBP .592 .442 .224 .968 .960 .800short-cut .642 .439 .238 .978 .957 .796
Note: The values reported in this table are for the design condition δ = 1, P = 6, T = 3.
5.5 Discussion and conclusions
The current study addressed methods of power analysis for the BLR when testing
hypotheses on the number of states in LM models. Two alternative methods of power
118 CHAPTER 5. POWER FOR THE BLR TEST IN LM MODELS
computation were discussed: the proportion of significant bootstrap p-values (PBP) and
the short-cut method. Using the PBP method, power is computed by first generating
a number of independent data sets under the alternative hypothesis, and then, for each
of these data sets, computing the p-value by applying a parametric bootstrap procedure
(McLachlan, 1987). The PBP method is computationally very demanding as it requires
performing the full bootstrap for each of M samples from the H1 model. We proposed
solving this computation time problem using the short-cut method. The short-cut method
works very much as a standard power computation, with the difference that instead of
relying on the theoretical distributions (a central chi-square under the null hypothesis and
a non-central chi-square under the alternative hypothesis), the distributions under H0 and
H1 are constructed by Monte Carlo simulation.
A numerical study was conducted to (a) illustrate the proposed power analysis methods
and (b) compare the power obtained by the short-cut and the PBP methods. As expected,
the power of the BLR test in the LM models increased with sample size. Likewise, power
increased with more time points and more response variables. In addition to these design
factors, the power of the BLR test was shown to depend on the following population
characteristics: the initial state proportions, the state transition probabilities, and the
state-response associations. Holding the other design factors constant, power was larger
with more balanced initial state proportions, more stable state memberships, and stronger
state-response associations. Contrary to this, when initial state proportions are highly
imbalanced, state membership is unstable, and the state-response association is weak,
the power of the BLR test is low.
For the simulation conditions that we have considered in this study, the sample size
required to achieve a power level of .8 or larger ranged from a few hundred to thousands
of cases. Also, the required sample size depended on other design factors and population
characteristics, which are highly interdependent. In general, the more time points, the
more response variables, the more balanced the initial state proportions, the more stable
the state memberships, and the stronger the state-response associations, the smaller the
5.5. DISCUSSION AND CONCLUSIONS 119
sample size needed to achieve a certain power level. Because of mutual dependencies
among the LM model parameters, and since the required sample size is also influenced by
the number of time points, response variables, and state-indicator variable associations,
a sample size of 300 or 500 will often not suffice in LM analysis. Therefore, we strongly
suggest applied researchers to perform a power analysis for his/her specific research
situation instead of relying on certain rules of thumb about the sample size. The same
applies to questions about the minimum number of time points and/or response variables.
Limitations to the current numerical experiments need to be acknowledged. Firstly, in
the current study, we assumed time homogeneity for both state transition and conditional
response probabilities. Future research should assess the power of the BLR test if this
assumption is relaxed. Secondly, the conditional response probabilities of the binary
response variables were set to equal values, and for simplicity, we considered a specific
structure of the transition matrix: πs|r = ρ|s−r|∑Sh=1 ρ
h−r . However, in practice the conditional
response probabilities may differ across response variables, the response variables may be
nominal with more than two categories, continuous or of mixed type, and the structure
of the transition matrix can be completely unconstrained, or, for example, symmetric
or triangular (Bartolucci, 2006). Thus, more intensive simulations that address these
different scenarios in the H1 population may be needed to establish more knowledge and
guidelines about the power and sample size requirements of the BLR test for the number
of states in LM models.
CHAPTER 6
Summary and discussions
6.1 Summary
This dissertation aimed to study power analysis methods for latent class and latent Markov
models. The most important requirement when setting up a study using such a model
is that it should be possible to detect the relevant classes (or states). Other, more
specific, requirements concern particular model parameters: the measurement parameters,
which specify the associations between the latent classes and the indicator variables,
the transition parameters, which describe transitions between states across successive
measurement occasions, and for models with covariates, the structural parameters, which
describe relationships between classes and explanatory variables. For these four sets of
parameters, we identified the relevant null hypotheses, studied the requirements of the
study design to achieve enough power for the relevant statistical tests, and presented tools
which applied researchers may use for power and sample size computation.
More specifically, in Chapter 2 we studied power analysis for the Wald test for the
121
122 CHAPTER 6. SUMMARY AND DISCUSSIONS
measurement parameters in latent class models. The objectives of this chapter were
twofold: one was presenting a method for power and sample size computation and the
other was identifying the design factors affecting the power of these Wald tests. We
presented a simple procedure for power or sample size computation for the Wald test,
which makes use of the asymptotic distribution of this statistic under the alternative
hypothesis. In order to compute the power or the sample size, the proposed power analysis
method requires obtaining the expected information matrix for the model parameters,
which can, among others, be computed by creating an ”exemplary” data set; that is, a
data set which contains all possible response patterns with weights equal to the population
proportions according to the model under the alternative hypothesis. Using this exemplary
data set, one can obtain the expected information matrix with standard software for latent
class analysis. The expected information matrix is subsequently used to obtain the non-
centrality parameter.
The power of the Wald test in latent class models is shown to depend on the effect
size, the sample size, the level of significance, the number of classes, the class proportions,
and the number of indicator variables. The first three factors may be considered as the
standard factors for statistical power analysis (Cohen, 1988), whereas the others are
specific to latent class models. Analytic derivations that address how these latent class
specific design factors affect the separation between classes, which is one of the key
elements of the study design in latent class modeling, were provided. Based on these
derivations, we discussed how the information matrix (and the power of the Wald test,
which indirectly involves this information matrix) is affected by the fact that latent class
membership is not observable.
Effect size, which in latent class models refers to the differences in responses between
classes, plays a double role in power analysis for tests concerning the measurement
parameters. As is always the case for standard statistical models (e.g., ANOVA, logistic
regression analysis), larger effects require a smaller sample to be detected with a power
of say .8 or larger. However, effect size also affects the separation between the classes,
6.1. SUMMARY 123
and thus the certainty about respondents’ class memberships. That is, the larger the
effect sizes, the smaller the loss of power resulting from the fact that we are uncertain
about the subjects’ class memberships. Other factors affecting the class separation are
the number of classes, the class proportions, and the number of response variables. The
larger the number of classes, the more unequal the class proportions, and the smaller the
number of response variables, the more uncertain we are about the respondents’ class
memberships, and thus the lower the power. These results further support the idea of
Moerbeek (2014), who suggested for discrete-time survival analysis mixture models that
lower class separation requires larger sample size.
In Chapter 3 we studied the statistical power of the likelihood-ratio and Wald tests
for testing the structural parameters in latent class analysis. Asymptotic distributions,
a central chi-square under the null and a non-central chi-square under the alternative
hypotheses, were assumed for both the tests. When using these asymptotic distributions
of the tests for power or sample size computation, the most difficult problem is estimating
the non-centrality parameters. For the likelihood-ratio test, the non-centrality parameter
is shown to be a function of the log-likelihood differences between the models under the
alternative and null hypotheses. For the Wald test, it is a function of the logit parameters
for covariate effects on latent classes and the expected information matrix.
We proposed estimating the non-centrality parameter by simulating a large data set
from the population under the alternative hypothesis. When using the likelihood-ratio
test, this amounts to fitting the models under the null and alternative hypotheses to a
large simulated data set obtained under the alternative hypothesis. When using the Wald
test, the large simulated data set was used to estimate the expected information matrix
(or the variance-covariances) for the parameters under the alternative hypotheses. As an
alternative to the large simulated data set method, the exemplary data method that we
discussed in Chapter 2 could also be used. However, when the covariates are continuous
instead of categorical, or when the number of indicator variables involved is large, the
exemplary data method is generally impractical.
124 CHAPTER 6. SUMMARY AND DISCUSSIONS
A numerical study was conducted to illustrate the proposed power analysis methods,
as well as to compare the power of the two types of tests. The results of this numerical
study indicated that, for a given effect size of a covariate on the latent classes, a desired
level of power can not only be obtained by manipulating the sample size, but also by
varying the number or the quality of the indicator variables. The implication of this is
that the statistical power for tests concerning the structural parameters depends on the
population characteristic for the measurement parameters as well. Based on the reported
results of the numerical study, we also concluded that the likelihood-ratio test is slightly
larger than the Wald test, supporting the results of previous work on power comparison
between the Wald and likelihood-ratio tests (Williamson et al., 2007).
In Chapter 4 we presented power analysis methods for testing hypotheses about the
transition parameters in latent Markov models. We distinguished power computation for
the standard case and power computation for the non-standard case, where the latter
arises when probabilities are fixed to zero. For the former case, we presented a power
computation method that relies on the theoretical distribution of the likelihood-ratio
statistic; i.e., a central chi-square under the null and non-central under the alternative
hypothesis. A problem arising when using these theoretical distributions is that the
non-centrality parameter is generally unknown, which makes it difficult to use the non-
central chi-square distribution in the process of computing the power. We proposed
two alternative solutions for this problem. One is estimating the power by simulating
the distribution of the likelihood-ratio under the alternative hypothesis. The other is
estimating the non-centrality parameter using the exemplary data set method that we
also described in Chapter 2. When the number of measurement occasions or the number
of response variables is large, the number of response patterns quickly becomes very large.
In such a case, the exemplary data method becomes impractical. We proposed resolving
this problem by using a large simulated data set as was also discussed in Chapter 3.
For the tests considered in the non-standard case, the distribution of the likelihood-
ratio is neither chi-square under the null hypothesis nor non-central chi-square under
6.1. SUMMARY 125
the alternative hypothesis. Therefore, for the non-standard case, we discussed power
computation by Monte Carlo simulation. It requires setting up two Monte Carlo
simulations: one yielding the distribution of the likelihood-ratio statistic under the null
hypothesis and the other yielding its distribution under the alternative hypothesis.
Design factors studied in Chapter 2 for testing the measurement in latent class
models were extended in Chapter 4 for testing transition parameters in latent Markov
models. Latent class and latent Markov models share the measurement parameters, and
thus factors affecting the uncertainty about the individuals’ class memberships play an
important role in power analysis for tests in latent Markov models as well. Additionally,
specific to latent Markov models are the number of measurement occasions and the size of
the transition probabilities. The results of the numerical experiment indicated that when
the transition probabilities are large, either a much larger sample size or larger number of
measurement occasions is required to achieve an acceptable level of power.
In Chapter 5 we studied power analysis for the bootstrap likelihood-ratio test for the
number of states in latent Markov models. The power of the bootstrap likelihood-ratio
test may be computed as the proportion of the bootstrap p-values (PBP) for which the null
hypothesis is rejected. Such a method of power computation is however computationally
very demanding as it requires performing the bootstrap p-value computation for multiple
data sets simulated according to the model under the alternative hypothesis. For example,
if we use 500 Monte Carlo samples and 500 bootstrap replications per Monte Carlo sample,
we need to estimate both the null and alternative model 250000 times. It will be clear
that such an approach is infeasible, especially for more complex models.
We proposed a computationally more efficient method, which we referred to as the
short-cut method. It works very much as standard power computation (see for example
Satorra and Saris (1985)), with the difference that the distributions under the null and
alternative hypotheses are constructed by simulation. Based on the presented numerical
studies, we concluded that the short-cut method is generally superior to the PBP method.
The additional advantage of the short-cut method is that a) it is computationally cheaper
126 CHAPTER 6. SUMMARY AND DISCUSSIONS
and b) it can easily be applied to determine the necessary sample size or number of
measurement occasions in a study design.
6.2 Direction for future research and study limitations
Whereas this dissertation focused on power analysis methods for tests in latent class and
latent Markov models, the same methods may be applied with other types of mixture
models, such as mixtures of normal distributions, mixture growth models, and multilevel
latent class models. Generally, it can be expected that the same design factors will affect
the statistical power of the tests in those models, though the distribution of response
variables within the classes and other population characteristics may differ from the
mixture models studied in this thesis. For example, in mixture models for continuous
responses, the class specific densities will typically be assumed to be normally distributed
within classes, and in multilevel latent class models, also the higher-level model parameters
need to be specified.
Specific aspects that requires further research when extending the proposed power
analysis methods to other mixture modeling techniques are the following: a) The
population characteristics we varied were the conditional response probabilities, class
proportions, initial probabilities, and transition probabilities. Different population
characteristics may be relevant for other mixture models. For example, class-specific
means and (co)variances in finite mixture models for continuous responses (McLachlan &
Peel, 2000), class-specific growth trajectories and variance components in growth mixture
models (Tofighi & Enders, 2008; B. Muthen & Muthen, 2000), and both higher- and
lower-level class distributions in multilevel mixture models (Vermunt, 2003). b) We
proposed approximating the non-centrality parameters of the test under the alternative
hypotheses by using either the exemplary data or the simulated large data set method.
Whether these methods can be applied with other types of mixture models requires further
research. For example, it seems that the exemplary data method is problematic with
continuous responses or with multilevel data since it is not possible to list all possible
6.3. CONCLUSION 127
response patterns. However, using a large simulated data set to obtain the non-centrality
parameter may still work.
The main limitations of the presented work concern the reported numerical experiments,
which could be expanded in future research. Firstly, the numerical experiments were
limited to binary response variables. However, in practice the response variables used in
latent class and latent Markov model are often nominal or ordinal variables with three or
more response alternatives. Secondly, the number of parameters defining a latent class or
latent Markov models can become rather large, especially with large numbers of classes or
response variables. We decided to simplify the numerical experiments by assuming that
conditional response probabilities were the same for each response variable (e.g., high in
one class and low in others for all the indicator variables), whereas in practice they may
take on different values. A similar thing applies for the transition parameters, which were
assumed to be constant over time, whereas in practice these may be time varying.
6.3 Conclusion
Mixture models, which include techniques such as latent class, latent Markov, mixture
growth, and multilevel mixture models, are used in many research areas. These models
are not only used in fundamental research, but also in applied research for both profit
and non-profit sectors. Whereas power analysis methods have been developed for many
statistical techniques including logistic regression models (Demidenko, 2007; Whittemore,
1981), log-linear models (O’Brien, 1986; Shieh, 2000), linear multivariate models (Muller,
Lavange, Ramey, & Ramey, 1992), and structural equation models (Satorra & Saris, 1985;
R. MacCallum et al., 2010), these were lacking for mixture models. Moreover, previous
studies in mixture models did not address the requirements of the study design to achieve
enough power for the relevant statistical tests. Given the popularity of these models, we
argued that methods for performing power analysis in mixture models were needed. This
dissertation presented power analysis methods for tests in mixture models, with emphasis
on latent class and latent Markov models.
128 CHAPTER 6. SUMMARY AND DISCUSSIONS
We discussed power analysis methods for different types of parameters in latent class
and latent Markov models, considering different specifications for the null hypothesis. For
some of these, the asymptotic distribution of the test statistic holds, while it does not
for others. For the situations in which the asymptotic distributions hold, we discussed
the estimation of the non-centrality parameter using the exemplary and large data set
methods. For non-standard testing situations, we presented computationally efficient
Monte Carlo simulation based power computation methods. Tests for the number of
classes may also be classified under this non-standard testing, for which we discussed
power computation by the short-cut and PBP method.
This dissertation contributes to the field of mixture modeling in various ways. Firstly,
the development of power computation methods for mixture models will contribute to
the validity of the results of applied research on which policy and business decisions are
based. Using the proposed power analysis methods enables assessment as to whether the
empirical studies are performed with an appropriate level of statistical power. Secondly,
the various numerical experiments conducted to illustrate the proposed power analysis
methods contribute to the understanding of the research design requirements to achieve
a certain (acceptable) level of power. Thirdly, we provide important tools to make sure
that resources for research are used as efficient as possible.
References
Agresti, A. (2007). An introduction to categorical data analysis. New Jersey: John Wiley
& Sons.
Aitkin, M., Anderson, D., & Hinde, J. (1981). Statistical modelling of data on teaching
styles. Journal of the Royal Statistical Society. Series A (General), 144(4), 419–461.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions
on Automatic Control , 19(6), 716–723.
Bacci, S., Pandolfi, S., & Pennoni, F. (2014). A comparison of some criteria for states
selection in the latent markov model for longitudinal data. Advances in Data Analysis
and Classification, 8(2), 125–145.
Bakk, Z., Tekle, F. B., & Vermunt, J. K. (2013). Estimating the association between latent
class membership and external variables using bias-adjusted three-step approaches.
Sociological Methodology , 43(1), 272–311.
Bandeen-Roche, K., Miglioretti, D. L., Zeger, S. L., & Rathouz, P. J. (1997).
Latent variable regression for multiple discrete outcomes. Journal of the American
Statistical Association, 92(440), 1375–1386.
Bartolucci, F. (2006). Likelihood inference for a class of latent Markov models under
129
130 References
linear hypotheses on the transition probabilities. Journal of the Royal Statistical
Society: Series B (Statistical Methodology), 68(2), 155–178.
Bartolucci, F., & Farcomeni, A. (2009). A multivariate extension of the dynamic logit
model for longitudinal data based on a latent markov heterogeneity structure.
Journal of the American Statistical Association, 104(486), 816–831.
Bartolucci, F., Farcomeni, A., & Pennoni, F. (2010). An overview of latent Markov
models for longitudinal categorical data. arXiv preprint arXiv:1003.2804 .
Bartolucci, F., Farcomeni, A., & Pennoni, F. (2013). Latent markov models for
longitudinal data. Boca Raton: Chapman and Hall/CRC press.
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique
occurring in the statistical analysis of probabilistic functions of Markov chains. The
Annals of Mathematical Statistics, 41(1), 164–171.
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The
general theory and its analytical extensions. Psychometrika, 52(3), 345–370.
Bozdogan, H. (1994). Mixture-model cluster analysis using model selection criteria and
a new informational measure of complexity. In H. Bozdogan, (eds.), Proceedings
of the First US/Japan Conference on the Frontiers of Statistical Modeling: An
informational approach (Vol. 2, pp. 69–113). Boston, MA: Kluwer Academic
Publishers.
Brown, B. W., Lovato, J., & Russell, K. (1999). Asymptotic power calculations:
description, examples, computer code. Statistics in Medicine, 18(22), 3137–3151.
Buse, A. (1982). The likelihood ratio, Wald, and Lagrange multiplier tests: An expository
note. The American Statistician, 36(3), 153–157.
Chung, H., Park, Y., & Lanza, S. T. (2005). Latent transition analysis with covariates:
pubertal timing and substance use behaviours in adolescent females. Statistics in
Medicine, 24(18), 2895–2910.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ:
Erlbaum.
References 131
Collins, L. M., & Lanza, S. T. (2010). Latent class and latent transition analysis: With
applications in the social, behavioral, and health sciences. New Jersey: John Wiley
& Sons.
Collins, L. M., & Wugalter, S. E. (1992). Latent class models for stage-sequential dynamic
latent variables. Multivariate Behavioral Research, 27(1), 131–157.
Dayton, C. M., & Macready, G. B. (1976). A probabilistic model for validation of
behavioral hierarchies. Psychometrika, 41(2), 189–204.
Dayton, C. M., & Macready, G. B. (1988). Concomitant-variable latent-class models.
Journal of the American Statistical Association, 83(401), 173–178.
Demidenko, E. (2007). Sample size determination for logistic regression revisited.
Statistics in Medicine, 26(18), 3385–3397.
Demidenko, E. (2008). Sample size and optimal design for logistic regression with binary
interaction. Statistics in Medicine, 27(1), 36–46.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from
incomplete data via the EM algorithm. Journal of the Royal Statistical Society.
Series B (Statistical Methodology), 39(1), 1–38.
Dias, J. (2006). Latent class analysis and model selection. In M. Spiliopoulou, R. Kruse,
C. Borgelt, A. Nurnberger, & W. Gaul (eds.), From Data and Information Analysis
to Knowledge Engineering (pp. 95–102). Berlin: Springer-Verlag.
Dias, J., & Goncalves, M. (2004). Finite mixture models: Review, applications,
and computer-intensive methods. Doctoral Dissertation. Research School Systems,
Organisation and Management, Groningen of University, The Netherlands.
Dziak, J. J., Lanza, S. T., & Tan, X. (2014). Effect size, statistical power, and sample
size requirements for the bootstrap likelihood ratio test in latent class analysis.
Structural Equation Modeling: A Multidisciplinary Journal , 21(4), 534–552.
Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses
using G* power 3.1: Tests for correlation and regression analyses. Behavior Research
Methods, 41(4), 1149–1160.
132 References
Feng, Z. D., & McCulloch, C. E. (1996). Using bootstrap likelihood ratios in finite mixture
models. Journal of the Royal Statistical Society. Series B (Statistical Methodology),
58(3), 609–617.
Fonseca, J. R., & Cardoso, M. G. (2007). Mixture-model cluster analysis using information
theoretical criteria. Intelligent Data Analysis, 11(2), 155–173.
Forcina, A. (2008). Identifiability of extended latent class models with individual
covariates. Computational Statistics & Data Analysis, 52(12), 5263–5268.
Formann, A. K. (1982). Linear logistic latent class analysis. Biometrical Journal , 24(2),
171–190.
Formann, A. K. (1992). Linear logistic latent class analysis for polytomous data. Journal
of the American Statistical Association, 87(418), 476–486.
Giudici, P., Ryden, T., & Vandekerkhove, P. (2000). Likelihood-ratio tests for hidden
Markov models. Biometrics, 56(3), 742–747.
Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and
unidentifiable models. Biometrika, 61(2), 215–231.
Gudicha, D. W., Schmittmann, V. D., & Vermunt, J. K. (2015). Power
computation for likelihood ratio tests for the transition parameters in latent
Markov models. Structural Equation Modeling: A Multidisciplinary Journal, DOI:
10.1080/10705511.2015.1014040 .
Gudicha, D. W., Tekle, F. B., & Vermunt, J. K. (in press). Power and sample size
computation for Wald tests in latent class models. Journal of Classification.
Gudicha, D. W., & Vermunt, J. K. (2013). Mixture model clustering with covariates using
adjusted three-step approaches. In B. Lausen, D. van den Poel, & A. Ultsch (eds.),
Algorithms from and for Nature and Life; Studies in Classification, Data Analysis,and
Knowledge Organization (pp. 87–93). Heidelberg, Germany: Springer-Verlag.
Hagenaars, J. A. (1988). Latent structure models with direct effects between indicators
local dependence models. Sociological Methods & Research, 16(3), 379–405.
Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied latent class analysis. New York:
References 133
Cambridge University Press.
Hirtenlehner, H., Starzer, B., & Weber, C. (2012). A differential phenomenology of
stalking using latent class analysis to identify different types of stalking victimization.
International Review of Victimology , 18(3), 207–227.
Holt, J. A., & Macready, G. B. (1989). A simulation study of the difference chi-square
statistic for comparing latent class models under violation of regularity conditions.
Applied Psychological Measurement, 13(3), 221–231.
Hsieh, F. Y., Bloch, D. A., & Larsen, M. D. (1998). A simple method of sample size
calculation for linear and logistic regression. Statistics in Medicine, 17(14), 1623–
1634.
Jackson, K. M., & Schulenberg, J. E. (2013). Alcohol use during the transition from
middle school to high school: national panel data on prevalence and moderators.
Developmental Psychology , 49(11), 2147–2158.
Keel, P. K., Fichter, M., Quadflieg, N., Bulik, C. M., Baxter, M. G., Thornton, L., . . .
others (2004). Application of a latent class analysis to empirically define eating
disorder phenotypes. Archives of General Psychiatry , 61(2), 192–200.
Langeheine, R., & Van de Pol, F. (1993). Multiple indicator Markov models. In R. Steyer,
K. F. Wender, & K. F. Widaman (eds.), Proceedings of the 7th European Meeting
of the Psychometric Society in Trier (pp. 248–252). Stuttgart: Fischer.
Lanza, S. T., & Collins, L. M. (2008). A new SAS procedure for latent transition analysis:
transitions in dating and sexual risk behavior. Developmental Psychology , 44(2),
446–456.
Lanza, S. T., Collins, L. M., Lemmon, D. R., & Schafer, J. L. (2007). PROC LCA:
A SAS procedure for latent class analysis. Structural Equation Modeling: A
Multidisciplinary Journal , 14(4), 671–694.
Lazarsfeld, P. (1950). The logical and mathematical foundation of latent structure
analysis and the interpretation and mathematical foundation of latent structure
analysis. In S.A. Stouffer et al (eds.), Measurement and prediction (Vol. 4, pp.
134 References
362–472). Princeton, NJ: Princeton University Press.
Leisch, F. (2004). Flexmix: A general framework for finite mixture models and latent
glass regression in R. Journal of Statistical Software, 11(8), 1–18.
Linzer, D. A., & Lewis, J. B. (2011). poLCA: An R package for polytomous variable
latent class analysis. Journal of Statistical Software, 42(10), 1–29.
Lukociene, O., Varriale, R., & Vermunt, J. K. (2010). The simultaneous decision(s)
about the number of lower-and higher-level classes in multilevel latent class analysis.
Sociological Methodology , 40(1), 247–283.
MacCallum, R., Lee, T., & Browne, M. W. (2010). The issue of isopower in power
analysis for tests of structural equation models. Structural Equation Modeling: A
Multidisciplinary Journal , 17(1), 23–41.
MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested
covariance structure models: Power analysis and null hypotheses. Psychological
Methods, 11(1), 19–35.
Magidson, J., & Vermunt, J. K. (2004). Latent class models. In D. Kaplan (eds.), The
Sage Handbook of Quantitative Methodology for the Social Sciences (pp. 175–198).
Thousand Oakes: Sage Publications.
Mann, H. B., & Wald, A. (1943). On stochastic limit and order relationships. The Annals
of Mathematical Statistics, 14(3), 217–226.
Marsh, H. W., Hau, K.-T., Balla, J. R., & Grayson, D. (1998). Is more ever too much?
the number of indicators per factor in confirmatory factor analysis. Multivariate
Behavioral Research, 33(2), 181–220.
Martin, R. A., Velicer, W. F., & Fava, J. L. (1996). Latent transition analysis to the
stages of change for smoking cessation. Addictive Behaviors, 21(1), 67–80.
McCutcheon, A. L. (1987). Latent class analysis. Sage University Papers Series:
Quantitative Applications in the Social Sciences Number 07–064. Newbury Park,
CA: Sage publishers.
McCutcheon, A. L. (2002). Basic concepts and procedures in single-and multiple-group
References 135
latent class analysis. In J. A. Hagenaars & A. L. Mccutcheon (eds.), Applied Latent
Class Analysis (pp. 56–85). Cambridge, UK: Cambridge University Press.
McDonald, R. P., & Marsh, H. W. (1990). Choosing a multivariate model: Noncentrality
and goodness of fit. Psychological Bulletin, 107(2), 247–255.
McHugh, R. B. (1956). Efficient estimation and local identification in latent class analysis.
Psychometrika, 21(4), 331–347.
McLachlan, G. (1987). On bootstrapping the likelihood ratio test stastistic for the number
of components in a normal mixture. Applied Statistics, 36(3), 318–324.
McLachlan, G., & Krishnan, T. (2007). The EM algorithm and extensions. New Jersey:
John Wiley & Sons.
McLachlan, G., & Peel, D. (2000). Finite mixture models. New York: John Wiley &
Sons.
Moerbeek, M. (2014). Sufficient sample sizes for discrete-time survival analysis mixture
models. Structural Equation Modeling: A Multidisciplinary Journal , 21(1), 63–67.
Mooijaart, A., & Van der Heijden, P. G. (1992). The EM algorithm for latent class
analysis with equality constraints. Psychometrika, 57(2), 261–269.
Muller, K. E., Lavange, L. M., Ramey, S. L., & Ramey, C. T. (1992). Power calculations for
general linear multivariate models including repeated measures applications. Journal
of the American Statistical Association, 87(420), 1209–1226.
Muthen, B., & Muthen, L. (2000). Integrating person-centered and variable-centered
analyses: Growth mixture modeling with latent trajectory classes. Alcoholism:
Clinical and Experimental Research, 24(6), 882–891.
Muthen, L., & Muthen, B. (1998-2007). Mplus user’s guide. fifth edition. Los Angeles:
Muthen & Muthen.
Nakagawa, S., & Foster, T. M. (2004). The case against retrospective statistical power
analyses with an introduction to power analysis. Acta Ethologica, 7(2), 103–108.
Nylund, K. L., Asparouhov, T., & Muthen, B. O. (2007). Deciding on the number
of classes in latent class analysis and growth mixture modeling: A Monte Carlo
136 References
simulation study. Structural Equation Modeling: A Multidisciplinary Journal , 14(4),
535–569.
O’Brien, R. G. (1986). Using the SAS system to perform power analyses for log-linear
models. Proceedings of the Eleventh Annual SAS Users Group Conference, Cary,
NC: SAS Institute, 778–784.
Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philosophical
Transactions of the Royal Society of London, 185(1), 71–110.
Poulsen, C. S. (1990). Mixed Markov and latent Markov modelling applied to brand
choice behaviour. International Journal of Research in Marketing , 7(1), 5–19.
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2004). Generalized multilevel structural
equation modeling. Psychometrika, 69(2), 167–190.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in
speech recognition. Proceedings of the IEEE , 77(2), 257–286.
Reboussin, B. A., Reboussin, D. M., Liang, K.-Y., & Anthony, J. C. (1998). Latent
transition modeling of progression of health-risk behavior. Multivariate Behavioral
Research, 33(4), 457–478.
Redner, R. (1981). Note on the consistency of the maximum likelihood estimate for
nonidentifiable distributions. The Annals of Statistics, 9(1), 225–228.
Rencher, A. C. (2000). Linear models in statistics. New York: John Wiley & Sons.
Rindskopf, D., & Rindskopf, W. (1986). The value of latent class analysis in medical
diagnosis. Statistics in Medicine, 5(1), 21–27.
Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance
structure analysis. Psychometrika, 50(1), 83–90.
Schoenfeld, D. A., & Borenstein, M. (2005). Calculating the power or sample size for the
logistic and proportional hazards models. Journal of Statistical Computation and
Simulation, 75(10), 771–785.
Schwarz, G., et al. (1978). Estimating the dimension of a model. The Annals of Statistics,
6(2), 461–464.
References 137
Sclove, S. L. (1987). Application of model-selection criteria to some problems in
multivariate analysis. Psychometrika, 52(3), 333–343.
Self, S. G., Mauritsen, R. H., & Ohara, J. (1992). Power calculations for likelihood ratio
tests in generalized linear models. Biometrics, 48(1), 31–39.
Shapiro, A. (1988). Towards theory of inequality. International Statistical Review , 56(1),
49–62.
Shieh, G. (2000). On power and sample size calculations for likelihood ratio tests in
generalized linear models. Biometrics, 56(4), 1192–1196.
Sotres-Alvarez, D., Herring, A. H., & Siega-Riz, A.-M. (2013). Latent transition models to
study women’s changing of dietary patterns from pregnancy to 1 year postpartum.
American journal of epidemiology , 177(8), 852–861.
Steiger, J. H., Shapiro, A., & Browne, M. W. (1985). On the multivariate asymptotic
distribution of sequential chi-square statistics. Psychometrika, 50(3), 253–263.
Tein, J.-Y., Coxe, S., & Cham, H. (2013). Statistical power to detect the correct number of
classes in latent profile analysis. Structural Equation Modeling: A Multidisciplinary
Journal , 20(4), 640–657.
Tofighi, D., & Enders, C. K. (2008). Identifying the correct number of classes in
growth mixture models. In G.R. Hancock (eds.), Mixture Models in Latent Variable
Research (pp. 317–341). Charlotte, NC: Information Age.
Uebersax, J. S., & Grove, W. M. (1990). Latent class analysis of diagnostic agreement.
Statistics in Medicine, 9(5), 559–572.
Van de Pol, F., & De Leeuw, J. (1986). A latent Markov model to correct for measurement
error. Sociological Methods & Research, 15(1-2), 118–141.
Van der Heijden, P. G., Dessens, J., & Bockenholt, U. (1996). Estimating the
concomitant-variable latent-class model with the EM algorithm. Journal of
Educational and Behavioral Statistics, 21(3), 215–229.
Vermunt, J. K. (1996). Log-linear event history analysis: A general approach with missing
data, latent variables, and unobserved heterogeneity. Tilburg: Tilburg University
138 References
Press.
Vermunt, J. K. (1997). LEM: A general program for the analysis of categorical data.
Tilburg University, The Netherlands.
Vermunt, J. K. (2003). Multilevel latent class models. Sociological Methodology , 33(1),
213–239.
Vermunt, J. K. (2010a). Latent class modeling with covariates: Two improved three-step
approaches. Political Analysis, 18(4), 450–469.
Vermunt, J. K. (2010b). Latent class models. In P. Peterson, E. Baker, & B. McGaw,
(eds.), International Encyclopedia of Education (Vol. 7, pp. 238–244). Oxford:
Elsevier.
Vermunt, J. K., Langeheine, R., & Bockenholt, U. (1999). Discrete-time discrete-state
latent markov models with time-constant and time-varying covariates. Journal of
Educational and Behavioral Statistics, 24(2), 179–207.
Vermunt, J. K., & Magidson, J. (2002). Latent class cluster analysis. In J. A. Hagenaars
& A. L. Mccutcheon (eds.), Applied Latent Class Analysis (pp. 56–85). Cambridge,
UK: Cambridge University Press.
Vermunt, J. K., & Magidson, J. (2013a). Lg-syntax user’s guide: Manual for latent gold
5.0 syntax module. Belmont, MA: Statistical Innovations Inc.
Vermunt, J. K., & Magidson, J. (2013b). Technical guide for Latent GOLD 5.0: Basic,
advanced, and syntax. Belmont, MA: Statistical Innovations Inc.
Vermunt, J. K., Tran, B., & Magidson, J. (2008). Latent class models in longitudinal
research. In S. Menard (eds.), Handbook of Longitudinal Research: Design,
Measurement, and Analysis (pp. 373–385). Burlington, MA: Elsevier.
Visser, I., Raijmakers, M. E. J., & Molenaar, P. C. M. (2002). Fitting hidden Markov
models to psychological data. Scientific Programming , 10(3), 185–199.
Visser, I., & Speekenbrink, M. (2010). depmixS4: an R-package for hidden Markov
models. Journal of Statistical Software, 36(7), 1–21.
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when
References 139
the number of observations is large. Transactions of the American Mathematical
Society , 54(3), 426–482.
Wall, M. M., & Li, R. (2009). Multiple indicator hidden Markov model with an application
to medical utilization data. Statistics in Medicine, 28(2), 293–310.
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica:
Journal of the Econometric Society , 50(1), 1–25.
Whittemore, A. S. (1981). Sample size for logistic regression with small response
probability. Journal of the American Statistical Association, 76(373), 27–32.
Wiggins, L. M. (1973). Panel analysis: Latent probability models for attitude and behavior
processes. San Francisco: Elsevier Scientific.
Williamson, J. M., Lin, H., Lyles, R. H., & Hightower, A. W. (2007). Power calculations
for zip and zinb models. Journal of Data Science, 5(4), 519–534.
Wolfe, J. H. (1970). Pattern clustering by multivariate mixture analysis. Multivariate
Behavioral Research, 5(3), 329–350.
Yamaguchi, K. (2000). Multinomial logit latent-class regression models: An analysis of
the predictors of gender-role attitudes among japanese women. American Journal
of Sociology , 105(6), 1702–1740.
Yang, C. (2006). Evaluating latent class analysis models in qualitative phenotype
identification. Computational Statistics & Data Analysis, 50(4), 1090–1104.
Yang, I., & Becker, M. (1997). Latent variable modeling of diagnostic accuracy.
Biometrics, 53(3), 948–958.
Acknowledgments
I would like to express my sincere gratitude and appreciation to the many people who have
offered me unwavering support, encouragement, and inspiration throughout this research.
I feel incredibly privileged to have the opportunity to share prof. dr. Jeroen Vermunt
exceptional scientific knowledge in the field of mixture modeling. prof. dr. Jeroen
Vermunt, during the last five years, as my professor on categorical data analysis, as a
supervisor for my first year paper and master thesis, and as a supervisor (and promoter)
for my PhD thesis, I constantly benefited from your continuous support and guidance.
Back in 2011, when applying for NWO Research Talent grant, you believed that I could
write my PhD thesis in three years. Thank you for understanding my potential and for
supporting me to grow as a research scientist. Working with you, I have had a very
enjoyable and rewarding experience.
Special gratitude is extended to my co-supervisors dr. Verena Schmittmann and dr.
Fetene Tekle, for constructive suggestions that they contributed to the various chapters
in my thesis. dr. Verena Schmittmann, I have learned a lot about how to structure
and write rigorous academic papers through my partnership with you. dr. Fetene Tekle,
beside professional support, you helped me in a lot of practical issues by sharing your
experience of staying in the Netherlands. I would also like to thank the members of my
thesis committee, for their valuable time and encouraging comments. My thanks goes to
141
142 ACKNOWLEDGMENTS
VIC group members, colleagues, and the administrative staff at Tiburg University, who
directly or indirectly contributed to this thesis.
My sincere thanks also goes to Lonneke van der Linde, former Oldendorff research
policy advisor, and dr. Andries van der Ark, former research master students coordinator
at Tilburg University, for creating a friendly and welcoming environment and for making
me feel home while more than 3500 miles away from home. I would also like to express
my gratitude to Tilburg University, Oldendorff scholarship, and NWO for financial support
during my research master study and PhD research. Oldendorff scholarship is a lot to me;
if not for this scholarship, I wouldn’t have been here.
I am deeply indebted to my family, especially my wife Kelebet and my lovely daughter
Nenati. Kelebet, without your deep love and full understanding, I would never have
succeeded. You sacrifice your career for taking care for our daughter and dedicate countless
efforts to make this proud moment in my life a reality. Mam, you are the most special:
you never went to school but send me from a remote rural area where no schools to cities
in Ethiopia, and then to Europe for my education. I don’t have enough words of thanks
to express my gratitude for you, but I simply pray that God will bless you with many
more healthy and joyful years. Furthermore, I would like to thank brothers and sisters at
Eindhoven church for an amazing fellowship.
Above all, I wholeheartedly thank my mighty God who gives us richly all things to
enjoy, and whose perfect love, patience, and gifts are the real strength behind the greatest
accomplishments in my life.