Nonparametric Estimation of Conditional Transition Probabilities in a ...
Nonparametric Smoothing Estimation of Conditional ...
Transcript of Nonparametric Smoothing Estimation of Conditional ...
Nonparametric Smoothing Estimation ofConditional Distribution Functions with
Longitudinal Data and Time-VaryingParametric Models
By Mohammed R. Chowdhury
B.Sc. in Statistics, August, 2000, University of Chittagong, BangladeshM.Sc. in Statistics, November, 2002, University of Chittagong, BangladeshM.A. in Statistics, July, 2008, Ball State University, Muncie, Indiana, USA
A Dissertation submitted to
The Faculty ofColumbian College of Arts and Sciencesof The George Washington University
in partial satisfaction of the requirementsfor the degree of Doctor of Philosophy
January 31, 2014
Dissertation directed by
Colin WuMathematical Statistician, NHLBI, National Institute of Helath, Bethesda, MD
Reza ModarresProfessor of Statistics
The Columbian College of Arts and Sciences of The George Washington Univer-
sity certifies that Mohammed R. Chowdhury has passed the Final Examination for
the degree of Doctor of Philosophy as of December 17, 2013. This is the final and
approved form of the dissertation.
Nonparametric Smoothing Estimation ofConditional Distribution Functions with
Longitudinal Data and Time-VaryingParametric Models
Mohammed R. Chowdhury
Dissertation Research Committee:
Colin Wu, Mathematical Statistician, NHLBI, National Institute of Health, Bethesda,
MD, Dissertation Co-Director
Reza Modarres, Professor of Statistics, Dissertation Co-Director
Subrata Kundu, Associate Professor of Statistics, Committee Member
Yinglei Lai, Associate Professor of Statistics, Committee Member
ii
Dedication
To my dear parents
Mohammed Mofizur Rahaman Chowdhury & Chemona Afroze
iii
Acknowledgements
At first, I am remembering The Almighty Allah for giving me strength, patience
and ability to accomplish this research work. I would like to thank my advisors Dr.
Colin Wu and Dr. Reza Modarres for their continuous support, encouragement and
guidance. Their guidance is evident throughout. I am indebted to Dr. Subrata
Kundu and Dr. Yinglei Lai for their helpful suggestions and constructive review of
this dissertation. I am also grateful to Dr. Paul Albert and Tatiyana Apanasovich
for their invaluable comments. Additionally, I would like to express appreciation to
my friends Li Cheung, Jorge Ivan Velez and others at the Department of Statistics
for their dear friendship and support. I also thank National Heart Lung and Blood
Institute for providing me the NGHS (National Growth and Health Study) data. The
National Growth and Health Study was supported by contracts NO1-HC-55023-26
and grants U01-HL48941-44 from the National Heart, Lung and Blood Institute. I
also thank my sister Zosna Afroze for her wholehearted support. Last, but not least,
I would like to thank my wife Nahida Akhter Irin, whose unconditional love and
devotion have provided comfort and joy. She has made all the differences in every
step of the way throughout this journey.
iv
Abstract
Nonparametric Smoothing Estimation of Conditional Distribution Functions withLongitudinal Data and Time-Varying Parametric Models
The thesis is concerned with the nonparametric estimation of the conditional distri-
bution function with longitudinal data. Nonparametric estimation and inferences of
conditional distribution functions with longitudinal data have important applications
in biomedical studies, such as epidemiological studies and longitudinal clinical trials.
Estimation without any structural assumptions may lead to inadequate and numeri-
cally unstable estimators in practice. In this Dissertation, we propose a nonparametric
approach based on time-varying parametric models for smoothing estimation of the
conditional distribution functions with a longitudinal sample and show that our local
polynomial smoothing estimator outperforms the existing Nadaraya-Watson kernel
smoothing estimator in term of root MSE and length of confidence band. In both
cases, we have used the Epanechnikov kernel and bandwidth 2.5.
Our model assumes that the conditional distribution of the outcome variable at
each given time point can be approximated by a parametric model after log trans-
formation or local Box-Cox transformation, but the parameters are smooth function
of time. Our estimation is based on a two-step smoothing method, in which we first
obtain the raw estimators of the conditional distribution functions at a set of dis-
joint time points, and then compute the final estimators at any time by smoothing
the raw estimators. Pointwise bootstrap confidence bands have been constructed for
both local polynomial smoothing estimators and Nadaraya-Watson kernel smoothing
v
estimators, resulting in a wider bootstrap confidence band for the Nadaraya-Wastson
kernel smoothing estimator. Asymptotic properties, including the asymptotic bi-
ases, variances and mean squared errors, have been derived for the local polynomial
smoothed estimators. Asymptotic distribution of the raw estimators of the condi-
tional distribution functions has been derived.
Applications of our two-step estimation method have been demonstrated through
a large epidemiological study of childhood growth and blood pressure. In our NGHS
(National Health and Growth Study) application, we report that
(a) Structural Nonparametric Model (SNM) performs better than the Unstructured
Nonparametric Model (UNM) in estimating raw probabilities as well as smoothing
probabilities on entire time design points.
(b) African American (AA) girls have higher probability of developing hypertension
than the Caucasian (CC) girls.
(c) Box-Cox transformation gives better results than the Log transformation.
(d) Smoothing-Ealry and Smoothing-Later give the same results when Log transfor-
mation is involved.
(e) Smoothing-Later is the only option when Box-Cox transformation is involved.
Finite sample properties of our procedures are investigated through a simulation
study. We report that root MSE is smaller at each of the 101 time design point for
local polynomial smoothing estimator than the Nadaraya-Watson kernel smoothing
estimator. A much stronger conclusion for smaller root MSE is demonstrated by
structural nonparametric model than the unstructured nonparametric model when
extreme conditional tail probabilities are estimated and smoothed.
vi
Contents
Dedication iii
Acknowledgements iv
Abstract v
Contents vii
List of Figures x
List of Tables xiv
1 Chapter One 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Conditional Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Time - Varying Parameter Models . . . . . . . . . . . . . . . . 7
1.3.2 Extension to Continuous and Time-Varying Covariates . . . . 8
1.3.3 Time-Varying Nonparametric Models . . . . . . . . . . . . . . 9
1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . 9
2 Chapter Two 12
2.1 Two-Step estimation methods and inferences for time variant paramet-
ric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.1 Raw estimates of parameter curves and distributions . . . . . 12
2.2 Smoothing Estimators of Conditional Distributions . . . . . . . . . . 13
2.2.1 Rationales of Smoothing Step . . . . . . . . . . . . . . . . . . 13
2.2.2 Smoothing-Early conditional CDF Estimators . . . . . . . . . 14
vii
2.2.3 Smoothing-Later conditional CDF Estimators . . . . . . . . . 15
2.3 Two-Steps estimation methods and inferences for unstructured non-
parametric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Raw estimates of unstructured nonparametric CDF . . . . . . 17
2.3.2 Smoothing estimates of unstructured nonparametric CDF . . . 17
2.4 Bandwidth Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Bootstrap Pointwise Confidence Intervals . . . . . . . . . . . . . . . . 22
3 Chapter Three 24
3.1 Application to NGHS BP data . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 Chapter Four 41
4.1 Asymptotic Properties of the Raw Estimators . . . . . . . . . . . . . 41
4.2 Asymptotic Properties of the Smoothing Estimators: . . . . . . . . . 43
5 Chapter Five 47
5.1 Time-Varying Models with Locally Transformed Variables . . . . . . 47
5.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6 Chapter Six 63
6.1 Discussion and Future Research . . . . . . . . . . . . . . . . . . . . . 63
7 Appendix 1: Preliminary Analysis 65
8 Appendix 2: Proof of Theoretical Results 133
8.1 A.1 Useful Approximation for the Equivalent Kernels . . . . . . . . . 133
8.2 A.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 133
9 Appendix 3: R Code 137
viii
10 References 178
ix
List of Figures
1 Raw estimators (scatter plots), smoothing estimators (solid curves),
and bootstrap pointwise 95% confidence intervals (dashed curves, B =
1000 bootstrap replications) of the age specific probabilities of SBP
greater than the 90th and 95th population SBP percentiles for Cau-
casian girls (CC) between 9.1 and 19.0 years old. (1a) and (1b): Esti-
mators based on the time-varying log-normal models. (1c)-(1d): Esti-
mators based on the unstructured kernel estimators. . . . . . . . . . . 30
2 Raw estimators (scatter plots), smoothing estimators (solid curves),
and bootstrap pointwise 95% confidence intervals (dashed curves, B =
1000 bootstrap replications) of the age specific probabilities of SBP
greater than the 90th and 95th population SBP percentiles for African-
American (AA) girls between 9.1 and 19.0 years old. (1a) and (1b):
Estimators based on the time-varying log-normal models. (1c)-(1d):
Estimators based on the unstructured kernel estimators. . . . . . . . 31
3 Raw estimators (scatter plots), smoothing estimators (solid curves),
and bootstrap pointwise 95% confidence intervals (dashed curves, B =
1000 bootstrap replications) of the age specific probabilities of SBP
greater than the 90th and 95th population SBP percentiles for all girls
between 9.1 and 19.0 years old. (1a) and (1b): Estimators based on
the time-varying log-normal models. (1c)-(1d): Estimators based on
the unstructured kernel estimators. . . . . . . . . . . . . . . . . . . . 32
x
4 Raw estimators (scatter plots), smoothing estimators (solid curves),
and bootstrap pointwise 95% confidence intervals (dashed curves, B =
1000 bootstrap replications) of the age specific mean and standard de-
viation of SBP for All girls between 9.1 and 19.0 years old. Estimators
based on the time-varying log-normal models. . . . . . . . . . . . . . 33
5 Local linear smoothing estimators (solid curves), and pointwise boot-
strap 95% confidence intervals (dashed curves, B = 1000 bootstrap
replications) of the age specific probabilities of SBP greater than the
90th and 95th population SBP percentiles for all girls between 9.1 and
19.0 years old. Estimators based on the time-varying Gaussian models
with smoothing early approach. . . . . . . . . . . . . . . . . . . . . . 34
6 Black solid line is local polynomial (a,b) and Nadaraya-Watson (c,d)
smoothing estimators with Epanechnikov kernel for SNM and UNM.
Dotted lines represent the 95% pointwise bootstrap confidence band
for 1000 simulated samples. . . . . . . . . . . . . . . . . . . . . . . . 38
7 Raw estimators (scatter plots), smoothing estimators (solid curves),
and bootstrap pointwise 95% confidence intervals (dashed curves, B =
1000 bootstrap replications) of the age specific probabilities of SBP
greater than the 90th, 95th and 99th population SBP percentiles for
Caucasian girls (CC) between 9.1 and 19.0 years old. (a),(c),(e): Esti-
mators based on the time-varying Gaussian models. (b),(d),(f): Esti-
mators based on the unstructured kernel estimators. . . . . . . . . . . 51
xi
8 Raw estimators (scatter plots), smoothing estimators (solid curves),
and bootstrap pointwise 95% confidence intervals (dashed curves, B =
1000 bootstrap replications) of the age specific probabilities of SBP
greater than the 90th, 95th and 99th population SBP percentiles for
African American girls (CC) between 9.1 and 19.0 years old. (a),(c),(e):
Estimators based on the time-varying Gaussian models. (b),(d),(f):
Estimators based on the unstructured kernel estimators. . . . . . . . 52
9 Raw estimators (scatter plots), smoothing estimators (solid curves),
and bootstrap pointwise 95% confidence intervals (dashed curves, B =
1000 bootstrap replications) of the age specific probabilities of SBP
greater than the 90th, 95th and 99th population SBP percentiles for
entire cohort between 9.1 and 19.0 years old. (a),(c),(e): Estimators
based on the time-varying Gaussian models. (b),(d),(f): Estimators
based on the unstructured kernel estimators. . . . . . . . . . . . . . . 53
10 Black solid line is local linear (a,c,e) and Nadaraya-Watson (b,d,f)
smoothing estimators with Epanechnikov kernel and bandwidth 2.5.
Dotted lines represent the 95% pointwise bootstrap confidence band
for 1000 simulated samples. . . . . . . . . . . . . . . . . . . . . . . . 59
11 QQplot of SBP after log transformation from the 1st data set to 12th
data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
12 QQplot of SBP after log transformation from the 13th data set to 24th
data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
13 QQplot of SBP after log transformation from the 25th data set to 36th
data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
14 QQplot of SBP after log transformation from the 37th data set to 48th
data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
xii
15 QQplot of SBP after log transformation from the 49th data set to 60th
data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
16 QQplot of SBP after log transformation from the 61th data set to 72th
data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
17 QQplot of SBP after log transformation from the 73th data set to 84th
data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
18 QQplot of SBP after log transformation from the 85th data set to 96th
data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
19 QQplot of SBP after log transformation from the 97th data set to 100th
data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
20 Local polynomial smoothing estimator of Box-Cox Lambda. . . . . . 132
xiii
List of Tables
1 Theoretical 90th and 95th quantiles for 10 different time points out of
101 different time points from the model of our simulation design. . . 35
2 Averages of estimates, averages of the biases, the square root of the
mean squared errors, the empirical coverage probabilities of the empir-
ical quantile bootstrap pointwise 95% confidence intervals (B = 1000
bootstrap replications) for the estimation of P [Y (t) > y.90(t)] = 0.10
and relative Root MSE at t = 1.0, 2.0, . . . , 10.0 over 1000 simulated
sample. The smoothing-later local linear estimators based on the time-
varying Gaussian model are shown in the left panel. The kernel esti-
mators based on the unstructured nonparametric model are shown in
the right panel. The Epanechnikov kernel and the LTCV bandwidth
h = 2.5 are used for all the smoothing estimators. . . . . . . . . . . . 39
3 Averages of estimates, averages of the biases, the square root of the
mean squared errors, the empirical coverage probabilities of the empir-
ical quantile bootstrap pointwise 95% confidence intervals (B = 1000
bootstrap replications) for the estimation of P [Y (t) > y.90(t)] = 0.05
and relative Root MSE at t = 1.0, 2.0, . . . , 10.0 over 1000 simulated
sample. The smoothing-later local linear estimators based on the time-
varying Gaussian model are shown in the left panel. The kernel esti-
mators based on the unstructured nonparametric model are shown in
the right panel. The Epanechnikov kernel and the LTCV bandwidth
h = 2.5 are used for all the smoothing estimators. . . . . . . . . . . . 40
4 Theoretical 90th, 95th and 99th quantiles for 10 different time points
out of 101 different time points from the model of our simulation design. 55
xiv
5 Averages of estimates, averages of the biases, the square root of the
mean squared errors, the empirical coverage probabilities of the empir-
ical quantile bootstrap pointwise 95% confidence intervals (B = 1000
bootstrap replications) for the estimation of P [Y (t) > y.90(t)] = 0.10
and relative Root MSE at t = 1.0, 2.0, . . . , 10.0 over 1000 simulated
sample. The smoothing-later local linear estimators based on the time-
varying Gaussian model are shown in the left panel. The kernel esti-
mators based on the unstructured nonparametric model are shown in
the right panel. The Epanechnikov kernel and the LTCV bandwidth
h = 2.5 are used for all the smoothing estimators. . . . . . . . . . . . 60
6 Averages of the biases, the square root of the mean squared errors, and
the empirical coverage probabilities of the empirical quantile bootstrap
pointwise 95% confidence intervals (B = 1000 bootstrap replications)
for the estimation of P [Y (t) > y.95(t)] = 0.05 at t = 1.0, 2.0, . . . , 10.0
over 1000 simulated sample. The smoothing-later local linear estima-
tors based on the time-varying Gaussian model are shown in the left
panel. The kernel estimators based on the unstructured nonparametric
model are shown in the right panel. The Epanechnikov kernel and the
LTCV bandwidth h = 2.5 are used for all the smoothing estimators. . 61
xv
7 Averages of the biases, the square root of the mean squared errors, and
the empirical coverage probabilities of the empirical quantile bootstrap
pointwise 95% confidence intervals (B = 1000 bootstrap replications)
for the estimation of P [Y (t) > y.99(t)] = 0.01 at t = 1.0, 2.0, . . . , 10.0
over 1000 simulated sample. The smoothing-later local linear estima-
tors based on the time-varying Gaussian model are shown in the left
panel. The kernel estimators based on the unstructured nonparametric
model are shown in the right panel. The Epanechnikov kernel and the
LTCV bandwidth h = 2.5 are used for all the smoothing estimators. . 62
8 P-values for normality test of 100 data sets. SW, AD, KS, CVM and
ChiSq stand for Shapiro-Wilk Test, Anderson Darling Test, Kolmogorov-
Smirnov Test, Cramer-Von-Mises Test and Chi Square Test respectively. 65
9 Estimated Raw Probabilities (ERP) of SBP for entire cohort that ex-
ceed different quantiles of yq(t) (q= .90,.95,.99) by SNM and UNM. . 70
10 Estimated Raw Probabilities (ERP) of SBP for Caucasian Girls that
exceed different quantiles of q(t)(q= .90,.95,.99) by SNM and UNM. . 78
11 Estimated Raw Probabilities (ERP) of SBP for African American Girls
that exceed different quantiles of yq(t)(q= .90,.95,.99) by SNM and UNM. 86
12 Local linear smoothing estimates for µ(t) and σ(t) for 100 data sets of
entire cohort. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
13 Girls with median height and age specific log scaled SBP percentile
values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
14 Smoothing probabilities by local linear smoothing estimator and Nadaraya-
Watson Kernel smoothing estimator for entire cohort. . . . . . . . . . 104
xvi
15 Some values of bandwidth for entire cohort, Caucasian cohort and
African American cohort obtained by AIC cross validation method.
Cross validation scores are given in the parenthesis. . . . . . . . . . . 109
16 Some values of bandwidth for entire cohort, Caucasian cohort and
African American cohort obtained by LS cross validation method.
Cross validation scores are given in the parenthesis. . . . . . . . . . . 110
17 ML estimators and local polynomial smoothing estimators of Lambda
with their corresponding mean, minimum and maximum of each sub-
samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
18 ML estimators and their local polynomial smoothing estimators with
corresponding p-values by Shapiro-Wilk (SW) Test for each sub-samples.118
xvii
1 Chapter One
1.1 Introduction
Longitudinal data often appear in biomedical studies, where at least some of the
variables of interest from the independent subjects are repeatedly measured over
time. Existing methods of longitudinal analysis in the literature are mostly focused on
regression models based on the conditional means and variance-covariance structures.
Parametric and nonparametric methods for conditional mean and variance-covariance
based regression models may be found, among others, in Hart and Wehrly (1993),
Hoover et al. (1998), Fan and Zhang (2000), Lin and Carroll (2001), Verbeke and
Mollenberghs (2005), James, Hastie, and Sugar (2000), Diggle et al. (2002), Chiou,
Muller, and Wang (2004), Hu, Wang, Carroll (2004), Qu and Li (2006), Senturk
and Muller (2006), Zhou, Huang and Carroll (2008) and Fitzmaurice et al. (2009).
These methods, although popular in practice, may not be adequate for estimating
the conditional distribution functions when the longitudinal variables of interest are
skewed or have significant deviation from normality.
Log transformation or Box-Cox (1964) transformation is needed to reduce skew-
ness of the longitudinal data. More specifically, a time variant local Box-Cox transfor-
mation is required to induce normality on each of the data set partitioned according to
predetermined time points from the original longitudinal sample. Box-Cox transfor-
mation system gives the power-normal (PN) family, whose members include normal
and log-normal distributions. A detailed review of the Box-Cox transformation can
be found on Sakia (1992). Box-Cox (1964) proposed both maximum likelihood as
well as Baysian Methods for the estimation of the parameter λ. We have used only
1
maximum likelihood Method for estimating time variant Box-Cox parameter λ(t).
Maximization of the likelihood function is done over a fixed power transformation
set. We consider this power transformation set as [-2 , 2]. When λ(t) = 0, a log
transformation is considered. Box-Cox (1964) transformation is defined as follows:
Y =
Zλ−1λ
if λ 6= 0
log(Z) if λ = 0
When λ varies according to time point t, we have time-variant Box-Cox transfor-
mation as follows:
Y (t) =
Zλ(t)(t)−1
λ(t)if λ(t) 6= 0
log(Z(t)) if λ(t) = 0
The problem of modeling and estimating the conditional distribution functions of
a longitudinal variable is well motivated by a large prospective cohort study, namely
the National Growth and Health Study (NGHS), which has the main objective of
evaluating the temporal trends of cardiovascular risk factors, such as the systolic and
diastolic blood pressures (SBP, DBP) based on up to 10 annual follow-up visits of
2379 African-American and Caucasian girls during adolescence (NGHSRG, 1992).
Existing results of this study, such as Daniels et al. (1998), Thompson et al. (2007)
and Obarzanek et al. (2010), have evaluated the effects of age, height and other
covariates on the means and abnormal levels of SBP, DBP and other cardiovascular
risk factors. Because the conditional distributions of SBP and DBP over age are not
normal, the conditional mean based regression models employed by the above authors
do not lead to adequate estimates of the conditional distributions of SBP and DBP
for this adolescent population.
In an effort to directly model and estimate the conditional distribution functions of
2
longitudinal variables, Wu, Tian and Yu (2010) and Wu and Tian (2013b) studied an
estimation method based on a time-varying transformation model with time-varying
covariates, and Wu and Tian (2013a) studied a two-step smoothing method for time-
invariant covariates without assuming any specific modeling structures. Applying
their methods to the NGHS blood pressure (BP) data, these authors illustrated the
advantages of directly modeling and estimating the conditional distribution functions
over the conditional mean based models. Similar to the unstructured smoothing meth-
ods of Hall, Wolff and Yao (1999) for the independent identically distributed (i.i.d.)
data, the smoothing method of Wu and Tian (2013a) could be numerically unstable
and lead to substantial estimation errors for estimating the conditional cumulative
distribution functions (CDF) near the boundary of the support. The structural mod-
eling approach of Wu, Tian and Yu (2010) is mainly for the purpose of reducing
the dimensionality associated with the time-varying covariates, which does not allevi-
ate potential estimation errors of the conditional CDF estimators near the boundary
points.
Motivated by the time-varying coefficient approaches in the literature (e.g., Hoover
et al., 1998), we propose in this dissertation a structural nonparametric approach for
the estimation of conditional distribution functions, and show that, when the struc-
tural assumptions hold, our approach may lead to estimators which are superior to the
unstructured smoothing estimators. Our approach relies on the assumption that the
conditional distribution function of the longitudinal variable of interest or its transfor-
mation (Log transformation or local Box-Cox transformation) at a given time point
follows a parametric family, but the parameters may vary when time changes. Our
longitudinal variable follows a “time-varying parametric model”, which is a special
case of the “structural nonparametric models” and maintains its flexibility by allow-
ing different parameter values at different time points. For the estimation method,
3
we propose a two-step smoothing procedure, which first obtains the raw estimators of
the conditional distribution functions based on the time-varying parametric family at
a set of distinct time points, and then computes the estimators at any time point by
smoothing the available raw estimators using a nonparametric smoothing procedure.
The two-step smoothing method can be applied in two ways. These are known
as the smoothing-early approach and smoothing-later approach. In smoothing-early
approach, we first smooth the estimate of the parameter space and then directly plug
the smoothing estimate of the parameters in the conditional distribution function to
get the smoothing estimate of the conditional distribution function. In smoothing-
later approach, we use the raw estimates of the parameters to get the raw estimates
of the conditional distribution function at each time point and finally smooth these
raw estimates of the conditional distribution function to get the smoothing estimates
of the conditional distribution function.
It should be mentioned that smoothing-early approach is not applicable when Box-
Cox power transformation is involved. Box-Cox transformed Y (tj) differ significantly
between time points for the ML estimates of λ(tj). For smoothing estimates of time
variant Box-Cox λ(t), transformed variable Y (t) does not belong to the Power Normal
family. Even with raw estimate of λ(tj), we cannot smooth the parameter space of
Gaussian distribution since for each raw estimate of λ(tj), the time varying param-
eter space of Gaussian distribution varies in large scale. Applying smoothing-early
approach for getting the smoothing estimates of the parameter space with such huge
variation among estimates of the parameter values at different time points will not
give us any meaningful smoothing estimates of the parameter space. So, smoothing
later approach is the only option when time variant Box-Cox transformation is applied
to induce normality. Table 17 of Appendix 1 shows the variation of Box-Cox trans-
formed Y (t) for ML estimate of Box-Cox λ(t). In the same table, we have also shown
4
the transformed Y (t) for smoothed λ(t). Table 17 also gives the average, minimum
and maximum for each sub-sample for ML estimate of Box-Cox λ(t) and smoothing
estimate of Box-Cox λ(t). Table 18 of Appendix 1 shows that 13 sub-samples from
100 sub-samples are nonnormal with time variant smoothed Box-Cox λ(t) whereas
only 4 sub-samples are nonnormal with ML estimates of λ(t). In Figure 20, we have
shown the smoothed λ(t) together with 95% pointwise bootstrap confidence band.
Table 17, Table 18 and Figure 20 tell us why smoothing early conditional distribution
function estimation is not possible with time variant Box-Cox λ(t).
The two-step smoothing procedure, which is similar to the ones used in Fan and
Zhang (2000) and Wu, Tian and Yu (2010), is computationally simple and easy to
be implemented in practice. For the practical properties, we demonstrate the clinical
interpretations and implications of our structural nonparametric approach over the
unstructured nonparametric method through an application to the NGHS BP data,
and investigate the finite sample properties of our procedures through a simulation
study. For the theoretical properties, we derive the asymptotic distributions of the
raw estimators and the asymptotic expressions for the biases, variances and mean
squared error for the two-step local polynomial estimators. These results show that
the smoothing step has the advantage of reducing the variability of the raw estimators
as well as giving estimates at entire time design points.
1.2 Data Structure
We focus on the longitudinal samples with similar structures as the NGHS (NGH-
SRG, 1992), which are mathematically convenient and commonly appear in large
epidemiological studies. Within each sample, we have n independent subjects. For
the ith subject with 1 ≤ i ≤ n, we have mi ≥ 1 observations at time points
Ti = {tij ∈ τ ; j = 1, . . . ,mi}, where τ is the time interval containing all the time
5
range of interest. The total number of observations is N =∑n
i=1 mi. For mathe-
matical simplicity, we assume that the time points T = {Ti; i = 1, . . . , n} for the
sample are contained in the vector of J “disjoint time points” t = (t1, ....., tJ)T . In
biomedical studies, t is often obtained by rounding off age or other time variables
within an acceptable range of accuracy. Let Y (t) be a real valued outcome variable
at any time point t ∈ τ and X is time-invariant categorical covariate that takes values
x ∈ {1, . . . , K}.
The longitudinal sample for {Y (t), X, t} is denoted by Z = {Yi(tj), Xi, tj; 1 ≤ j ≤
J, i ∈ Sj}, where Sj is the set of subjects which have observations at time point tj.
This data structure is the special case of Obarzanek et al. (2010), Wu, Tian and Yu
(2010) and Wu and Tian (2013b) with a categorical and time invariant covariates.
Let nj = # {i ∈ Sj} be the number of subjects in Sj and nj1j2 = # {i ∈ Sj1⋂Sj2} the
number of subjects in both Sj1 and Sj2 . Then, nj1j2 ≤ min {nj1 , nj2}. In the NGHS
applications of Wu, Tian and Yu (2010) and Wu and Tian (2013b), the time points in
t are specified by rounding up the ages of the NGHS subjects at the first decimal place,
which are chosen using the clinical definition of age for pediatric studies. Although
in general Y (t) may be a multivariate random variable, for mathematical simplicity,
our main results are restricted to the univariate Y (t). Extension to multivariate Y (t)
require further modeling for the joint distributions, hence, is out of the scope of this
dissertation.
Remark 1.1. We assume categorical and time-invariant covariate X for the sim-
plicity of mathematical expressions and biological interpretations. When continuous
or time-dependent covariates are included in the model, the nonparametric estimation
methods require multivariate smoothing methods, which could be computationally in-
tensive and sometimes infeasible and difficult to interpret in practice. In section 1.3.2
a useful dimension reduction approach based on time-varying transformation models
6
(Wu, Tian and Yu, 2010) is discussed. This approach could be extended to the esti-
mation of conditional probabilities with continuous and time-varying covariates. But
this extension requires further methodological and theoretical developments than the
ones provided in this dissertation. In the present context, X=1 represent Caucasian
girls and X=2 represent African American girls.
Remark 1.2. The data structure mentioned in this section is consistent with
the data formulation used in the NGHS publications, such as Daniels et al.,(1998),
Obarzanek et al. (2010), Wu, Tian and Yu (2010) and Wu and Tian (2013a). In
both data formulations, the time points {tij; 1 ≤ j ≤ mi, 1 ≤ i ≤ n} have J > 1
distinct possible values in t, which are referred in Wu, Tian and Yu (2010) and Wu
and Tian (2013b) as the “time design points”. Each of the n independent subjects
has actual visit times within a subset of t. If the ith subject is observed at time point
tj, the corresponding outcome variable is Yi(tj), which may not be the same as Yi(tij),
since tj and tij are not necessarily the same.
1.3 Conditional Distribution
1.3.1 Time - Varying Parameter Models
For any given t ∈ τ , our objective is to estimate the conditional cumulative distribu-
tion functions (CDF) Ft,θ(t)[y(t)|x] = P [Y (t) ≤ y(t)|t,X = x] for some given curve
y(t) on τ based on the longitudinal sample Z. Other conditional probabilities may
be obtained using the functionals of Ft,θ(t)[y(t)|x]. The choices of y(t) depend on the
scientific objectives of the analysis. For example, hypertension and abnormal levels
of blood pressure for children and adolescents are defined by gender and age specific
blood pressure quantiles (e.g., NHBPEP, 2004), so that it is often meaningful in pe-
diatric studies to evaluate the conditional CDFs with y(t) chosen as a pre-determined
gender and age specific blood pressure quantile curves.
7
The two-step estimation method of Wu and Tian (2013a) relies on kernel smooth-
ing estimator of the raw empirical CDF Ftj [y(tj)|X = x] for j = 1, . . . , J without
assuming any parametric structures for Ft[y(t)|X = x]. When Ft(·) belongs to a
parametric family at each t ∈ τ , we have a time-varying parametric model
Fθ(t) = {Ft,θ(t)(·); θ(t) ∈ Θ}, (1)
where θ(t) is the vector of time-varying parameters which belong to an open Euclidean
space Θ. Under Fθ(t), one would naturally expect that a smoothing estimator, which
effectively utilizes the local parameter structure of Ft,θ(t)(·), could be superior to the
unstructured two-step smoothing estimators of Wu and Tian (2013a). For the special
case of time-varying normal distribution, Ft,θ(t)[y(t)] given t ∈ τ with mean and
variance curves θ(t) = (µ(t), σ2(t))T is given by
Ft,θ(t)[y(t)] =
∫ y(t)
−∞
1√2πσ(t)
exp[− 1
2σ2(t)
{s− µ(t)
}2]ds. (2)
Conditional CDFs for time-varying parametric models other than (2) may be similarly
obtained. We note that the conditional CDF Ft,θ(t)(·) allows Y (t) to be either a
continuous or discrete random variable. If Box-Cox transformation is involved, the
parameter curves will be θ(t) = (µ(t), σ2(t), λ(t))T .
1.3.2 Extension to Continuous and Time-Varying Covariates
When there are continuous and time-varying covariates, we denote the covariate vec-
tor by X(t). The conditional CDF to be estimated is
Ft [y(t)|X = x(t)] = P [Y (t) ≤ y(t)|t,X(t) = x(t)] (3)
8
Nonparametric estimation of Ft [y(t)|X(t) = x(t)] would require a multivariate smooth-
ing method over both x(t) and t, which could be difficult to compute, particularly
when the dimensionality of X(t) is high, due to the well-known problem of the “curse
of dimensionality”. A potentially useful approach is to model the distribution of Y(t)
given X(t) and then to estimate the conditional CDF from the fitted model. The
time-varying transformation models of Wu, Tian and Yu (2010) and Wu and Tian
(2013b) rely on modeling the dependence of Ft [y(t)|X(t)] on X(t) through a “struc-
tural nonparametric model” determined by the coefficient curves. However, further
research is needed to develop appropriate time-varying parametric families Fθ(t) and
their estimation procedures for estimating the conditional CDFs Ft [y(t)|X(t)] when
X(t) involves continuous and time-varying components.
1.3.3 Time-Varying Nonparametric Models
For a given t ∈ τ , our interest is to estimate the conditional probability of FY (t)[y(t)|x] =
P [Y (t) ≤ y(t)|X = x] by empirical method. The choices of y(t) depend on the scien-
tific objectives of the analysis. More generally, it is often useful in practice to allow
y(t) to change with t. For example, health status for children and adolescents are
often defined by gender and age specific risk categories (e.g.,NHBPEP, 2004). Thus,
in pediatric studies, it is often meaningful to evaluate the conditional CDFs defined
as above, where, y(t) is a pre-determined gender and age specific risk-threshold curve.
For the both structural nonparametric and unstructured nonparametric methods, we
have used 90th, 95th and 99th percentile values of y(t) for girls with median height.
1.4 Organization of the Dissertation
Chapter 2 proposes a structural nonparametric model (SNM) for smoothing estima-
tion of the time variant conditional distribution function. Local polynomial smoothing
9
estimator for the structural nonparametric model has been developed in this chapter.
Chapter 2 also discusses the smoothing early and smoothing later approaches. This
chapter also examines the existing Nadaraya-Watson kernel smoothing approach for
conditional distribution function. To estimate time variant conditional CDF from
structural nonparametric models, we assume that our variable of interest-systolic
blood pressure (SBP) follows a parametric model after Log transformation. At each
time point the parameters of the models vary according to time design points. These
time variant parameters are estimated by maximum likelihood method. By plugging
these raw estimates on the time dependent parametric model, we estimate the time
variant conditional CDF for some pre-specified percentile values popularly known as
quantile curve y(t). These estimates of the conditional CDF are considered as the
raw estimates. As most time dependent raw estimates of the unknown parameters in
biomedical study show spiky behavior, we need to smooth them over by local poly-
nomial smoothing so that smoothing curve estimator over entire time design points
can be constructed. This approach is known as the smoothing-later approach. In
smoothing-early approach, we need to smooth the raw estimators of the parameter
space and then directly plug these smoothing estimators of the parameters in the con-
ditional CDF formula to immediately get the smoothing estimates of the conditional
CDF on entire time design points. Unstructured nonparametric model (UNM) and
Nadaraya-Watson kernel smoothing estimator is discussed for comparison with the
Structural Nonparametric Model and local polynomial smoothing estimator.
In chapter 3, we have presented an application of the above two methods to NGHS
data and also conducted a simulation study which is designed as similar to the NGHS
study. In simulation study, we have shown that root MSE is smaller in structural
nonparametric model than the unstructured nonparametric model in each of the 101
time design points. By looking at the relative root MSE, we see that it is always less
10
than 1 at each time design point, which indicates that SNM is more efficient than
UNM. A wider 95% pointwise bootstrap confidence band for UNM also indicate that
SNM is better than UNM.
In chapter 4, we have derived the asymptotic distribution of the estimators of
conditional CDF. The variance, bias and MSE of the the smoothing estimators of the
conditional distribution function from structural nonparametric model are explicitly
presented in this chapter.
In chapter 5, we have used the local Box-Cox transformation and repeated the
application of chapter 3 and show that local Box-Cox power transformation system
produces better results than the global log transformation. Asymptotic results of
smoothing estimator involving the Box-Cox λ(t) were not derived and left for future
research. Other Future research is demonstrated in the discussion part in chapter 6.
Further details of the theoretical derivations are given in the Appendix 2. Preliminary
data analysis, expolration and visualization are presented in Appendix 1. R code is
provided in Appendix 3.
11
2 Chapter Two
2.1 Two-Step estimation methods and inferences for time
variant parametric models
Similar to the estimation approach of Wu and Tian (2013b), we derive here a two-
step smoothing method for the estimation of the conditional distribution functions
in which we first compute the raw estimates of θ(tj|x) and Ftj ,θ(tj |x)
[y(tj)
∣∣x] for all
j = 1, . . . , J , and then derive the smoothing estimates of θ(t|x) and Ft,θ(t|x)
[y(t)
∣∣x] for
any t ∈ τ by applying a smoothing procedure over the corresponding raw estimates
at t. This two-step smoothing approach is computationally simple and does not need
correlation assumptions across different time points.
2.1.1 Raw estimates of parameter curves and distributions
We derive the estimators θ(tj|x) and Ftj ,θ(tj |x)
[y(tj)
∣∣x] of θ(tj|x) and Ftj ,θ(tj |x)
[y(tj)
∣∣x],respectively, using observations at time tj ∈ t. Suppose that we have enough obser-
vations nj at tj, so that θ(tj|x), tj ∈ t, can be estimated by the maximum likeli-
hood estimators (MLE) θ(tj|x) using the subjects in Sj. Substituting θ(tj|x) with
θ(tj|x), the corresponding raw estimator of Ftj ,θ(tj |x)
[y(tj)
∣∣x] is Ftj ,θ(tj |x)
[y(tj)
∣∣x] =
Ftj ,θ(tj |x)
[y(tj)
∣∣x]. For the time-varying Gaussian model, Ftj ,θ(tj |x)
[y(tj)
∣∣x] is given
by substituting θ(tj|x) with the MLE θ(tj|x) =(µ(tj|x), σ2(tj|x)
)Tin (2).
In practice, these raw estimators require the number of observations nj at tj to be
sufficiently large, so that they can be computed numerically. When the local sample
size nj is not sufficiently large, we can round off or group some of the adjacent time
points into small bins, and compute the raw estimates within each bin. This round
12
off or binning approach has been used by Fan and Zhang (2000), Wu, Tian and Yu
(2010) and Wu and Tian (2013b). But the effects of round off or binning on the
asymptotic properties of the smoothing estimators have not been investigated in the
literature. In biomedical studies, such as the NGHS, the unit of time is often rounded
off into an acceptable precision, so that numerical computations of the raw estimators
in such studies are possible.
2.2 Smoothing Estimators of Conditional Distributions
2.2.1 Rationales of Smoothing Step
There are two reasons to use the smoothing step in addition to the raw estimates.
First, the raw estimates are only for the coefficients or conditional CDFs at time
points in t, while the smoothing step leads to curve estimates over the entire time
range τ . Second, the raw estimates usually have excessive variations, so that their
values may change dramatically among adjacent time points in t. Given that spiky
estimates may not have meaningful biological interpretations, the smoothing step
should be used to reduce the variation by sharing information from the adjacent time
points. Theoretical justifications of the smoothing are discussed in Chapter 4.
There are two different two-step smoothing estimators for Ft,θ(t|x)
[y(t)
∣∣x]. The
first approach, referred herein as the “smoothing-early approach”, is to obtain the
smoothing estimators θ(t|x) of θ(t|x), and then estimate Ft,θ(t|x)
[y(t)
∣∣x] by the “plug-
in” smoothing estimator Ft,θ(t|x)
[y(t)
∣∣x]. The second approach, referred herein as
the “smoothing-later approach”, is to obtain the raw estimators Ftj ,θ(tj |x)
[y(tj)
∣∣x]for all j = 1, . . . , J , and then estimate Ft,θ(t|x)
[y(t)
∣∣x] by a smoothing estimator
Ft,θ(t|x)
[y(t)
∣∣x] based on Ftj ,θ(tj |x)
[y(tj)
∣∣x], j = 1, . . . , J . Both of these smoothing
approaches lead to appropriate conditional CDF estimators in practice.
13
2.2.2 Smoothing-Early conditional CDF Estimators
Suppose that θ(t|x) is (p+1) times differentiable with respect to t ∈ τ . Let θ(q)(t|x)
be the qth derivative of θ(t|x), 0 ≤ q ≤ p and βq(t|x) = θ(q)(t|x)/q!. By the Taylor
expansion of θ(t|x),
θ(t|x) ≈p∑q=0
βq(s0|x)(tj − s0)q
for t in some neighborhood of s0. We can treat the raw estimates θ(tj|x) as the
“observations” of θ(tj|x) at tj, j = 1, . . . , J , and obtain the pth local polynomial
estimators by minimizing
J∑j=1
{θ(tj|x)−
p∑q=0
βq(t|x)(tJ − t)q}2
Kh(tj − t0),
where Kh(tj−t) = K[(tj−t)/h]/h, K(·) is a non-negative kernel function, and h > 0 is
a bandwidth. Using the matrix formulation, we define θ(t|x) = (θ(t1|x), . . . , θ(tJ |x))T ,
β(t|x) = (β0(t|x), . . . , βp(t|x))T , G(t;h) = diag{Kh(tj−t)} with jth columnGj(t;h) =
(0, . . . , Kh(tj − t), . . . , 0)T , and Tp(t) the J × (p+ 1) matrix with its jth row given by
Tj,p(t) = (1, tj − t, . . . , (tj − t)p). The local polynomial estimators βq(t|x) minimize
QG
[β(t)
]=[θ(t|x)− Tp(t)β(t|x)
]TG(t;h)
[θ(t|x)− Tp(t)β(t|x)
].
The pth order local polynomial estimator of θ(q)(t|x) based on θ(tj|x), which minimizes
QG
[β(t)
], is
θ(q)(t|x) =J∑j=1
{Wq,p+1(tj, t;h) θ(tj|x)
}(4)
where Wq,p+1(tj, t;h) = q!eq+1,p+1
[T Tp (t)G(t;h)Tp(t)
]−1[T Tj,p(t)Gj(t;h)
]is the “equiv-
alent kernel function” (e.g., Fan and Zhang, 2000) and eq+1,p+1 is the row vector of
length p+ 1 with 1 at its (q + 1)th place and 0 elsewhere.
14
By the definition of β(t|x), we have β(t|x) =(β0(t|x), . . . , βp(t|x)
)Tand θ(q)(t|x) =
βp(t|x) q! for q = 0, . . . , p. The pth order local polynomial estimator θ(t|x) of θ(t|x)
is θ(t|x) = θ(p)(t|x). For the special case of p = 0, we get the local linear esti-
mator θL(t|x) = β0(t|x) of θ(t|x) based on (4) and the equivalent kernel function
W0,2(tj, t;h). Following (1) and (4), we substitute θ(t|x) with θ(t|x) and define the
smoothing estimator of Ft,θ(t|x)[y(t)|x] based on θ(t|x) to be
Ft,θ(t|x)
[y(t)
∣∣x] = Ft,θ(t|x)
[y(t)
∣∣x], (5)
where a common choice of θ(t|x) is the local linear estimator θL(t|x).
2.2.3 Smoothing-Later conditional CDF Estimators
Suppose that Ft,θ(t|x)
[y(t)
∣∣x] is (p+ 1) times differentiable with respect to t ∈ τ . Let
F(q)t,θ(t|x)
[y(t)
∣∣x] be the qth derivative of Ft,θ(t|x)
[y(t)
∣∣x], 1 ≤ q ≤ p, and γq(t|x) =
F(q)t,θ(t|x)
[y(t)
∣∣x]/q!. By the Taylor expansion of Ft,θ(t|x)[y(t)|x],
Ft,θ(t|x)
[y(t)
∣∣x] ≈ p∑q=0
γq(s0|x)(tj − s0)q
for t in some neighborhood of s0, we can treat the raw estimates Ftj ,θ(tj |x)
[y(tj)
∣∣x] as
the “observations” of Ftj ,θ(tj |x)
[y(tj)
∣∣x] at tj, j = 1, . . . , J , and obtain the pth local
polynomial estimators by minimizing
J∑j=1
{Ftj ,θ(tj |x)[y(tj)|x]−
p∑q=0
γq(t|x)(tJ − t)q}2
Kh(tj − t0)
with Kh(tj − t) = K[(tj − t)
/h]/h and h > 0.
Let F (t|x) = Ft,θ(t|x)
[y(t)
∣∣x], F (t|x) =(F (t1|x), . . . , F (tJ |x)
)T, γ(t) =
(γ0(t|x),
15
. . . , γp(t|x))T
, G(t;h) = diag{Kh(tj− t)
}with jth column Gj(t;h) =
(0, . . . , Kh(tj−
t), . . . , 0)T
, and Tp(t) the J × (p + 1) matrix with its jth row given by Tj,p(t) =(1, tj − t, . . . , (tj − t)p
). The local polynomial estimators βq(t|x) minimize
QG
[β(t|x)
]={F (t|x)− Tp(t)β(t|x)
}TG(t;h)
{F (t|x)− Tp(t)β(t|x)
},
and, consequently, the pth order local polynomial estimator of F(q)t,θ(t|x)
[y(t)
∣∣x] based
on Ftj ,θ(tj |x)
[y(tj)|x
], which minimizes QG
[β(t|x)
], is
F(q)t,θ(t|x)
[y(t)|x
]=
J∑j=1
{Wq,p+1(tj, t;h) Ftj ,θ(tj |x)
[y(tj)|x
]}(6)
where Wq,p+1(tj, t;h) is the “equivalent kernel function” defined in (4). By the defini-
tion of γ(t|x), we have γ(t|x) = (γ0(t|x), . . . , γp(t|x))T and F(q)t,θ(t|x)
[y(t)|x
]= γq(t|x) q!
for q = 0, . . . , p. Following (6), the smoothing-later pth order local polynomial esti-
mator of Ft,θ(t|x)[y(t)|x] is
Ft,θ(t|x)
[y(t)
∣∣x] = F(p)t,θ(t|x)
[y(t)
∣∣x]. (7)
For the special case of p = 0, the local linear estimator of Ft,θ(t|x)
[y(t)
∣∣x] based on
(7) is Ft,θ(t|x)
[y(t)
∣∣x] = γ0(t|x) with the equivalent kernel W0,2(tj, t;h).
2.3 Two-Steps estimation methods and inferences for un-
structured nonparametric models
In this approach, we estimate the empirical conditional CDF at each time points
and then apply the Nadaraya-Watson kernel smoothing method to get the smoothing
estimators of conditional distribution function on entire the time design points.
16
2.3.1 Raw estimates of unstructured nonparametric CDF
We have defined the indicator variable
I {Y (t) ≤ y(t)} =
1 if Y (t) ≤ y(t)
0 Otherwise
The different percentile values of y(t) are coming from a previous standard study.
Raw estimates of CDF by empirical method “at each time design point” is estimated
by
1nj
∑nji=1 I {Y (tij) ≤ y(tj)} ∀j.
Here i runs for subjects and j runs for time design points. These raw estimates of
conditional CDF by empirical methods show spiky behavior at different time points.
Dots of Figure 1 (c,d), Figure 2 (c,d) and Figure 3 (c,d) show this spiky patterns. In
addition, this approach suffers another major problem of nonexistence when quantiles
are at the extreme tail point. For example, if y(t) is considered as the 99th percentile,
the estimates of the above empirical CDF at different time points can be 1 and this
leads to no estimates of tail probabilities i.e., no estimate of P [Y (t) > y(t)|t].
2.3.2 Smoothing estimates of unstructured nonparametric CDF
Estimation of the conditional CDF Ft[y(t)
∣∣x] = P[Y (t) ≤ y(t)
∣∣t,X = x]
with
unstructured nonparametric models has been investigated by Wu and Tian (2013a)
using a kernel smoothing method. Let wi be a weight function which may be either
1/(nmi) or 1/N , the kernel estimator of Wu and Tian (2013a) for Ft[y(t)
∣∣x] is
Ft[y(t)
∣∣x] =
∑Jj=1
∑i∈Sj wi1[Yi(tj)≤y(t),Xi=x]Kh
(tj − t
)∑Jj=1
∑i∈Sj wiKh
(tj − t
) , (8)
17
where 1[·] is an indicator function and Kh
(tj − t
)= K
[(tj − t
)/h]
for some ker-
nel function K(·). By smoothing the indicator function 1[Yi(tj)≤y(t),Xi=x] through the
kernel weight wiKh(tj−t) for all i ∈ Sj and j = 1, . . . , J , Ft[y(t)
∣∣x] of (8) can be gen-
erally applied to conditional distributions which may not belong to the time-varying
parametric family Fθ(t). In contrast, however, the local polynomial estimators of (5)
and (7) depend on the time-varying parametric assumption of Fθ(t|x), and may lead to
biased estimates when the structural assumption of Fθ(t|x) is not satisfied. Table 14
of Appendix 1 gives the smoothing probabilities by local linear smoothing estimator
and Nadaraya-Watson kernel smoothing estimator for entire cohort.
Remark 2.1. Although it has been shown by Wu and Tian (2013a) that Ft[y(t)
∣∣x]has adequate properties when y(t) is within the interior of the support of Y (t), in
practice, Ft[y(t)
∣∣x] may have large bias and variance when y(t) is near the boundary
of the support of Y (t). Intuitively, when y(t) is near the boundary of the support of
Y (t) and the number of the observed Yi(tj) within the small neighborhood of t defined
by Kh(tj − t), is small, the values of 1[Yi(tj)≤y(t),Xi=x] are likely to be all 0 or all 1,
which may lead to Ft[y(t)
∣∣x] to be either 0 or 1. The smoothing estimators of (5)
and (7), however, compute the conditional distribution function Ft,θ(t)[y(t)|x] based
on the parametric model Fθ(t|x) and the estimators of the time-varying parameter
θ(t|x). The mean squared errors of Ft,θ(t|x)
[y(t)
∣∣x] and Ft,θ(t|x)
[y(t)
∣∣x] mainly depend
on the estimators of θ(t|x) and are less affected by the values of y(t). We compare
the estimators of (5) and (7) with Ft[y(t)
∣∣x] in the simulation study of Chapter 3.
2.4 Bandwidth Choices
The bandwidths of (5) and (7) may be selected either subjectively by examining the
plots of the estimated parameter curves or using a data-driven bandwidth selection
procedure. As demonstrated by the simulation studies in nonparametric estimation
18
with two-step local polynomial estimators, such as Fan and Zhang (2000), Wu, Tian
and Yu (2010) and Wu and Tian (2013a, 2013b), subjective bandwidth choices ob-
tained from examining the fitted curves of the estimators often produce appropriate
bandwidths in real applications.
Two cross validation approaches, the “Leave-One-Subject-Out Cross Validation”
(LSCV) and the “Leave-One-Time-Point-Out Cross Validation” (LTCV), have been
proposed by Wu and Tian (2013a, 2013b) for the selection of data-driven bandwidths
under the unstructured nonparametric models. These cross validation approaches can
be extended to the smoothing estimators (5) and (7) to provide a potential range of
suitable bandwidths. Let F(−i)t,θ(t|x)
[y(t)
∣∣x] with 1 ≤ i ≤ n be either the estimators (5)
or (7) of Ft,θ(t|x)
[y(t)
∣∣x] computed using the sample with all the observations of the
ith subject deleted, and let wi be a weight function which could be either 1/(nmi) or
1/N . The LSCV bandwidth hx,LSCV is the minimizer of the LSCV score
LSCV[y(·), x
]=
J∑j=1
∑i∈Sj
wi
{1[Yi(tj)≤y(tj),Xi=x] − F (−i)
tj ,θ(tj |x)
[y(tj)
∣∣x]}2
. (9)
For a heuristic justification of LSCV[y(·), x
], we can consider the expansion
LSCV[y(·), x
]=
J∑j=1
∑i∈Sj
wi
{1[Yi(tj)≤y(tj),Xi=x] − Ftj ,θ(tj |x)
[y(tj)
∣∣x]}2
+J∑j=1
∑i∈Sj
wi
{Ftj ,θ(tj |x)
[y(tj)
∣∣x]− F (−i)tj ,θ(tj |x)
[y(tj)
∣∣x]}2
+2J∑j=1
∑i∈Sj
wi
{{1[Yi(tj)≤y(tj),Xi=x] − Ftj ,θ(tj |x)
[y(tj)
∣∣x]}×{Ftj ,θ(tj |x)
[y(tj)
∣∣x]− F (−i)tj ,θ(tj |x)
[y(tj)
∣∣x]}}. (10)
The first term at the right-hand side of (10) does not involve the smoothing estimator,
19
hence, does not depend on the bandwidth. The expected value of the third term
at the right-hand side of (10) is zero, since the observations of the ith subject is
not included in F(−i)tj ,θ(tj |x)
[y(tj)
∣∣x]. Thus, by minimizing LSCV[y(·), x
], the LSCV
bandwidth hx,LSCV approximately minimizes the second term at the right-hand side
of (10), which is approximately the average squared error
ASE[y(·), x
]=
J∑j=1
∑i∈Sj
wi
{Ftj ,θ(tj |x)
[y(tj)
∣∣x]− Ftj ,θ(tj |x)
[y(tj)
∣∣x]}2
. (11)
A potential drawback for the LSCV approach is that the minimization of the LSCV
score (9) is often computationally intensive, particularly when the number of subjects
n is large, which hampers its application potential in real applications. Thus, it is
usually more practical to consider the alternative of k-fold LSCV, which is computed
by deleting the observations of k > 1 subjects in the computation of F(−i)tj ,θ(tj |x)
[y(tj)
∣∣x].Instead of deleting the subjects one at a time, the LTCV procedure deletes the
observations at the time design points t = {t1, . . . , tJ}. When J is smaller than n,
the LTCV procedure may be computationally simpler than the LSCV procedure. Let
F(−j)t,θ(t|x)
[y(t)
∣∣x] with 1 ≤ j ≤ J be either the estimators (5) or (7) of Ft,θ(t|x)
[y(t)
∣∣x]computed using the sample with all the observations at the time point tj deleted.
Then the value of F(−j)t,θ(t|x)
[y(t)
∣∣x] at time point tj is F(−j)tj ,θ(tj |x)
[y(tj)
∣∣x], and the LTCV
score for Ft,θ(t|x)
[y(t)
∣∣x] is
LTCV[y(·), x
]=
J∑j=1
∑i∈Sj
wi
{1[Yi(tj)≤y(tj),Xi=x] − F (−j)
tj ,θ(tj |x)
[y(tj)
∣∣x]}2
. (12)
The LTCV bandwidth hx,LTCV is the minimizer of LTCV[y(·), x
]. Similar to the
k-fold alternative for the LSCV, the k-fold LTCV bandwidths, which are obtained by
deleting k > 1 time points in t each time, may also be used in practical applications to
20
reduce the computing complexity when J is large. Table 15 in the preliminary analysis
of Appendix 1 gives some values of bandwidth together with cross validation scores
in the parenthesis obtained by AIC method for entire cohort, Caucasian cohort and
African American cohort. Table 16 in the preliminary analysis of Appendix 1 gives
some values of bandwidth together with cross validation scores in the parenthesis
obtained by Least Square method for entire cohort, Caucasian cohort and African
American cohort.
Remark 2.2. Since the observations at different time points are potentially cor-
related, the heuristic justification for LSCV[y(·), x
]based on (10) and (11) does not
apply to LTCV[y(·), x
], and the effects of the intra-subject correlations on the ap-
propriateness of the LTCV approach have not been systematically investigated. But
the simulation results of Wu and Tian (2013a, 2013b) have shown that the LTCV
approach may lead to appropriate bandwidths under the unstructured nonparametric
models. Theoretical and practical properties of both the LSCV and LTCV bandwidths
warrant substantial further investigation. In practice, the LSCV and LTCV band-
widths may only be used to provide a rough range of the appropriate bandwidths. The
bandwidths for a actual dataset may be selected by evaluating the overall information
from LSCV[y(·), x
], LTCV
[y(·), x
], the scientific interpretations and the smooth-
ness of the estimates.
Remark 2.3. Bandwidth is known as the smoothing parameter and kernels are
known as the weighting function. The bandwidth parameter controls the smoothness of
the probability estimation and also determines the tradeoff between the bias and vari-
ance of probability estimation. In general, the smaller the bandwidth is the smaller
the bias and the larger the variance. Intuitively, the smoothing estimator is just the
summation of many bumps, each one of them centered at an observation tj. The
kernel function K determines the shape of the bumps and the bandwidth parameter
21
h determines the width of the bumps. For local polynomial smoothing estimator or
smoothing estimator for structural nonparametric model, Epanechnikov kernel is used
as the weighting function. If Gaussian kernel is used instead, the same results can be
achieved as kernel does not change the shape of the smoothing estimates. For unstruc-
tured nonparametric model, Nadaraya-Watson estimator is being used with Epanech-
nikov kernel as the weighting function to get the smoothing estimate. For selecting
appropriate size of the bandwidth, we use subjective choices obtained by examining the
plots and data driven choices by examining the CV scores from the above two methods.
A choice of higher bandwidth value yields over smoothing results and vice-versa. In
our NGHS BP data, bandwidth size around 2.5 is the best representative.
2.5 Bootstrap Pointwise Confidence Intervals
Since different asymptotic distributions of the smoothing estimators may be obtained
depending on the longitudinal designs and whether and how fast mi, i = 1, . . . , n, con-
verge to infinity, statistical inferences based on the asymptotic approximations may
not be an appropriate option in practice, and a widely used inference approach for non-
parametric longitudinal analysis is through the “re-sampling-subject” bootstrap sug-
gested in Hoover et al. (1998). Under the current context, we can obtain a pointwise
bootstrap confidence interval for Ft,θ(t|x)
[y(t)
∣∣x] by first obtaining B bootstrap sam-
ples through re-sampling the subjects of the longitudinal sample with replacement,
and then computing B two-step smoothing estimators{F bt,θ(t|x)
[y(t)
∣∣x] : b = 1, . . . B}
using (5) or (7) and each of the bootstrap samples. The lower and upper bound-
aries of the [100× (1−α)]th empirical quantile bootstrap pointwise confidence inter-
val of Ft,θ(t|x)
[y(t)
∣∣x] are the empirical lower and upper [100 × (α/2)]th percentiles
based on the bootstrap estimators{F bt,θ(t|x)
[y(t)
∣∣x] : b = 1, . . . B}
. Alternatively,
if SD{F bt,θ(t|x)
[y(t)
∣∣x]} is the empirical standard deviation of{F bt,θ(t|x)
[y(t)
∣∣x] : b =
22
1, . . . B}
, the [100×(1−α)]th normally approximated bootstrap pointwise confidence
interval of Ft,θ(t|x)
[y(t)
∣∣x] is
Ft,θ(t|x)
[y(t)
∣∣x]± Z1−α/2 × SD{F bt,θ(t|x)
[y(t)
∣∣x]},where Z1−α/2 is the [100×(1−α/2)]th percentile of the standard normal distribution.
23
3 Chapter Three
Application and Simulation
3.1 Application to NGHS BP data
We apply our method to the NGHS BP data to estimate the conditional distribu-
tion functions of SBP, a main cardiovascular risk outcome, of the Caucasian and
African American girls and their trends over age between 9 and 19 years old. The
NGHS is a multicenter population based observational study designed to evaluate
the prevalence and incidence of cardiovascular risk factors in Caucasian and African-
American girls during childhood and adolescence. The study involves 1166 Caucasian
girls and 1213 African-American girls who had up to 10 annual follow-up visits be-
tween 9 and 19 years old and whose numbers of follow-up visits have median 9, mean
8.2 and standard deviation 2. Among all the important risk factors that have been
studied by the NGHS investigators, childhood systolic blood pressure (SBP) is an
important one. Detailed information about NGHS data can be found at the National
Heart, Lung and Blood Institute Biologic Specimen and Data Repository website
(https://biolincc.nhlbi.nih.gov).
Following the practical definition of age in pediatric studies (e.g., Obarzanek et
al., 2010), we round up the observed age to one decimal point with J = 100 distinct
time design points {t1, t2, . . . , t100} = {9.1, 9.2, . . . , 19.0}. Since our objective is to
estimate the conditional distribution functions of SBP for age t within the interior of
the observed age range, we omit the boundary age of t = 9.0 years in this analysis.
According to this time design points, the entire NGHS data has been partitioned into
24
sub-samples at 100 time design points. The first subsample corresponding to age 9.1
includes all girls that are in the age interval [9, 9.1) and so on for the rest of the
subsamples. The last subsample with age 19 includes all girls that are in the age
interval [18.9, 19). In our preliminary analysis based on the goodness-of-fit tests of
normality for the SBP distributions at the time design points{t1, t2, . . . , t100
}, we
observed that the conditional distributions of the natural logarithmic transformed
SBP given age and race can be reasonably approximated by normal distributions,
while the conditional distributions of the actual SBP given age and race are not
approximately normal. Thus, for a given 1 ≤ j ≤ J = 100 and i ∈ Sj, we denote by
Yi(tj) the natural logarithmic transformed SBP observation of the ith girl at age tj.
The time-invariant categorical covariate Xi is race, which is defined by Xi = 0 if the
ith girl is Caucasian, and Xi = 1 if she is African-American. The random variables
for the natural logarithmic transformed SBP at age t and race are Y (t) and X,
respectively. Given t and X = x, we consider the family of log-normal distributions
of SBP for this population of girls, that is, the conditional CDF of Y (t) is given by
the normal distributions Ft,θ(t|x)
[y(t)
∣∣x] of (2).
We have used all five goodness-of-fit tests of normality for SBP. These are Shapiro-
Wilk test, the Kolmogorov-Smirnov test, the Anderson-Darling test, the Cramer-
Von-Mises test and the Chi-Square test. By looking at the P-values as well as the
visual inspections of the QQ plots in the preliminary analysis part of Table 8 and
Figure 11 to Figure 19, of the Appendix-1, we have seen that 94 of 100 subsamples
follows normal distribution by Shapiro-Wilk test, 90 of 100 subsamples follows normal
distribution by Anderson-Darling test, all 100 subsamples follows normal distribution
by Kolmogorov-Smirnov test, 86 of 100 subsamples follows normal distribution by
Cramer-Von-Mises test and 72 of 100 subsamples follows normal distribution by Chi-
Square tests. Hence, according to the above goodness-of-fit tests, we can conclude
25
that almost all subsamples are approximately normal. If we apply the above normality
test on SBP when the data are extracted from the original data sets in such a way
that individuals with more than median height are only considered, we see that only
7 of the 100 subsamples show nonnormality by the Shapiro-Wilk test, the Anderson-
Darling test and the Cramer-Von-Mises test. Kolmogorov results remain same as
before. Chisquare test shows that only 13 of the 100 subsamples do not follow normal
distribution. For the goodness-of-fit tests, we have used 5% significance level. It is
noteworthy to mention that before splitting the data, SBP on entire time design points
is not normal whether SBP is log scaled or not. But, if we conduct a lognormaility
test, we have seen that SBP follows the lognormal distribution. P-values of these five
normality tests for 100 subsamples are given in Table 8. QQ plots of log scaled SBP
for these 100 subsamples are also given from Figure 11 to Figure 19 in the preliminary
analysis part of Appendix 1. Estimated raw probabilities (ERP) by the estimators
from the both structural nonparametric model (Gaussian Model) and unstructured
nonparametric model (Empirical approach) for entire cohort as well as for Caucasian
girls and for African American girls are given in Table 9, Table 10 and Table 11. Each
of these Tables gives the probability that the SBP exceeds the 90th percentile, the
95th percentile and the 99th percentile values of SBP determined by the gender and
age specific blood pressure quantiles (e.g., NHBPEP, 2004). Expected SBP (µ) for
girls of age t years and height h inches is given by
µ = α +∑4
r=1 βr(t− 10)r +∑4
s=1 τs(Zht)s
where α, βr and τs are given in the Table B1 (e.g., NHBPEP, 2004). When median
height is considered Zht equals zero. To compute yq(t) for q = .90, q = .95 and q = .99
i.e.,the 90th percentile, 95th percentile and 99th percentile of SBP are computed by
1.28 SD, 1.645 SD and 2.326 SD over µ. SD can be found in the same table. Table
13 in the preliminary analysis of the appendix 1 gives the age specific log scaled
26
percentile values of SBP of girls with median height.
A columnwise comparison between estimated raw probabilities by SNM and UNM
can be done from the Table 9, Table 10 and Table 11. From the Table 9, we see
that unstructured nonparametric model is unable to estimate tail probabilities at
5, 17 and 59 different time points out of 100 time design points when SBP exceeds
the 90thpercentiles, the 95th percentiles and the 99th percentiles respectively whereas
structural nonparametric models do not suffer these types of estimation problem.
Table 10 and Table 11 give us worse results than Table 9 when comparison is made for
estimated probabilities by SNM and UNM. Table 10 and Table 11 represent estimated
raw probabilities for Caucasian girls and African American girls. In general, if the
observations in each data set are small, we might end up with no raw estimate of
the tail probabilities by the existing unstructured nonparametric model. Without
raw estimates, we cannot go any further for smoothing estimates. Table 8 to Table
18 and Figure 11 to Figure 20 are presented in the preliminary results part of the
Appendix 1.
Applying the two-step local linear estimators of (5) and (7) to the observed
data{Yi(tj), Xi, tj; 1 ≤ j ≤ J, 1 ≤ i ≤ n
}, we compute the smoothing estimators
Ft,θ(t|x)
[yq(t)
∣∣x] and Ft,θ(t|x)
[yq(t)
∣∣x] of Ft,θ(t|x)
[yq(t)
∣∣x], where yq(t) is the natural
logarithmic transformed (100 × q)th percentiles of SBP for girls with median height
at age t (NHBPEP, 2004). By the monotonicity of the transformation, Ft,θ(t|x)
[yq(t)
∣∣x]is also the conditional probability of SBP at or below the (100× q)th SBP percentile
given in NHBPEP (2004) for girls with age t and race x. In addition to the smoothing
estimators based on the time-varying normal distributions Ft,θ(t|x)
[yq(t)
∣∣x] in (2), we
also compute the kenrel estimator Ft[yq(t)
∣∣x] of the unstructured conditional CDF
Ft[yq(t)
∣∣x] of Y (t) based on (8) with the wi = 1/(nmi) and 1/N weights. Smoothing
estimators of these conditional probabilities are also computed for the entire cohort
27
ignoring the race covariate. These smoothing probabilities are given in Table 14 of
Appendix 1.
Figure 1 shows the local linear smoothing estimates 1 − Ft,θ(t)[yq(t)
∣∣x] based on
(7) (q = 0.90 in Figure 1a; q = 0.95 in Figure 1b), the unstructured kernel estimators
1− Ft[yq(t)
∣∣x] based on (8) with wi = 1/N (q = 0.90 in Figure 1c; q = 0.95 in Figure
1d), and their corresponding empirical quantile bootstrap pointwise 95% confidence
intervals based on B = 1000 bootstrap replications for Caucasian girls (x = 0). The
Epanechnikov kernel and the bandwidth h = 2.5 were used for both the local linear
smoothing estimators and the unstructured kernel estimators. The bandwidth of
h = 2.5 was chosen by examining the LSCV and LTCV scores and the smoothness of
the fitted plots.
Figure 2 shows the local linear smoothing estimates 1− Ft,θ(t|x)
[yq(t)
∣∣x] (2a, 2b),
the unstructured kernel estimators 1 − Ft[yq(t)
∣∣x] with wi = 1/N (2c, 2d), and
their corresponding empirical quantile bootstrap pointwise 95% confidence intervals
based on B = 1000 bootstrap replications for African-American girls (x = 1) with
q = 0.90 and 0.95. Similar to Figure 1, the estimators of Figure 2 are based on the
Epanechnikov kernel and the bandwidth h = 2.5.
Figure 3 shows the local linear smoothing estimates 1− Ft,θ(t|x)
[yq(t)
∣∣x] (2a, 2b),
the unstructured kernel estimators 1− Ft[yq(t)
∣∣x] with wi = 1/N (2c, 2d), and their
corresponding empirical quantile bootstrap pointwise 95% confidence intervals based
on B = 1000 bootstrap replications for entire cohort with q = 0.90 and 0.95. Similar
to Figure 1 and Figure 2, the estimators of Figure 3 are based on the Epanechnikov
kernel and the bandwidth h = 2.5.
28
Both the smoothing estimates in Figure 1 and Figure 2 exhibit similar trends
over t. The estimates 1− Ft,θ(t|x)
[yq(t)
∣∣x] are slightly lower than the estimates based
on the unstructured approach 1 − Ft[yq(t)
∣∣x]. The 95% confidence intervals of 1 −
Ft,θ(t|x)
[yq(t)
∣∣x] are narrower than that of 1− Ft[yq(t)
∣∣x]. These narrower confidence
band implies that SNM gives better results than the UNM in smoothing estimation of
conditional distribution function. A comparison between the corresponding estimates
in Figure 1 and Figure 2 shows that the African-American girls are more likely to
have higher SBP values than the Caucasian girls. In otherwords, African American
girl has higher probability to develop hypertension than Caucasian girl.
In each of the above Figures, dots represent raw estimates, solid middle line repre-
sents smoothing curve and dotted lines represents 95% pointwise bootstrap confidence
band. We have also noticed that in UNM, many raw estimates of the tail probabilities
are zero. The scenarios is worse if we consider yq(t) as the 99th percentile. Smoothing
later approach has been adopted in Figure 1, Figure 2 and Figure 3. We also com-
puted the local linear estimators of 1−Ft,θ(t|x)
[yq(t)
∣∣x] based on 1−Ft,θ(t|x)
[yq(t)
∣∣x],the smoothing-early approach of (5). Figure 4 represents the local linear smooth-
ing estimator of µ(t) and σ(t) for whole cohort. Table 12 gives the local polynomial
smoothing estimates of µ(t) and σ(t) for 100 subsamples. Figure 5 represents local lin-
ear smoothing estimators of the conditional probability by smoothing early approach
for whole cohort. By the comparison of Figure 3 and Figure 5, we can say that we
will end up with the same results whatever be the smoothing approaches (Smoothing-
Early and Smoothing-Later). This means the numerical results of 1−Ft,θ(t|x)
[yq(t)
∣∣x]is similar to 1− Ft,θ(t|x)
[yq(t)
∣∣x]. To avoid redundancy, smoothing-early approach for
the Caucasian girls and African-American girls have been ignored.
29
10 12 14 16 18
0.00
0.05
0.10
0.15
a. Age vs P(Y(t) > y0.9(t))at Median height
Ages of CC girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
b. Age vs P(Y(t) > y0.95(t))at Median height
Ages of CC girlsP
(Y>
y 0.9
5(t))
10 12 14 16 18
0.00
0.05
0.10
0.15
c. Age vs P(Y(t) > y0.9(t))at Median height
Ages of CC girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
d. Age vs P(Y(t) > y0.95(t))at Median height
Ages of CC girls
P(Y
>y 0
.95(
t))
Figure 1: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th and 95thpopulation SBP percentiles for Caucasian girls (CC) between 9.1 and 19.0 years old.(1a) and (1b): Estimators based on the time-varying log-normal models. (1c)-(1d):Estimators based on the unstructured kernel estimators.
30
10 12 14 16 18
0.00
0.05
0.10
0.15
a. Age vs P(Y(t) > y0.9(t))at Median height
Ages of AA girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
b. Age vs P(Y(t) > y0.95(t))at Median height
Ages of AA girlsP
(Y>
y 0.9
5(t))
10 12 14 16 18
0.00
0.05
0.10
0.15
c. Age vs P(Y(t) > y0.9(t))at Median height
Ages of AA girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
d. Age vs P(Y(t) > y0.95(t))at Median height
Ages of AA girls
P(Y
>y 0
.95(
t))
Figure 2: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th and 95thpopulation SBP percentiles for African-American (AA) girls between 9.1 and 19.0years old. (1a) and (1b): Estimators based on the time-varying log-normal models.(1c)-(1d): Estimators based on the unstructured kernel estimators.
31
10 12 14 16 18
0.00
0.05
0.10
0.15
a. Age vs P(Y(t) > y0.9(t))at Median height
Ages of all girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
b. Age vs P(Y(t) > y0.95(t))at Median height
Ages of all girlsP
(Y>
y 0.9
5(t))
10 12 14 16 18
0.00
0.05
0.10
0.15
c. Age vs P(Y(t) > y0.9(t))at Median height
Ages of all girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
d. Age vs P(Y(t) > y0.95(t))at Median height
Ages of all girls
P(Y
>y 0
.95(
t))
Figure 3: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th and 95thpopulation SBP percentiles for all girls between 9.1 and 19.0 years old. (1a) and (1b):Estimators based on the time-varying log-normal models. (1c)-(1d): Estimators basedon the unstructured kernel estimators.
32
10 12 14 16 18
4.58
4.60
4.62
4.64
4.66
4.68
4.70
4.72
a. Age vs μ(t)
Ages of All girls
μ(t)
10 12 14 16 18
0.06
0.07
0.08
0.09
0.10
a. Age vs σ(t)
Ages of All girls
σ(t)
Figure 4: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific mean and standard deviation of SBP for All girlsbetween 9.1 and 19.0 years old. Estimators based on the time-varying log-normalmodels.
33
10 12 14 16 18
0.00
0.05
0.10
0.15
age
prob
90th
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
age
prob
95th
Figure 5: Local linear smoothing estimators (solid curves), and pointwise bootstrap95% confidence intervals (dashed curves, B = 1000 bootstrap replications) of theage specific probabilities of SBP greater than the 90th and 95th population SBPpercentiles for all girls between 9.1 and 19.0 years old. Estimators based on thetime-varying Gaussian models with smoothing early approach.
34
3.2 Simulation Results
Following the data structure of Section 1.2, we generate in each sample n=1000
subjects with 10 visits per subject. The jth visit time, tij, of the ith subject is
generated from the uniform distribution U(j − 1, j) for j = 1, . . . , 10. Given each tij,
we generate the observations Yi(tij) for Y (t) from the following simulation design:
Yij = 21.5 + .7(tij − 5)− 0.05(tij − 5)2 + a0i + εij,
a0i ∼ N(0, (2.5)2
), εij ∼ N
(0, (0.5)2
),
(13)
where εij are independent for all (i, j), and a0i and εij are independent. From (13),
E[Yi(tij)
∣∣tij] = 21.5 + 0.7(tij − 5
)− 0.05
(tij − 5
)2and V ar
[Yi(tij)
∣∣tij] = 6.5. For
each simulated sample{(Yi(tij), tij
): i = 1, . . . , 1000; j = 1, . . . , 10
}, we round up
the time points so that each tij belongs to one of the equally spaced time design
points {t0, . . . , t100} = {0, 0.1, 0.2, . . . , 10}. Let yq(t) be the (100× q)th percentile of
Y (t). It follows that P [Y (t) > yq(t)] = q. More Specifically, let y.90(t) and y.95(t) are
the 90th and 95th quantiles of Y(t). Since Y(t) follows normal distribution, hence
P [Y (t) > y.90(t)] = .10 and P [Y (t) > y.95(t)] = .05. Our theoretical 90th and 95th
quantiles for 10 different time points from 101 different time points for the above
model are given in Table 1. We repeatedly generate 1000 simulation samples. Within
Table 1: Theoretical 90th and 95th quantiles for 10 different time points out of 101different time points from the model of our simulation design.
Time(t) 1 2 3 4 5 6 7 8 9 10Y.90(t) 21.2 22.2 23.2 24 24.8 25.4 26 26.4 26.8 27Y.95(t) 22.1 23.1 24.1 24.9 25.7 26.3 26.9 27.3 27.7 27.9
each simulation sample, we compute the smoothing estimates of P [Y (t) > yq(t)] for
q = 0.90 and 0.95 by the local linear estimators (5) and (7) based on the time-varying
35
Gaussian model (2) and the unstructured kernel estimator (8) using the Epanechnikov
kernel and the LTCV bandwidths. The bootstrap pointwise 95% confidence intervals
for the smoothing estimators are constructed using empirical quantiles of B = 1000
bootstrap samples.
Let Pq(t) be a smoothing estimator of P [Y (t) > yq(t)] = q, which could be either
the local polynomial estimators, e.g., (5) or (7), or the unstructured kernel estimator
of (8). We measure the accuracy of Pq(t) by the average of the bias∑M
m=1
[Pq(t) −
q]/M , the empirical mean squared error MSE
[Pq(t)
]=∑M
m=1
[Pq(t) − q
]2/M or
the square root of MSE[Pq(t)
](root-MSE), where M = 1000 is the total number
of simulated samples. We assess the accuracy of a pointwise confidence interval of
Pq(t) by the empirical coverage probability of the confidence interval covering the
true value P [Y (t) > yq(t)] = q.
Table 2 shows the averages of the estimates, averages of the biases, the root-MSEs,
and the empirical coverage probabilities of the empirical quantile bootstrap pointwise
95% confidence intervals based on B = 1000 bootstrap replications for the estimation
of P [Y (t) > y.90(t)] = 0.10 at t = 1.0, 2.0, . . . , 10.0. For all the 10 time points, the
smoothing-later local linear estimators based on the time-varying Gaussian model
have smaller root-MSEs than the kernel estimators based on the unstructured non-
parametric model. Comparing the empirical coverage probabilities of the bootstrap
pointwise 95% confidence intervals, we observe that the smoothing estimators based
on the time-varying Gaussian model have higher coverage probabilities than the un-
structured kernel estimators at most of the time points.
When q increases to 0.95, Table 3 shows the averages of the biases, the root-
MSEs, and the empirical coverage probabilities of the empirical quantile bootstrap
pointwise 95% confidence intervals based on B = 1000 bootstrap replications for the
estimation of P [Y (t) > y.95(t)] = 0.05 at t = 1.0, 2.0, . . . , 10.0. Again, the smoothing
36
estimators based on the time-varying Gaussian model have smaller root-MSEs than
the unstructured kernel estimators at all 10 time points.
From Figure 6, if we compare (a) with (c) and (b) with (d), we see that smoothing
estimators from the unstructured nonparametric models gives wider confidence bands
than the smoothing estimators from the structural nonparametric models. In Figure
6, (a and b) are estimators from time varying Gaussian models and (c and d) are
estimators from unstructured kernel method. Root MSE of SNM for each time point
is smaller than the root MSE of UNM and consequently relative root MSE at each
time point is smaller than 1, which means that SNM is better than UNM. In Table
2 and Table 3, we have presented the simulation results for only 10 different integer
time points. Simulation results between integer time points are similar to the ones
presented in Table 2 and Table 3.
The results of Table 2 and Table 3 suggest that, when the time-varying paramet-
ric model is appropriate, the structural two-step smoothing estimators have smaller
mean squared errors than the unstructured smoothing estimators under a practical
longitudinal sample with moderate sample sizes. However, these results may not
hold if the time-varying parametric model is not an appropriate approximation to the
time-varying distribution functions of the longitudinal variable being considered.
37
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
a. Age vs P(Y > y0.9(t))
Age
P(Y
>y 0
.9(t)
)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
b. Age vs P(Y > y(c))
Age
P(Y
>y(
c))
0 2 4 6 8 10
0.00
0.05
0.10
0.15
0.20
c. Age vs P(Y > y0.9(t))
Age
P(Y
>y 0
.9(t)
)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
d. Age vs P(Y > y(c))
Age
P(Y
>y(
c))
Figure 6: Black solid line is local polynomial (a,b) and Nadaraya-Watson (c,d)smoothing estimators with Epanechnikov kernel for SNM and UNM. Dotted linesrepresent the 95% pointwise bootstrap confidence band for 1000 simulated samples.
38
Tab
le2:
Ave
rage
sof
esti
mat
es,
aver
ages
ofth
ebia
ses,
the
squar
ero
otof
the
mea
nsq
uar
eder
rors
,th
eem
pir
ical
cove
rage
pro
bab
ilit
ies
ofth
eem
pir
ical
quan
tile
boot
stra
pp
ointw
ise
95%
confiden
cein
terv
als
(B=
1000
boot
stra
pre
plica
tion
s)fo
rth
ees
tim
atio
nofP
[Y(t
)>y .
90(t
)]=
0.10
and
rela
tive
Root
MSE
att
=1.
0,2.
0,...,
10.0
over
1000
sim
ula
ted
sam
ple
.T
he
smoot
hin
g-la
ter
loca
llinea
res
tim
ator
sbas
edon
the
tim
e-va
ryin
gG
auss
ian
model
are
show
nin
the
left
pan
el.
The
kern
eles
tim
ator
sbas
edon
the
unst
ruct
ure
dnon
par
amet
ric
model
are
show
nin
the
righ
tpan
el.
The
Epan
echnik
ovke
rnel
and
the
LT
CV
ban
dw
idthh
=2.
5ar
euse
dfo
ral
lth
esm
oot
hin
ges
tim
ator
s.
Str
uct
ura
lN
onpar
amet
ric
Model
Unst
ruct
ure
dN
onpar
amet
ric
Model
Tim
eE
stim
ate
Ave
.Bia
s√MSE
Cov
erag
eE
stim
ate
Ave
.Bia
s√MSE
Cov
erag
eR
elat
ive√MSE
10.
0998
-0.0
002
0.00
920.
935
0.09
98-0
.000
20.
0111
0.93
00.
8288
20.
1005
0.00
050
0.00
950.
935
0.10
040.
0004
0.01
160.
940
0.81
903
0.10
030.
0003
00.
0091
0.91
50.
1003
0.00
030.
0113
0.89
00.
8053
40.
1000
0.00
000
0.00
930.
935
0.10
000.
0000
0.01
130.
925
0.82
305
0.09
99-0
.000
10.
0092
0.94
0.10
000.
0000
0.01
150.
945
0.80
006
0.10
040.
0004
00.
0092
0.92
0.10
060.
0006
0.01
120.
920
0.82
147
0.10
010.
0001
00.
0093
0.94
50.
1000
0.00
000.
0117
0.90
50.
7949
80.
1002
0.00
020
0.00
900.
940.
1002
0.00
020.
0111
0.92
50.
8108
90.
1000
0.00
000
0.00
910.
915
0.09
99-0
.000
10.
0113
0.92
50.
8053
100.
1002
0.00
020
0.01
100.
925
0.10
030.
0003
0.01
400.
915
0.78
57
39
Tab
le3:
Ave
rage
sof
esti
mat
es,
aver
ages
ofth
ebia
ses,
the
squar
ero
otof
the
mea
nsq
uar
eder
rors
,th
eem
pir
ical
cove
rage
pro
bab
ilit
ies
ofth
eem
pir
ical
quan
tile
boot
stra
pp
ointw
ise
95%
confiden
cein
terv
als
(B=
1000
boot
stra
pre
plica
tion
s)fo
rth
ees
tim
atio
nofP
[Y(t
)>y .
90(t
)]=
0.05
and
rela
tive
Root
MSE
att
=1.
0,2.
0,...,
10.0
over
1000
sim
ula
ted
sam
ple
.T
he
smoot
hin
g-la
ter
loca
llinea
res
tim
ator
sbas
edon
the
tim
e-va
ryin
gG
auss
ian
model
are
show
nin
the
left
pan
el.
The
kern
eles
tim
ator
sbas
edon
the
unst
ruct
ure
dnon
par
amet
ric
model
are
show
nin
the
righ
tpan
el.
The
Epan
echnik
ovke
rnel
and
the
LT
CV
ban
dw
idthh
=2.
5ar
euse
dfo
ral
lth
esm
oot
hin
ges
tim
ator
s.
Str
uct
ura
lN
onpar
amet
ric
Model
Unst
ruct
ure
dN
onpar
amet
ric
Model
Tim
eE
stim
ate
Ave
.Bia
s√MSE
Cov
erag
eE
stim
ate
Ave
.Bia
s√MSE
Cov
erag
eR
elat
ive√MSE
10.
0499
-0.0
001
0.00
610.
915
0.04
99-0
.000
10.
0081
0.91
50.
7531
20.
0504
0.00
040.
0063
0.93
50.
0504
0.00
040.
0085
0.93
00.
7412
30.
0502
0.00
020.
0061
0.90
00.
0504
0.00
040.
0081
0.88
50.
7531
40.
0500
0.00
000.
0061
0.94
00.
0497
-0.0
003
0.00
840.
920
0.72
625
0.05
00-0
.000
0.00
620.
925
0.04
98-0
.000
20.
0084
0.93
00.
7381
60.
0505
0.00
020.
0061
0.91
50.
0503
0.00
030.
0081
0.88
50.
7531
70.
0501
0.00
010.
0062
0.94
50.
0499
-0.0
001
0.00
820.
905
0.75
618
0.05
020.
0002
0.00
600.
960
0.05
020.
0002
0.00
800.
925
0.75
009
0.05
000.
0000
0.00
600.
910
0.05
010.
0001
0.00
830.
915
0.72
2910
0.05
020.
0002
0.00
730.
935
0.05
000.
0000
0.01
020.
915
0.71
57
40
4 Chapter Four
Asymptotic Results
We establish in this section the asymptotic bias, variance and mean squared errors
(MSE) of the smoothing-later local polynomial estimator F(q)t,θ(t|x)
[y(t)
∣∣x] of (6). Be-
cause the smoothing-early estimator Ft,θ(t|x)
[y(t)
∣∣x] of (5) is a function of the two-step
local polynomial estimator θ(t|x) and the asymptotic properties of θ(t|x) have been
established in Fan and Zhang (2000), the asymptotic properties of θ(t|x) are not pre-
sented here in order to avoid redundancy. The asymptotic bias, variance and MSE
of Ft,θ(t|x)
[y(t)
∣∣x] can be derived by applying the delta method to the asymptotic
results of θ(t|x) (e.g., van der Vaart, 1998, Ch 3). Asymptotic distribution of θ(t|x)
and Ft,θ(t|x)
[y(t)
∣∣x] have been derived under the Gaussian model assumption.
4.1 Asymptotic Properties of the Raw Estimators
Following Section 3.1, θ(tj|x) at each of the time design point tj ∈ t is estimated
by the MLE θ(tj|x), and the raw estimator of Ftj ,θ(tj |x)
[y(tj)
∣∣x] is Ftj ,θ(tj |x)
[y(tj)
∣∣x].Suppose that the classical regularity conditions of the MLEs, i.e., the conditions of
Theorem 5.41 of van der Vaart (1998), are satisfied. Then, for all tj ∈ t, n1/2j
[θ(tj|x)−
θ(tj|x)]
has asymptotically the N(0, I−1(θ(tj|x)
)distribution, where I
[θ(tj|x)
]is
the Fisher information matrix at θ(tj|x). It follows that θ(tj|x) is asymptotically
unbiased for θ(tj|x), i.e., E[θ(tj|x)
]= θ(tj|x) and the asymptotic variance of θ(tj|x)
is n−1j I−1
[θ(tj|x)
].
At different time points tj 6= tk, θ(tj)
and θ(tk) are possibly correlated, and the
41
covariance Cov[θ(tj|x), θ(tk|x)
]may depend on the design and unknown correlation
structure of the longitudinal sample. If Cov[θ(tj|x), θ(tk|x)
]has the convergence rate
r(nj, nk, njk
), which depends on the numbers of subjects observed at tj and tk, the
asymptotic expression of Cov[θ(tj|x), θ(tk|x)
]can be written as
limn→∞
r(nj, nk, njk
)Cov
[θ(tj|x), θ(tk|x)
]= ρθ
(tj, tk|x
), (14)
for some limiting function ρθ(tj, tk|x
). Since the correlation structure of the longi-
tudinal sample is unknown, the exact expression of ρθ(tj, tk|x
)is unknown, while it
is known that ρθ(tj, tk|x
)is bounded for all (tj, tk) and may depend on the model
Fθ(t|x), the expression of θ(t|x) at t = tj and tk, and the distance tj − tk = djk.
By the delta method and Theorem 5.41 of van der Vaart (1998), it follows the
asymptotic properties of θ(tj|x) that, as n→∞,
E{Ftj ,θ(tj |x)
[y(tj)
∣∣x]}− Ftj ,θ(tj |x)
[y(tj)
∣∣x] = o(n−1/2j
)njV ar
{Ftj ,θ(tj |x)
[y(tj)
∣∣x]}→ F ′tj ,θ(tj |x)
[y(tj)
∣∣x]T I−1[θ(tj|x)
]F ′tj ,θ(tj |x)
[y(tj)
∣∣x]r(nj, nk, njk
)Cov
{Ftj ,θ(tj |x)
[y(tj)
∣∣x], Ftk,θ(tk|x)
[y(tk)
∣∣x]}→ ρF(tj, tk|x
), j 6= k,
(15)
and n1/2j
{Ftj ,θ(tj |x)
[y(tj)
∣∣x] − Ftj ,θ(tj |x)
[y(tj)
∣∣x]} has asymptotically the normal dis-
tribution with mean 0 and variance F ′tj ,θ(tj |x)
[y(tj)
∣∣x]T I−1[θ(tj|x)
]F ′tj ,θ(tj |x)
[y(tj)
∣∣x],where F ′tj ,θ(tj |x)
[y(tj)
∣∣x] is the column vector of partial derivatives of Ftj ,θ(tj |x)
[y(tj)
∣∣x]with respect to θ(tj|x), and the bounded limiting covariance function ρF
(tj, tk|x
)de-
pend on the unknown covariance ρθ(tj, tk|x
). When θ(tj) represents the parameters
of a Gaussion model, we have θ(tj) = (µ(tj), σ2(tj))T . Asymptotic distributions of
µ(tj) and σ2(tj) are respectively given as
√n(tj)(µ(tj)− µ(tj)) ∼ N(0, σ2(tj))√
n(tj)(σ2(tj)− σ2(tj)) ∼√n(tj)(S
2(tj)− σ2(tj)) ∼ N(0, 2σ4(tj))
42
Where S2(tj) =∑
(Yi(tj)−Y (tj))2
n(tj)−1
If n is large, σ2(tj) =∑
(Yi(tj)−Y (tj))2
n(tj)and S2(tj) =
∑(Yi(tj)−Y (tj))
2
n(tj)−1are equivalent.
When Ftj ,θ(tj |x)
[y(tj)
∣∣x] is a normal CDF, the asymptotic distribution of its esti-
mator is as follows: After plugging the MLEs of µ(tj) and σ2(tj) on Ftj ,θ(tj)[y(tj)|tj]
and doing some algebraic manipulation, we have
Ftj ,θ(tj)[y(tj)|x] = Ftj ,θ(tj)[y(tj)|x]
=
∫ y(tj)
−∞
exp
[−(y(tj)−y(tj))
2
2(y(tj)2−y(tj)2)
]√
2π
√y(tj)
2 − y(tj)2dy(tj)
= h[y(tj), y2(tj)].
Let α1(tj) = E[y(tj)] and α2(tj) = E[y2(tj)]. By the multivariate delta method, we
can show that,
√n(h[y(tj), y2(tj)]− h[α1(tj), α2(tj)]) ∼ N(0, τ 2)
where, τ 2 = σ11δ2hδα2
1+ σ22
δ2hδα2
2+ 2σ12
δhδα1
δhδα2
,
σ11 = V ar[y(tj)] =σ2(tj)
n,
σ22 = V ar[y2(tj)] =4µ2(tj)σ
2(tj)+2σ4(tj)
n,
σ12 = Cov[y(tj), y2(tj)] = −µ(tj)[µ2(tj)+σ
2(tj)]
n
y2(tj) =∑y2(tj)
n.
First and second order derivatives of h with respect to α′s are straightforward com-
putation (Theorem 8.16, Lehmann and Casella, 1998).
4.2 Asymptotic Properties of the Smoothing Estimators:
We assume the following asymptotic assumptions for the two-step local polynomial
estimators F(q)t,θ(t|x)
[y(t)|x
]= F
(q)
t,θ(t|x)
[y(t)|x
]given in (6):
43
A1. If n→∞, then h→ 0, n1/2hp−q+1 →∞, Jh→∞, and nJh2q+1 →∞.
A2. The design time points {t1, t2, . . . , tJ} are independent and identically distributed
with density function g(t). For all 1 ≤ j ≤ J , 1 ≤ j1 ≤ J and 1 ≤ j2 ≤ J
with j1 6= j2, there are known constants 0 < cj ≤ 1 and 0 < cj1j2 ≤ 1 such that
limn→∞(nj/n) = cj and limn→∞(nj1j2/n) = cj1j2 .
A3. The conditional CDFs Ft,θ(t|x)
[y(t)|x
]are p+1 times continuously differentiable
with respect to t.
A4. The kernel function K(·) is a bounded symmetric probability density function
with support within a bounded set [−a, a] for some a > 0.
A5. There is a δ which may tend to 0 as n → ∞, such that the visit times of the
subjects satisfy∣∣tij − ti,j−1
∣∣ > δ for all 1 ≤ i ≤ n and j = 2, . . . ,mi. If δ < ah,
the convergence rate r(nj, nk, njk) and the bandwidth h satisfy the relationship∑Jj=1
∑k:δ≤|tk−tj |≤ah r(nj, nk, njk) = o(Jh) when n is sufficiently large.
Assumptions A1-A4 are similar to the asymptotic conditions used in the estima-
tion of conditional distribution functions with longitudinal data, such as Wu, Tian and
Yu (2010) and Wu and Tian (2013a, 2013b). Assumption A5 is specifically motivated
by the designs of practical longitudinal studies, such as the NGHS, in which there is
usually a prespecified gap δ between the visiting times of the same subject. Although
the assumption of∑J
j=1
∑k:δ≤|tk−tj |≤ah r(nj, nk, njk) = o(Jh) does not appear to be
intuitive, Assumption A5 suggests that, when |tj − tk| is close to zero, the number
of subjects having measurements at both tj and tk, i.e., njk, is small, which leads to
small correlation between the raw estimates Ftj ,θ(tj |x)
[y(tj)
∣∣x] and Ftk,θ(tk|x)
[y(tk)
∣∣x].Let Kq,p+1(t) = eTq,p+1S
−1(1, t, . . . , tp
)TK(t) be the equivalent kernel of local poly-
nomial fit with S =(skl)k,l=0,1,...,p
and skl =∫K(u)uk+ldu, Bp+1(K) =
∫K(u)up+1du
44
and V (K) =∫K2(u)du. The next theorem summarizes the asymptotic expressions
of the bias, variance and mean squared errors of F(q)
t,θ(t|x)
[y(t)|x
].
Theorem 1. Suppose that the Assumptions A1-A5 are satisfied with c = cj and
the asymptotic mean, variance and covariance of the raw estimator Ftj ,θ(tj |x)
[y(tj)|x
]for j = 1, . . . , J are given by (15). When n is sufficiently large,
Bias{F
(q)t,θ(t|x)
[y(t)
∣∣x]} = E{F
(q)t,θ(t|x)
[y(t)
∣∣x]}− F (q)t,θ(t|x)
[y(t)
∣∣x] (16)
=q!hp−q+1
p+ 1F
(p+1)t,θ(t|x)
[y(t)|x
]Bp+1
(Kq,p+1
)[1 + op(1)
],
V ar{F
(q)t,θ(t|x)
[y(t)
∣∣x]} =(q!)2
cnJh2q+1g(t)V(Kq,p+1
)(17)
×F ′t,θ(t|x)
[y(t)
∣∣x]T I−1[θ(t|x)
]F ′t,θ(t|x)
[y(t)
∣∣x][1 + op(1)]
and the asymptotic expression of the mean squared error (MSE)
MSE{F
(q)t,θ(t|x)
[y(t)
∣∣x]} = Bias{F
(q)t,θ(t|x)
[y(t)
∣∣x]}2+ V ar
{F
(q)t,θ(t|x)
[y(t)
∣∣x]}, (18)
is given by substituting Bias{F
(q)t,θ(t|x)
[y(t)
∣∣x]} and V ar{F
(q)t,θ(t|x)
[y(t)
∣∣x]} of (18) with
the right sides of (16) and (17), respectively.
Proof. See Appendix A2.
Remark 4.1. Special cases of Theorem 1 can be easily derived from (16), (17)
and (18). For the local linear estimator of Ft,θ(t|x)
[y(t)
∣∣x]}, we have q = 0 and p = 1,
so that the asymptotic MSE of Ft,θ(t|x)
[y(t)
∣∣x]} is
MSE{Ft,θ(t|x)
[y(t)
∣∣x]} ={h4B2(t|x) + h−1(nJ)−1V(t|x)
}[1 + op(1)
], (19)
45
where B(t|x) = F ′′t,θ(t|x)
[y(t)|x
]B2
(K0,2
)/2 and
V(t|x) =[cg(t)
]−1V(K0,2
)F ′t,θ(t|x)
[y(t)
∣∣x]T I−1[θ(t|x)
]F ′t,θ(t|x)
[y(t)
∣∣x].Setting ∂MSE
{Ft,θ(t|x)
[y(t)
∣∣x]}/∂h to zero, the theoretically optimal bandwidth hopt
which minimizes the dominating term at the right side of (19) is
hopt = (nJ)−1/5[V(t|x)
]1/5[4B2(t|x)
]−1/5.
Substituting hopt into (19), the MSE of the local linear estimator Ft,θ(t|x)
[y(t)
∣∣x]} is
MSE{Ft,θ(t|x)
[y(t)
∣∣x];hopt} = (nJ)−4/5[V(t|x)
]4/5[B(t|x)]2/5(
2−8/5 + 22/5), (20)
which suggests that the optimal rate for the MSE of Ft,θ(t|x)
[y(t)
∣∣x]} to converge to
zero is (nJ)4/5.
Remark 4.2. By Assumption A5, the covariances of the raw estimators do not
affect the asymptotic MSE of the smoothing estimator Ft,θ(t|x)
[y(t)
∣∣x]. In many
practical situations, the visit times of the same subject are set to be larger than a
fixed value, so that δ > 0 is fixed. It can be seen from the proof of Theorem 1 (A.2 of
Appendix) that the contribution of the covariances of the raw estimators to the MSE
of Ft,θ(t|x)
[y(t)
∣∣x] is ignorable because of the local smoothing nature of Ft,θ(t|x)
[y(t)
∣∣x].The assumption that δ > 0 is fixed is appropriate for the NGHS, since the actual visit
times for each subject of NGHS are at least 6 months apart.
46
5 Chapter Five
5.1 Time-Varying Models with Locally Transformed Vari-
ables
Our method is applied to the NGHS BP data to evaluate the conditional CDF of
SBP for African American girls, Caucasian girls and entire cohort and their trends
over ages 9 to 19 years old. We also apply local linear smoothing estimators to
get the smoothing estimates of the conditional CDF on entire time design points.
When the variable of interest is locally transformed, we can use only smoothing-
later local linear smoothing estimator. In many instances, it has been seen that
locally transformed variables show more stability in the sense of normality than a
global transformation on the subsamples partitioned over time design points. NGHS
is a multicenter population-based cohort study designed to evaluate the prevalence
and incidence of cardiovascular risk factors in Caucasian and African-American girls
during childhood and adolescence. This study included 1166 Caucasian and 1213
African American girls for follow-up visits with the summary statistics as: a range
from 1 to 10 visits, median 9, mean 8.2 and standard deviation 2. Among all the
important risk factors that have been studied by the NGHS investigators, childhood
systolic blood pressure (SBP) is an important one.
Because the entry age starts at 9, the observed age in our analysis is limited
to T = [9.1, 19] and rounded up to first decimal point. This age round-up has the
required clinical accuracy for age (Obarzanek et al.,2010), which leads to J=100
distinct time-design points {t1 = 9.1, t2 = 9.2, . . . t100 = 19}. According to this time
design points, the entire NGHS data has been partitioned to 100 subsamples. The
47
first subsample corresponding to age 9.1 includes all girls that are in the age interval
[9, 9.1) and so on for the rest of the subsamples. The last subsample with age 19
includes all girls that are in the age interval [18.9, 19). In our preliminary analysis
based on the goodness-of-fit tests of normality for the SBP distributions at the time
design points{t1, t2, . . . , t100
}, we observed that the conditional distributions of the
Box-Cox transformed SBP given age and race can be reasonably approximated by
normal distributions, while the conditional distributions of the actual SBP given age
and race are not approximately normal. Thus, for a given 1 ≤ j ≤ J = 100 and
i ∈ Sj, we denote by Yi(tj) the Box-Cox transformed SBP observation of the ith
girl at age tj. The time-invariant categorical covariate Xi is race, which is defined
by Xi = 0 if the ith girl is Caucasian, and Xi = 1 if she is African-American. The
random variables for the Box-Cox transformed SBP at age t and race are Y (t) and X,
respectively. Given t and X = x, we consider the family of Power Normal Distribution
(PN) of SBP for this population of girls, that is, the conditional CDF of Y (t) is given
by the normal distributions Ft,θ(t|x)
[y(t)
∣∣x] of (2).
Box-Cox transformation is applied on Z(t) for normality. This transformation is
known as the local Box-Cox as λ(t) varies at different time points. The parameter
λ(t) is estimated by Maximum Likelihood method which takes values in the closed
interval [-2 , 2]. When λ(t) = 0, a log transformation is considered. The local Box-Cox
transformation on Z(t) at time point t is given by
Y (t) =
Zλ(t)(t)−1
λ(t)if λ(t) 6= 0,
log(Z(t)) if λ(t) = 0.
In a series of preliminary goodness-of-fit tests of normality for SBP, such as the
Shapiro-Wilk test, the Kolmogorov-Smirnov test, the Anderson-Darling test, the
48
Cramer-Von-Mises test as well as visual inspections of the QQ plots, we have seen
that 96 of 100 subsamples follow normal distribution by Shapiro-Wilk test, 92 of 100
subsamples follow normal distribution by Anderson-Darling test, all 100 subsamples
follow normal distribution by Kolmogorov-Smirnov test, 92 of 100 subsamples follow
normal distribution by Cramer-Von-Mises test. According to the above goodness-of-
fit tests, we can conclude that almost all subsamples are approximately normal. If
we apply the above normality tests on SBP when the data are extracted from the
original subsamples in such a way that individuals with more than median height are
only considered, then we see that only 2 of the 100 subsamples show nonnormality by
Shapiro-Wilk test, 4 of the 100 subsamples show nonnormality by Anderson-Darling
test and 2 of the 100 subsamples show nonnormality by Cramer-Von-Mises test. Kol-
mogorov results remain same as before.
For the goodness-of-fit tests, we have used 5% significance level. It is noteworthy to
mention that before splitting the NGHS longitudinal data, SBP is not normal. But if
we conduct a lognormality test, we have seen that SBP follows lognormal distribution.
Tabular representation of P-values, All estimated probabilities and QQ plots are ig-
nored to avoid redundancy for the Box-Cox transformed SBP. Applying the two-step
local linear estimator of (7) to the observed data{Yi(tj), Xi, tj; 1 ≤ j ≤ J, 1 ≤ i ≤ n
},
we compute the smoothing estimator Ft,θ(t|x)
[yq(t)
∣∣x] of Ft,θ(t|x)
[yq(t)
∣∣x], where yq(t)
is the Box-Cox transformed (100×q)th percentiles of SBP for girls with median height
at age t (NHBPEP, 2004). By the monotonicity of the transformation, Ft,θ(t|x)
[yq(t)
∣∣x]is also the conditional probability of SBP at or below the (100× q)th SBP percentile
given in NHBPEP (2004) for girls with age t and race x. In addition to the smoothing
estimators based on the time-varying normal distributions Ft,θ(t|x)
[yq(t)
∣∣x] in (2), we
also compute the kenrel estimator Ft[yq(t)
∣∣x] of the unstructured conditional CDF
Ft[yq(t)
∣∣x] of Y (t) based on (8) with the wi = 1/(nmi) or 1/N weights. Smoothing
49
estimators of these conditional probabilities are also computed for the entire cohort
ignoring the race covariate.
Figure 7 shows the local linear smoothing estimates 1−Ft,θ(t)[yq(t)
∣∣x] based on (7)
(q = 0.90 in Figure a; q = 0.95 in Figure c and q = .99 in Figure e), the unstructured
kernel estimators 1 − Ft[yq(t)
∣∣x] based on (8) with wi = 1/N (q = 0.90 in Figure
b; q = 0.95 in Figure d and q = .99 in Figure f), and their corresponding empirical
quantile bootstrap pointwise 95% confidence intervals based on B = 1000 bootstrap
replications for Caucasian girls (x = 0). The Epanechnikov kernel and the bandwidth
h = 2.5 were used for both the local linear smoothing estimators and the unstructured
kernel estimators. The bandwidth of h = 2.5 was chosen by examining the LSCV
and LTCV scores and the smoothness of the fitted plots.
Figure 8 shows the local linear smoothing estimates 1 − Ft,θ(t|x)
[yq(t)
∣∣x] (a, c,
e), the unstructured kernel estimators 1 − Ft[yq(t)
∣∣x] with wi = 1/N (b, d, f), and
their corresponding empirical quantile bootstrap pointwise 95% confidence intervals
based on B = 1000 bootstrap replications for African-American girls (x = 1) with
q = 0.90, 0.95 and .99. Similar to Figure 7, the estimators of Figure 8 are based on
the Epanechnikov kernel and the bandwidth h = 2.5.
Figure 9 shows the local linear smoothing estimates 1 − Ft,θ(t|x)
[yq(t)
∣∣x] (a, c,
e), the unstructured kernel estimators 1 − Ft[yq(t)
∣∣x] with wi = 1/N (b, d, f), and
their corresponding empirical quantile bootstrap pointwise 95% confidence intervals
based on B = 1000 bootstrap replications for entire cohort with q = 0.90, 0.95 and
.99. Similar to Figure 7 and Figure 8, the estimators of Figure 9 are based on the
Epanechnikov kernel and the bandwidth h = 2.5.
50
10 12 14 16 18
0.00
0.05
0.10
0.15
a. Age vs P(Y(t) > y0.9(t))at Median height by SNM
Ages of CC girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.05
0.10
0.15
b. Age vs P(Y(t) > y0.9(t))at Median height by UNM
Ages of CC girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
c. Age vs P(Y(t) > y0.95(t))at Median height by SNM
Ages of CC girls
P(Y
>y 0
.95(
t))
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
d. Age vs P(Y(t) > y0.95(t))at Median height by UNM
Ages of CC girls
P(Y
>y 0
.95(
t))
10 12 14 16 18
0.00
0.01
0.02
0.03
0.04
e. Age vs P(Y(t) > y0.99(t))at Median height by SNM
Ages of CC girls
P(Y
>y 0
.99(
t))
10 12 14 16 18
0.00
0.01
0.02
0.03
0.04
f. Age vs P(Y(t) > y0.99(t))at Median height by UNM
Ages of CC girls
P(Y
>y 0
.99(
t))
Figure 7: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th, 95th and99th population SBP percentiles for Caucasian girls (CC) between 9.1 and 19.0 yearsold. (a),(c),(e): Estimators based on the time-varying Gaussian models. (b),(d),(f):Estimators based on the unstructured kernel estimators.
51
10 12 14 16 18
0.00
0.05
0.10
0.15
a. Age vs P(Y(t) > y0.9(t))at Median height by SNM
Ages of AA girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.05
0.10
0.15
b. Age vs P(Y(t) > y0.9(t))at Median height by UNM
Ages of AA girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
c. Age vs P(Y(t) > y0.95(t))at Median height by SNM
Ages of AA girls
P(Y
>y 0
.95(
t))
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
d. Age vs P(Y(t) > y0.95(t))at Median height by UNM
Ages of AA girls
P(Y
>y 0
.95(
t))
10 12 14 16 18
0.00
0.01
0.02
0.03
0.04
e. Age vs P(Y(t) > y0.99(t))at Median height by SNM
Ages of AA girls
P(Y
>y 0
.99(
t))
10 12 14 16 18
0.00
0.01
0.02
0.03
0.04
f. Age vs P(Y(t) > y0.99(t))at Median height by UNM
Ages of AA girls
P(Y
>y 0
.99(
t))
Figure 8: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th, 95th and99th population SBP percentiles for African American girls (CC) between 9.1 and19.0 years old. (a),(c),(e): Estimators based on the time-varying Gaussian models.(b),(d),(f): Estimators based on the unstructured kernel estimators.
52
10 12 14 16 18
0.00
0.05
0.10
0.15
a. Age vs P(Y(t) > y0.9(t))at Median height by SNM
Ages of all girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.05
0.10
0.15
b. Age vs P(Y(t) > y0.9(t))at Median height by UNM
Ages of all girls
P(Y
>y 0
.9(t)
)
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
c. Age vs P(Y(t) > y0.95(t))at Median height by SNM
Ages of all girls
P(Y
>y 0
.95(
t))
10 12 14 16 18
0.00
0.02
0.04
0.06
0.08
0.10
d. Age vs P(Y(t) > y0.95(t))at Median height by UNM
Ages of all girls
P(Y
>y 0
.95(
t))
10 12 14 16 18
0.00
0.01
0.02
0.03
0.04
e. Age vs P(Y(t) > y0.99(t))at Median height by SNM
Ages of all girls
P(Y
>y 0
.99(
t))
10 12 14 16 18
0.00
0.01
0.02
0.03
0.04
f. Age vs P(Y(t) > y0.99(t))at Median height by UNM
Ages of all girls
P(Y
>y 0
.99(
t))
Figure 9: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th, 95thand 99th population SBP percentiles for entire cohort between 9.1 and 19.0 yearsold. (a),(c),(e): Estimators based on the time-varying Gaussian models. (b),(d),(f):Estimators based on the unstructured kernel estimators.
53
5.2 Simulation Results
For the simulation design, we generate in each sample n=1000 subjects with 10 visits
per subject according to the data structure of NGHS and Section 1.2. The jth visit
time of the ith subject tij is generated from the uniform distribution U(j − 1, j) for
for j = 1, ..., 10. Given each tij, we generate the observations Yij for Y(t) from the
following simulation design:
Yij = 210 + 28(tij − 5)− 2(tij − 5)2 + a0i + εij,
a0i ∼ N(0, (3)2), εij ∼ N(0, (0.9)2)
where εij are independent for all (i, j), and a0i and εij are independent. For the above
design, we have E(Yij|tij) = 210 + 28(tij − 5) − 2(tij − 5)2, V ar(Yij|tij) = 9.81. For
each simulated sample{(Yi(tij), tij
): i = 1, . . . , 1000; j = 1, . . . , 10
}, we round up the
time points so that each tij belongs to one of the equally spaced time design points
{t0, . . . , t100} = {0, 0.1, 0.2, . . . , 10}. Let yq(t) be the (100 × q)th percentile of Y (t).
It follows that P [Y (t) > yq(t)] = q. More Specifically, let y.90(t), y.95(t) and y.99(t)
are the 90th, 95th and 99th quantiles of Y(t). Since Y(t) follows normal distribution,
hence P [Y (t) > y.90(t)] = .10, P [Y (t) > y.95(t)] = .05 and P [Y (t) > y.95(t)] = .01.
Our theoretical 90th, 95th and 99th quantiles for 10 different time points from 101
different time points for the above model are given in Table 4.
54
Tab
le4:
Theo
reti
cal
90th
,95
than
d99
thquan
tile
sfo
r10
diff
eren
tti
me
poi
nts
out
of10
1diff
eren
tti
me
poi
nts
from
the
model
ofou
rsi
mula
tion
des
ign.
Tim
e(t)
12
34
56
78
910
Y.9
0(t
)70
.01
112.
0115
0.01
184.
0121
4.01
240.
0126
2.01
280.
0129
4.01
304.
01Y.9
5(t
)71
.15
113.
1515
1.15
185.
1521
5.15
241.
1526
3.15
281.
1529
5.15
305.
15Y.9
9(t
)73
.29
115.
2915
3.29
187.
2921
7.29
243.
2926
5.29
283.
2929
7.29
307.
29
55
We repeatedly generate 1000 simulation samples. Within each simulation sample,
we compute the smoothing estimates of P [Y (t) > yq(t)] for q = 0.90, 0.95, 0.99
by the local linear estimator (7) based on the time-varying Gaussian model (2) and
the unstructured kernel estimator (8) using the Epanechnikov kernel and the LTCV
bandwidths. The bootstrap pointwise 95% confidence intervals for the smoothing
estimators are constructed using empirical quantiles of B = 1000 bootstrap samples.
For each simulated sample {(Yij, tij) : i = 1, ..., 1000; j = 1, ...., 10} , we round up
the time points so that each tij belongs to one of the equally spaced time points
{t0, ....., t101} = {0, 0.1, 0.2, ...., 10} . We then compute the smoothing estimates of
P.90(t) = P [Y (t) > y.90(t)|t],
P.95(t) = P [Y (t) > y.95(t)|t],
P.99(t) = P [Y (t) > y.99(t)|t].
from estimators from both structural nonparametric models and unstructured non-
parametric models. The 95% point-wise bootstrap confidence bands for this condi-
tional probability are constructed with B=1000 simulated samples. Figure 10 and
Table 5, Table 6 and Table 7 represent the results of our simulation study in pictorial
and tabular form. From Figure 10, we see that smoothing estimators from unstruc-
tured nonparametric model gives wider confidence bands than the estimators from
the structural nonparametric model (Gaussian Model) when smoothing estimates of
the conditional probability for top 10%, top 5% and 1% are computed. Numerical
results of the average estimates, average bias, average root MSE and confidence in-
terval coverage probability are given in Table 5, Table 6 and Table 7 respectively for
top 10%, top 5% and top 1% . Root MSE of SNM for each time point is smaller
than the Root MSE of UNM and consequently relative root MSE at each time point
is smaller than 1, which means that SNM is better than UNM. From the Table 5,
Table 6 and Table 7, we also notice better results from the smoothing estimators of
56
SNM when extreme tail probabilities are estimated.
Let Pq(t) be a smoothing estimator of P [Y (t) > yq(t)] = q, which could be
either the local polynomial estimator of (7), or the unstructured kernel estimator of
(8). We measure the accuracy of Pq(t) by the average of the bias∑M
m=1
[Pq(t) −
q]/M , the empirical mean squared error MSE
[Pq(t)
]=∑M
m=1
[Pq(t) − q
]2/M or
the square root of MSE[Pq(t)
](root-MSE), where M = 1000 is the total number
of simulated samples. We assess the accuracy of a pointwise confidence interval of
Pq(t) by the empirical coverage probability of the confidence interval covering the
true value P [Y (t) > yq(t)] = q.
Table 5 shows averages of estimates, the averages of the biases, the root-MSEs,
and the empirical coverage probabilities of the empirical quantile bootstrap pointwise
95% confidence intervals based on B = 1000 bootstrap replications for the estimation
of P [Y (t) > y.90(t)] = 0.10 at t = 1.0, 2.0, . . . , 10.0. For all the 10 time points, the
smoothing-later local linear estimators based on the time-varying Gaussian model
have smaller root-MSEs than the kernel estimators based on the unstructured non-
parametric model. Comparing the empirical coverage probabilities of the bootstrap
pointwise 95% confidence intervals, we observe that the smoothing estimators based
on the time-varying Gaussian model have higher coverage probabilities than the un-
structured kernel estimators at most of the time points.
When q increases to 0.95 and 0.99, Table 6 and Table 7 shows the averages of the
estimates, averages of the biases, the root-MSEs, and the empirical coverage proba-
bilities of the empirical quantile bootstrap pointwise 95% confidence intervals based
on B = 1000 bootstrap replications for the estimation of P [Y (t) > y.95(t)] = 0.05
at t = 1.0, 2.0, . . . , 10.0. Again, the smoothing estimators based on the time-varying
Gaussian model have smaller root-MSEs than the unstructured kernel estimators at
all 10 time points.
57
From Figure 10, if we compare (a) with (b), (c) with (d) and (e) with (f) we see
that smoothing estimators from the unstructured nonparametric models gives wider
confidence bands than the smoothing estimators from the structural nonparametric
models. The simulation results of Table 5, Table 6 and Table 7 also suggest that, when
the time-varying parametric model is appropriate, the structural two-step smoothing
estimators have smaller mean squared errors than the unstructured smoothing esti-
mators under a practical longitudinal sample with moderate sample sizes. However,
these results may not hold if the time-varying parametric model is not an appropriate
approximation to the time-varying distribution functions of the longitudinal variable
being considered. Looking at the Relative√MSE of Table 5, Table 6 and Table 7,
we see that as q increases, the Relative√MSE decreases, which means that smooth-
ing estimates of extreme tail probabilities by existing unstructured kernel method is
extremely inefficient and misleading when data sets at different time points follow a
parametric family.
58
0 2 4 6 8 10
0.00
0.10
0.20
a. Age vs P(Y > y0.9(t))by SNM
Age
P(Y
>y 0
.9(t)
)
0 2 4 6 8 10
0.00
0.10
0.20
b. Age vs P(Y > y0.9(t))by UNM
Age
P(Y
>y 0
.9(t)
)
0 2 4 6 8 10
0.00
0.05
0.10
0.15
c. Age vs P(Y > y0.95(t))by SNM
Age
P(Y
>y 0
.95(
t))
0 2 4 6 8 10
0.00
0.05
0.10
0.15
d. Age vs P(Y > y0.95(t))by UNM
Age
P(Y
>y 0
.95(
t))
0 2 4 6 8 10
0.00
0.02
0.04
e. Age vs P(Y > y0.99(t))by SNM
Age
P(Y
>y 0
.99(
t))
0 2 4 6 8 10
0.00
0.02
0.04
f. Age vs P(Y > y0.99(t))by UNM
Age
P(Y
>y 0
.99(
t))
Figure 10: Black solid line is local linear (a,c,e) and Nadaraya-Watson (b,d,f) smooth-ing estimators with Epanechnikov kernel and bandwidth 2.5. Dotted lines representthe 95% pointwise bootstrap confidence band for 1000 simulated samples.
59
Tab
le5:
Ave
rage
sof
esti
mat
es,
aver
ages
ofth
ebia
ses,
the
squar
ero
otof
the
mea
nsq
uar
eder
rors
,th
eem
pir
ical
cove
rage
pro
bab
ilit
ies
ofth
eem
pir
ical
quan
tile
boot
stra
pp
ointw
ise
95%
confiden
cein
terv
als
(B=
1000
boot
stra
pre
plica
tion
s)fo
rth
ees
tim
atio
nofP
[Y(t
)>y .
90(t
)]=
0.10
and
rela
tive
Root
MSE
att
=1.
0,2.
0,...,
10.0
over
1000
sim
ula
ted
sam
ple
.T
he
smoot
hin
g-la
ter
loca
llinea
res
tim
ator
sbas
edon
the
tim
e-va
ryin
gG
auss
ian
model
are
show
nin
the
left
pan
el.
The
kern
eles
tim
ator
sbas
edon
the
unst
ruct
ure
dnon
par
amet
ric
model
are
show
nin
the
righ
tpan
el.
The
Epan
echnik
ovke
rnel
and
the
LT
CV
ban
dw
idthh
=2.
5ar
euse
dfo
ral
lth
esm
oot
hin
ges
tim
ator
s.
Str
uct
ura
lN
onpar
amet
ric
Model
Unst
ruct
ure
dN
onpar
amet
ric
Model
Tim
eE
stim
ate
Ave
.Bia
s√MSE
Cov
erag
eE
stim
ate
Ave
.Bia
s√MSE
Cov
erag
eR
elat
ive√MSE
10.
0986
-0.0
014
0.02
470.
948
0.09
84-0
.001
60.
0306
0.95
0.80
5670
133
20.
1002
0.00
020.
0243
0.94
70.
0995
-0.0
005
0.03
190.
952
0.76
1484
532
30.
0998
-0.0
002
0.02
440.
948
0.09
91-0
.000
90.
0306
0.95
70.
7978
7585
44
0.10
100.
0010
0.02
520.
948
0.10
130.
0013
0.03
240.
951
0.77
7396
196
50.
1010
0.00
100.
0244
0.95
80.
1002
0.00
020.
0307
0.94
30.
7943
6251
56
0.09
88-0
.001
20.
0238
0.94
80.
0988
-0.0
012
0.03
060.
947
0.77
8519
871
70.
1013
0.00
130.
0248
0.94
80.
1014
0.00
140.
0317
0.94
20.
7832
0848
80.
0991
-0.0
009
0.02
390.
948
0.09
89-0
.001
10.
0302
0.95
20.
7902
4193
19
0.10
070.
0007
0.02
420.
949
0.10
080.
0008
0.03
130.
954
0.77
1904
612
100.
0986
-0.0
014
0.03
400.
950.
0983
-0.0
017
0.04
300.
955
0.79
0419
587
60
Tab
le6:
Ave
rage
sof
the
bia
ses,
the
squar
ero
otof
the
mea
nsq
uar
eder
rors
,an
dth
eem
pir
ical
cove
rage
pro
bab
ilit
ies
ofth
eem
pir
ical
quan
tile
boot
stra
pp
ointw
ise
95%
confiden
cein
terv
als
(B=
1000
boot
stra
pre
plica
tion
s)fo
rth
ees
tim
atio
nof
P[Y
(t)>y .
95(t
)]=
0.05
att
=1.
0,2.
0,...,
10.0
over
1000
sim
ula
ted
sam
ple
.T
he
smoot
hin
g-la
ter
loca
llinea
res
tim
ator
sbas
edon
the
tim
e-va
ryin
gG
auss
ian
model
are
show
nin
the
left
pan
el.
The
kern
eles
tim
ator
sbas
edon
the
unst
ruct
ure
dnon
par
amet
ric
model
are
show
nin
the
righ
tpan
el.
The
Epan
echnik
ovke
rnel
and
the
LT
CV
ban
dw
idthh
=2.
5ar
euse
dfo
ral
lth
esm
oot
hin
ges
tim
ator
s.
Str
uct
ura
lN
onpar
amet
ric
Model
Unst
ruct
ure
dN
onpar
amet
ric
Model
Tim
eE
stim
ate
Ave
.Bia
s√MSE
Cov
erag
eE
stim
ate
Ave
.Bia
s√MSE
Cov
erag
eR
elat
ive√MSE
10.
0497
-0.0
003
70.0
165
0.94
90.
0486
-0.0
014
0.02
230.
959
0.73
8030
611
20.
0506
0.00
060.
0162
0.94
50.
0497
-0.0
003
0.02
310.
960.
7013
2229
13
0.05
030.
0003
0.01
620.
955
0.05
000.
0000
0.02
110.
963
0.76
8899
154
40.
0513
0.00
130.
0170
0.94
90.
0503
0.00
030.
0222
0.95
10.
7679
902
50.
0512
0.00
120.
0165
0.96
20.
0502
0.00
020.
0230
0.95
80.
7193
7884
16
0.04
96-0
.000
40.
0158
0.95
90.
0488
-0.0
012
0.02
210.
960.
7117
8465
57
0.05
140.
0014
0.01
670.
950.
0515
0.00
150.
0233
0.96
10.
7158
9335
80.
0499
-0.0
001
0.01
600.
952
0.04
94-0
.000
60.
0214
0.94
90.
7465
0521
69
0.05
090.
0009
0.01
620.
957
0.05
050.
0005
0.02
230.
959
0.72
5402
593
100.
0499
-0.0
001
0.02
260.
955
0.04
84-0
.001
60.
0312
0.96
50.
7228
7276
61
Tab
le7:
Ave
rage
sof
the
bia
ses,
the
squar
ero
otof
the
mea
nsq
uar
eder
rors
,an
dth
eem
pir
ical
cove
rage
pro
bab
ilit
ies
ofth
eem
pir
ical
quan
tile
boot
stra
pp
ointw
ise
95%
confiden
cein
terv
als
(B=
1000
boot
stra
pre
plica
tion
s)fo
rth
ees
tim
atio
nof
P[Y
(t)>y .
99(t
)]=
0.01
att
=1.
0,2.
0,...,
10.0
over
1000
sim
ula
ted
sam
ple
.T
he
smoot
hin
g-la
ter
loca
llinea
res
tim
ator
sbas
edon
the
tim
e-va
ryin
gG
auss
ian
model
are
show
nin
the
left
pan
el.
The
kern
eles
tim
ator
sbas
edon
the
unst
ruct
ure
dnon
par
amet
ric
model
are
show
nin
the
righ
tpan
el.
The
Epan
echnik
ovke
rnel
and
the
LT
CV
ban
dw
idthh
=2.
5ar
euse
dfo
ral
lth
esm
oot
hin
ges
tim
ator
s.
Str
uct
ura
lN
onpar
amet
ric
Model
Unst
ruct
ure
dN
onpar
amet
ric
Model
Tim
eE
stim
ate
Ave
.Bia
s√MSE
Cov
erag
eE
stim
ate
Ave
.Bia
s√MSE
Cov
erag
eR
elat
ive√MSE
10.
0104
0.00
040.
0056
0.94
80.
0097
-0.0
003
0.01
020.
951
0.54
2704
172
0.01
070.
0007
0.00
550.
947
0.01
020.
0002
0.01
000.
948
0.54
6403
262
30.
0105
0.00
050.
0054
0.94
20.
0100
0.00
000.
0100
0.94
70.
5431
7865
94
0.01
100.
0010
0.00
590.
948
0.01
040.
0004
0.01
030.
951
0.57
2775
648
50.
0109
0.00
090.
0057
0.94
90.
0102
0.00
020.
0101
0.94
70.
5676
1380
86
0.01
030.
0003
0.00
520.
952
0.00
99-0
.000
10.
0097
0.95
30.
5412
3299
87
0.01
090.
0009
0.00
570.
940.
0105
0.00
050.
0106
0.95
20.
5394
098
80.
0104
0.00
040.
0054
0.94
70.
0096
-0.0
004
0.00
970.
947
0.55
3963
269
0.01
080.
0008
0.00
550.
951
0.01
040.
0004
0.01
040.
948
0.52
9256
297
100.
0108
0.00
080.
0078
0.94
80.
0097
-0.0
003
0.01
460.
941
0.53
6028
584
62
6 Chapter Six
6.1 Discussion and Future Research
We have proposed a time varying structural nonparametric model to estimate the
conditional distribution functions in longitudinal study. Such method is usually ap-
propriate for a long term follow up studies, such as NGHS, which have a large number
of subjects and sufficient numbers of repeated measurements over time. This approach
has practical advantages over the well-established conditional-mean-based models in
longitudinal analysis when the scientific objective is better achieved by evaluating the
conditional distribution functions. Our application to NGHS SBP data demonstrate
that “estimating conditional distribution function by SNM” is a useful quantitative
measure to see the risk of developing hypertension over time for the adolescents. Al-
though our estimation of conditional distribution function by SNM does not include
any time variant covariates and limited to local polynomial smoothing estimator and
a specific set of asymptotic assumptions, they provide some useful insight into the
accuracy of the statistical results under different repeated measurement scenarios.
There are a number of theoretical and methodological aspects that warrant further
investigation. First , further theoretical and simulation studies are needed to investi-
gate the properties of other smoothing methods, such as global smoothing methods
through splines, wavelets and other basis approximations, and their corresponding
asymptotic inference procedures. Second, flexible conditional-distribution based sta-
tistical models incorporating both time-dependent and time-invariant covariates are
still not well-understood and need to be developed. Third, many longitudinal stud-
ies have multivariate outcome variables, so that appropriate statistical models and
63
estimation methods for multivariate conditional distribution functions deserve to be
systematically investigated. For our future research, we are interested to use copula
model to estimate the bivariate and multivariate conditional distribution function.
Incorporating continuous and time variant covariates in the estimating conditional
distribution functions can also be considered as our future research.
64
7 Appendix 1: Preliminary Analysis
Table 8: P-values for normality test of 100 data sets.
SW, AD, KS, CVM and ChiSq stand for Shapiro-Wilk
Test, Anderson Darling Test, Kolmogorov-Smirnov Test,
Cramer-Von-Mises Test and Chi Square Test respec-
tively.
Data Sets SW AD KS CVM Chisq
1 0.752 0.836 0.929 0.796 0.488
2 0.24 0.169 0.956 0.217 0.27
3 0.062 0.295 0.959 0.3 0.229
4 0.197 0.47 0.945 0.573 0.4
5 0.385 0.192 0.959 0.194 0.094
6 0.022 0.035 0.956 0.046 0.338
7 0.811 0.389 0.917 0.251 0.006
8 0.685 0.343 0.963 0.363 0.008
9 0.117 0.091 0.94 0.096 0.04
10 0.284 0.29 0.958 0.29 0.001
11 0.339 0.401 0.958 0.406 0.419
12 0.567 0.333 0.951 0.408 0.017
13 0.318 0.269 0.96 0.197 0.043
14 0.105 0.013 0.949 0.011 0
15 0.205 0.585 0.96 0.658 0.008
16 0.87 0.765 0.947 0.707 0.028
continued . . .
65
. . . continued
Data Sets SW AD KS CVM Chisq
17 0.059 0.102 0.951 0.195 0
18 0.548 0.236 0.948 0.224 0.006
19 0.288 0.117 0.956 0.099 0.004
20 0.369 0.198 0.955 0.181 0
21 0.359 0.197 0.914 0.18 0
22 0.577 0.38 0.952 0.363 0.106
23 0.308 0.325 0.937 0.35 0.075
24 0.324 0.268 0.954 0.224 0.003
25 0.477 0.488 0.944 0.456 0.021
26 0.473 0.096 0.961 0.057 0.006
27 0.122 0.082 0.93 0.058 0
28 0.298 0.095 0.93 0.073 0.049
29 0.318 0.098 0.939 0.061 0.001
30 0.55 0.476 0.96 0.493 0.129
31 0.083 0.023 0.955 0.017 0.012
32 0.423 0.248 0.948 0.216 0.448
33 0.742 0.485 0.942 0.408 0.012
34 0.373 0.141 0.961 0.092 0
35 0.043 0.019 0.887 0.013 0
36 0.074 0.009 0.955 0.009 0
37 0.137 0.035 0.947 0.027 0.013
38 0.525 0.448 0.963 0.405 0.039
39 0.552 0.465 0.946 0.503 0.398
continued . . .
66
. . . continued
Data Sets SW AD KS CVM Chisq
40 0.136 0.142 0.943 0.166 0
41 0.056 0.051 0.959 0.066 0.003
42 0.102 0.056 0.916 0.04 0
43 0.412 0.186 0.961 0.141 0
44 0.494 0.237 0.965 0.28 0.011
45 0.078 0.027 0.941 0.025 0.005
46 0.138 0.241 0.958 0.262 0.265
47 0.058 0.094 0.961 0.094 0.144
48 0.392 0.349 0.927 0.298 0.316
49 0.203 0.128 0.951 0.139 0.089
50 0.651 0.58 0.945 0.614 0.079
51 0.13 0.21 0.96 0.282 0
52 0.351 0.515 0.93 0.556 0.288
53 0.438 0.466 0.963 0.49 0.183
54 0.121 0.125 0.924 0.099 0.713
55 0.58 0.264 0.954 0.187 0.398
56 0.028 0.009 0.927 0.012 0.02
57 0.271 0.139 0.963 0.132 0.003
58 0.331 0.115 0.939 0.079 0.222
59 0.271 0.074 0.924 0.038 0.005
60 0.041 0.099 0.957 0.156 0.118
61 0.277 0.135 0.954 0.092 0.219
62 0.15 0.309 0.962 0.468 0.153
continued . . .
67
. . . continued
Data Sets SW AD KS CVM Chisq
63 0.793 0.551 0.943 0.404 0.629
64 0.07 0.033 0.962 0.061 0.001
65 0.45 0.298 0.903 0.243 0.285
66 0.78 0.77 0.957 0.768 0.877
67 0.691 0.591 0.942 0.646 0.677
68 0.185 0.244 0.947 0.335 0.038
69 0.088 0.036 0.951 0.044 0.001
70 0.687 0.567 0.955 0.637 0.111
71 0.482 0.383 0.962 0.476 0.497
72 0.543 0.54 0.957 0.634 0.969
73 0.85 0.772 0.962 0.653 0.421
74 0.149 0.082 0.914 0.08 0.099
75 0.262 0.379 0.937 0.277 0.613
76 0.001 0.022 0.915 0.044 0.323
77 0.001 0 0.966 0.001 0.061
78 0.592 0.549 0.962 0.548 0.326
79 0.339 0.507 0.905 0.419 0.811
80 0.253 0.117 0.961 0.112 0.254
81 0.038 0.285 0.904 0.338 0.733
82 0.804 0.77 0.957 0.705 0.241
83 0.082 0.339 0.95 0.336 0.562
84 0.26 0.563 0.962 0.803 0.341
85 0.294 0.438 0.936 0.451 0.413
continued . . .
68
. . . continued
Data Sets SW AD KS CVM Chisq
86 0.155 0.115 0.904 0.093 0.38
87 0.693 0.676 0.922 0.686 0.115
88 0.143 0.094 0.962 0.142 0.397
89 0.897 0.798 0.966 0.723 0.901
90 0.012 0.015 0.937 0.03 0.071
91 0.777 0.494 0.962 0.422 0.391
92 0.058 0.156 0.89 0.135 0.328
93 0.403 0.273 0.946 0.278 0.124
94 .341 0.361 0.957 0.443 0.303
95 0.129 0.198 0.934 0.231 0.02
96 0.245 0.166 0.936 0.203 0.185
97 0.334 0.136 0.953 0.141 0
98 0.004 0.006 0.925 0.008 0.005
99 0.807 0.517 0.966 0.42 0.386
100 0.166 0.111 0.96 0.097 0.374
69
Tab
le9:
Est
imat
edR
awP
robab
ilit
ies
(ER
P)
ofSB
Pfo
r
enti
reco
hor
tth
atex
ceed
diff
eren
tquan
tile
sofy q
(t)
(q=
.90,
.95,
.99)
by
SN
Man
dU
NM
.
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
10.
0340
50.
0000
00.
0117
50.
0000
00.
0013
40.
0000
0
20.
1014
20.
1000
00.
0442
20.
0600
00.
0077
10.
0000
0
30.
0508
90.
0612
20.
0162
70.
0000
00.
0014
60.
0000
0
40.
0401
70.
0697
70.
0132
90.
0232
60.
0013
50.
0000
0
50.
0590
90.
0612
20.
0201
70.
0612
20.
0020
90.
0000
0
60.
1178
70.
0961
50.
0546
40.
0769
20.
0108
40.
0000
0
70.
0875
80.
0869
60.
0377
80.
0724
60.
0065
60.
0144
9
80.
1002
10.
1093
80.
0393
00.
0468
80.
0052
60.
0156
3
90.
1288
40.
1666
70.
0620
10.
0606
10.
0133
20.
0151
5
100.
1046
80.
1428
60.
0361
40.
0714
30.
0034
40.
0000
0
conti
nued
...
70
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
110.
0583
30.
0707
10.
0209
40.
0303
00.
0024
50.
0101
0
120.
0547
40.
0666
70.
0181
50.
0476
20.
0017
60.
0000
0
130.
0592
40.
0761
90.
0200
60.
0476
20.
0020
30.
0000
0
140.
0796
30.
1485
10.
0311
00.
0594
10.
0042
80.
0000
0
150.
0587
30.
1010
10.
0206
50.
0303
00.
0022
90.
0000
0
160.
0720
60.
0927
80.
0257
70.
0515
50.
0028
90.
0206
2
170.
0697
10.
0796
50.
0268
10.
0531
00.
0036
20.
0177
0
180.
1432
60.
1284
40.
0665
50.
1009
20.
0128
30.
0275
2
190.
1026
20.
1428
60.
0408
90.
0446
40.
0056
70.
0089
3
200.
0810
10.
1121
50.
0297
60.
0373
80.
0035
00.
0093
5
210.
0820
70.
1149
40.
0328
40.
0574
70.
0047
70.
0114
9
220.
0695
80.
1250
00.
0280
50.
0250
00.
0042
50.
0125
0
230.
0717
60.
0875
00.
0274
00.
0500
00.
0036
10.
0000
0
conti
nued
...
71
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
240.
0890
20.
1304
30.
0339
70.
0326
10.
0043
20.
0217
4
250.
1281
20.
1546
40.
0541
60.
0618
60.
0083
60.
0103
1
260.
1042
00.
1162
80.
0434
50.
0581
40.
0067
30.
0000
0
270.
0584
70.
1111
10.
0181
10.
0555
60.
0014
60.
0222
2
280.
1051
60.
1411
80.
0395
60.
0588
20.
0046
90.
0117
6
290.
1294
90.
1828
00.
0556
50.
0645
20.
0089
40.
0107
5
300.
1183
90.
1428
60.
0480
20.
0714
30.
0067
70.
0204
1
310.
0752
80.
0595
20.
0312
70.
0238
10.
0050
30.
0119
0
320.
0502
80.
0945
90.
0156
80.
0540
50.
0013
20.
0000
0
330.
1009
30.
1590
90.
0448
80.
0795
50.
0081
90.
0113
6
340.
0622
40.
1325
30.
0238
20.
0361
40.
0032
20.
0000
0
350.
0739
50.
1500
00.
0282
50.
0500
00.
0037
10.
0000
0
360.
0849
10.
1176
50.
0358
00.
0352
90.
0058
70.
0117
6
conti
nued
...
72
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
370.
0723
30.
0769
20.
0276
80.
0384
60.
0036
60.
0256
4
380.
0919
20.
1265
80.
0364
20.
0632
90.
0050
60.
0126
6
390.
1010
10.
1473
70.
0370
20.
0210
50.
0041
30.
0000
0
400.
0673
70.
1318
70.
0192
50.
0659
30.
0012
20.
0000
0
410.
0393
60.
0595
20.
0165
70.
0357
10.
0010
30.
0000
0
420.
0489
70.
0882
40.
0217
80.
0441
20.
0016
10.
0000
0
430.
0748
40.
1276
60.
0361
80.
0638
30.
0034
10.
0000
0
440.
0305
20.
0384
60.
0117
10.
0256
40.
0005
40.
0000
0
450.
0685
80.
1039
00.
0354
50.
0649
40.
0043
70.
0259
7
460.
0677
70.
0821
90.
0342
30.
0274
00.
0038
70.
0137
0
470.
1061
70.
1111
10.
0605
80.
0833
30.
0100
90.
0416
7
480.
0585
80.
1176
50.
0298
10.
0588
20.
0035
50.
0147
1
490.
0774
80.
1369
90.
0398
30.
0958
90.
0047
10.
0000
0
conti
nued
...
73
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
500.
0564
30.
0821
90.
0274
00.
0411
00.
0027
60.
0137
0
510.
0175
70.
0333
30.
0044
90.
0000
00.
0002
70.
0000
0
520.
0600
00.
1044
80.
0230
30.
0447
80.
0031
50.
0000
0
530.
0600
70.
0714
30.
0244
60.
0357
10.
0038
50.
0178
6
540.
0569
30.
0847
50.
0199
70.
0508
50.
0022
00.
0339
0
550.
0370
60.
0634
90.
0132
70.
0476
20.
0016
30.
0158
7
560.
0836
80.
1428
60.
0331
40.
1071
40.
0046
50.
0000
0
570.
0497
80.
0678
00.
0185
20.
0339
00.
0024
00.
0000
0
580.
0428
90.
0350
90.
0133
80.
0350
90.
0011
50.
0000
0
590.
1072
70.
1587
30.
0504
20.
0952
40.
0104
30.
0317
5
600.
0124
10.
0000
00.
0023
40.
0000
00.
0000
70.
0000
0
610.
0510
10.
0697
70.
0199
30.
0232
60.
0028
90.
0000
0
620.
0704
20.
1000
00.
0274
50.
0400
00.
0038
00.
0200
0
conti
nued
...
74
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
630.
0320
30.
0416
70.
0102
70.
0208
30.
0009
80.
0000
0
640.
0637
50.
0851
10.
0265
00.
0638
30.
0043
50.
0000
0
650.
0403
50.
0666
70.
0121
70.
0444
40.
0009
70.
0000
0
660.
0227
50.
0540
50.
0067
30.
0270
30.
0005
60.
0000
0
670.
0445
00.
1063
80.
0164
10.
0425
50.
0021
10.
0000
0
680.
0401
70.
0625
00.
0141
20.
0416
70.
0016
40.
0000
0
690.
1177
00.
1087
00.
0622
20.
0652
20.
0168
80.
0434
8
700.
0669
10.
1224
50.
0254
90.
0408
20.
0033
60.
0000
0
710.
0529
70.
0303
00.
0209
20.
0303
00.
0031
00.
0000
0
720.
0785
90.
1081
10.
0337
90.
0810
80.
0058
60.
0000
0
730.
0230
40.
0540
50.
0060
90.
0000
00.
0003
80.
0000
0
740.
0667
10.
0681
80.
0287
60.
0454
50.
0051
20.
0227
3
750.
0680
70.
1025
60.
0271
40.
0256
40.
0039
90.
0256
4
conti
nued
...
75
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
760.
0299
70.
0277
80.
0083
90.
0277
80.
0005
80.
0000
0
770.
0047
10.
0476
20.
0008
10.
0000
00.
0000
20.
0000
0
780.
0501
10.
0652
20.
0178
70.
0217
40.
0020
90.
0000
0
790.
0609
50.
0697
70.
0233
70.
0465
10.
0031
70.
0232
6
800.
0891
40.
1290
30.
0392
30.
1290
30.
0070
80.
0322
6
810.
0544
30.
0263
20.
0234
40.
0000
00.
0042
50.
0000
0
820.
0134
40.
0256
40.
0030
60.
0000
00.
0001
40.
0000
0
830.
0217
00.
0681
80.
0065
40.
0000
00.
0005
60.
0000
0
840.
0388
80.
0000
00.
0139
30.
0000
00.
0017
00.
0000
0
850.
0238
80.
0263
20.
0074
90.
0263
20.
0007
00.
0000
0
860.
0720
30.
1142
90.
0317
80.
0285
70.
0059
20.
0000
0
870.
0121
60.
0540
50.
0034
50.
0270
30.
0002
80.
0000
0
880.
0662
90.
0952
40.
0263
70.
0476
20.
0038
70.
0238
1
conti
nued
...
76
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
890.
0478
40.
0512
80.
0197
00.
0000
00.
0032
60.
0000
0
900.
0575
10.
0444
40.
0224
60.
0444
40.
0032
10.
0222
2
910.
0171
60.
0434
80.
0050
10.
0000
00.
0004
10.
0000
0
920.
0305
70.
0238
10.
0118
70.
0238
10.
0017
80.
0238
1
930.
0028
10.
0000
00.
0004
10.
0000
00.
0000
10.
0000
0
940.
0273
50.
0769
20.
0084
20.
0000
00.
0007
40.
0000
0
950.
0491
10.
0204
10.
0210
80.
0000
00.
0038
30.
0000
0
960.
0175
70.
0333
30.
0042
70.
0000
00.
0002
30.
0000
0
970.
0146
60.
0175
40.
0045
70.
0175
40.
0004
40.
0000
0
980.
0744
20.
1363
60.
0352
50.
0909
10.
0077
30.
0227
3
990.
0091
80.
0000
00.
0020
00.
0000
00.
0000
90.
0000
0
100
0.03
238
0.05
263
0.01
139
0.05
263
0.00
135
0.00
000
77
Tab
le10
:E
stim
ated
Raw
Pro
bab
ilit
ies
(ER
P)
ofSB
Pfo
r
Cau
casi
anG
irls
that
exce
eddiff
eren
tquan
tile
sof
q(t
)(q=
.90,
.95,
.99)
by
SN
Man
dU
NM
.
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
10.
0560
00.
0000
00.
0239
60.
0000
00.
0043
00.
0000
0
20.
0923
00.
1500
00.
0441
30.
1000
00.
0097
50.
0000
0
30.
0250
30.
0000
00.
0072
70.
0000
00.
0005
70.
0000
0
40.
0092
50.
0000
00.
0018
30.
0000
00.
0000
70.
0000
0
50.
0344
90.
0370
40.
0095
00.
0370
40.
0006
30.
0000
0
60.
0862
70.
0000
00.
0415
20.
0000
00.
0093
90.
0000
0
70.
1362
60.
1290
30.
0738
40.
1290
30.
0210
20.
0322
6
80.
0956
70.
1333
30.
0408
40.
0666
70.
0068
20.
0333
3
90.
0993
50.
1428
60.
0448
90.
0000
00.
0085
70.
0000
0
100.
1151
60.
1724
10.
0442
80.
1034
50.
0054
80.
0000
0
conti
nued
...
78
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
110.
1156
50.
1282
10.
0507
60.
0512
80.
0088
00.
0256
4
120.
0803
10.
1000
00.
0310
10.
0750
00.
0041
40.
0000
0
130.
0459
50.
0588
20.
0144
40.
0392
20.
0012
60.
0000
0
140.
0396
10.
1250
00.
0125
70.
0208
30.
0011
50.
0000
0
150.
0522
80.
0909
10.
0192
60.
0227
30.
0024
40.
0000
0
160.
1383
30.
1851
90.
0696
70.
1481
50.
0165
50.
0740
7
170.
0673
40.
0454
50.
0283
20.
0454
50.
0047
70.
0227
3
180.
0643
50.
0277
80.
0233
70.
0277
80.
0027
70.
0000
0
190.
1272
40.
1555
60.
0581
20.
0666
70.
0109
60.
0222
2
200.
0657
70.
0816
30.
0216
90.
0204
10.
0020
10.
0204
1
210.
0657
30.
0625
00.
0264
60.
0312
50.
0040
20.
0000
0
220.
0537
20.
1034
50.
0166
80.
0000
00.
0013
80.
0000
0
230.
0375
70.
0571
40.
0128
60.
0285
70.
0014
20.
0000
0
conti
nued
...
79
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
240.
0588
70.
1000
00.
0184
80.
0000
00.
0015
40.
0000
0
250.
0996
10.
1052
60.
0392
00.
0526
30.
0052
80.
0263
2
260.
1341
50.
1875
00.
0638
70.
1250
00.
0132
40.
0000
0
270.
0093
80.
0303
00.
0011
80.
0000
00.
0000
10.
0000
0
280.
0314
70.
0571
40.
0054
80.
0000
00.
0001
10.
0000
0
290.
0730
70.
1111
10.
0231
60.
0555
60.
0019
00.
0000
0
300.
1631
80.
1612
90.
0830
60.
1290
30.
0197
20.
0645
2
310.
0781
30.
0625
00.
0328
10.
0312
50.
0053
90.
0000
0
320.
0303
20.
0571
40.
0083
00.
0285
70.
0005
50.
0000
0
330.
0730
40.
1428
60.
0304
00.
0238
10.
0049
30.
0000
0
340.
0293
40.
1081
10.
0091
60.
0270
30.
0008
30.
0000
0
350.
0320
70.
1000
00.
0097
30.
0333
30.
0008
20.
0000
0
360.
0654
40.
0625
00.
0255
30.
0312
50.
0035
90.
0000
0
conti
nued
...
80
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
370.
0244
30.
0500
00.
0068
10.
0000
00.
0004
80.
0000
0
380.
0653
40.
0967
70.
0228
80.
0322
60.
0024
60.
0000
0
390.
0819
50.
1282
10.
0180
40.
0000
00.
0005
30.
0000
0
400.
0796
40.
1428
60.
0243
40.
0857
10.
0017
80.
0000
0
410.
0126
20.
0250
00.
0038
90.
0000
00.
0000
90.
0000
0
420.
0593
20.
1034
50.
0289
20.
0689
70.
0029
30.
0000
0
430.
0457
70.
0909
10.
0188
20.
0227
30.
0010
40.
0000
0
440.
0199
10.
0454
50.
0063
20.
0454
50.
0001
50.
0000
0
450.
0172
40.
0512
80.
0063
90.
0000
00.
0002
80.
0000
0
460.
0433
80.
0384
60.
0200
20.
0000
00.
0017
30.
0000
0
470.
1302
70.
1282
10.
0841
30.
1282
10.
0215
40.
0512
8
480.
0819
70.
1428
60.
0455
80.
0857
10.
0071
70.
0285
7
490.
0734
10.
1111
10.
0375
70.
1111
10.
0044
10.
0000
0
conti
nued
...
81
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
500.
0202
30.
0666
70.
0068
30.
0000
00.
0002
10.
0000
0
510.
0266
60.
0625
00.
0069
40.
0000
00.
0004
10.
0000
0
520.
0387
00.
0882
40.
0138
90.
0000
00.
0017
00.
0000
0
530.
0426
90.
0588
20.
0141
40.
0000
00.
0014
10.
0000
0
540.
0412
30.
0833
30.
0130
20.
0833
30.
0011
60.
0416
7
550.
0312
70.
0540
50.
0111
40.
0540
50.
0013
80.
0000
0
560.
0100
10.
0526
30.
0019
40.
0000
00.
0000
60.
0000
0
570.
0221
00.
0294
10.
0063
30.
0000
00.
0004
90.
0000
0
580.
0365
50.
0294
10.
0106
00.
0294
10.
0007
80.
0000
0
590.
1061
30.
1612
90.
0512
00.
0967
70.
0113
00.
0322
6
600.
0180
40.
0000
00.
0043
20.
0000
00.
0002
20.
0000
0
610.
0240
10.
0588
20.
0066
20.
0588
20.
0004
60.
0000
0
620.
0630
90.
1034
50.
0239
60.
0344
80.
0031
60.
0000
0
conti
nued
...
82
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
630.
0262
10.
0384
60.
0083
30.
0000
00.
0008
00.
0000
0
640.
0276
00.
0322
60.
0086
30.
0322
60.
0007
90.
0000
0
650.
0133
30.
0000
00.
0025
50.
0000
00.
0000
80.
0000
0
660.
0040
10.
0000
00.
0006
80.
0000
00.
0000
20.
0000
0
670.
0314
00.
0357
10.
0106
60.
0000
00.
0011
70.
0000
0
680.
0266
80.
0000
00.
0086
10.
0000
00.
0008
60.
0000
0
690.
0685
20.
0952
40.
0288
40.
0476
20.
0048
30.
0000
0
700.
0466
00.
0454
50.
0164
00.
0000
00.
0018
80.
0000
0
710.
0664
60.
0526
30.
0284
10.
0526
30.
0049
60.
0000
0
720.
0525
90.
0476
20.
0207
10.
0476
20.
0030
50.
0000
0
730.
0287
20.
0869
60.
0076
50.
0000
00.
0004
70.
0000
0
740.
0694
50.
0800
00.
0298
40.
0400
00.
0052
50.
0000
0
750.
0392
70.
1000
00.
0116
80.
0000
00.
0009
10.
0000
0
conti
nued
...
83
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
760.
0321
30.
0500
00.
0106
60.
0500
00.
0011
10.
0000
0
770.
0010
80.
0000
00.
0001
30.
0000
00.
0000
00.
0000
0
780.
0394
20.
0416
70.
0121
00.
0000
00.
0010
10.
0000
0
790.
0263
00.
0000
00.
0064
20.
0000
00.
0003
20.
0000
0
800.
0359
00.
0666
70.
0124
10.
0666
70.
0014
00.
0000
0
810.
0390
60.
0434
80.
0136
40.
0000
00.
0015
60.
0000
0
820.
0133
60.
0344
80.
0029
50.
0000
00.
0001
30.
0000
0
830.
0211
90.
0357
10.
0065
60.
0000
00.
0006
00.
0000
0
840.
0280
40.
0000
00.
0095
30.
0000
00.
0010
60.
0000
0
850.
0355
40.
0454
50.
0123
40.
0454
50.
0014
10.
0000
0
860.
0493
80.
0588
20.
0198
40.
0000
00.
0030
90.
0000
0
870.
0018
20.
0000
00.
0002
80.
0000
00.
0000
10.
0000
0
880.
0567
00.
0476
20.
0229
10.
0000
00.
0035
50.
0000
0
conti
nued
...
84
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
890.
0217
80.
0000
00.
0077
00.
0000
00.
0009
60.
0000
0
900.
0321
80.
0000
00.
0103
20.
0000
00.
0009
90.
0000
0
910.
0296
70.
0689
70.
0112
60.
0000
00.
0016
10.
0000
0
920.
0157
70.
0000
00.
0051
70.
0000
00.
0005
60.
0000
0
930.
0027
40.
0000
00.
0004
10.
0000
00.
0000
10.
0000
0
940.
0216
40.
0909
10.
0064
40.
0000
00.
0005
40.
0000
0
950.
0340
90.
0416
70.
0140
30.
0000
00.
0023
90.
0000
0
960.
0437
80.
0555
60.
0149
70.
0000
00.
0016
10.
0000
0
970.
0148
50.
0312
50.
0047
60.
0312
50.
0004
90.
0000
0
980.
0144
40.
0476
20.
0042
10.
0476
20.
0003
50.
0000
0
990.
0013
20.
0000
00.
0001
60.
0000
00.
0000
00.
0000
0
100
0.03
632
0.00
000
0.01
567
0.00
000
0.00
295
0.00
000
85
Tab
le11
:E
stim
ated
Raw
Pro
bab
ilit
ies
(ER
P)
ofSB
Pfo
r
Afr
ican
Am
eric
anG
irls
that
exce
eddiff
eren
tquan
tile
sof
y q(t
)(q=
.90,
.95,
.99)
by
SN
Man
dU
NM
.
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
10.
0241
60.
0000
00.
0072
00.
0000
00.
0006
10
20.
1044
60.
0666
70.
0408
30.
0333
30.
0053
80
30.
0690
60.
1111
10.
0202
90.
0000
00.
0013
90
40.
0900
50.
1428
60.
0387
00.
0476
20.
0066
30
50.
0975
10.
0909
10.
0407
00.
0909
10.
0064
10
60.
1352
50.
1724
10.
0540
80.
1379
30.
0071
50
70.
0445
90.
0526
30.
0133
30.
0263
20.
0010
40
80.
1022
90.
0882
40.
0358
30.
0294
10.
0035
60
90.
1543
40.
1842
10.
0779
10.
1052
60.
0183
30.
0263
1579
100.
0989
50.
1219
50.
0318
10.
0487
80.
0025
30
conti
nued
...
86
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
110.
0284
00.
0333
30.
0080
40.
0166
70.
0005
80
120.
0415
90.
0461
50.
0123
80.
0307
70.
0009
60
130.
0743
00.
0925
90.
0269
80.
0555
60.
0031
30
140.
1220
00.
1698
10.
0529
90.
0943
40.
0088
60
150.
0630
60.
1090
90.
0208
90.
0363
60.
0019
80
160.
0468
00.
0571
40.
0132
20.
0142
90.
0008
80
170.
0692
50.
1014
50.
0244
00.
0579
70.
0026
60.
0144
9275
180.
1862
80.
1780
80.
0923
90.
1369
90.
0200
60.
0410
9589
190.
0864
00.
1343
30.
0308
20.
0298
50.
0033
20
200.
0956
40.
1379
30.
0383
40.
0517
20.
0054
60
210.
0924
30.
1454
50.
0365
50.
0727
30.
0050
50.
0181
8182
220.
0756
00.
1372
50.
0334
20.
0392
20.
0062
50.
0196
0784
230.
0991
90.
1111
10.
0376
50.
0666
70.
0046
30
conti
nued
...
87
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
240.
1146
40.
1538
50.
0492
90.
0576
90.
0081
10.
0384
6154
250.
1489
70.
1864
40.
0658
60.
0678
00.
0110
80
260.
0879
90.
0740
70.
0336
00.
0185
20.
0042
90
270.
0963
70.
1578
90.
0375
90.
0877
20.
0049
80.
0350
8772
280.
1542
90.
2000
00.
0728
30.
1000
00.
0144
10.
02
290.
1651
20.
2280
70.
0803
70.
0701
80.
0169
90.
0175
4386
300.
0949
90.
1343
30.
0327
90.
0447
80.
0031
80
310.
0755
30.
0576
90.
0316
10.
0192
30.
0051
80.
0192
3077
320.
0719
10.
1282
10.
0243
00.
0769
20.
0023
50
330.
1298
20.
1739
10.
0604
30.
1304
30.
0118
80.
0217
3913
340.
0943
80.
1521
70.
0397
90.
0434
80.
0064
20
350.
1040
20.
1800
00.
0429
20.
0600
00.
0064
60
360.
0991
40.
1509
40.
0439
60.
0377
40.
0079
80.
0188
6792
conti
nued
...
88
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
370.
1343
10.
1052
60.
0580
20.
0789
50.
0093
70.
0526
3158
380.
1120
50.
1458
30.
0478
40.
0833
30.
0077
50.
0208
3333
390.
0979
90.
1607
10.
0404
90.
0357
10.
0061
80
400.
0616
80.
1250
00.
0171
50.
0535
70.
0010
30
410.
0732
70.
0909
10.
0362
00.
0681
80.
0037
20
420.
0428
80.
0769
20.
0178
30.
0256
40.
0010
40
430.
1036
80.
1600
00.
0557
40.
1000
00.
0074
30
440.
0458
60.
0294
10.
0209
70.
0000
00.
0017
40
450.
1376
70.
1578
90.
0821
80.
1315
80.
0155
60.
0526
3158
460.
0843
80.
1063
80.
0446
10.
0425
50.
0057
90.
0212
766
470.
0539
20.
0909
10.
0206
90.
0303
00.
0008
40.
0303
0303
480.
0383
60.
0909
10.
0174
90.
0303
00.
0014
80
490.
0821
90.
1521
70.
0428
60.
0869
60.
0053
00
conti
nued
...
89
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
500.
0850
40.
0930
20.
0474
80.
0697
70.
0075
50.
0232
5581
510.
0081
80.
0000
00.
0018
20.
0000
00.
0000
90
520.
0841
00.
1212
10.
0329
20.
0909
10.
0044
90
530.
0852
60.
0909
10.
0426
20.
0909
10.
0104
80.
0454
5455
540.
0711
20.
0857
10.
0270
30.
0285
70.
0035
10.
0285
7143
550.
0471
50.
0769
20.
0170
00.
0384
60.
0020
60.
0384
6154
560.
1350
60.
1891
90.
0603
80.
1621
60.
0106
20
570.
0999
00.
1200
00.
0459
00.
0800
00.
0090
70
580.
0564
00.
0434
80.
0200
40.
0434
80.
0022
90
590.
1116
20.
1562
50.
0519
50.
0937
50.
0104
30.
0312
5
600.
0075
70.
0000
00.
0010
30.
0000
00.
0000
20
610.
0720
70.
0769
20.
0329
10.
0000
00.
0066
50
620.
0859
10.
0952
40.
0357
60.
0476
20.
0056
60.
0476
1905
conti
nued
...
90
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
630.
0413
50.
0454
50.
0135
40.
0454
50.
0013
20
640.
1535
20.
1875
00.
0844
50.
1250
00.
0244
30
650.
0616
30.
1034
50.
0221
10.
0689
70.
0025
50
660.
0550
90.
1052
60.
0212
00.
0526
30.
0029
40
670.
0705
90.
2105
30.
0292
00.
1052
60.
0046
80
680.
0524
20.
1034
50.
0196
30.
0689
70.
0025
70
690.
1594
00.
1200
00.
0954
20.
0800
00.
0336
40.
08
700.
0881
00.
1851
90.
0358
00.
0740
70.
0053
20
710.
0419
40.
0000
00.
0155
00.
0000
00.
0020
20
720.
1219
70.
1875
00.
0576
40.
1250
00.
0118
50
730.
0162
30.
0000
00.
0042
50.
0000
00.
0002
70
740.
0680
60.
0526
30.
0304
90.
0526
30.
0059
30.
0526
3158
750.
1010
60.
1052
60.
0493
90.
0526
30.
0113
00.
0526
3158
conti
nued
...
91
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
760.
0192
30.
0000
00.
0032
50.
0000
00.
0000
70
770.
0114
40.
0952
40.
0023
80.
0000
00.
0000
90
780.
0643
60.
0909
10.
0266
50.
0454
50.
0043
20
790.
0779
20.
1000
00.
0339
50.
0666
70.
0060
80.
0333
3333
800.
1523
10.
1875
00.
0753
60.
1875
00.
0167
80.
0625
810.
0749
00.
0000
00.
0393
00.
0000
00.
0109
20
820.
0173
40.
0000
00.
0047
60.
0000
00.
0003
40
830.
0254
30.
1250
00.
0076
70.
0000
00.
0006
50
840.
0625
10.
0000
00.
0243
80.
0000
00.
0034
30
850.
0134
40.
0000
00.
0036
80.
0000
00.
0002
70
860.
1009
80.
1666
70.
0489
10.
0555
60.
0109
50
870.
0741
90.
1818
20.
0373
80.
0909
10.
0094
70
880.
0789
50.
1428
60.
0311
30.
0952
40.
0043
60.
0476
1905
conti
nued
...
92
...c
onti
nued
Dat
aE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
PE
RP
ofSB
P
Set
s≥y .
90(t
)≥y .
90(t
)≥y .
95(t
)≥y .
95(t
)≥y .
99(t
)≥y .
99(t
)
by
SN
Mby
UN
Mby
SN
Mby
UN
Mby
SN
Mby
UN
M
890.
0822
50.
1052
60.
0363
10.
0000
00.
0066
70
900.
0985
90.
1000
00.
0462
50.
1000
00.
0096
10.
05
910.
0012
90.
0000
00.
0001
00.
0000
00.
0000
00
920.
0706
90.
0666
70.
0338
50.
0666
70.
0076
60.
0666
6667
930.
0036
80.
0000
00.
0005
60.
0000
00.
0000
10
940.
0388
70.
0588
20.
0128
80.
0000
00.
0013
00
950.
0666
20.
0000
00.
0294
00.
0000
00.
0055
20
960.
0007
60.
0000
00.
0000
50.
0000
00.
0000
00
970.
0159
70.
0000
00.
0050
00.
0000
00.
0004
90
980.
1479
50.
2173
90.
0839
30.
1304
30.
0262
30.
0434
7826
990.
0244
70.
0000
00.
0070
30.
0000
00.
0005
30
100
0.02
870
0.08
333
0.00
839
0.08
333
0.00
065
0
93
Table 12: Local linear smoothing estimates for µ(t) and
σ(t) for 100 data sets of entire cohort.
Data Sets Smoothed Mean Smoothed SD
1 4.607932 0.08324157
2 4.61047 0.08305636
3 4.612976 0.08287314
4 4.615447 0.08269112
5 4.617884 0.08251192
6 4.620288 0.08233448
7 4.622656 0.08215882
8 4.624989 0.08198541
9 4.627286 0.08181361
10 4.629547 0.08164475
11 4.631771 0.08147822
12 4.633957 0.0813142
13 4.636104 0.08115307
14 4.638213 0.08099445
15 4.640281 0.08083881
16 4.642309 0.08068629
17 4.644294 0.08053695
18 4.646237 0.08039086
19 4.648137 0.08024827
20 4.649992 0.08010934
21 4.651802 0.07997427
continued . . .
94
. . . continued
Data Sets Smoothed Mean Smoothed SD
22 4.653567 0.07984321
23 4.655285 0.07971635
24 4.656956 0.07959387
25 4.65858 0.07947592
26 4.660157 0.07936274
27 4.661685 0.07925439
28 4.663166 0.07915105
29 4.664599 0.0790529
30 4.665984 0.07895995
31 4.667322 0.07887228
32 4.668613 0.07879008
33 4.669857 0.07871307
34 4.671056 0.07864152
35 4.672209 0.07857545
36 4.673319 0.07851449
37 4.674386 0.07845881
38 4.675411 0.07840828
39 4.676396 0.07836244
40 4.677342 0.07832134
41 4.678252 0.07828468
42 4.679125 0.07825221
43 4.679965 0.07822363
44 4.680772 0.07819863
continued . . .
95
. . . continued
Data Sets Smoothed Mean Smoothed SD
45 4.681549 0.07817689
46 4.682298 0.07815806
47 4.683019 0.07814178
48 4.683715 0.07812769
49 4.684388 0.07811543
50 4.685038 0.07810463
51 4.685669 0.07809494
52 4.68628 0.07808602
53 4.686874 0.07807754
54 4.687451 0.07806919
55 4.688013 0.07806069
56 4.688561 0.07805178
57 4.689094 0.07804222
58 4.689615 0.07803181
59 4.690124 0.07802037
60 4.69062 0.07800776
61 4.691104 0.07799387
62 4.691577 0.07797857
63 4.692037 0.07796183
64 4.692486 0.07794371
65 4.692922 0.07792409
66 4.693345 0.07790303
67 4.693755 0.07788049
continued . . .
96
. . . continued
Data Sets Smoothed Mean Smoothed SD
68 4.694152 0.07785662
69 4.694534 0.07783156
70 4.694902 0.07780524
71 4.695255 0.07777797
72 4.695592 0.07774966
73 4.695913 0.07772052
74 4.696218 0.07769065
75 4.696505 0.07766014
76 4.696774 0.07762916
77 4.697026 0.07759781
78 4.697259 0.07756618
79 4.697474 0.07753438
80 4.69767 0.0775025
81 4.697848 0.07747058
82 4.698008 0.07743869
83 4.698149 0.07740685
84 4.698271 0.07737503
85 4.698377 0.0773432
86 4.698465 0.07731135
87 4.698536 0.07727941
88 4.698592 0.0772471
89 4.698632 0.07721448
90 4.698658 0.07718144
continued . . .
97
. . . continued
Data Sets Smoothed Mean Smoothed SD
91 4.69867 0.07714765
92 4.69867 0.07711316
93 4.698658 0.07707707
94 4.698636 0.07704011
95 4.698603 0.07700127
96 4.698563 0.07696029
97 4.698515 0.07691734
98 4.698462 0.0768714
99 4.698404 0.07682238
100 4.698343 0.07676942
98
Table 13: Girls with median height and age specific log
scaled SBP percentile values.
Age 90th Percentile 95th Percentile 99th Percentile
9.1 4.733495 4.766604 4.825591
9.2 4.735181 4.768234 4.827128
9.3 4.736867 4.769866 4.828667
9.4 4.738555 4.771499 4.830206
9.5 4.740243 4.773133 4.831747
9.6 4.741932 4.774767 4.833288
9.7 4.74362 4.776401 4.834829
9.8 4.745308 4.778034 4.836369
9.9 4.746994 4.779666 4.837909
10 4.748679 4.781297 4.839448
10.1 4.750362 4.782926 4.840985
10.2 4.752043 4.784553 4.84252
10.3 4.753721 4.786177 4.844053
10.4 4.755396 4.787799 4.845584
10.5 4.757067 4.789417 4.847111
10.6 4.758735 4.791032 4.848635
10.7 4.760398 4.792642 4.850156
10.8 4.762057 4.794248 4.851672
10.9 4.76371 4.795849 4.853184
11 4.765358 4.797445 4.854691
11.1 4.767001 4.799036 4.856193
continued . . .
99
. . . continued
Age 90th Percentile 95th Percentile 99th Percentile
11.2 4.768637 4.80062 4.857689
11.3 4.770266 4.802199 4.85918
11.4 4.771889 4.80377 4.860665
11.5 4.773504 4.805335 4.862143
11.6 4.775112 4.806892 4.863614
11.7 4.776711 4.808441 4.865078
11.8 4.778302 4.809983 4.866535
11.9 4.779884 4.811516 4.867984
12 4.781458 4.81304 4.869424
12.1 4.783021 4.814555 4.870856
12.2 4.784575 4.816061 4.87228
12.3 4.786119 4.817557 4.873694
12.4 4.787652 4.819042 4.875098
12.5 4.789174 4.820517 4.876493
12.6 4.790685 4.821982 4.877878
12.7 4.792184 4.823435 4.879252
12.8 4.793672 4.824877 4.880616
12.9 4.795147 4.826306 4.881968
13 4.796609 4.827724 4.883309
13.1 4.798059 4.829129 4.884638
13.2 4.799495 4.830522 4.885955
13.3 4.800918 4.831901 4.88726
13.4 4.802326 4.833266 4.888552
continued . . .
100
. . . continued
Age 90th Percentile 95th Percentile 99th Percentile
13.5 4.803721 4.834618 4.889832
13.6 4.8051 4.835956 4.891098
13.7 4.806465 4.837279 4.89235
13.8 4.807815 4.838588 4.893588
13.9 4.809148 4.839881 4.894813
14 4.810466 4.841159 4.896023
14.1 4.811768 4.842422 4.897218
14.2 4.813053 4.843668 4.898397
14.3 4.814321 4.844898 4.899562
14.4 4.815572 4.846111 4.900711
14.5 4.816806 4.847308 4.901844
14.6 4.818021 4.848487 4.90296
14.7 4.819219 4.849648 4.90406
14.8 4.820398 4.850792 4.905143
14.9 4.821558 4.851917 4.906209
15 4.822698 4.853024 4.907257
15.1 4.82382 4.854112 4.908288
15.2 4.824922 4.855181 4.9093
15.3 4.826003 4.85623 4.910294
15.4 4.827064 4.85726 4.91127
15.5 4.828105 4.858269 4.912226
15.6 4.829124 4.859259 4.913164
15.7 4.830122 4.860227 4.914081
continued . . .
101
. . . continued
Age 90th Percentile 95th Percentile 99th Percentile
15.8 4.831099 4.861174 4.914979
15.9 4.832053 4.862101 4.915857
16 4.832985 4.863005 4.916714
16.1 4.833895 4.863888 4.917551
16.2 4.834782 4.864748 4.918366
16.3 4.835645 4.865586 4.91916
16.4 4.836485 4.866401 4.919933
16.5 4.837301 4.867193 4.920683
16.6 4.838093 4.867962 4.921412
16.7 4.83886 4.868706 4.922118
16.8 4.839602 4.869427 4.922801
16.9 4.84032 4.870123 4.923461
17 4.841011 4.870795 4.924098
17.1 4.841678 4.871441 4.924711
17.2 4.842318 4.872063 4.9253
17.3 4.842931 4.872658 4.925864
17.4 4.843518 4.873228 4.926404
17.5 4.844077 4.873771 4.92692
17.6 4.84461 4.874288 4.92741
17.7 4.845114 4.874777 4.927874
17.8 4.845591 4.87524 4.928313
17.9 4.846039 4.875675 4.928725
18 4.846458 4.876082 4.929111
continued . . .
102
. . . continued
Age 90th Percentile 95th Percentile 99th Percentile
18.1 4.846848 4.876461 4.929471
18.2 4.847209 4.876811 4.929803
18.3 4.84754 4.877133 4.930108
18.4 4.847841 4.877425 4.930385
18.5 4.848112 4.877688 4.930634
18.6 4.848352 4.87792 4.930855
18.7 4.84856 4.878123 4.931047
18.8 4.848738 4.878295 4.93121
18.9 4.848883 4.878436 4.931344
19 4.848996 4.878546 4.931448
103
Table 14: Smoothing probabilities by local linear smooth-
ing estimator and Nadaraya-Watson Kernel smoothing
estimator for entire cohort.
Age 1-p90 by SNM 1-p95 by SNM 1-np90 by UNM 1-np95 by UNM
9.1 0.05803788 0.022803294 0.07195069 0.03444722
9.2 0.05811078 0.022827506 0.07224472 0.03451883
9.3 0.05817845 0.022849613 0.07253242 0.03458641
9.4 0.05824079 0.022869543 0.07281305 0.03465001
9.5 0.05829699 0.02288699 0.07308645 0.03470968
9.6 0.05834656 0.022901787 0.073352 0.03476539
9.7 0.05838847 0.022913498 0.07360678 0.0348157
9.8 0.05842182 0.022921738 0.0738512 0.034861
9.9 0.05844624 0.022926387 0.07408485 0.03490098
10 0.05846104 0.022927176 0.07430589 0.03493536
10.1 0.05846525 0.022923712 0.07451408 0.03496384
10.2 0.05845849 0.022915888 0.07470823 0.03498642
10.3 0.05843926 0.022903063 0.07488735 0.03500228
10.4 0.05840743 0.02288521 0.07505088 0.03501158
10.5 0.0583618 0.022861864 0.07519683 0.03501363
10.6 0.05830167 0.022832749 0.07532518 0.03500853
10.7 0.05822601 0.022797462 0.07543442 0.03499562
10.8 0.05813429 0.02275582 0.07552378 0.03497503
10.9 0.05802547 0.022707414 0.07559211 0.0349461
11 0.0578991 0.022652075 0.07563932 0.03490875
continued . . .
104
. . . continued
Age 1-p90 by SNM 1-p95 by SNM 1-np90 by UNM 1-np95 by UNM
11.1 0.05775452 0.022589587 0.07566352 0.03486276
11.2 0.05759073 0.022519531 0.07566411 0.03480789
11.3 0.0574073 0.022441781 0.07564059 0.0347441
11.4 0.05720408 0.022356339 0.07559289 0.0346713
11.5 0.05697988 0.02226268 0.07551881 0.0345888
11.6 0.05673479 0.022160899 0.07541893 0.03449675
11.7 0.05646867 0.022050977 0.07529256 0.03439532
11.8 0.05618111 0.021932769 0.07513914 0.03428418
11.9 0.055872 0.021806263 0.07495833 0.03416328
12 0.05554136 0.021671495 0.07474988 0.03403258
12.1 0.0551893 0.02152854 0.07451364 0.03389208
12.2 0.05481605 0.021377518 0.07424957 0.03374178
12.3 0.05442196 0.021218597 0.0739577 0.03358171
12.4 0.05400749 0.021051985 0.07363819 0.03341195
12.5 0.05357321 0.020877935 0.07329128 0.03323256
12.6 0.0531198 0.020696741 0.07291732 0.03304365
12.7 0.05264806 0.02050874 0.07251677 0.03284533
12.8 0.05215888 0.020314301 0.07209016 0.03263773
12.9 0.05165323 0.020113831 0.07163812 0.03242101
13 0.0511322 0.019907766 0.07116138 0.03219532
13.1 0.05059693 0.01969657 0.07066074 0.03196083
13.2 0.05004863 0.019480728 0.07013708 0.03171773
13.3 0.04948859 0.019260743 0.06959136 0.03146619
continued . . .
105
. . . continued
Age 1-p90 by SNM 1-p95 by SNM 1-np90 by UNM 1-np95 by UNM
13.4 0.04891811 0.019037134 0.0690246 0.03120641
13.5 0.04833856 0.018810425 0.06843787 0.03093859
13.6 0.04775131 0.018581147 0.06783231 0.03066294
13.7 0.04715775 0.018349829 0.06720909 0.03037967
13.8 0.04655925 0.018116996 0.06656944 0.03008899
13.9 0.0459572 0.017883164 0.06591459 0.02979113
14 0.04535295 0.017648834 0.06524583 0.0294863
14.1 0.0447478 0.017414491 0.06456446 0.02917475
14.2 0.04414304 0.017180602 0.06387178 0.02885671
14.3 0.04353989 0.016947609 0.06316913 0.02853245
14.4 0.04293952 0.01671593 0.06245782 0.02820223
14.5 0.04234302 0.016485956 0.06173919 0.02786633
14.6 0.04175143 0.01625805 0.06101456 0.02752503
14.7 0.04116572 0.016032546 0.06028523 0.02717865
14.8 0.04058677 0.015809747 0.05955252 0.02682753
14.9 0.04001538 0.015589929 0.05881769 0.026472
15 0.0394523 0.015373336 0.05808202 0.02611243
15.1 0.03889817 0.015160184 0.05734674 0.02574921
15.2 0.03835357 0.014950662 0.05661306 0.02538274
15.3 0.037819 0.014744931 0.05588216 0.02501347
15.4 0.03729489 0.014543128 0.05515519 0.02464182
15.5 0.03678161 0.014345365 0.05443325 0.02426827
15.6 0.03627945 0.014151734 0.05371742 0.0238933
continued . . .
106
. . . continued
Age 1-p90 by SNM 1-p95 by SNM 1-np90 by UNM 1-np95 by UNM
15.7 0.03578865 0.013962305 0.05300872 0.0235174
15.8 0.03530938 0.01377713 0.05230815 0.02314108
15.9 0.03484177 0.013596244 0.05161663 0.02276487
16 0.0343859 0.013419668 0.05093506 0.02238928
16.1 0.0339418 0.01324741 0.05026429 0.02201485
16.2 0.03350946 0.013079465 0.04960508 0.0216421
16.3 0.03308885 0.012915818 0.04895819 0.02127156
16.4 0.0326799 0.012756447 0.04832429 0.02090374
16.5 0.03228249 0.012601319 0.04770401 0.02053916
16.6 0.03189654 0.012450415 0.04709821 0.02017844
16.7 0.03152149 0.012303487 0.04650637 0.01982162
16.8 0.03115794 0.012160876 0.04593015 0.01946977
16.9 0.03080541 0.012022344 0.04536957 0.01912283
17 0.03046354 0.011887786 0.04482458 0.01878114
17.1 0.03013192 0.011756999 0.04429566 0.01844542
17.2 0.02981092 0.011630239 0.0437832 0.01811609
17.3 0.02950006 0.011507334 0.04328741 0.01779333
17.4 0.02919899 0.011388082 0.04280823 0.01747768
17.5 0.02890791 0.011272732 0.04234566 0.01716916
17.6 0.02862638 0.011161014 0.04190018 0.01686785
17.7 0.02835416 0.011052894 0.04147152 0.01657398
17.8 0.02809098 0.01094827 0.04105959 0.01628787
17.9 0.0278365 0.010847015 0.04066411 0.01600972
continued . . .
107
. . . continued
Age 1-p90 by SNM 1-p95 by SNM 1-np90 by UNM 1-np95 by UNM
18 0.0275908 0.010749217 0.0402854 0.0157398
18.1 0.02735343 0.010654692 0.03992308 0.01547763
18.2 0.02712425 0.010563398 0.03957669 0.01522356
18.3 0.02690254 0.010475015 0.03924569 0.01497729
18.4 0.02668897 0.010389954 0.03893077 0.01473981
18.5 0.02648311 0.010307996 0.03863127 0.01451039
18.6 0.02628428 0.01022884 0.03834664 0.01428872
18.7 0.02609273 0.010152629 0.03807706 0.01407526
18.8 0.02590809 0.010079245 0.03782172 0.01386954
18.9 0.02573003 0.010008548 0.03758043 0.01367179
19 0.02555815 0.009940343 0.03735263 0.01348134
108
Table 15: Some values of bandwidth for entire cohort, Caucasian cohort and AfricanAmerican cohort obtained by AIC cross validation method. Cross validation scoresare given in the parenthesis.
Prob. Entire Cohort Caucasian Cohort African American Cohort1-p90 1.488871 (-6.26955) 20353050(-5.97297) 2.48544 (-5.43709)1-p95 2.450879 (-7.58673) 63760225(-7.18375) 2.877118 (-6.61335)1-p99 28514439 (-10.5807) 5637720 (-9.94674) 17.86149 (-9.18483)1-np90 1.148214 (-5.60183) 1.443465(-5.23541) 1.767473 (-4.62549)1-np95 2.451945 (-6.26849) 15246194(-5.63706) 3.393213 (-5.27116)1-np99 3.497903 (-7.95842) 11684023(-7.60147) 4.861644 (-6.83946)
109
Table 16: Some values of bandwidth for entire cohort, Caucasian cohort and AfricanAmerican cohort obtained by LS cross validation method. Cross validation scores aregiven in the parenthesis.
Prob. Entire Cohort Caucasian Cohort African American Cohort1-p90 1.487693 (0.000683) 13116003 (0.000916) 2.59700 (0.001572)1-p95 2.452068 (0.000182) 57713608 (0.000272) 2.92834 (0.000484)1-p99 22484681 (0.0000091) 4244505 (0.000017) 22.9261 (0.000036)1-np90 1.213203 ( 0.00134) 1.507063 (0.001940) 1.85182 (0.003550)1-np95 2.924783 (0.000687) 50740390 (0.001281) 3.53802 (0.001855)1-np99 2.705329 (0.000124) 1.122294 (0.000181) 3.87748 (0.000383)
110
Tab
le17
:M
Les
tim
ator
san
dlo
cal
pol
ynom
ial
smoot
h-
ing
esti
mat
ors
ofL
amb
da
wit
hth
eir
corr
esp
ondin
gm
ean,
min
imum
and
max
imum
ofea
chsu
b-s
ample
s.
Age
Raw
Lam
bda
Raw
mea
nR
awm
inR
awm
axSm
oot
hL
amb
da
Sm
oot
hm
ean
Sm
oot
hm
inSm
oot
hm
ax
9.1
0.40
013
.211
11.9
9914
.467
0.44
515
.126
13.6
4016
.677
9.2
0.80
048
.650
39.5
4357
.094
0.43
214
.621
12.8
8516
.125
9.3
2.00
049
32.7
8626
64.0
0067
27.5
000.
419
13.9
5112
.005
15.0
84
9.4
1.70
014
50.9
4888
5.02
719
85.4
190.
405
13.3
7811
.644
14.6
38
9.5
0.00
04.
604
4.38
24.
812
0.39
112
.936
11.6
3914
.241
9.6
2.00
051
48.3
1028
12.0
0075
64.0
000.
377
12.4
6210
.863
13.6
36
9.7
0.60
024
.799
21.2
6428
.965
0.36
311
.922
10.7
0713
.285
9.8
0.60
024
.888
21.7
8328
.387
0.34
911
.465
10.4
6912
.539
9.9
0.40
013
.296
12.0
7015
.019
0.33
410
.972
10.0
6212
.236
10.0
1.00
010
1.08
982
.000
119.
000
0.32
010
.589
9.71
911
.326
10.1
0.70
034
.914
27.6
3641
.690
0.30
510
.143
8.90
411
.187
10.2
-0.1
003.
701
3.55
63.
849
0.29
09.
729
8.88
410
.665
conti
nued
...
111
...c
onti
nued
Age
Raw
Lam
bda
Raw
mea
nR
awm
inR
awm
axSm
oot
hL
amb
da
Sm
oot
hm
ean
Sm
oot
hm
inSm
oot
hm
ax
10.3
0.90
069
.719
52.3
5384
.588
0.27
59.
295
8.23
810
.079
10.4
0.10
05.
880
5.49
96.
219
0.26
08.
948
8.16
59.
669
10.5
0.80
048
.900
40.3
7857
.094
0.24
48.
540
7.84
89.
145
10.6
-0.6
001.
563
1.54
71.
577
0.22
98.
234
7.57
88.
968
10.7
-0.9
001.
094
1.09
01.
097
0.21
37.
867
7.31
78.
514
10.8
0.00
04.
634
4.35
74.
942
0.19
87.
592
6.91
58.
384
10.9
0.40
013
.455
11.7
0815
.073
0.18
27.
280
6.62
57.
857
11.0
0.50
018
.242
15.4
3620
.804
0.16
76.
978
6.35
07.
506
11.1
-0.1
003.
709
3.54
83.
859
0.15
16.
715
6.21
57.
208
11.2
0.10
05.
892
5.48
06.
270
0.13
66.
448
5.96
36.
895
11.3
0.30
010
.062
9.17
010
.923
0.12
06.
204
5.81
06.
573
11.4
-0.6
001.
564
1.55
11.
578
0.10
55.
977
5.65
76.
387
11.5
1.00
010
4.25
681
.000
127.
000
0.08
95.
772
5.40
26.
076
11.6
0.40
013
.565
11.9
2715
.443
0.07
45.
554
5.18
05.
951
11.7
-1.3
000.
767
0.76
70.
768
0.05
95.
332
5.08
45.
644
conti
nued
...
112
...c
onti
nued
Age
Raw
Lam
bda
Raw
mea
nR
awm
inR
awm
axSm
oot
hL
amb
da
Sm
oot
hm
ean
Sm
oot
hm
inSm
oot
hm
ax
11.8
0.00
04.
661
4.44
34.
875
0.04
55.
181
4.91
45.
446
11.9
0.90
072
.725
56.8
8491
.346
0.03
05.
002
4.69
95.
295
12.0
-0.1
003.
723
3.59
53.
854
0.01
64.
835
4.61
65.
061
12.1
1.20
022
3.54
216
1.72
230
4.57
70.
002
4.68
14.
414
4.94
4
12.2
-0.5
001.
806
1.78
61.
824
-0.0
124.
540
4.35
34.
726
12.3
-0.1
003.
725
3.58
73.
881
-0.0
254.
402
4.20
84.
627
12.4
-0.2
003.
029
2.94
43.
105
-0.0
374.
273
4.09
44.
438
12.5
-0.6
001.
564
1.55
31.
576
-0.0
504.
157
4.01
54.
307
12.6
-0.2
003.
032
2.92
93.
123
-0.0
614.
055
3.86
24.
231
12.7
1.00
010
5.15
781
.000
133.
000
-0.0
723.
956
3.77
34.
123
12.8
-0.5
001.
806
1.78
81.
825
-0.0
833.
869
3.74
84.
010
12.9
0.40
013
.784
12.4
8815
.180
-0.0
933.
797
3.66
33.
930
13.0
0.80
051
.196
42.0
3560
.135
-0.1
023.
713
3.56
43.
835
13.1
-1.1
000.
904
0.90
30.
905
-0.1
113.
651
3.55
73.
758
13.2
-0.8
001.
220
1.21
61.
226
-0.1
193.
587
3.48
73.
735
conti
nued
...
113
...c
onti
nued
Age
Raw
Lam
bda
Raw
mea
nR
awm
inR
awm
axSm
oot
hL
amb
da
Sm
oot
hm
ean
Sm
oot
hm
inSm
oot
hm
ax
13.3
-1.1
000.
904
0.90
30.
905
-0.1
263.
537
3.43
03.
665
13.4
0.20
07.
732
7.18
68.
256
-0.1
323.
487
3.36
83.
595
13.5
-1.1
000.
904
0.90
30.
905
-0.1
383.
449
3.34
93.
575
13.6
-0.6
001.
566
1.55
51.
581
-0.1
423.
415
3.32
93.
549
13.7
-1.3
000.
767
0.76
70.
768
-0.1
463.
390
3.28
63.
540
13.8
-1.2
000.
830
0.82
90.
831
-0.1
493.
362
3.26
63.
479
13.9
-0.1
003.
738
3.60
23.
849
-0.1
513.
353
3.24
63.
441
14.0
-0.3
002.
514
2.45
42.
570
-0.1
533.
343
3.22
63.
456
14.1
0.70
036
.087
29.2
6641
.690
-0.1
533.
334
3.19
23.
431
14.2
1.00
010
7.10
885
.000
126.
000
-0.1
533.
343
3.23
13.
422
14.3
-1.1
000.
904
0.90
30.
905
-0.1
523.
349
3.25
63.
467
14.4
-0.1
003.
738
3.62
43.
899
-0.1
503.
366
3.27
53.
493
14.5
-0.9
001.
095
1.09
11.
098
-0.1
473.
383
3.28
23.
493
14.6
0.00
04.
685
4.40
74.
920
-0.1
433.
413
3.26
83.
531
14.7
-1.3
000.
767
0.76
70.
768
-0.1
393.
440
3.35
23.
550
conti
nued
...
114
...c
onti
nued
Age
Raw
Lam
bda
Raw
mea
nR
awm
inR
awm
axSm
oot
hL
amb
da
Sm
oot
hm
ean
Sm
oot
hm
inSm
oot
hm
ax
14.8
0.70
036
.513
31.9
0441
.922
-0.1
343.
480
3.38
13.
581
14.9
-1.1
000.
904
0.90
30.
905
-0.1
283.
534
3.42
73.
647
15.0
-1.4
000.
713
0.71
30.
713
-0.1
223.
565
3.49
73.
666
15.1
1.10
015
8.22
812
8.97
418
8.12
4-0
.114
3.63
03.
523
3.72
3
15.2
-1.2
000.
830
0.83
00.
831
-0.1
073.
690
3.57
33.
840
15.3
0.60
025
.830
22.6
3128
.965
-0.0
993.
743
3.61
33.
857
15.4
0.20
07.
793
7.35
28.
236
-0.0
903.
831
3.71
63.
943
15.5
-1.0
000.
991
0.98
90.
993
-0.0
813.
899
3.77
34.
046
15.6
0.30
010
.293
9.56
711
.057
-0.0
713.
990
3.85
94.
120
15.7
1.00
010
6.37
285
.000
127.
000
-0.0
614.
065
3.89
94.
198
15.8
-1.3
000.
767
0.76
70.
768
-0.0
514.
169
4.03
94.
324
15.9
0.00
04.
696
4.48
94.
963
-0.0
404.
278
4.10
64.
498
16.0
0.50
019
.018
17.0
7920
.978
-0.0
294.
391
4.22
44.
548
16.1
1.60
011
70.2
3483
6.26
415
25.2
33-0
.018
4.50
64.
318
4.66
2
16.2
-0.5
001.
810
1.79
41.
828
-0.0
074.
626
4.46
94.
818
conti
nued
...
115
...c
onti
nued
Age
Raw
Lam
bda
Raw
mea
nR
awm
inR
awm
axSm
oot
hL
amb
da
Sm
oot
hm
ean
Sm
oot
hm
inSm
oot
hm
ax
16.3
0.10
05.
979
5.68
36.
232
0.00
44.
730
4.54
04.
891
16.4
0.30
010
.297
9.48
111
.409
0.01
54.
866
4.64
65.
149
16.5
-1.6
000.
625
0.62
50.
625
0.02
75.
013
4.84
25.
299
16.6
-1.0
000.
991
0.98
80.
994
0.03
85.
145
4.81
25.
599
16.7
0.00
04.
677
4.48
94.
844
0.04
95.
263
5.02
65.
474
16.8
0.60
026
.204
22.9
6428
.965
0.06
15.
430
5.16
05.
643
16.9
-1.3
000.
768
0.76
70.
768
0.07
25.
596
5.35
95.
945
17.0
-0.4
002.
118
2.07
92.
149
0.08
35.
747
5.39
16.
056
17.1
2.00
060
80.9
1633
61.5
0079
37.5
000.
094
5.90
75.
461
6.12
4
17.2
1.20
023
1.35
918
6.09
227
5.39
20.
105
6.05
85.
769
6.30
0
17.3
-0.5
001.
808
1.79
11.
834
0.11
66.
229
5.93
96.
739
17.4
1.70
017
61.4
7512
81.7
1421
29.3
550.
126
6.42
36.
095
6.63
3
17.5
-1.3
000.
767
0.76
70.
768
0.13
66.
568
6.25
27.
053
17.6
2.00
060
94.4
0736
12.0
0087
11.5
000.
146
6.75
86.
259
7.13
0
17.7
0.30
010
.214
9.60
910
.990
0.15
66.
884
6.57
37.
276
conti
nued
...
116
...c
onti
nued
Age
Raw
Lam
bda
Raw
mea
nR
awm
inR
awm
axSm
oot
hL
amb
da
Sm
oot
hm
ean
Sm
oot
hm
inSm
oot
hm
ax
17.8
1.30
035
0.19
825
4.75
546
0.32
90.
166
7.13
16.
617
7.60
6
17.9
0.70
036
.724
31.3
8441
.922
0.17
57.
277
6.79
97.
702
18.0
1.40
052
1.79
736
4.18
272
1.14
30.
184
7.49
56.
906
8.06
5
18.1
-0.4
002.
117
2.08
32.
145
0.19
37.
634
7.11
88.
120
18.2
-2.0
000.
500
0.50
00.
500
0.20
27.
821
7.41
58.
594
18.3
-1.3
000.
767
0.76
70.
768
0.21
07.
972
7.60
78.
436
18.4
0.00
04.
690
4.45
44.
860
0.21
88.
177
7.53
58.
656
18.5
2.00
060
99.2
0439
60.0
0083
20.0
000.
226
8.37
97.
784
8.85
4
18.6
2.00
060
14.2
3843
24.0
0079
37.5
000.
234
8.54
38.
069
8.97
8
18.7
0.00
04.
684
4.48
94.
898
0.24
18.
693
8.10
09.
371
18.8
-2.0
000.
500
0.50
00.
500
0.24
98.
926
8.22
110
.048
18.9
1.00
010
7.63
288
.000
128.
000
0.25
69.
047
8.41
09.
636
19.0
1.90
039
53.4
6024
38.0
8455
46.5
210.
262
9.24
08.
414
9.88
3
117
Table 18: ML estimators and their local polynomial
smoothing estimators with corresponding p-values by
Shapiro-Wilk (SW) Test for each sub-samples.
Age ML.Lambda SW p-values Smooth.Lambda SW p-values
9.1 0.4 0.798 0.445119729 0.893
9.2 0.8 0.367 0.43195112 0.254
9.3 2 0.791 0.418588776 0.441
9.4 1.7 0.843 0.405032313 0.576
9.5 0 0.385 0.391255708 0.207
9.6 2 0.476 0.377294702 0.072
9.7 0.6 0.694 0.36317325 0.375
9.8 0.6 0.568 0.348865167 0.331
9.9 0.4 0.127 0.334367136 0.108
10 1 0.407 0.319726542 0.369
10.1 0.7 0.732 0.304901839 0.449
10.2 -0.1 0.562 0.28996274 0.389
10.3 0.9 0.877 0.27487598 0.395
10.4 0.1 0.102 0.259678107 0.009
10.5 0.8 0.42 0.244369525 0.686
10.6 -0.6 0.922 0.228976998 0.699
10.7 -0.9 0.112 0.213495789 0.091
10.8 0 0.548 0.197943409 0.226
10.9 0.4 0.353 0.182357136 0.115
11 0.5 0.501 0.166763948 0.222
continued . . .
118
. . . continued
Age ML.Lambda SW p-values Smooth.Lambda SW p-values
11.1 -0.1 0.357 0.151194377 0.206
11.2 0.1 0.583 0.135659244 0.35
11.3 0.3 0.338 0.120168606 0.325
11.4 -0.6 0.515 0.104747617 0.241
11.5 1 0.882 0.089463339 0.526
11.6 0.4 0.537 0.074334151 0.106
11.7 -1.3 0.469 0.059372383 0.075
11.8 0 0.298 0.04462395 0.097
11.9 0.9 0.637 0.030124114 0.105
12 -0.1 0.571 0.015909271 0.471
12.1 1.2 0.251 0.002016778 0.024
12.2 -0.5 0.509 0.011515247 0.25
12.3 -0.1 0.762 0.024648152 0.488
12.4 -0.2 0.389 0.037343003 0.144
12.5 -0.6 0.068 0.049560851 0.021
12.6 -0.2 0.076 0.061263029 0.01
12.7 1 0.134 -0.07241146 0.032
12.8 -0.5 0.659 0.082968974 0.475
12.9 0.4 0.662 0.092899646 0.444
13 0.8 0.382 0.102169121 0.124
13.1 -1.1 0.137 0.110744956 0.059
13.2 -0.8 0.286 0.118596941 0.061
13.3 -1.1 0.836 0.125697421 0.219
continued . . .
119
. . . continued
Age ML.Lambda SW p-values Smooth.Lambda SW p-values
13.4 0.2 0.472 0.132021594 0.26
13.5 -1.1 0.367 0.137547793 0.038
13.6 -0.6 0.26 0.142257739 0.264
13.7 -1.3 0.634 0.146136763 0.119
13.8 -1.2 0.952 0.149174001 0.439
13.9 -0.1 0.216 0.151362541 0.15
14 -0.3 0.679 0.152699535 0.553
14.1 0.7 0.212 0.153186271 0.226
14.2 1 0.345 0.152828196 0.505
14.3 -1.1 0.783 0.151634893 0.505
14.4 -0.1 0.137 -0.14962002 0.118
14.5 -0.9 0.708 0.146801197 0.286
14.6 0 0.028 0.143199859 0.012
14.7 -1.3 0.6 0.138841059 0.155
14.8 0.7 0.303 0.133753241 0.101
14.9 -1.1 0.58 0.127967976 0.088
15 -1.4 0.058 -0.12151967 0.106
15.1 1.1 0.422 0.114445245 0.121
15.2 -1.2 0.582 0.106783802 0.346
15.3 0.6 0.799 0.098576264 0.536
15.4 0.2 0.068 0.089865014 0.033
15.5 -1 0.552 0.080693526 0.302
15.6 0.3 0.735 0.071105997 0.767
continued . . .
120
. . . continued
Age ML.Lambda SW p-values Smooth.Lambda SW p-values
15.7 1 0.412 0.061146984 0.607
15.8 -1.3 0.309 0.050861052 0.246
15.9 0 0.088 0.040292435 0.036
16 0.5 0.695 0.029484719 0.564
16.1 1.6 0.934 0.018480543 0.377
16.2 -0.5 0.647 -0.00732133 0.541
16.3 0.1 0.847 0.003952961 0.771
16.4 0.3 0.116 0.015304047 0.083
16.5 -1.6 0.855 0.026695509 0.369
16.6 -1 0.004 0.03809056 0.021
16.7 0 0.001 0.049457222 0
16.8 0.6 0.636 0.060760408 0.553
16.9 -1.3 0.959 0.071983753 0.471
17 -0.4 0.275 0.083110792 0.111
17.1 2 0.554 0.094090441 0.314
17.2 1.2 0.909 0.104928501 0.783
17.3 -0.5 0.197 0.115594631 0.324
17.4 1.7 0.358 0.126074211 0.577
17.5 -1.3 0.632 0.136347126 0.416
17.6 2 0.95 0.146409111 0.143
17.7 0.3 0.614 0.156253827 0.633
17.8 1.3 0.244 0.165858147 0.109
17.9 0.7 0.936 0.175231839 0.823
continued . . .
121
. . . continued
Age ML.Lambda SW p-values Smooth.Lambda SW p-values
18 1.4 0.032 0.184356962 0.021
18.1 -0.4 0.809 0.193249469 0.441
18.2 -2 0.872 0.201896656 0.116
18.3 -1.3 0.617 0.210287399 0.238
18.4 0 0.341 0.218428597 0.307
18.5 2 0.467 0.226325664 0.249
18.6 2 0.583 0.233988684 0.201
18.7 0 0.334 0.241412567 0.108
18.8 -2 0.284 0.248602215 0.003
18.9 1 0.891 0.255575468 0.563
19 1.9 0.409 0.262308056 0.125
122
−2 −1 0 1 2
4.4
4.6
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 1−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 2−2 −1 0 1 2
4.3
4.6
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 3
−2 −1 0 1 2
4.3
4.6
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 4−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 5−2 −1 0 1 2
4.3
4.6
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 6
−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 7−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 8−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 9
−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 10−3 −2 −1 0 1 2 3
4.3
4.6
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 11−3 −2 −1 0 1 2 3
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 12
Figure 11: QQplot of SBP after log transformation from the 1st data set to 12th dataset.
123
−3 −2 −1 0 1 2 3
4.3
4.6
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 13−3 −2 −1 0 1 2 3
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 14−3 −2 −1 0 1 2 3
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 15
−3 −2 −1 0 1 2 3
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 16−3 −2 −1 0 1 2 3
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 17−3 −2 −1 0 1 2 3
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 18
−3 −2 −1 0 1 2 3
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 19−3 −2 −1 0 1 2 3
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 20−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 21
−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 22−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 23−3 −2 −1 0 1 2 3
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 24
Figure 12: QQplot of SBP after log transformation from the 13th data set to 24thdata set.
124
−3 −2 −1 0 1 2 3
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 25−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 26−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 27
−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 28−3 −2 −1 0 1 2 3
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 29−3 −2 −1 0 1 2 3
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 30
−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 31−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 32−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 33
−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 34−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 35−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 36
Figure 13: QQplot of SBP after log transformation from the 25th data set to 36thdata set.
125
−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 37−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 38−3 −2 −1 0 1 2 3
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 39
−3 −2 −1 0 1 2 3
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 40−2 −1 0 1 2
4.55
4.80
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 41−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 42
−3 −2 −1 0 1 2 3
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 43−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 44−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 45
−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 46−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 47−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 48
Figure 14: QQplot of SBP after log transformation from the 37th data set to 48thdata set.
126
−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 49−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 50−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 51
−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 52−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 53−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 54
−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 55−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 56−2 −1 0 1 2
4.5
4.7
4.9
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 57
−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 58−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 59−2 −1 0 1 2
4.55
4.75
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 60
Figure 15: QQplot of SBP after log transformation from the 49th data set to 60thdata set.
127
−2 −1 0 1 2
4.50
4.75
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 61−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 62−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 63
−2 −1 0 1 2
4.55
4.80
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 64−2 −1 0 1 2
4.5
4.7
4.9
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 65−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 66
−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 67−2 −1 0 1 2
4.6
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 68−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 69
−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 70−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 71−2 −1 0 1 2
4.6
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 72
Figure 16: QQplot of SBP after log transformation from the 61th data set to 72thdata set.
128
−2 −1 0 1 2
4.50
4.75
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 73−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 74−2 −1 0 1 2
4.6
4.9
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 75
−2 −1 0 1 2
4.4
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 76−2 −1 0 1 2
4.50
4.75
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 77−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 78
−2 −1 0 1 2
4.6
4.9
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 79−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 80−2 −1 0 1 2
4.4
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 81
−2 −1 0 1 2
4.50
4.75
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 82−2 −1 0 1 2
4.6
4.9
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 83−2 −1 0 1 2
4.55
4.75
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 84
Figure 17: QQplot of SBP after log transformation from the 73th data set to 84thdata set.
129
−2 −1 0 1 2
4.6
4.9
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 85−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 86−2 −1 0 1 2
4.55
4.80
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 87
−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 88−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 89−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 90
−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 91−2 −1 0 1 2
4.6
4.9
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 92−2 −1 0 1 2
4.55
4.75
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 93
−2 −1 0 1 2
4.5
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 94−2 −1 0 1 2
4.5
4.7
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 95−2 −1 0 1 2
4.55
4.75
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 96
Figure 18: QQplot of SBP after log transformation from the 85th data set to 96thdata set.
130
−2 −1 0 1 2
4.5
4.6
4.7
4.8
4.9
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 97−2 −1 0 1 2
4.5
4.6
4.7
4.8
4.9
5.0
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 98
−2 −1 0 1 2
4.5
4.6
4.7
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 99−2 −1 0 1 2
4.5
4.6
4.7
4.8
Theoretical Quantiles
Sam
ple
Qua
ntile
s
Data Set 100
Figure 19: QQplot of SBP after log transformation from the 97th data set to 100thdata set.
131
10 12 14 16 18
-2
-1
0
1
2
age
lam
da
Figure 20: Local polynomial smoothing estimator of Box-Cox Lambda.
132
8 Appendix 2: Proof of Theoretical Results
8.1 A.1 Useful Approximation for the Equivalent Kernels
The following approximations for the equivalent kernel function Wq,p+1(tj, t;h) are
used in computing the asymptotic bias and variance of F(q)
t,θ(t|x)
[y(t)|x
]:
Wq,p+1(tj, t;h) =q!
Jhq+1g(t)Kq,p+1
(tj − th
)[1 + op(1)
], j = 1, . . . , J ; (A.1)
J∑j=1
Wq,p+1(tj, t;h) (tj − t)k = q!1[k=q], k = 0, 1, . . . , p; (A.2)
J∑j=1
Wq,p+1(tj, t;h) (tj − t)p+1
= q!hp−q+1Bp+1
(Kq,p+1
)[1 + op(1)
], k = 1, . . . , p; (A.3)
J∑j=1
W 2q,p+1(tj, t;h) =
(q!)2
Jh2q+1g(t)V(Kq,p+1
)[1 + op(1)
], (A.4)
where Kq,p+1(t), Bp+1(K) and V (K) are defined in Chapter 4. Proofs of equations
(A.1)-(A.4) are given Fan and Zhang (2000, Appendix A, Lemma 1 and Lamma 2).
8.2 A.2 Proof of Theorem 1
First, note that Ftj ,θ(tj |x)
[y(tj)|x
]= Ftj ,θ(tj |x)
[y(tj)|x
]. Using the equation (6), the
bias of F(q)t,θ(t|x)
[y(t)
∣∣x] is
E{F
(q)t,θ(t|x)
[y(t)
∣∣x]}− F (q)t,θ(t|x)
[y(t)
∣∣x] =W1 +W2, (A.5)
133
where W1 =∑J
j=1Wq,p+1(tj, t;h){EFtj ,θ(tj |x)
[y(tj)|x
]− Ftj ,θ(tj |x)
[y(tj)|x
]},
W2 =∑J
j=1Wq,p+1(tj, t;h)Ftj ,θ(tj |x)
[y(tj)|x
]− F (q)
t,θ(t|x)
[y(t)
∣∣x]. (A.6)
It then follows from (15) and (A.1) that
W1 =J∑j=1
q!
Jhq+1g(t)Kq,p+1
(tj − th
)[1 + op(1)
]op(n−1/2j
)= op
(n−1/2
), (A.7)
where the second equality holds because, by Assumption A2, limn→∞(nj/n) is bounded
between 0 and 1, and∑J
j=1
∣∣q![Jhq+1g(t)]−1
Kq,p+1
[(tj − t)/h
]∣∣ is bounded. By the
Taylor expansions for Ftj ,θ(tj)[y(tj)|x
]and the equations (A.2) and (A.3), we have
W2 =J∑j=1
Wq,p+1(tj, t;h){ p+1∑k=0
[F
(k)t,θ(t|x)
[y(t)|x
](tj − t)kk!
]+op
[(tj − t)p+1
]}− F (q)
t,θ(t|x)
[y(t)|x
]=
q!hp−q+1
(p+ 1)!F
(p+1)t,θ(t|x)
[y(t)|x
]Bp+1
(Kq,p+1
)[1 + op(1)
]. (A.8)
Since, by Assumption A1, and the asymptotic expressions of (A.7) and (A.8), W2 is
the dominating term overW1. The asymptotic expression for the bias of F(q)t,θ(t|x)
[y(t)
∣∣x]follows from (A.6)-(A.8).
Let µF (tj) = E{Ftj ,θ(tj |x)
[y(tj)
∣∣x]}. Then, by equation (6),
V ar{F
(q)t,θ(t|x)
[y(t)
∣∣x]} = E{ J∑
j=1
Wq,p+1(tj, t;h)[Ftj ,θ(tj |x)
[y(tj)
∣∣x]− µF (tj)]}2
= W3 +W4, (A.9)
134
where, by (15), (A.4) and Assumption A2 with cj = c,
W3 =J∑j=1
W 2q,p+1(tj, t;h)E
{[Ftj ,θ(tj |x)
[y(tj)
∣∣x]− µF (tj)]2}
=J∑j=1
W 2q,p+1(tj, t;h)V ar
{Ftj ,θ(tj |x)
[y(tj)
∣∣x]]=
(q!)2
cnJh2q+1g(t)V(Kq,p+1
)×F ′t,θ(t|x)
[y(t)
∣∣x]T I−1[θ(t|x)
]F ′t,θ(t|x)
[y(t)
∣∣x][1 + op(1)]
(A.10)
and, by (14), Assumptions A2 and A4, the equation (A.1) and limn→∞ njk/n = cjk,
there is a constant C1 > 0 such that, when n is sufficiently large,
W4 =J∑j 6=k
{Wq,p+1(tj, t;h)Wq,p+1(tk, t;h)
×E{[Ftj ,θ(tj |x)
[y(tj)
∣∣x]− µF (tj)][Ftk,θ(tk|x)
[y(tk)
∣∣x]− µF (tk)]}}
≤J∑j 6=k
{∣∣∣Wq,p+1(tj, t;h)Wq,p+1(tk, t;h)∣∣∣
×∣∣∣Cov{Ftj ,θ(tj |x)
[y(tj)
∣∣x], Ftk,θ(tk|x)
[y(tk)
∣∣x]}∣∣∣} (A.11)
≤ C1
J2h2q+2g2(t)
J∑j 6=k
∣∣∣Kq,p+1
(tj − th
)Kq,p+1
(tk − th
) [ ρF (tj, tk|x)
r(nj, nk, njk)
]∣∣∣.The bounded support of K(·) implies that, for any t, Kq,p+1
[(tj−t)/h
]= Kq,p+1
[(tk−
t)/h]
= 0 for any j 6= k such that∣∣tj − tk∣∣ ≤ ah for some constant a > 0.
We now consider the following three situations:
(i) If∣∣tj−tk∣∣ ≤ δ, by Assumption A2, Cov
{Ftj ,θ(tj |x)
[y(tj)
∣∣x], Ftk,θ(tk|x)
[y(tk)
∣∣x]} =
0, so that W4 = 0, and, by (A.9), V ar{F
(q)t,θ(t|x)
[y(t)
∣∣x]} =W3.
(ii) If∣∣tj − tk∣∣ > δ ≥ ah, then K
[(tj − t)/h
]K[(tk− t)/h
]= 0, so that, by (A.11),
W4 = 0, and it still follows from (A.9) that V ar{F
(q)t,θ(t|x)
[y(t)
∣∣x]} =W3.
135
(iii) If δ < ah and δ ≤∣∣tj − tk∣∣ ≤ ah, since Kq,p+1(s) and ρF (tj, tk) are bounded,
there is C2 > 0, so that, by (A.11) and∑J
j=1
∑k:δ≤|tk−tj |≤ah r
−1(nj, nk, njk) = o(Jh),
W4 ≤C2
J2h2q+2g2(t)
{ J∑j=1
∑k:δ≤|tk−tj |≤ah
r−1(nj, nk, njk)}[
1 + op(1)]
= op[(nJh2q+1
)−1]. (A.12)
Then, by (A.9) and (A.12), V ar{F
(q)t,θ(t|x)
[y(t)
∣∣x]} =W3
[1+op(1)
]. Since, for all three
situations (i), (ii) and (iii), V ar{F
(q)t,θ(t|x)
[y(t)
∣∣x]} = W3
[1 + op(1)
], the asymptotic
expression for the variance of F(q)t,θ(t|x)
[y(t)
∣∣x] follows from (A.9)-(A.11).
136
9 Appendix 3: R Code
ONLY FOR BOOTSTRAPS
For bootstrap Confidence band
scatboot <- function(x,y,nreps=100,confidence=0.9,degree=2,span=2/3,
family=“gaussian”){
dat <- na.omit(data.frame(x=x,y=y))
if(nrow(dat) == 0) {
print(“Error: No data left after dropping NAs”)
print(dat)
return(NULL)
ndx <- order(dat$x)
dat$x <- dat$x[ndx]
dat$y <- dat$y[ndx]
r <- range(dat$x, na.rm=T)
x.out <- seq(r[1], r[2], length.out=40)
f <- loess(y∼x, data=dat, degree=degree, span=span,family=family)
y.fit <- approx(f$x, fitted(f), x.out,rule=2)$y
len <- length(dat$x)
mat <- matrix(0,nreps,length(x.out))
for(i in seq(nreps)){
ndx <- sample(len,replace=T)
x.repl <- x[ndx]
y.repl <- y[ndx]
f <- loess(y.repl x.repl, degree=degree, span=span, family=family)
mat[i,] <- predict(f, newdata=x.out)
n.na <- apply(is.na(mat), 2, sum)
137
nx <- ncol(mat)
up.lim <- rep(NA, nx)
low.lim <- rep(NA, nx)
stddev <- rep(NA, nx)
for(i in 1:nx) {
if(n.na[i] > nreps*(1.0-confidence)) {
# Too few good values to get estimate next
conf <- confidence*nreps/(nreps-n.na[i])
pr <- 0.5*(1.0 - conf)
up.lim[i] <- quantile(mat[,i], 1.0-pr, na.rm=T)
low.lim[i] <- quantile(mat[,i], pr, na.rm=T)
stddev[i] <- sd(mat[,i], na.rm=T)
ndx <- !is.na(up.lim) # indices of good values
fit <- data.frame(x=x.out[ndx],y.fit=y.fit[ndx],up.lim=up.lim[ndx],
low.lim=low.lim[ndx], stddev=stddev[ndx])
return(list(nreps=nreps, confidence=confidence,degree=degree,
span=span,family=family, data=dat, fit=fit))
scatboot.plot <- function(sb,...) {
require(lattice)
require(grid)
p <- xyplot(y∼ x, data=sb$data,
panel=function(x,y,...) {
panel.xyplot(x,y,...)
panel.xyplot(sb$fit$x, sb$fit$y.fit, type=“l”,...)
panel.xyplot(sb$fit$x, sb$fit$up.lim, type=“l”, col=“gray”,...)
panel.xyplot(sb$fit$x, sb$fit$low.lim, type=“l”, col=“gray”,...)
138
pg.x <- c(sb$fit$x, rev(sb$fit$x))
pg.y <- c(sb$fit$up.lim, rev(sb$fit$low.lim))
grid.polygon(pg.x, pg.y, gp=gpar(fill=“pink”,col=“transparent”, alpha=0.5),
default.units=“native”), ...)
return(p)
# Alan R. Rogers, 26 Feb 2011
scatboot.test <- function(nreps=100, confidence=0.9, span=2/3,
degree=2, family=“gaussian”) {
x <- seq(0, 4, length.out=25)
y <- sin(2*x)+ 0.5*x + rnorm(25, sd=0.5)
sb <- scatboot(x, y, nreps=nreps, confidence=confidence, span=span,
degree=degree, family=family)
scatboot.plot(sb)
# My bootstrap code
# Fit a kernel smoothers and calculates a symmetric
# nonparametric bootstrap confidence intervals.
# Arguments:
# x, y : data values
# nreps : number of bootstrap replicates
# mohammed, May 5, 2012
ksboot <- function(x,y,nreps=1000, band=5, confidence = 0.95)
# Put input data into a data frame, sorted by x, with no missing values.
dat <- na.omit(data.frame(x=x,y=y))
if(nrow(dat) == 0) {
print(“Error: No data left after dropping NAs”)
print(dat)
139
return(NULL)
ndx <- order(dat$x)
dat$x <- dat$x[ndx]
dat$y <- dat$y[ndx]
# Fit curve to data
require(KernSmooth)
len <- length(dat$x)
f0 <- ksmooth(x, y, kernel = “normal”, bandwidth = band, n.points = len)
y.fit <- f0$y
# Generate bootstrap replicates
mat <- matrix(0,NROW(dat), nreps)
for(i in seq(nreps)){
ndx <- sample(len,replace=T)
x.repl <- x[ndx]
y.repl <- y[ndx]
f <- ksmooth(x.repl, y.repl, kernel = “normal”, bandwidth = 5, n.points = len)
mat[, i] <- f$y
# calculating confidence intervals
ci <- t(apply(mat, 1, quantile, probs = c((1-confidence)/2, (1+confidence)/2)))
res <- cbind(as.data.frame(f0), ci)
colnames(res) <- c(‘x’,‘y’, ‘lwr.limit’,‘upr.limit’)
res
# ———–
# example
# ———–
140
m <- with(cars, ksboot(speed, dist, nreps = 5000))
with(cars, plot(speed, dist, las = 1))
with(m, matpoints(x, m[, - 1], type = ‘l’, col = c(1, 2, 2), lty = c(1, 2, 2)))
# MY R code Starts from here
u<-.10
v<-.05
w<-.01
setwd(“C:\\Users\\mchowdhury\\Desktop\\phd”)
library(MASS)
library(TeachingDemos)
library(car)
library(nnet)
library(fitdistrplus)
library(survival)
library(splines)
library(MASS)
library(KernSmooth)
library(copula)
library(mvtnorm)
library(scatterplot3d)
library(pspline)
library(grid)
require(graphics)
require(KernSmooth)
require(gridExtra)
require(copula)
141
require(np)
library(nortest)
# With Missing Data
new<-read.csv(“new.csv”,header=TRUE)
x<- new[c(1,2,4,6,7,8,9,10,12)]
x<- new[c(1,2,4,6,7,8,9,10)]
#x1<-x[which(x$RACE==1),]
#x2<-x[which(x$RACE==2),]
dim(x)
head(x)
summary(x$AGE)
# Without Missing Data
newx<-na.omit(x)
dim(newx)
head(newx)
summary(newx$AGE)
xx <- subset(newx,AGE≥9.09 & AGE <19.01)
xx$AGE <- round(xx$AGE, 1)
uage<-unique(xx$AGE)
# Normality Test of whole Data Set before log transformation
aa<-round(shapiro.test(xx$SYSAV)$p.value,3)
bb<-round(ad.test(xx$SYSAV)$p.value,3)
cc<-round(ks.test(xx$SYSAV, mean(xx$SYSAV),sd(xx$SYSAV))$p.value,3)
dd<-round(cvm.test(xx$SYSAV)$p.value,3)
ee<-round(pearson.test(xx$SYSAV)$p.value,3)
ff<-c(aa,bb,cc,dd,ee)
142
# Normality Test of whole Data Set after log transformation
a<-log(xx$SYSAV)
aa<-round(shapiro.test(a)$p.value,3)
bb<-round(ad.test(a)$p.value,3)
cc<-round(ks.test(a, mean(a),sd(a))$p.value,3)
dd<-round(cvm.test(a)$p.value,3)
ee<-round(pearson.test(a)$p.value,3)
# frequency checking
table(xx$AGE)
# Spliting Data set
sdata<-with(xx,split(xx,xx$AGE))
a<-length(sdata)
names(sdata)<-paste(‘int’,1:a,sep=“ ”)
nn<-names(sdata)
# for(i in 1:nn)
aa<-round(shapiro.test(xx$SYSAV)$p.value,3)
# Dimension Checking of the Data Set before median height
dimen<-matrix(NA,100,2)
for(i in 1:100){
d<-sdata[[i]]
one<-dim(d)
dimen[i,]<-one }
dimen
# Log Transformation of SBP data
fda <- vector(“list”, NROW(100))
for(i in 1:100){
143
d <- sdata[[i]] i-th data set
d$logsbp <- log(d$SYSAV)
fda[[i]] <- d}
# Normality Test after Log Transformation of the data
pvaluelogsbp<-matrix(NA,100,5)
for(i in 1:100){
d<-fda[[i]]
aa<-round(shapiro.test(d$logsbp)$p.value,3)
bb<-round(ad.test(d$logsbp)$p.value,3)
cc<-round(ks.test(d$logsbp, mean(d$logsbp),sd(d$logsbp))$p.value,3)
dd<-round(cvm.test(d$logsbp)$p.value,3)
ee<-round(pearson.test(d$logsbp)$p.value,3)
ff<-c(aa,bb,cc,dd,ee)
pvaluelogsbp[i,]<-ff}
colnames(pvaluelogsbp)<-c(‘SW’,‘AD’,‘KS’,‘CVM’,‘P.ChiSq’)
pvaluelogsbp
# lognormality Test of DBP4 Data Set
gofsum <- function(x){
result <- gofstat(x)
c(cramer = result$cvm, anderson = result$ad, kolmogorov = result$ks)}
newp<-matrix(NA,100,3)
for(i in 1:100){
d<-fda[[i]]
d1<-fitdist(d$DIA4AV,“lnorm”)
d2<-gofsum(d1)
newp[i,]<- d2}
144
colnames(newp) <- c(‘cramer’,‘anderson’,‘kolmogorov’)
newp
# Data Set with median height
fff<-vector(“list”,NROW(100))
for(i in 1:100){
d<-fda[[i]]
sd1<-d[d$HTAV>median(d$HTAV),]
fff[[i]]<-sd1
# Dimension Checking of the Data Set after median height
dimen1<-matrix(NA,100,2)
for(i in 1:100){
d<-fff[[i]]
one<-dim(d)
dimen1[i,]<-one}
dimen1
# Normality Test after Log Transformation after median height
pvaluelogsbp<-matrix(NA,100,5)
for(i in 1:100){
d<-fff[[i]]
aa<-round(shapiro.test(d$logsbp)$p.value,3)
bb<-round(ad.test(d$logsbp)$p.value,3)
cc<-round(ks.test(d$logsbp, mean(d$logsbp),sd(d$logsbp))$p.value,3)
dd<-round(cvm.test(d$logsbp)$p.value,3)
ee<-round(pearson.test(d$logsbp)$p.value,3)
ff<-c(aa,bb,cc,dd,ee)
pvaluelogsbp[i,]<-ff}
145
colnames(pvaluelogsbp)<-c(‘SW’,‘AD’,‘KS’,‘CVM’,‘P.ChiSq’)
pvaluelogsbp
# lognormality Test of DBP4 after median height
gofsum <- function(x){
result <- gofstat(x)
c(cramer = result$cvm, anderson = result$ad, kolmogorov = result$ks)
}
newp<-matrix(NA,100,3)
for(i in 1:100){
d<-fff[[i]]
d1<-fitdist(d$DIA4AV,“lnorm”)
d2<-gofsum(d1)
newp[i,]<- d2
}
colnames(newp) <- c(‘cramer’,‘anderson’,‘kolmogorov’)
newp
# Univariate Case
# qqplot of SBP withour qqline
#pdf(”qq.pdf”)
par(mfrow=c(3,3))
for(i in 1:100){
d <- fda[[i]]
qqnorm(d$logsbp, main=“qqplot after bct of sbp”)}
#dev.off()
#getwd()
# qqplot of SBP with qqline
146
#pdf(“qqline.pdf”)
par(mfrow=c(3,3))
for(i in 1:100){
d <- fda[[i]]
qqnorm(d$logsbp, main=“”, xlab = “”, ylab = “”, las = 1)
mtext(‘SBP’, 3, line = .3, cex = .8)
qqline(d$logsbp, col = 2)
mtext(paste(‘Data’, i), 3, line = 2, font = 2)
Sys.sleep(.01)}
#dev.off()
#getwd()
# percentile values of SBP
t<-seq(9.1,19,.1)
y<-102.01027+1.94397*(t-10)+.00598*((t-10)2)− .00789 ∗ ((t− 10)3)− .00059∗
((t-10)2)
s1<-log(y+1.28*10.4855)
s2<-log(y+1.645*10.4855)
s3<-log(y+2.326*10.4855)
# computation of cross validation score
mat<-matrix(0,150,100)
for(j in 1:100){
dataN<-fff[[j]]$logsbp
for(i in 1:length(dataN)){
one<-ifelse(dataN[[i]]¡s1[[j]],1,0)
mcv<-mean(dataN[-i])
scv<-length(dataN[-i]-1)*sd(dataN[-i])/length(dataN[-i])
147
p90cv<-pnorm(s1[j],mcv,scv)
diffsq<-((one-p90cv)2)/150
mat[i,j]<-diffsq
}
}
newmat<-c(mat)
sum(newmat)
setwd(“C:\\Users\\mchowdhury\\Desktop′′)
ndata1<-read.csv(“entirecv.csv”,header=TRUE)
ndata2<-read.csv(“cccv.csv”,header=TRUE)
ndata3<-read.csv(“aacv.csv”header=TRUE)
library(np)
require(np)
#by CV.AIC
#entire cohort
on1<-npregbw(ndata1$age,ndata1$s10,regtype=“ll”,bwmethod=“cv.aic”)
summary(on1)
on2<-npregbw(ndata1$age,ndata1$u10,regtype=“ll”,bwmethod=“cv.aic”)
summary(on2)
on3<-npregbw(ndata1$age,ndata1$s5,regtype=“ll”,bwmethod=“cv.aic”)
summary(on3)
on4<-npregbw(ndata1$age,ndata1$u5,regtype=“ll”,bwmethod=“cv.aic”)
summary(on4)
on5<-npregbw(ndata1$age,ndata1$s1,regtype=“ll”,bwmethod=“cv.aic”)
summary(on5)
on6<-npregbw(ndata1$age,ndata1$u1,regtype=“ll”,bwmethod=“cv.aic”)
148
summary(on6)
#caucasian
on1<-npregbw(ndata2$age,ndata2$p90,regtype=“ll”,bwmethod=“cv.aic”)
summary(on1)
on2<-npregbw(ndata2$age,ndata2$np90,regtype=“ll”,bwmethod=“cv.aic”)
summary(on2)
on3<-npregbw(ndata2$age,ndata2$p95,regtype=“ll”,bwmethod=“cv.aic”)
summary(on3)
on4<-npregbw(ndata2$age,ndata2$np95,regtype=“ll”,bwmethod=“cv.aic”)
summary(on4)
on5<-npregbw(ndata2$age,ndata2$p99,regtype=“ll”,bwmethod=“cv.aic”)
summary(on5)
on6<-npregbw(ndata2$age,ndata2$np99,regtype=“ll”,bwmethod=“cv.aic”)
summary(on6)
#african american
on1<-npregbw(ndata3$age,ndata3$p90,regtype=“ll”,bwmethod=“cv.aic”)
summary(on1)
on2<-npregbw(ndata3$age,ndata3$np90,regtype=“ll”,bwmethod=“cv.aic”)
summary(on2)
on3<-npregbw(ndata3$age,ndata3$p95,regtype=“ll”,bwmethod=“cv.aic”)
summary(on3)
on4<-npregbw(ndata3$age,ndata3$np95,regtype=“ll”,bwmethod=“cv.aic”)
summary(on4)
on5<-npregbw(ndata3$age,ndata3$p99,regtype=“ll”,bwmethod=“cv.aic”)
summary(on5)
on6<-npregbw(ndata3$age,ndata3$np99,regtype=“ll”,bwmethod=“cv.aic”)
149
summary(on6)
#by CV.ls
#entire cohort
on1<-npregbw(ndata1$age,ndata1$s10,regtype=“ll”,bwmethod=“cv.ls”)
summary(on1)
on2<-npregbw(ndata1$age,ndata1$u10,regtype=“ll”,bwmethod=“cv.ls”)
summary(on2)
on3<-npregbw(ndata1$age,ndata1$s5,regtype=“ll”,bwmethod=“cv.ls”)
summary(on3)
on4<-npregbw(ndata1$age,ndata1$u5,regtype=“ll”,bwmethod=“cv.ls”)
summary(on4)
on5<-npregbw(ndata1$age,ndata1$s1,regtype=“ll”,bwmethod=“cv.ls”)
summary(on5)
on6<-npregbw(ndata1$age,ndata1$u1,regtype=“ll”,bwmethod=“cv.ls”)
summary(on6)
#caucasian
on1<-npregbw(ndata2$age,ndata2$p90,regtype=“ll”,bwmethod=“cv.ls”)
summary(on1)
on2<-npregbw(ndata2$age,ndata2$np90,regtype=“ll”,bwmethod=“cv.ls”)
summary(on2)
on3<-npregbw(ndata2$age,ndata2$p95,regtype=“ll”,bwmethod=“cv.ls”)
summary(on3)
on4<-npregbw(ndata2$age,ndata2$np95,regtype=“ll”,bwmethod=“cv.ls”)
summary(on4)
on5<-npregbw(ndata2$age,ndata2$p99,regtype=“ll”,bwmethod=“cv.ls”)
summary(on5)
150
on6<-npregbw(ndata2$age,ndata2$np99,regtype=“ll”,bwmethod=“cv.ls”)
summary(on6)
#african american
on1<-npregbw(ndata3$age,ndata3$p90,regtype=“ll”,bwmethod=“cv.ls”)
summary(on1)
on2<-npregbw(ndata3$age,ndata3$np90,regtype=“ll”,bwmethod=“cv.ls”)
summary(on2)
on3<-npregbw(ndata3$age,ndata3$p95,regtype=“ll”,bwmethod=“cv.ls”)
summary(on3)
on4<-npregbw(ndata3$age,ndata3$np95,regtype=“ll”,bwmethod=“cv.ls”)
summary(on4)
on5<-npregbw(ndata3$age,ndata3$p99,regtype=“ll”,bwmethod=“cv.ls”)
summary(on5)
on6<-npregbw(ndata3$age,ndata3$np99,regtype=“ll”,bwmethod=“cv.ls”)
summary(on6)
# computation of raw probabilities for all girls
all<-matrix(NA,100,6)
for(i in 1:100){
d<-fff[[i]]
m<-mean(d$logsbp)
s<-(length(d)-1)*sd(d$logsbp)/length(d)
p90<-1-pnorm(s1[i],m,s)
p95<-1-pnorm(s2[i],m,s)
p99<-1-pnorm(s3[i],m,s)
np90<-mean(ifelse(d$logsbp≥ s1[i],1,0))
np95<-mean(ifelse(d$logsbp≥ s2[i],1,0))
151
np99<-mean(ifelse(d$logsbp≥ s3[i],1,0))
prob<-c(p90,np90,p95,np95,p99,np99)
all[i,]<-prob}
colnames(all)<-c(‘p1’,‘np1’,‘p2’,‘np2’,‘p3’,‘np3’)
all
b1<-data.frame(all)
aveall<-c(mean(b1$p1),mean(b1$np1),mean(b1$p2),
mean(b1$np2),mean(b1$p3),mean(b1$np3))
all1<-sum(ifelse(all[,1]>all[,2],1,0))
all2<-sum(ifelse(all[,3]>all[,4],1,0))
all3<-sum(ifelse(all[,5]>all[,6],1,0))
allsum<-c(all1,all2,all3)
# difference between structural nonparametric and nonparametric
a<-data.frame(all)
bias<-data.frame(b1 = (a$p1-a$np1), b2 = (a$p2-a$np2),b3 = (a$p3-a$np3))
new1<-c(mean(bias$b1),mean(bias$b2),mean(bias$b3))
new1
biasall<-data.frame(bp1=mean(a$p1-u),bnp1=mean(a$np1-u),
bp2=mean(a$p2-v), bnp2=mean(a$np2-v),bp3=mean(a$p3-w),
bnp3=mean(a$np3-w))
biasall
# Bootstrap Confidence Band for all girls
#pdf(“OPSBP.pdf”)
age<-uage
a1<-data.frame(age,all)
par(mfrow=c(2,3))
152
#one
m1<-with(a1, ksboot(age, p1, nreps=5000))
one1<-expression(atop(Age vs P(Y>y[.90](t)),“at Median height”))
one2<-expression(paste(P(Y>y[.90](t))))
with(a1,plot(age,p1,las=1,main=one1,ylim=c(0,.20),
ylab=one2,xlab=“Ages of all girls”))
with(m1,matpoints(x,m1[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#two
m2<-with(a1, ksboot(age, p2, nreps=5000))
two1<-expression(atop(Age vs P(Y>y[.95](t)),“at Median height”))
two2<-expression(paste(P(Y>y[.95](t))))
with(a1,plot(age,p2,las=1,main=two1,ylim=c(0,.10),ylab=two2,
xlab=“Ages of all girls”))
with(m2,matpoints(x,m2[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#three
m3<-with(a1, ksboot(age, p3, nreps=5000))
three1<-expression(atop(Age vs P(Y>y[.99](t)),“at Median height”))
three2<-expression(paste(P(Y>y[.99](t))))
with(a1,plot(age,p3,las=1,main=three1,ylim=c(0,.02),ylab=three2,
xlab=“Ages of all girls”))
with(m3,matpoints(x,m3[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#four
m4<-with(a1, ksboot(age, np1, nreps=5000))
four1<-expression(atop(Age vs P(Y>y[.90](t)),“at Median height”))
four2<-expression(paste(P(Y>y[.90](t))))
with(a1,plot(age,np1,las=1,main=four1,ylim=c(0,.20),ylab=four2,
153
xlab=“Ages of all girls”))
with(m4,matpoints(x,m4[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#five
m5<-with(a1, ksboot(age, np2, nreps=5000))
five1<-expression(atop(Age vs P(Y>y[.95](t)),“at Median height”))
five2<-expression(paste(P(Y>y[.95](t))))
with(a1,plot(age,np2,las=1,main=five1,ylim=c(0,.10),ylab=five2,
xlab=“Ages of all girls”))
with(m5,matpoints(x,m5[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#six
m6<-with(a1, ksboot(age, np3, nreps=5000))
six1<-expression(atop(Age vs P(Y>y[.99](t)),“at Median height”))
six2<-expression(paste(P(Y>y[.99](t))))
with(a1,plot(age,np3,las=1,main=six1,ylim=c(0,.02),ylab=six2,
xlab=“Ages of all girls”))
with(m6,matpoints(x,m6[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#dev.off()
#getwd()
# computation of raw probabilities for Caucasian girls
cau<-matrix(NA,100,6)
for(i in 1:100){
d<-fff[[i]][which(fff[[i]]$RACE==1),]
m<-mean(d$logsbp)
s<-(length(d)-1)*sd(d$logsbp)/length(d)
p90<-1-pnorm(s1[i],m,s)
p95<-1-pnorm(s2[i],m,s)
154
p99<-1-pnorm(s3[i],m,s)
np90<-mean(ifelse(d$logsbp≥s1[i],1,0))
np95<-mean(ifelse(d$logsbp≥s2[i],1,0))
np99<-mean(ifelse(d$logsbp≥s3[i],1,0))
prob<-c(p90,np90,p95,np95,p99,np99)
cau[i,]<-prob}
colnames(cau)<-c(‘p1’,‘np1’,‘p2’,‘np2’,‘p3’,‘np3’)
cau
cau1<-sum(ifelse(cau[,1]>cau[,2],1,0))
cau2<-sum(ifelse(cau[,3]>cau[,4],1,0))
cau3<-sum(ifelse(cau[,5]>cau[,6],1,0))
causum<-c(cau1,cau2,cau3)
causum
# difference between structural nonparametric and unstructured nonparametric
for Caucasian Girl
a<-data.frame(cau)
bias<-data.frame(b1 = (a$p1-a$np1), b2 = (a$p2-a$np2),b3 = (a$p3-a$np3))
new2<-c(mean(bias$b1),mean(bias$b2),mean(bias$b3))
new2
biasca<-data.frame(bp1=mean(a$p1-u),bnp1=mean(a$np1-u),bp2=mean(a$p2-v),
bnp2=mean(a$np2-v),bp3=mean(a$p3-w),bnp3=mean(a$np3-w))
# Bootstrap Confidence Band for Caucasian girls
#pdf(“OPSBP.pdf”)
age<-uage
a1<-data.frame(age,cau)
par(mfrow=c(2,3))
155
#one
m1<-with(a1, ksboot(age, p1, nreps=5000))
one1<-expression(atop(Age vs P(Y>y[.90](t)),“at Median height”))
one2<-expression(paste(P(Y>y[.90](t))))
with(a1,plot(age,p1,las=1,main=one1,ylim=c(0,.20),ylab=one2,
xlab=“Ages of CC girls”))
with(m1,matpoints(x,m1[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#two
m2<-with(a1, ksboot(age, p2, nreps=5000))
two1<-expression(atop(Age vs P(Y>y[.95](t)),“at Median height”))
two2<-expression(paste(P(Y>y[.95](t))))
with(a1,plot(age,p2,las=1,main=two1,ylim=c(0,.10),ylab=two2,
xlab=“Ages of CC girls”))
with(m2,matpoints(x,m2[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#three
m3<-with(a1, ksboot(age, p3, nreps=5000))
three1<-expression(atop(Age vs P(Y>y[.99](t)),“at Median height”))
three2<-expression(paste(P(Y>y[.99](t))))
with(a1,plot(age,p3,las=1,main=three1,ylim=c(0,.02),ylab=three2,
xlab=“Ages of CC girls”))
with(m3,matpoints(x,m3[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#four
m4<-with(a1, ksboot(age, np1, nreps=5000))
four1<-expression(atop(Age vs P(Y>y[.90](t)),“at Median height”))
four2<-expression(paste(P(Y>y[.90](t))))
with(a1,plot(age,np1,las=1,main=four1,ylim=c(0,.20),ylab=four2,
156
xlab=“Ages of CC girls”))
with(m4,matpoints(x,m4[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#five
m5<-with(a1, ksboot(age, np2, nreps=5000))
five1<-expression(atop(Age vs P(Y>y[.95](t)),“at Median height”))
five2<-expression(paste(P(Y>y[.95](t))))
with(a1,plot(age,np2,las=1,main=five1,ylim=c(0,.10),ylab=five2,
xlab=“Ages of CC girls”))
with(m5,matpoints(x,m5[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#six
m6<-with(a1, ksboot(age, np3, nreps=5000))
six1<-expression(atop(Age vs P(Y>y[.99](t)),“at Median height”))
six2<-expression(paste(P(Y>y[.99](t))))
with(a1,plot(age,np3,las=1,main=six1,ylim=c(0,.02),ylab=six2,
xlab=“Ages of CC girls”))
with(m6,matpoints(x,m6[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#dev.off()
#getwd()
# computation of raw probabilities for African American girls
aa<-matrix(NA,100,6)
for(i in 1:100){
d<-fff[[i]][which(fff[[i]]$RACE==2),]
m<-mean(d$logsbp)
s<-(length(d)-1)*sd(d$logsbp)/length(d)
p90<-1-pnorm(s1[i],m,s)
p95<-1-pnorm(s2[i],m,s)
157
p99<-1-pnorm(s3[i],m,s)
np90<-mean(ifelse(d$logsbp≥s1[i],1,0))
np95<-mean(ifelse(d$logsbp≥s2[i],1,0))
np99<-mean(ifelse(d$logsbp≥s3[i],1,0))
prob<-c(p90,np90,p95,np95,p99,np99)
aa[i,]<-prob }
colnames(aa)<-c(‘p1’,‘np1’,‘p2’,‘np2’,‘p3’,‘np3’)
aa
aa1<-sum(ifelse(aa[,1]>aa[,2],1,0))
aa2<-sum(ifelse(aa[,3]>aa[,4],1,0))
aa3<-sum(ifelse(aa[,5]>aa[,6],1,0))
aasum<-c(aa1,aa2,aa3)
aasum
a<-data.frame(aa)
biasaa<-data.frame(bp1=mean(a$p1-u),bnp1=mean(a$np1-u),bp2=mean(a$p2-v),
bnp2=mean(a$np2-v),bp3=mean(a$p3-w),bnp3=mean(a$np3-w))
naa1<-sum(ifelse(aa[,2]==0,1,0))
naa2<-sum(ifelse(aa[,4]==0,1,0))
naa3<-sum(ifelse(aa[,6]==0,1,0))
naasum<-c(naa1,naa2,naa3)
ncau1<-sum(ifelse(cau[,2]==0,1,0))
ncau2<-sum(ifelse(cau[,4]==0,1,0))
ncau3<-sum(ifelse(cau[,6]==0,1,0))
ncausum<-c(ncau1,ncau2,ncau3)
nall1<-sum(ifelse(all[,2]==0,1,0))
nall2<-sum(ifelse(all[,4]==0,1,0))
158
nall3<-sum(ifelse(all[,6]==0,1,0))
nallsum<-c(nall1,nall2,nall3)
# difference between structural nonparametric and nonparametric for Caucasian
Girl a<-data.frame(aa)
bias<-data.frame(b1 = (a$p1-a$np1), b2 = (a$p2-a$np2),b3 = (a$p3-a$np3))
new3<-c(mean(bias$b1),mean(bias$b2),mean(bias$b3))
new3
finalnew<-rbind(new1,new2,new3)
colnames(finalnew)<-c(’90th’,’95th’,’99th’)
rownames(finalnew)<-c(’all’,’cau’,’aa’)
finalnew
finalbiases<-rbind(biasall,biasca,biasaa)
colnames(finalbiases)<-c(’p90th’,’np90th’,’p95th’,’np95th’,’p99th’,’np99th’)
rownames(finalbiases)<-c(’all’,’cau’,’aa’)
finalbiases
finalsum<-rbind(allsum,causum,aasum)
colnames(finalsum)<-c(’90th’,’95th’,’99th’)
rownames(finalsum)<-c(’all’,’cau’,’aa’)
finalsum
nfinalsum<-rbind(nallsum,ncausum,naasum)
colnames(nfinalsum)<-c(’90th’,’95th’,’99th’)
rownames(nfinalsum)<-c(’all’,’cau’,’aa’)
nfinalsum
# Bootstrap Confidence Band for African American girls
#pdf(”OPSBP.pdf”)
age<-uage
159
a1<-data.frame(age,aa)
par(mfrow=c(2,3))
#one
m1<-with(a1, ksboot(age, p1, nreps=5000))
one1<-expression(atop(Age vs P(Y>y[.90](t)),”at Median height”))
one2<-expression(paste(P(Y>y[.90](t))))
with(a1,plot(age,p1,las=1,main=one1,ylim=c(0,.20),ylab=one2,xlab=”Ages of AA
girls”))
with(m1,matpoints(x,m1[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#two
m2<-with(a1, ksboot(age, p2, nreps=5000))
two1<-expression(atop(Age vs P(Y>y[.95](t)),”at Median height”))
two2<-expression(paste(P(Y>y[.95](t))))
with(a1,plot(age,p2,las=1,main=two1,ylim=c(0,.10),ylab=two2,xlab=”Ages of AA
girls”))
with(m2,matpoints(x,m2[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#three
m3<-with(a1, ksboot(age, p3, nreps=5000))
three1<-expression(atop(Age vs P(Y>y[.99](t)),”at Median height”))
three2<-expression(paste(P(Y>y[.99](t))))
with(a1,plot(age,p3,las=1,main=three1,ylim=c(0,.02),ylab=three2,xlab=”Ages of
AA girls”))
with(m3,matpoints(x,m3[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#four
m4<-with(a1, ksboot(age, np1, nreps=5000))
four1<-expression(atop(Age vs P(Y>y[.90](t)),”at Median height”))
160
four2<-expression(paste(P(Y>y[.90](t))))
with(a1,plot(age,np1,las=1,main=four1,ylim=c(0,.20),ylab=four2,xlab=”Ages of
AA girls”))
with(m4,matpoints(x,m4[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#five
m5<-with(a1, ksboot(age, np2, nreps=5000))
five1<-expression(atop(Age vs P(Y>y[.95](t)),”at Median height”))
five2<-expression(paste(P(Y>y[.95](t))))
with(a1,plot(age,np2,las=1,main=five1,ylim=c(0,.10),ylab=five2,xlab=”Ages of AA
girls”))
with(m5,matpoints(x,m5[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#six
m6<-with(a1, ksboot(age, np3, nreps=5000))
six1<-expression(atop(Age vs P(Y>y[.99](t)),”at Median height”))
six2<-expression(paste(P(Y>y[.99](t))))
with(a1,plot(age,np3,las=1,main=six1,ylim=c(0,.02),ylab=six2,xlab=”Ages of AA
girls”))
with(m6,matpoints(x,m6[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))
#dev.off()
#getwd()
# simulating one data
simdata <- function(n){
# i is the subject (columns)
# j is the time (rows)
# Yij <- 21.5 + 0.7*(tij - 5) - 0.05*(tij - 5)2 + a0i+ e1ij
# a0i ∼ N(0, 2.52), e1ij N(0, 0.52)
161
#n <- length(ui)
#a0i <- rnorm(1, 0, 2.5)
m <-10
j <- seq(1, 10, by = 1)
u <-replicate(n, sapply(j, function(j) runif(1, j - 1, j)))
ui <- round(c(u))
a0i <- rep(rnorm(n, 0, 2.5),each=10)
e1ij <- rnorm(n*m, 0, 0.5)
ccaa <- rbinom(n,1,.51)
y <- 21.5 + 0.7*(ui - 5) - 0.05*(ui - 5)2 + a0i+ e1ij
d1 <- data.frame(ID = 1:length(y), age = ui, y1 = y, race = ccaa)
with(d1, split(d1, age))
# Quantiles from Regression Model
ti<-seq(0,10,by=1)
z<-21.5 + 0.7*(ti - 5) - 0.05*(ti - 5)2
y90<-qnorm(.90,z,rep(sqrt(6.5),11))
y95<-qnorm(.95,z,rep(sqrt(6.5),11))
y99<-qnorm(.99,z,rep(sqrt(6.5),11))
# this function calculates the probabilties for a given n
# computation of overall probability
foo <- function(n){
X <- simdata(n)
k <- length(X)
res <- sapply(1:k, function(i){
d <- X[[i]]
y1 <- d[, ”y1”]
162
m <- mean(y1)
s <- sd(y1)
p1 <- 1 - pnorm(y90[i], m, s)
p2 <- 1 - pnorm(y95[i], m, s)
p3 <- 1 - pnorm(y99[i], m, s)
np1<- mean(ifelse(y1>y90[i],1,0))
np2<- mean(ifelse(y1>y95[i],1,0))
np3<- mean(ifelse(y1>y99[i],1,0))
c(p1, np1, p2, np2, p3, np3))
rownames(res) <- c(’p1’, ’np1’, ’p2’, ’np2’, ’p3’, ’np3’)
list(res)
# example
foo(1000)
n<-1000
# similuating B probabilities
B <- 100
a<-replicate(B, foo(n))
a1 <- sapply(a, function(x) x[1,])
a2 <- sapply(a, function(x) x[2,])
a3 <- sapply(a, function(x) x[3,])
a4 <- sapply(a, function(x) x[4,])
a5 <- sapply(a, function(x) x[5,])
a6 <- sapply(a, function(x) x[6,])
# estimated value
m<-rowMeans
u<-.10
163
v<-.05
w<-.01
ALLr<-cbind(ALLp1est=m(a1),ALLnp1est=m(a2),ALLp2est=m(a3),
ALLnp2est=m(a4),ALLp3est=m(a5),ALLnp3est=m(a6))
# average bias(AB-Average Bias)
ALLr1<-cbind(p1AB=m(a1-u),n1AB=m(a2-u),p2AB=m(a3-v),
n2AB=m(a4-v),p3AB=m(a5-w),n3AB=m(a6-w))
colnames(ALLr1)<-c(’ALLp1AveBias’,’ALLnp1AveBias’,’ALLp2AveBias’,
’ALLnp2AveBias’,’ALLp3AveBias’,’ALLnp3AveBias’)
# MSE
ALLr2<-cbind(one=m((a1-u)2), two = m((a2− u)2), three = m((a3− v)2),
four=m((a4-v)2), five = m((a5− w)2), six = m((a6− w)2))
colnames(ALLr2)<-c(’ALLp1AveMSE’,’ALLnp1AveMSE’,’ALLp2AveMSE’,
’ALLnp2AveMSE’,’ALLp3AveMSE’,’ALLnp3AveMSE’)
# coverage probabilities
CI1 <- t(apply(a1, 1, quantile, probs = c(0.025, 0.975)))
CI2 <- t(apply(a2, 1, quantile, probs = c(0.025, 0.975)))
CI3 <- t(apply(a3, 1, quantile, probs = c(0.025, 0.975)))
CI4 <- t(apply(a4, 1, quantile, probs = c(0.025, 0.975)))
CI5 <- t(apply(a5, 1, quantile, probs = c(0.025, 0.975)))
CI6 <- t(apply(a6, 1, quantile, probs = c(0.025, 0.975)))
Cov1 <- sapply(1:NROW(CI1), function(i) mean(a1[i, ] ≥ CI1[i, 1] &
a1[i, ] ≤ CI1[i, 2]))
Cov2 <- sapply(1:NROW(CI2), function(i) mean(a2[i, ] ≥ CI2[i, 1] &
a2[i, ] ≤ CI2[i, 2]))
Cov3 <- sapply(1:NROW(CI3), function(i) mean(a3[i, ] ≥ CI3[i, 1] &
164
a3[i, ] ≤ CI3[i, 2]))
Cov4 <- sapply(1:NROW(CI4), function(i) mean(a4[i, ] ≥ CI4[i, 1] &
a4[i, ] ≤ CI4[i, 2]))
Cov5 <- sapply(1:NROW(CI5), function(i) mean(a5[i, ] ≥ CI5[i, 1] &
a5[i, ] ≤ CI5[i, 2]))
Cov6 <- sapply(1:NROW(CI6), function(i) mean(a6[i, ] ≥ CI6[i, 1] &
a6[i, ] ≤ CI6[i, 2]))
coverage<-cbind(Cov1,Cov2,Cov3,Cov4,Cov5,Cov6)
# plot
par(mfrow=c(2,3))
m1<-c(0.06,.14)
m2<-c(0.03,.07)
m3<-c(0,.02)
o1<-seq(0,10, by=1)
ALLp1Ave <- m(a1)
N1 <- t(apply(a1, 1, quantile, probs = c(0.025, 0.975)))
n1<-cbind(ALLp1Ave, N1)
one1<-expression(atop(Age vs P(Y> y[.90](t))))
one2<-expression(paste(P(Y> y[.90](t))))
matplot(o1, n1, type = ’l’, lty = c(1, 2, 2), col = 1, main=one1,
xlab = “Ages of All girls”, ylab = one2, ylim=m1)
ALLp2Ave <- m(a3)
N2 <- t(apply(a3, 1, quantile, probs = c(0.025, 0.975)))
n2<-cbind(ALLp2Ave, N2)
two1<-expression(atop(Age vs P(Y> y[.95](t))))
two2<-expression(paste(P(Y> y[.95](t))))
165
matplot(o1, n2, type = ’l’, lty = c(1, 2, 2), col = 1, main=two1,
xlab = “Ages of All girls”, ylab = two2, ylim=m2)
ALLp3Ave <- m(a5)
N3 <- t(apply(a5, 1, quantile, probs = c(0.025, 0.975)))
n3<-cbind(ALLp3Ave, N3)
three1<-expression(atop(Age vs P(Y> y[.99](t))))
three2<-expression(paste(P(Y> y[.99](t))))
matplot(o1, n3, type = ‘l’, lty = c(1, 2, 2), col = 1, main=three1,
xlab = “Ages of All girls”, ylab = three2, ylim=m3)
ALLnp4Ave <- m(a2)
N4 <- t(apply(a2, 1, quantile, probs = c(0.025, 0.975)))
n4<-cbind(ALLnp4Ave, N4)
four1<-expression(atop(Age vs P(Y> y[.90](t))))
four2<-expression(paste(P(Y> y[.90](t))))
matplot(o1, n4, type = ‘l’, lty = c(1, 2, 2), col = 1, main=four1,
xlab = “Ages of All girls”, ylab = four2, ylim=m1)
ALLnp5Ave <- m(a4)
N5 <- t(apply(a4, 1, quantile, probs = c(0.025, 0.975)))
n5<-cbind(ALLnp5Ave, N5)
five1<-expression(atop(Age vs P(Y> y[.95](t))))
five2<-expression(paste(P(Y> y[.95](t))))
matplot(o1, n5, type = ’l’, lty = c(1, 2, 2), col = 1, main=five1,
xlab = “Ages of All girls“, ylab = five2, ylim=m2)
ALLnp6Ave <- m(a6)
N6 <- t(apply(a6, 1, quantile, probs = c(0.025, 0.975)))
n6<-cbind(ALLnp6Ave, N6)
166
six1<-expression(atop(Age vs P(Y> y[.99](t))))
six2<-expression(paste(P(Y> y[.99](t))))
matplot(o1, n6, type = ‘l’, lty = c(1, 2, 2), col = 1, main=six1,
xlab = “Ages of All girls”, ylab = six2, ylim=m3)
# computing probabilities for caucasian girls
# for a given n
foo <- function(n)
X <- simdata(n)
k <- length(X)
res <- sapply(1:k, function(i)
d <- X[[i]][which(X[[i]]$race==1),]
y1 <- d[, “y1”]
m <- mean(y1)
s <- sd(y1)
p1 <- 1 - pnorm(y90[i], m, s)
p2 <- 1 - pnorm(y95[i], m, s)
p3 <- 1 - pnorm(y99[i], m, s)
np1<- mean(ifelse(y1> y90[i],1,0))
np2<- mean(ifelse(y1> y95[i],1,0))
np3<- mean(ifelse(y1> y99[i],1,0))
c(p1, np1, p2, np2, p3, np3))
rownames(res) <- c(‘p1’, ‘np1’, ‘p2’, ‘np2’, ‘p3’, ‘np3’)
list(res)
# example
foo(1000)
# similuating B probabilities
167
B <- 1000
a<-replicate(B, foo(n))
a1 <- sapply(a, function(x) x[1,])
a2 <- sapply(a, function(x) x[2,])
a3 <- sapply(a, function(x) x[3,])
a4 <- sapply(a, function(x) x[4,])
a5 <- sapply(a, function(x) x[5,])
a6 <- sapply(a, function(x) x[6,])
# estimated value
m<-rowMeans
u<-.10
v<-.05
w<-.01
ALLr<-cbind(ALLp1est=m(a1),ALLnp1est=m(a2),ALLp2est=m(a3),
ALLnp2est=m(a4),ALLp3est=m(a5),ALLnp3est=m(a6))
# average bias(AB-Average Bias)
ALLr1<-cbind(p1AB=m(a1-u),n1AB=m(a2-u),p2AB=m(a3-v),
n2AB=m(a4-v),p3AB=m(a5-w),n3AB=m(a6-w))
colnames(ALLr1)<-c(‘ALLp1AveBias’,‘ALLnp1AveBias’,‘ALLp2AveBias’,‘ALLnp2AveBias’,‘ALLp3AveBias’,‘ALLnp3AveBias’)
# MSE
ALLr2<-cbind(one=m((a1-u)2),two=m((a2-u)2),three=m((a3-v)2),
four=m((a4-v)2),five=m((a5-w)2),six=m((a6-w)2))
colnames(ALLr2)<-c(‘ALLp1AveMSE’,‘ALLnp1AveMSE’,‘ALLp2AveMSE’,
‘ALLnp2AveMSE’,‘ALLp3AveMSE’,‘ALLnp3AveMSE’)
# coverage probabilities
CI1 <- t(apply(a1, 1, quantile, probs = c(0.025, 0.975)))
168
CI2 <- t(apply(a2, 1, quantile, probs = c(0.025, 0.975)))
CI3 <- t(apply(a3, 1, quantile, probs = c(0.025, 0.975)))
CI4 <- t(apply(a4, 1, quantile, probs = c(0.025, 0.975)))
CI5 <- t(apply(a5, 1, quantile, probs = c(0.025, 0.975)))
CI6 <- t(apply(a6, 1, quantile, probs = c(0.025, 0.975)))
Cov1 <- sapply(1:NROW(CI1), function(i) mean(a1[i, ] > CI1[i, 1] &
a1[i, ] ≤ CI1[i, 2]))
Cov2 <- sapply(1:NROW(CI2), function(i) mean(a2[i, ] > CI2[i, 1] &
a2[i, ] ≤ CI2[i, 2]))
Cov3 <- sapply(1:NROW(CI3), function(i) mean(a3[i, ] > CI3[i, 1] &
a3[i, ] ≤ CI3[i, 2]))
Cov4 <- sapply(1:NROW(CI4), function(i) mean(a4[i, ] > CI4[i, 1] &
a4[i, ] ≤ CI4[i, 2]))
Cov5 <- sapply(1:NROW(CI5), function(i) mean(a5[i, ] > CI5[i, 1] &
a5[i, ] ≤ CI5[i, 2]))
Cov6 <- sapply(1:NROW(CI6), function(i) mean(a6[i, ] > CI6[i, 1] &
a6[i, ] ≤ CI6[i, 2]))
coverage<-cbind(Cov1,Cov2,Cov3,Cov4,Cov5,Cov6)
plot
par(mfrow=c(2,3))
m1<-c(0.06,.14)
m2<-c(0.025,.08)
m3<-c(0,.025)
o1<-seq(0,10, by=1)
ALLp1Ave <- m(a1)
N1 <- t(apply(a1, 1, quantile, probs = c(0.025, 0.975)))
169
n1<-cbind(ALLp1Ave, N1)
one1<-expression(atop(Age vs P(Y> y[.90](t))))
one2<-expression(paste(P(Y> y[.90](t))))
matplot(o1, n1, type = ’l’, lty = c(1, 2, 2), col = 1, main=one1,
xlab = ”Ages of CC girls”, ylab = one2, ylim=m1)
ALLp2Ave <- m(a3)
N2 <- t(apply(a3, 1, quantile, probs = c(0.025, 0.975)))
n2<-cbind(ALLp2Ave, N2)
two1<-expression(atop(Age vs P(Y> y[.95](t))))
two2<-expression(paste(P(Y> y[.95](t))))
matplot(o1, n2, type = ’l’, lty = c(1, 2, 2), col = 1, main=two1,
xlab = ”Ages of CC girls”, ylab = two2, ylim=m2)
ALLp3Ave <- m(a5)
N3 <- t(apply(a5, 1, quantile, probs = c(0.025, 0.975)))
n3<-cbind(ALLp3Ave, N3)
three1<-expression(atop(Age vs P(Y> y[.99](t))))
three2<-expression(paste(P(Y> y[.99](t))))
matplot(o1, n3, type = ’l’, lty = c(1, 2, 2), col = 1, main=three1,
xlab = “Ages of CC girls”, ylab = three2, ylim=m3)
ALLnp4Ave <- m(a2)
N4 <- t(apply(a2, 1, quantile, probs = c(0.025, 0.975)))
n4<-cbind(ALLnp4Ave, N4)
four1<-expression(atop(Age vs P(Y> y[.90](t))))
four2<-expression(paste(P(Y> y[.90](t))))
matplot(o1, n4, type = ’l’, lty = c(1, 2, 2), col = 1, main=four1,
xlab = “Ages of CC girls”, ylab = four2, ylim=m1)
170
ALLnp5Ave <- m(a4)
N5 <- t(apply(a4, 1, quantile, probs = c(0.025, 0.975)))
n5<-cbind(ALLnp5Ave, N5)
five1<-expression(atop(Age vs P(Y> y[.95](t))))
five2<-expression(paste(P(Y> y[.95](t))))
matplot(o1, n5, type = ’l’, lty = c(1, 2, 2), col = 1, main=five1,
xlab = “Ages of CC girls”, ylab = five2, ylim=m2)
ALLnp6Ave <- m(a6)
N6 <- t(apply(a6, 1, quantile, probs = c(0.025, 0.975)))
n6<-cbind(ALLnp6Ave, N6)
six1<-expression(atop(Age vs P(Y> y[.99](t))))
six2<-expression(paste(P(Y> y[.99](t))))
matplot(o1, n6, type = ‘l’, lty = c(1, 2, 2), col = 1, main=six1,
xlab = “Ages of CC girls”, ylab = six2, ylim=m3)
# computation of african american probability
# for a given n
foo <- function(n){
X <- simdata(n)
k <- length(X)
res <- sapply(1:k, function(i){
d <- X[[i]][which(X[[i]]$race==0),]
y1 <- d[,“y1”]
m <- mean(y1)
s <- sd(y1)
p1 <- 1 - pnorm(y90[i], m, s)
p2 <- 1 - pnorm(y95[i], m, s)
171
p3 <- 1 - pnorm(y99[i], m, s)
np1<- mean(ifelse(y1> y90[i],1,0))
np2<- mean(ifelse(y1> y95[i],1,0))
np3<- mean(ifelse(y1> y99[i],1,0))
c(p1, np1, p2, np2, p3, np3))
rownames(res) <- c(‘p1’, ‘np1’, ‘p2’, ‘np2’, ‘p3’, ‘np3’)
list(res)
# example
foo(1000)
# similuating B probabilities
B <- 1000
a<-replicate(B, foo(n))
a1 <- sapply(a, function(x) x[1,])
a2 <- sapply(a, function(x) x[2,])
a3 <- sapply(a, function(x) x[3,])
a4 <- sapply(a, function(x) x[4,])
a5 <- sapply(a, function(x) x[5,])
a6 <- sapply(a, function(x) x[6,])
# estimated value
m<-rowMeans
u<-.10
v<-.05
w<-.01
ALLr<-cbind(ALLp1est=m(a1),ALLnp1est=m(a2),ALLp2est=m(a3),
ALLnp2est=m(a4),ALLp3est=m(a5),ALLnp3est=m(a6))
# average bias(AB-Average Bias)
172
ALLr1<-cbind(p1AB=m(a1-u),n1AB=m(a2-u),p2AB=m(a3-v),
n2AB=m(a4-v),p3AB=m(a5-w),n3AB=m(a6-w))
colnames(ALLr1)<-c(‘ALLp1AveBias’,‘ALLnp1AveBias’,‘ALLp2AveBias’,
‘ALLnp2AveBias’,‘ALLp3AveBias’,‘ALLnp3AveBias’)
# MSE
ALLr2<-cbind(one=m((a1-u)2),two=m((a2-u)2),three=m((a3-v)2),
four=m((a4-v)2),five=m((a5-w)2),six=m((a6-w)2))
colnames(ALLr2)<-c(‘ALLp1AveMSE’,‘ALLnp1AveMSE’,‘ALLp2AveMSE’,
‘ALLnp2AveMSE’,‘ALLp3AveMSE’,‘ALLnp3AveMSE’)
# coverage probabilities
CI1 <- t(apply(a1, 1, quantile, probs = c(0.025, 0.975)))
CI2 <- t(apply(a2, 1, quantile, probs = c(0.025, 0.975)))
CI3 <- t(apply(a3, 1, quantile, probs = c(0.025, 0.975)))
CI4 <- t(apply(a4, 1, quantile, probs = c(0.025, 0.975)))
CI5 <- t(apply(a5, 1, quantile, probs = c(0.025, 0.975)))
CI6 <- t(apply(a6, 1, quantile, probs = c(0.025, 0.975)))
Cov1 <- sapply(1:NROW(CI1), function(i) mean(a1[i, ] ≥ CI1[i, 1] &
a1[i, ] ≤ CI1[i, 2]))
Cov2 <- sapply(1:NROW(CI2), function(i) mean(a2[i, ] ≥ CI2[i, 1] &
a2[i, ] ≤ CI2[i, 2]))
Cov3 <- sapply(1:NROW(CI3), function(i) mean(a3[i, ] ≥ CI3[i, 1] &
a3[i, ] ≤ CI3[i, 2]))
Cov4 <- sapply(1:NROW(CI4), function(i) mean(a4[i, ] ≥ CI4[i, 1] &
a4[i, ] ≤ CI4[i, 2]))
Cov5 <- sapply(1:NROW(CI5), function(i) mean(a5[i, ] ≥ CI5[i, 1] &
a5[i, ] ≤ CI5[i, 2]))
173
Cov6 <- sapply(1:NROW(CI6), function(i) mean(a6[i, ] ≥ CI6[i, 1] &
a6[i, ] ≤ CI6[i, 2]))
coverage<-cbind(Cov1,Cov2,Cov3,Cov4,Cov5,Cov6)
# plot
par(mfrow=c(2,3))
m1<-c(0.06,.14)
m2<-c(0.025,.08)
m3<-c(0,.025)
o1<-seq(0,10, by=1)
ALLp1Ave <- m(a1)
N1 <- t(apply(a1, 1, quantile, probs = c(0.025, 0.975)))
n1<-cbind(ALLp1Ave, N1)
one1<-expression(atop(Age vs P(Y> y[.90](t))))
one2<-expression(paste(P(Y> y[.90](t))))
matplot(o1, n1, type = ‘l’, lty = c(1, 2, 2), col = 1, main=one1,
xlab = “Ages of AA girls”, ylab = one2, ylim=m1)
ALLp2Ave <- m(a3)
N2 <- t(apply(a3, 1, quantile, probs = c(0.025, 0.975)))
n2<-cbind(ALLp2Ave, N2)
two1<-expression(atop(Age vs P(Y> y[.95](t))))
two2<-expression(paste(P(Y> y[.95](t))))
matplot(o1, n2, type = ‘l’, lty = c(1, 2, 2), col = 1, main=two1,
xlab = “Ages of AA girls”, ylab = two2, ylim=m2)
ALLp3Ave <- m(a5)
N3 <- t(apply(a5, 1, quantile, probs = c(0.025, 0.975)))
n3<-cbind(ALLp3Ave, N3)
174
three1<-expression(atop(Age vs P(Y> y[.99](t))))
three2<-expression(paste(P(Y> y[.99](t))))
matplot(o1, n3, type = ‘l’, lty = c(1, 2, 2), col = 1, main=three1,
xlab = “Ages of AA girls”, ylab = three2, ylim=m3)
ALLnp4Ave <- m(a2)
N4 <- t(apply(a2, 1, quantile, probs = c(0.025, 0.975)))
n4<-cbind(ALLnp4Ave, N4)
four1<-expression(atop(Age vs P(Y> y[.90](t))))
four2<-expression(paste(P(Y> y[.90](t))))
matplot(o1, n4, type = ‘l’, lty = c(1, 2, 2), col = 1, main=four1,
xlab = “Ages of AA girls”, ylab = four2, ylim=m1)
ALLnp5Ave <- m(a4)
N5 <- t(apply(a4, 1, quantile, probs = c(0.025, 0.975)))
n5<-cbind(ALLnp5Ave, N5)
five1<-expression(atop(Age vs P(Y> y[.95](t))))
five2<-expression(paste(P(Y>y[.95](t))))
matplot(o1, n5, type = ‘l’, lty = c(1, 2, 2), col = 1, main=five1,
xlab = “Ages of AA girls”, ylab = five2, ylim=m2)
ALLnp6Ave <- m(a6)
N6 <- t(apply(a6, 1, quantile, probs = c(0.025, 0.975)))
n6<-cbind(ALLnp6Ave, N6)
six1<-expression(atop(Age vs P(Y> y[.99](t))))
six2<-expression(paste(P(Y>y[.99](t))))
matplot(o1, n6, type = ‘l’, lty = c(1, 2, 2), col = 1, main=six1,
xlab = “Ages of AA girls”, ylab = six2, ylim=m3)
175
# R code for Box-Cox Transformation
#lambda <- seq(-5, 5, length = 500)
t1data<-lapply(sdata,function(l) with(l, boxcox(SYSAV AGE+RACE+HTAV, data
= l, plotit = FALSE)))
# lambdas for SBP
result sbp <- t(sapply(t1data, function(d){
lambdas <- d$x[which.max(d$y)]
ll <- d$y[which.max(d$y)]
c(lambdas, ll)
}))
colnames(result sbp) <- c(’lambdas’,’logLik’)
c<-result sbp
lamda<-result sbp[,1]
age<-seq(9.1,19,by=.1)
nlamda<-data.frame(cbind(age,lamda))
# Bootstrap confidence band
m <- with(nlamda, ksboot(age, lamda, nreps = 5000))
with(nlamda, plot(age, lamda, las = 1))
with(m, matpoints(x, m[, - 1], type = ’l’, col = c(1, 2, 2), lty = c(1, 2, 2)))
newdat<-cbind(age,lamda,slamda=m[,2])
# data sets with transformed SBP variable with raw lamda and smooth lamda
require(car)
newsbp <- vector(“list”, NROW(result sbp))
for(i in 1:NROW(result sbp)){
d <- sdata[[i]] # i-th data set
d$sbp bcraw <- bcPower(d$SYSAV, newdat[i, 2])
176
d$sbp bcsmooth <- bcPower(d$SYSAV, newdat[i, 3])
newsbp[[i]] <- d
}
# mean, min and max
msd<-matrix(NA,100,6)
for(i in 1:100){
d<-newsbp[[i]]
muraw<-mean(d$sbp bcraw)
minraw<-min(d$sbp bcraw)
maxraw<-max(d$sbp bcraw)
musmooth<-mean(d$sbp bcsmooth)
minsmooth<-min(d$sbp bcsmooth)
maxsmooth<-max(d$sbp bcsmooth)
msd[i,]<-c(muraw,minraw,maxraw,musmooth,minsmooth,maxsmooth)
}
colnames(msd)<-c(’rmean’,’rmin’,’rmax’,’smean’,’smin’,’smax’)
msd<-data.frame(msd)
177
10 References
1. American Academy of Pediatrics (Aug-2005): The fourth report on the di-
agnosis, Evaluation, and Treatment of High Blood Pressure in Children and
Adolescents-Pediatrics.
2. Box, G. E. P. Cox, D. R. (1964): An analysis of transformations, Journal of
the Royal Statistical Society, Series B, 26, pp. 211-252.
3. Chiou, J.M., Mller, H.G., Wang, J.L. (2004): Functional response models. Sta-
tistica Sinica 14, pp. 675-693.
4. Danieals, S.R., McMahor, R.P.,Obarzanek, E., Waclawiw, M.A.,Similo, S.L.,Biro,
F.M, Schreiber, G.B.,Kimm, S.Y.S., Morrison, J.A., and Barton, B.A.(1998):
Longitudinal corrlates of change in blood pressure in adolescent girls. Hyper-
tension 31, pp. 97-103
5. Diggle, P.J., Heagery, P.,Liang, K.Y. and Zeger, S.L. (2002): Analysis of Lon-
gitudinal data, 2nd ed., Oxford:Oxford University Press
6. Fan, J. and Gijbels, I. (1996): Local Polynomial Modelling and Its Applications,
Chapman and Hall, First Edition.
7. Fan, J. and Wu,Y. (2008): Semiparametric estimation of covariance matrices
for longitudinal data. Journal of American Statistical Assosciation. 103, pp.
1520-1533.
8. Fan, J. and Zhang, J.T. (2000): Two-step estimation of functional linear models
with applications to longitudinal data. Journal of Royal Statistcal Society, Ser.
B62, pp. 303-322.
178
9. Fitzmaurice, G., Davidian, M., Verbeke, G., and Molenberghs G. (Editors)
(2009): Longitudinal Data Analysis. Chapman & and Hall/CRC:Boca Ra-
ton, FL.
10. Freedman, Jade Lee (2003): An Analysis of Box-Cox Transformed Data. PhD
Thesis.
11. Hall, P., Wolff, R.C.L. and Yao, Q. (1999): Methods for estimating a conditional
distribution function. Journal of American Statistical Assosciation. 94, pp.
154-163
12. Hall, P., Racine, J. and Li,Q. (2004): Cross-validation and the estimation of
conditional probability densities. Journal of American Statistical Assosciation,
99, pp. 1015-1026
13. Hart, T.D., and T.E.Wehrly. (1993): Kernel regression estimation using re-
peated measurements data. Stochastic Processes Their Application., 45, pp.
351-361.
14. Hoover, D. R., Rice, J. A., Wu, C. O. and Yang, L. P. (1998): Nonparametric
smoothing estimates of time-varying coefficient models with longitudinal data.
Biometrika 85, pp. 809-822.
15. Hu, Z., Wang, N. and Carroll, R. J. (2004). Profile-kernel versus backfitting in
the partially linear model for longitudinal/clustered data. Biometrika, 91, pp.
251-262.
16. Gibbons and Chakraborti (4th Edition): Nonparametric Statistical Inference.
17. James, G.M., T.J. Hastie, and C.A.Sugar. 2000. Principal component models
for sparse functional data. Biometrika 87, pp. 587-602.
179
18. Jianqing Fan and Jin-Ting Zhang (2000): Two-step estimation of functional
linear models with applications to longitudinal data- Journal of Royal Statistical
Society, Series B (Statistical Methodology), Vol. 62, No.2, pp. 303-322.
19. Lin, X., and R. Carroll. (2001): Semiparametric regression for clustered data.
Biometrika, 88, pp. 1179-1185.
20. L.J.Wei, Z.Ying, S.C.Cheng (1995): Analysis of transformation models with
censored data. Biometrika, 8, 835-845.
21. Lehmann and Casella (1998): Theory of Point Estimation, Springer, Second
Edition.
22. Molenberghs, G. and Verbeke, G. (2005): Models for Discrete Longitudinal
Data. Springer: New York, NY.
23. National Heart, Lung, and Blood Institute Growth and Health Research Group
(NGHSRG) (1992): Obesity and cardiovascular disease risk factors in black and
white girls: the NHLBI Growth and Health Study. American Journal of Public
Health 82, pp. 1613–1620.
24. Obarzanek, E., C.O. Wu, J.A. Cutler, R. W. Kavey, R.W. Pearson, S.R. Daniels,
(2010): Prevalence and incidence of hypertension in adolescent girls. The Jour-
nal of Pediatrics, 157(3), pp. 461-467.
25. National High Blood Pressure Education Program Working Group on High
Blood Pressure in Children and Adolescents (NHBPEP Working Group) (2004).
The fourth report on the diagnosis, evaluation, and treatment of high blood
pressure in children and adolescents. Pediatrics 114, pp. 555-576.
180
26. Qu, A. and Li, R. (2006): Nonparametric modeling and inference function for
longitudinal data. Biometrics, 62, pp. 379-391
27. Senturk, D. and Muller, H. G. (2006): Inference for covariate adjusted regression
via varying coefficient models. Annals of Statistics, 34: pp. 654-679.
28. Rohatgi and Saleh (2006): An Introduction to Probability and Statistics, second
edition, John Wiley & Sons Inc.
29. Sakia, R.M. (1992): The Box-cox transformation technique: A review. The
Statistician 41, pp. 169-178.
30. Thompson, D. R., Obarzanek, E., Franko, D. L., Barton, B. A., Morrison,
J., Biro, F. M., Daniels, S. R. and Striegel-Moore, R. H. (2007). Childhood
overweight and cardiovascular disease risk factors: The National Heart, Lung,
and Blood Institute Growth and Health Study. Journal of Pediatrics 150, pp.
18-25.
31. van der vaart, A. W. (1998): Asymptotic Statistics, Cambridge University
Press.
32. Wu, C. O., Tian, X. and Yu, J. (2010): Nonparametric estimation for time-
varying transformation models with longitudinal data. Journal of Nonparamet-
ric Statistics, 22, pp. 133-147.
33. Wu, C. O., Tian, X. (2013a): Nonparametric estimation of conditional distribu-
tion functions and rank-tracking probabilities with longitudinal data. Journal
of Statistical Theory and Practice, 7, pp. 1-26.
34. Wu, C. O., Tian, X. (2013b): Nonparametric estimation of conditional distribu-
tions and rank-tracking probabilities with time varying transformation models
181
in longitudinal studies, Journal of the American Statistical Association, Vol
108, Issue 503, 2013.
35. Zhou, L., Huang, J. Z., Carroll, R. (2008): Joint modelling of paired sparse
functional data using principal components. Biometrika, 95(3), pp. 601-619.
182