Nonparametric Smoothing Estimation of Conditional ...

Nonparametric Smoothing Estimation ofConditional Distribution Functions with

Longitudinal Data and Time-VaryingParametric Models

By Mohammed R. Chowdhury

B.Sc. in Statistics, August, 2000, University of Chittagong, BangladeshM.Sc. in Statistics, November, 2002, University of Chittagong, BangladeshM.A. in Statistics, July, 2008, Ball State University, Muncie, Indiana, USA

A Dissertation submitted to

The Faculty ofColumbian College of Arts and Sciencesof The George Washington University

in partial satisfaction of the requirementsfor the degree of Doctor of Philosophy

January 31, 2014

Dissertation directed by

Colin WuMathematical Statistician, NHLBI, National Institute of Helath, Bethesda, MD

Reza ModarresProfessor of Statistics

The Columbian College of Arts and Sciences of The George Washington Univer-

sity certifies that Mohammed R. Chowdhury has passed the Final Examination for

the degree of Doctor of Philosophy as of December 17, 2013. This is the final and

approved form of the dissertation.

Nonparametric Smoothing Estimation ofConditional Distribution Functions with

Longitudinal Data and Time-VaryingParametric Models

Mohammed R. Chowdhury

Dissertation Research Committee:

Colin Wu, Mathematical Statistician, NHLBI, National Institute of Health, Bethesda,

MD, Dissertation Co-Director

Reza Modarres, Professor of Statistics, Dissertation Co-Director

Subrata Kundu, Associate Professor of Statistics, Committee Member

Yinglei Lai, Associate Professor of Statistics, Committee Member

ii

Dedication

To my dear parents

Mohammed Mofizur Rahaman Chowdhury & Chemona Afroze

iii

Acknowledgements

At first, I am remembering The Almighty Allah for giving me strength, patience

and ability to accomplish this research work. I would like to thank my advisors Dr.

Colin Wu and Dr. Reza Modarres for their continuous support, encouragement and

guidance. Their guidance is evident throughout. I am indebted to Dr. Subrata

Kundu and Dr. Yinglei Lai for their helpful suggestions and constructive review of

this dissertation. I am also grateful to Dr. Paul Albert and Tatiyana Apanasovich

for their invaluable comments. Additionally, I would like to express appreciation to

my friends Li Cheung, Jorge Ivan Velez and others at the Department of Statistics

for their dear friendship and support. I also thank National Heart Lung and Blood

Institute for providing me the NGHS (National Growth and Health Study) data. The

National Growth and Health Study was supported by contracts NO1-HC-55023-26

and grants U01-HL48941-44 from the National Heart, Lung and Blood Institute. I

also thank my sister Zosna Afroze for her wholehearted support. Last, but not least,

I would like to thank my wife Nahida Akhter Irin, whose unconditional love and

devotion have provided comfort and joy. She has made all the differences in every

step of the way throughout this journey.

iv

Abstract

Nonparametric Smoothing Estimation of Conditional Distribution Functions withLongitudinal Data and Time-Varying Parametric Models

The thesis is concerned with the nonparametric estimation of the conditional distri-

bution function with longitudinal data. Nonparametric estimation and inferences of

conditional distribution functions with longitudinal data have important applications

in biomedical studies, such as epidemiological studies and longitudinal clinical trials.

Estimation without any structural assumptions may lead to inadequate and numeri-

cally unstable estimators in practice. In this Dissertation, we propose a nonparametric

approach based on time-varying parametric models for smoothing estimation of the

conditional distribution functions with a longitudinal sample and show that our local

polynomial smoothing estimator outperforms the existing Nadaraya-Watson kernel

smoothing estimator in term of root MSE and length of confidence band. In both

cases, we have used the Epanechnikov kernel and bandwidth 2.5.

Our model assumes that the conditional distribution of the outcome variable at

each given time point can be approximated by a parametric model after log trans-

formation or local Box-Cox transformation, but the parameters are smooth function

of time. Our estimation is based on a two-step smoothing method, in which we first

obtain the raw estimators of the conditional distribution functions at a set of dis-

joint time points, and then compute the final estimators at any time by smoothing

the raw estimators. Pointwise bootstrap confidence bands have been constructed for

both local polynomial smoothing estimators and Nadaraya-Watson kernel smoothing

v

estimators, resulting in a wider bootstrap confidence band for the Nadaraya-Wastson

kernel smoothing estimator. Asymptotic properties, including the asymptotic bi-

ases, variances and mean squared errors, have been derived for the local polynomial

smoothed estimators. Asymptotic distribution of the raw estimators of the condi-

tional distribution functions has been derived.

Applications of our two-step estimation method have been demonstrated through

a large epidemiological study of childhood growth and blood pressure. In our NGHS

(National Health and Growth Study) application, we report that

(a) Structural Nonparametric Model (SNM) performs better than the Unstructured

Nonparametric Model (UNM) in estimating raw probabilities as well as smoothing

probabilities on entire time design points.

(b) African American (AA) girls have higher probability of developing hypertension

than the Caucasian (CC) girls.

(c) Box-Cox transformation gives better results than the Log transformation.

(d) Smoothing-Ealry and Smoothing-Later give the same results when Log transfor-

mation is involved.

(e) Smoothing-Later is the only option when Box-Cox transformation is involved.

Finite sample properties of our procedures are investigated through a simulation

study. We report that root MSE is smaller at each of the 101 time design point for

local polynomial smoothing estimator than the Nadaraya-Watson kernel smoothing

estimator. A much stronger conclusion for smaller root MSE is demonstrated by

structural nonparametric model than the unstructured nonparametric model when

extreme conditional tail probabilities are estimated and smoothed.

vi

Contents

Dedication iii

Acknowledgements iv

Abstract v

Contents vii

List of Figures x

List of Tables xiv

1 Chapter One 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Conditional Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.1 Time - Varying Parameter Models . . . . . . . . . . . . . . . . 7

1.3.2 Extension to Continuous and Time-Varying Covariates . . . . 8

1.3.3 Time-Varying Nonparametric Models . . . . . . . . . . . . . . 9

1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . 9

2 Chapter Two 12

2.1 Two-Step estimation methods and inferences for time variant paramet-

ric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.1 Raw estimates of parameter curves and distributions . . . . . 12

2.2 Smoothing Estimators of Conditional Distributions . . . . . . . . . . 13

2.2.1 Rationales of Smoothing Step . . . . . . . . . . . . . . . . . . 13

2.2.2 Smoothing-Early conditional CDF Estimators . . . . . . . . . 14

vii

2.2.3 Smoothing-Later conditional CDF Estimators . . . . . . . . . 15

2.3 Two-Steps estimation methods and inferences for unstructured non-

parametric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.1 Raw estimates of unstructured nonparametric CDF . . . . . . 17

2.3.2 Smoothing estimates of unstructured nonparametric CDF . . . 17

2.4 Bandwidth Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 Bootstrap Pointwise Confidence Intervals . . . . . . . . . . . . . . . . 22

3 Chapter Three 24

3.1 Application to NGHS BP data . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4 Chapter Four 41

4.1 Asymptotic Properties of the Raw Estimators . . . . . . . . . . . . . 41

4.2 Asymptotic Properties of the Smoothing Estimators: . . . . . . . . . 43

5 Chapter Five 47

5.1 Time-Varying Models with Locally Transformed Variables . . . . . . 47

5.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6 Chapter Six 63

6.1 Discussion and Future Research . . . . . . . . . . . . . . . . . . . . . 63

7 Appendix 1: Preliminary Analysis 65

8 Appendix 2: Proof of Theoretical Results 133

8.1 A.1 Useful Approximation for the Equivalent Kernels . . . . . . . . . 133

8.2 A.2 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 133

9 Appendix 3: R Code 137

viii

10 References 178

ix

List of Figures

1 Raw estimators (scatter plots), smoothing estimators (solid curves),

and bootstrap pointwise 95% confidence intervals (dashed curves, B =

1000 bootstrap replications) of the age specific probabilities of SBP

greater than the 90th and 95th population SBP percentiles for Cau-

casian girls (CC) between 9.1 and 19.0 years old. (1a) and (1b): Esti-

mators based on the time-varying log-normal models. (1c)-(1d): Esti-

mators based on the unstructured kernel estimators. . . . . . . . . . . 30




greater than the 90th and 95th population SBP percentiles for African-

American (AA) girls between 9.1 and 19.0 years old. (1a) and (1b):

Estimators based on the time-varying log-normal models. (1c)-(1d):

Estimators based on the unstructured kernel estimators. . . . . . . . 31




greater than the 90th and 95th population SBP percentiles for all girls

between 9.1 and 19.0 years old. (1a) and (1b): Estimators based on

the time-varying log-normal models. (1c)-(1d): Estimators based on

the unstructured kernel estimators. . . . . . . . . . . . . . . . . . . . 32

x



1000 bootstrap replications) of the age specific mean and standard de-

viation of SBP for All girls between 9.1 and 19.0 years old. Estimators

based on the time-varying log-normal models. . . . . . . . . . . . . . 33

5 Local linear smoothing estimators (solid curves), and pointwise boot-

strap 95% confidence intervals (dashed curves, B = 1000 bootstrap

replications) of the age specific probabilities of SBP greater than the

90th and 95th population SBP percentiles for all girls between 9.1 and

19.0 years old. Estimators based on the time-varying Gaussian models

with smoothing early approach. . . . . . . . . . . . . . . . . . . . . . 34

6 Black solid line is local polynomial (a,b) and Nadaraya-Watson (c,d)

smoothing estimators with Epanechnikov kernel for SNM and UNM.

Dotted lines represent the 95% pointwise bootstrap confidence band

for 1000 simulated samples. . . . . . . . . . . . . . . . . . . . . . . . 38




greater than the 90th, 95th and 99th population SBP percentiles for

Caucasian girls (CC) between 9.1 and 19.0 years old. (a),(c),(e): Esti-

mators based on the time-varying Gaussian models. (b),(d),(f): Esti-

mators based on the unstructured kernel estimators. . . . . . . . . . . 51

xi





African American girls (CC) between 9.1 and 19.0 years old. (a),(c),(e):

Estimators based on the time-varying Gaussian models. (b),(d),(f):

Estimators based on the unstructured kernel estimators. . . . . . . . 52





entire cohort between 9.1 and 19.0 years old. (a),(c),(e): Estimators

based on the time-varying Gaussian models. (b),(d),(f): Estimators

based on the unstructured kernel estimators. . . . . . . . . . . . . . . 53

10 Black solid line is local linear (a,c,e) and Nadaraya-Watson (b,d,f)

smoothing estimators with Epanechnikov kernel and bandwidth 2.5.

Dotted lines represent the 95% pointwise bootstrap confidence band

for 1000 simulated samples. . . . . . . . . . . . . . . . . . . . . . . . 59

11 QQplot of SBP after log transformation from the 1st data set to 12th

data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

12 QQplot of SBP after log transformation from the 13th data set to 24th

data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124


data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125


data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

xii


data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127


data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128


data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129


data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130


data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

20 Local polynomial smoothing estimator of Box-Cox Lambda. . . . . . 132

xiii

List of Tables

1 Theoretical 90th and 95th quantiles for 10 different time points out of

101 different time points from the model of our simulation design. . . 35

2 Averages of estimates, averages of the biases, the square root of the

mean squared errors, the empirical coverage probabilities of the empir-

ical quantile bootstrap pointwise 95% confidence intervals (B = 1000

bootstrap replications) for the estimation of P [Y (t) > y.90(t)] = 0.10

and relative Root MSE at t = 1.0, 2.0, . . . , 10.0 over 1000 simulated

sample. The smoothing-later local linear estimators based on the time-

varying Gaussian model are shown in the left panel. The kernel esti-

mators based on the unstructured nonparametric model are shown in

the right panel. The Epanechnikov kernel and the LTCV bandwidth

h = 2.5 are used for all the smoothing estimators. . . . . . . . . . . . 39











4 Theoretical 90th, 95th and 99th quantiles for 10 different time points

out of 101 different time points from the model of our simulation design. 55

xiv











6 Averages of the biases, the square root of the mean squared errors, and

the empirical coverage probabilities of the empirical quantile bootstrap

pointwise 95% confidence intervals (B = 1000 bootstrap replications)

for the estimation of P [Y (t) > y.95(t)] = 0.05 at t = 1.0, 2.0, . . . , 10.0

over 1000 simulated sample. The smoothing-later local linear estima-

tors based on the time-varying Gaussian model are shown in the left

panel. The kernel estimators based on the unstructured nonparametric

model are shown in the right panel. The Epanechnikov kernel and the

LTCV bandwidth h = 2.5 are used for all the smoothing estimators. . 61

xv

7 Averages of the biases, the square root of the mean squared errors, and

the empirical coverage probabilities of the empirical quantile bootstrap

pointwise 95% confidence intervals (B = 1000 bootstrap replications)

for the estimation of P [Y (t) > y.99(t)] = 0.01 at t = 1.0, 2.0, . . . , 10.0

over 1000 simulated sample. The smoothing-later local linear estima-

tors based on the time-varying Gaussian model are shown in the left

panel. The kernel estimators based on the unstructured nonparametric

model are shown in the right panel. The Epanechnikov kernel and the

LTCV bandwidth h = 2.5 are used for all the smoothing estimators. . 62

8 P-values for normality test of 100 data sets. SW, AD, KS, CVM and

ChiSq stand for Shapiro-Wilk Test, Anderson Darling Test, Kolmogorov-

Smirnov Test, Cramer-Von-Mises Test and Chi Square Test respectively. 65

9 Estimated Raw Probabilities (ERP) of SBP for entire cohort that ex-

ceed different quantiles of yq(t) (q= .90,.95,.99) by SNM and UNM. . 70

10 Estimated Raw Probabilities (ERP) of SBP for Caucasian Girls that

exceed different quantiles of q(t)(q= .90,.95,.99) by SNM and UNM. . 78

11 Estimated Raw Probabilities (ERP) of SBP for African American Girls

that exceed different quantiles of yq(t)(q= .90,.95,.99) by SNM and UNM. 86

12 Local linear smoothing estimates for µ(t) and σ(t) for 100 data sets of

entire cohort. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

13 Girls with median height and age specific log scaled SBP percentile

values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

14 Smoothing probabilities by local linear smoothing estimator and Nadaraya-

Watson Kernel smoothing estimator for entire cohort. . . . . . . . . . 104

xvi

15 Some values of bandwidth for entire cohort, Caucasian cohort and

African American cohort obtained by AIC cross validation method.

Cross validation scores are given in the parenthesis. . . . . . . . . . . 109

16 Some values of bandwidth for entire cohort, Caucasian cohort and

African American cohort obtained by LS cross validation method.

Cross validation scores are given in the parenthesis. . . . . . . . . . . 110

17 ML estimators and local polynomial smoothing estimators of Lambda

with their corresponding mean, minimum and maximum of each sub-

samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

18 ML estimators and their local polynomial smoothing estimators with

corresponding p-values by Shapiro-Wilk (SW) Test for each sub-samples.118

xvii

1 Chapter One

1.1 Introduction

Longitudinal data often appear in biomedical studies, where at least some of the

variables of interest from the independent subjects are repeatedly measured over

time. Existing methods of longitudinal analysis in the literature are mostly focused on

regression models based on the conditional means and variance-covariance structures.

Parametric and nonparametric methods for conditional mean and variance-covariance

based regression models may be found, among others, in Hart and Wehrly (1993),

Hoover et al. (1998), Fan and Zhang (2000), Lin and Carroll (2001), Verbeke and

Mollenberghs (2005), James, Hastie, and Sugar (2000), Diggle et al. (2002), Chiou,

Muller, and Wang (2004), Hu, Wang, Carroll (2004), Qu and Li (2006), Senturk

and Muller (2006), Zhou, Huang and Carroll (2008) and Fitzmaurice et al. (2009).

These methods, although popular in practice, may not be adequate for estimating

the conditional distribution functions when the longitudinal variables of interest are

skewed or have significant deviation from normality.

Log transformation or Box-Cox (1964) transformation is needed to reduce skew-

ness of the longitudinal data. More specifically, a time variant local Box-Cox transfor-

mation is required to induce normality on each of the data set partitioned according to

predetermined time points from the original longitudinal sample. Box-Cox transfor-

mation system gives the power-normal (PN) family, whose members include normal

and log-normal distributions. A detailed review of the Box-Cox transformation can

be found on Sakia (1992). Box-Cox (1964) proposed both maximum likelihood as

well as Baysian Methods for the estimation of the parameter λ. We have used only

1

maximum likelihood Method for estimating time variant Box-Cox parameter λ(t).

Maximization of the likelihood function is done over a fixed power transformation

set. We consider this power transformation set as [-2 , 2]. When λ(t) = 0, a log

transformation is considered. Box-Cox (1964) transformation is defined as follows:

Y =

Zλ−1λ

if λ 6= 0

log(Z) if λ = 0

When λ varies according to time point t, we have time-variant Box-Cox transfor-

mation as follows:

Y (t) =

Zλ(t)(t)−1

λ(t)if λ(t) 6= 0

log(Z(t)) if λ(t) = 0

The problem of modeling and estimating the conditional distribution functions of

a longitudinal variable is well motivated by a large prospective cohort study, namely

the National Growth and Health Study (NGHS), which has the main objective of

evaluating the temporal trends of cardiovascular risk factors, such as the systolic and

diastolic blood pressures (SBP, DBP) based on up to 10 annual follow-up visits of

2379 African-American and Caucasian girls during adolescence (NGHSRG, 1992).

Existing results of this study, such as Daniels et al. (1998), Thompson et al. (2007)

and Obarzanek et al. (2010), have evaluated the effects of age, height and other

covariates on the means and abnormal levels of SBP, DBP and other cardiovascular

risk factors. Because the conditional distributions of SBP and DBP over age are not

normal, the conditional mean based regression models employed by the above authors

do not lead to adequate estimates of the conditional distributions of SBP and DBP

for this adolescent population.

In an effort to directly model and estimate the conditional distribution functions of

2

longitudinal variables, Wu, Tian and Yu (2010) and Wu and Tian (2013b) studied an

estimation method based on a time-varying transformation model with time-varying

covariates, and Wu and Tian (2013a) studied a two-step smoothing method for time-

invariant covariates without assuming any specific modeling structures. Applying

their methods to the NGHS blood pressure (BP) data, these authors illustrated the

advantages of directly modeling and estimating the conditional distribution functions

over the conditional mean based models. Similar to the unstructured smoothing meth-

ods of Hall, Wolff and Yao (1999) for the independent identically distributed (i.i.d.)

data, the smoothing method of Wu and Tian (2013a) could be numerically unstable

and lead to substantial estimation errors for estimating the conditional cumulative

distribution functions (CDF) near the boundary of the support. The structural mod-

eling approach of Wu, Tian and Yu (2010) is mainly for the purpose of reducing

the dimensionality associated with the time-varying covariates, which does not allevi-

ate potential estimation errors of the conditional CDF estimators near the boundary

points.

Motivated by the time-varying coefficient approaches in the literature (e.g., Hoover

et al., 1998), we propose in this dissertation a structural nonparametric approach for

the estimation of conditional distribution functions, and show that, when the struc-

tural assumptions hold, our approach may lead to estimators which are superior to the

unstructured smoothing estimators. Our approach relies on the assumption that the

conditional distribution function of the longitudinal variable of interest or its transfor-

mation (Log transformation or local Box-Cox transformation) at a given time point

follows a parametric family, but the parameters may vary when time changes. Our

longitudinal variable follows a “time-varying parametric model”, which is a special

case of the “structural nonparametric models” and maintains its flexibility by allow-

ing different parameter values at different time points. For the estimation method,

3

we propose a two-step smoothing procedure, which first obtains the raw estimators of

the conditional distribution functions based on the time-varying parametric family at

a set of distinct time points, and then computes the estimators at any time point by

smoothing the available raw estimators using a nonparametric smoothing procedure.

The two-step smoothing method can be applied in two ways. These are known

as the smoothing-early approach and smoothing-later approach. In smoothing-early

approach, we first smooth the estimate of the parameter space and then directly plug

the smoothing estimate of the parameters in the conditional distribution function to

get the smoothing estimate of the conditional distribution function. In smoothing-

later approach, we use the raw estimates of the parameters to get the raw estimates

of the conditional distribution function at each time point and finally smooth these

raw estimates of the conditional distribution function to get the smoothing estimates

of the conditional distribution function.

It should be mentioned that smoothing-early approach is not applicable when Box-

Cox power transformation is involved. Box-Cox transformed Y (tj) differ significantly

between time points for the ML estimates of λ(tj). For smoothing estimates of time

variant Box-Cox λ(t), transformed variable Y (t) does not belong to the Power Normal

family. Even with raw estimate of λ(tj), we cannot smooth the parameter space of

Gaussian distribution since for each raw estimate of λ(tj), the time varying param-

eter space of Gaussian distribution varies in large scale. Applying smoothing-early

approach for getting the smoothing estimates of the parameter space with such huge

variation among estimates of the parameter values at different time points will not

give us any meaningful smoothing estimates of the parameter space. So, smoothing

later approach is the only option when time variant Box-Cox transformation is applied

to induce normality. Table 17 of Appendix 1 shows the variation of Box-Cox trans-

formed Y (t) for ML estimate of Box-Cox λ(t). In the same table, we have also shown

4

the transformed Y (t) for smoothed λ(t). Table 17 also gives the average, minimum

and maximum for each sub-sample for ML estimate of Box-Cox λ(t) and smoothing

estimate of Box-Cox λ(t). Table 18 of Appendix 1 shows that 13 sub-samples from

100 sub-samples are nonnormal with time variant smoothed Box-Cox λ(t) whereas

only 4 sub-samples are nonnormal with ML estimates of λ(t). In Figure 20, we have

shown the smoothed λ(t) together with 95% pointwise bootstrap confidence band.

Table 17, Table 18 and Figure 20 tell us why smoothing early conditional distribution

function estimation is not possible with time variant Box-Cox λ(t).

The two-step smoothing procedure, which is similar to the ones used in Fan and

Zhang (2000) and Wu, Tian and Yu (2010), is computationally simple and easy to

be implemented in practice. For the practical properties, we demonstrate the clinical

interpretations and implications of our structural nonparametric approach over the

unstructured nonparametric method through an application to the NGHS BP data,

and investigate the finite sample properties of our procedures through a simulation

study. For the theoretical properties, we derive the asymptotic distributions of the

raw estimators and the asymptotic expressions for the biases, variances and mean

squared error for the two-step local polynomial estimators. These results show that

the smoothing step has the advantage of reducing the variability of the raw estimators

as well as giving estimates at entire time design points.

1.2 Data Structure

We focus on the longitudinal samples with similar structures as the NGHS (NGH-

SRG, 1992), which are mathematically convenient and commonly appear in large

epidemiological studies. Within each sample, we have n independent subjects. For

the ith subject with 1 ≤ i ≤ n, we have mi ≥ 1 observations at time points

Ti = {tij ∈ τ ; j = 1, . . . ,mi}, where τ is the time interval containing all the time

5

range of interest. The total number of observations is N =∑n

i=1 mi. For mathe-

matical simplicity, we assume that the time points T = {Ti; i = 1, . . . , n} for the

sample are contained in the vector of J “disjoint time points” t = (t1, ....., tJ)T . In

biomedical studies, t is often obtained by rounding off age or other time variables

within an acceptable range of accuracy. Let Y (t) be a real valued outcome variable

at any time point t ∈ τ and X is time-invariant categorical covariate that takes values

x ∈ {1, . . . , K}.

The longitudinal sample for {Y (t), X, t} is denoted by Z = {Yi(tj), Xi, tj; 1 ≤ j ≤

J, i ∈ Sj}, where Sj is the set of subjects which have observations at time point tj.

This data structure is the special case of Obarzanek et al. (2010), Wu, Tian and Yu

(2010) and Wu and Tian (2013b) with a categorical and time invariant covariates.

Let nj = # {i ∈ Sj} be the number of subjects in Sj and nj1j2 = # {i ∈ Sj1⋂Sj2} the

number of subjects in both Sj1 and Sj2 . Then, nj1j2 ≤ min {nj1 , nj2}. In the NGHS

applications of Wu, Tian and Yu (2010) and Wu and Tian (2013b), the time points in

t are specified by rounding up the ages of the NGHS subjects at the first decimal place,

which are chosen using the clinical definition of age for pediatric studies. Although

in general Y (t) may be a multivariate random variable, for mathematical simplicity,

our main results are restricted to the univariate Y (t). Extension to multivariate Y (t)

require further modeling for the joint distributions, hence, is out of the scope of this

dissertation.

Remark 1.1. We assume categorical and time-invariant covariate X for the sim-

plicity of mathematical expressions and biological interpretations. When continuous

or time-dependent covariates are included in the model, the nonparametric estimation

methods require multivariate smoothing methods, which could be computationally in-

tensive and sometimes infeasible and difficult to interpret in practice. In section 1.3.2

a useful dimension reduction approach based on time-varying transformation models

6

(Wu, Tian and Yu, 2010) is discussed. This approach could be extended to the esti-

mation of conditional probabilities with continuous and time-varying covariates. But

this extension requires further methodological and theoretical developments than the

ones provided in this dissertation. In the present context, X=1 represent Caucasian

girls and X=2 represent African American girls.

Remark 1.2. The data structure mentioned in this section is consistent with

the data formulation used in the NGHS publications, such as Daniels et al.,(1998),

Obarzanek et al. (2010), Wu, Tian and Yu (2010) and Wu and Tian (2013a). In

both data formulations, the time points {tij; 1 ≤ j ≤ mi, 1 ≤ i ≤ n} have J > 1

distinct possible values in t, which are referred in Wu, Tian and Yu (2010) and Wu

and Tian (2013b) as the “time design points”. Each of the n independent subjects

has actual visit times within a subset of t. If the ith subject is observed at time point

tj, the corresponding outcome variable is Yi(tj), which may not be the same as Yi(tij),

since tj and tij are not necessarily the same.

1.3 Conditional Distribution

1.3.1 Time - Varying Parameter Models

For any given t ∈ τ , our objective is to estimate the conditional cumulative distribu-

tion functions (CDF) Ft,θ(t)[y(t)|x] = P [Y (t) ≤ y(t)|t,X = x] for some given curve

y(t) on τ based on the longitudinal sample Z. Other conditional probabilities may

be obtained using the functionals of Ft,θ(t)[y(t)|x]. The choices of y(t) depend on the

scientific objectives of the analysis. For example, hypertension and abnormal levels

of blood pressure for children and adolescents are defined by gender and age specific

blood pressure quantiles (e.g., NHBPEP, 2004), so that it is often meaningful in pe-

diatric studies to evaluate the conditional CDFs with y(t) chosen as a pre-determined

gender and age specific blood pressure quantile curves.

7

The two-step estimation method of Wu and Tian (2013a) relies on kernel smooth-

ing estimator of the raw empirical CDF Ftj [y(tj)|X = x] for j = 1, . . . , J without

assuming any parametric structures for Ft[y(t)|X = x]. When Ft(·) belongs to a

parametric family at each t ∈ τ , we have a time-varying parametric model

Fθ(t) = {Ft,θ(t)(·); θ(t) ∈ Θ}, (1)

where θ(t) is the vector of time-varying parameters which belong to an open Euclidean

space Θ. Under Fθ(t), one would naturally expect that a smoothing estimator, which

effectively utilizes the local parameter structure of Ft,θ(t)(·), could be superior to the

unstructured two-step smoothing estimators of Wu and Tian (2013a). For the special

case of time-varying normal distribution, Ft,θ(t)[y(t)] given t ∈ τ with mean and

variance curves θ(t) = (µ(t), σ2(t))T is given by

Ft,θ(t)[y(t)] =

∫ y(t)

−∞

1√2πσ(t)

exp[− 1

2σ2(t)

{s− µ(t)

}2]ds. (2)

Conditional CDFs for time-varying parametric models other than (2) may be similarly

obtained. We note that the conditional CDF Ft,θ(t)(·) allows Y (t) to be either a

continuous or discrete random variable. If Box-Cox transformation is involved, the

parameter curves will be θ(t) = (µ(t), σ2(t), λ(t))T .

1.3.2 Extension to Continuous and Time-Varying Covariates

When there are continuous and time-varying covariates, we denote the covariate vec-

tor by X(t). The conditional CDF to be estimated is

Ft [y(t)|X = x(t)] = P [Y (t) ≤ y(t)|t,X(t) = x(t)] (3)

8

Nonparametric estimation of Ft [y(t)|X(t) = x(t)] would require a multivariate smooth-

ing method over both x(t) and t, which could be difficult to compute, particularly

when the dimensionality of X(t) is high, due to the well-known problem of the “curse

of dimensionality”. A potentially useful approach is to model the distribution of Y(t)

given X(t) and then to estimate the conditional CDF from the fitted model. The

time-varying transformation models of Wu, Tian and Yu (2010) and Wu and Tian

(2013b) rely on modeling the dependence of Ft [y(t)|X(t)] on X(t) through a “struc-

tural nonparametric model” determined by the coefficient curves. However, further

research is needed to develop appropriate time-varying parametric families Fθ(t) and

their estimation procedures for estimating the conditional CDFs Ft [y(t)|X(t)] when

X(t) involves continuous and time-varying components.

1.3.3 Time-Varying Nonparametric Models

For a given t ∈ τ , our interest is to estimate the conditional probability of FY (t)[y(t)|x] =

P [Y (t) ≤ y(t)|X = x] by empirical method. The choices of y(t) depend on the scien-

tific objectives of the analysis. More generally, it is often useful in practice to allow

y(t) to change with t. For example, health status for children and adolescents are

often defined by gender and age specific risk categories (e.g.,NHBPEP, 2004). Thus,

in pediatric studies, it is often meaningful to evaluate the conditional CDFs defined

as above, where, y(t) is a pre-determined gender and age specific risk-threshold curve.

For the both structural nonparametric and unstructured nonparametric methods, we

have used 90th, 95th and 99th percentile values of y(t) for girls with median height.

1.4 Organization of the Dissertation

Chapter 2 proposes a structural nonparametric model (SNM) for smoothing estima-

tion of the time variant conditional distribution function. Local polynomial smoothing

9

estimator for the structural nonparametric model has been developed in this chapter.

Chapter 2 also discusses the smoothing early and smoothing later approaches. This

chapter also examines the existing Nadaraya-Watson kernel smoothing approach for

conditional distribution function. To estimate time variant conditional CDF from

structural nonparametric models, we assume that our variable of interest-systolic

blood pressure (SBP) follows a parametric model after Log transformation. At each

time point the parameters of the models vary according to time design points. These

time variant parameters are estimated by maximum likelihood method. By plugging

these raw estimates on the time dependent parametric model, we estimate the time

variant conditional CDF for some pre-specified percentile values popularly known as

quantile curve y(t). These estimates of the conditional CDF are considered as the

raw estimates. As most time dependent raw estimates of the unknown parameters in

biomedical study show spiky behavior, we need to smooth them over by local poly-

nomial smoothing so that smoothing curve estimator over entire time design points

can be constructed. This approach is known as the smoothing-later approach. In

smoothing-early approach, we need to smooth the raw estimators of the parameter

space and then directly plug these smoothing estimators of the parameters in the con-

ditional CDF formula to immediately get the smoothing estimates of the conditional

CDF on entire time design points. Unstructured nonparametric model (UNM) and

Nadaraya-Watson kernel smoothing estimator is discussed for comparison with the

Structural Nonparametric Model and local polynomial smoothing estimator.

In chapter 3, we have presented an application of the above two methods to NGHS

data and also conducted a simulation study which is designed as similar to the NGHS

study. In simulation study, we have shown that root MSE is smaller in structural

nonparametric model than the unstructured nonparametric model in each of the 101

time design points. By looking at the relative root MSE, we see that it is always less

10

than 1 at each time design point, which indicates that SNM is more efficient than

UNM. A wider 95% pointwise bootstrap confidence band for UNM also indicate that

SNM is better than UNM.

In chapter 4, we have derived the asymptotic distribution of the estimators of

conditional CDF. The variance, bias and MSE of the the smoothing estimators of the

conditional distribution function from structural nonparametric model are explicitly

presented in this chapter.

In chapter 5, we have used the local Box-Cox transformation and repeated the

application of chapter 3 and show that local Box-Cox power transformation system

produces better results than the global log transformation. Asymptotic results of

smoothing estimator involving the Box-Cox λ(t) were not derived and left for future

research. Other Future research is demonstrated in the discussion part in chapter 6.

Further details of the theoretical derivations are given in the Appendix 2. Preliminary

data analysis, expolration and visualization are presented in Appendix 1. R code is

provided in Appendix 3.

11

2 Chapter Two

2.1 Two-Step estimation methods and inferences for time

variant parametric models

Similar to the estimation approach of Wu and Tian (2013b), we derive here a two-

step smoothing method for the estimation of the conditional distribution functions

in which we first compute the raw estimates of θ(tj|x) and Ftj ,θ(tj |x)

[y(tj)

∣∣x] for all

j = 1, . . . , J , and then derive the smoothing estimates of θ(t|x) and Ft,θ(t|x)

[y(t)

∣∣x] for

any t ∈ τ by applying a smoothing procedure over the corresponding raw estimates

at t. This two-step smoothing approach is computationally simple and does not need

correlation assumptions across different time points.

2.1.1 Raw estimates of parameter curves and distributions

We derive the estimators θ(tj|x) and Ftj ,θ(tj |x)

[y(tj)

∣∣x] of θ(tj|x) and Ftj ,θ(tj |x)

[y(tj)

∣∣x],respectively, using observations at time tj ∈ t. Suppose that we have enough obser-

vations nj at tj, so that θ(tj|x), tj ∈ t, can be estimated by the maximum likeli-

hood estimators (MLE) θ(tj|x) using the subjects in Sj. Substituting θ(tj|x) with

θ(tj|x), the corresponding raw estimator of Ftj ,θ(tj |x)

[y(tj)

∣∣x] is Ftj ,θ(tj |x)

[y(tj)

∣∣x] =

Ftj ,θ(tj |x)

[y(tj)

∣∣x]. For the time-varying Gaussian model, Ftj ,θ(tj |x)

[y(tj)

∣∣x] is given

by substituting θ(tj|x) with the MLE θ(tj|x) =(µ(tj|x), σ2(tj|x)

)Tin (2).

In practice, these raw estimators require the number of observations nj at tj to be

sufficiently large, so that they can be computed numerically. When the local sample

size nj is not sufficiently large, we can round off or group some of the adjacent time

points into small bins, and compute the raw estimates within each bin. This round

12

off or binning approach has been used by Fan and Zhang (2000), Wu, Tian and Yu

(2010) and Wu and Tian (2013b). But the effects of round off or binning on the

asymptotic properties of the smoothing estimators have not been investigated in the

literature. In biomedical studies, such as the NGHS, the unit of time is often rounded

off into an acceptable precision, so that numerical computations of the raw estimators

in such studies are possible.

2.2 Smoothing Estimators of Conditional Distributions

2.2.1 Rationales of Smoothing Step

There are two reasons to use the smoothing step in addition to the raw estimates.

First, the raw estimates are only for the coefficients or conditional CDFs at time

points in t, while the smoothing step leads to curve estimates over the entire time

range τ . Second, the raw estimates usually have excessive variations, so that their

values may change dramatically among adjacent time points in t. Given that spiky

estimates may not have meaningful biological interpretations, the smoothing step

should be used to reduce the variation by sharing information from the adjacent time

points. Theoretical justifications of the smoothing are discussed in Chapter 4.

There are two different two-step smoothing estimators for Ft,θ(t|x)

[y(t)

∣∣x]. The

first approach, referred herein as the “smoothing-early approach”, is to obtain the

smoothing estimators θ(t|x) of θ(t|x), and then estimate Ft,θ(t|x)

[y(t)

∣∣x] by the “plug-

in” smoothing estimator Ft,θ(t|x)

[y(t)

∣∣x]. The second approach, referred herein as

the “smoothing-later approach”, is to obtain the raw estimators Ftj ,θ(tj |x)

[y(tj)

∣∣x]for all j = 1, . . . , J , and then estimate Ft,θ(t|x)

[y(t)

∣∣x] by a smoothing estimator

Ft,θ(t|x)

[y(t)

∣∣x] based on Ftj ,θ(tj |x)

[y(tj)

∣∣x], j = 1, . . . , J . Both of these smoothing

approaches lead to appropriate conditional CDF estimators in practice.

13

2.2.2 Smoothing-Early conditional CDF Estimators

Suppose that θ(t|x) is (p+1) times differentiable with respect to t ∈ τ . Let θ(q)(t|x)

be the qth derivative of θ(t|x), 0 ≤ q ≤ p and βq(t|x) = θ(q)(t|x)/q!. By the Taylor

expansion of θ(t|x),

θ(t|x) ≈p∑q=0

βq(s0|x)(tj − s0)q

for t in some neighborhood of s0. We can treat the raw estimates θ(tj|x) as the

“observations” of θ(tj|x) at tj, j = 1, . . . , J , and obtain the pth local polynomial

estimators by minimizing

J∑j=1

{θ(tj|x)−

p∑q=0

βq(t|x)(tJ − t)q}2

Kh(tj − t0),

where Kh(tj−t) = K[(tj−t)/h]/h, K(·) is a non-negative kernel function, and h > 0 is

a bandwidth. Using the matrix formulation, we define θ(t|x) = (θ(t1|x), . . . , θ(tJ |x))T ,

β(t|x) = (β0(t|x), . . . , βp(t|x))T , G(t;h) = diag{Kh(tj−t)} with jth columnGj(t;h) =

(0, . . . , Kh(tj − t), . . . , 0)T , and Tp(t) the J × (p+ 1) matrix with its jth row given by

Tj,p(t) = (1, tj − t, . . . , (tj − t)p). The local polynomial estimators βq(t|x) minimize

QG

[β(t)

]=[θ(t|x)− Tp(t)β(t|x)

]TG(t;h)

[θ(t|x)− Tp(t)β(t|x)

].

The pth order local polynomial estimator of θ(q)(t|x) based on θ(tj|x), which minimizes

QG

[β(t)

], is

θ(q)(t|x) =J∑j=1

{Wq,p+1(tj, t;h) θ(tj|x)

}(4)

where Wq,p+1(tj, t;h) = q!eq+1,p+1

[T Tp (t)G(t;h)Tp(t)

]−1[T Tj,p(t)Gj(t;h)

]is the “equiv-

alent kernel function” (e.g., Fan and Zhang, 2000) and eq+1,p+1 is the row vector of

length p+ 1 with 1 at its (q + 1)th place and 0 elsewhere.

14

By the definition of β(t|x), we have β(t|x) =(β0(t|x), . . . , βp(t|x)

)Tand θ(q)(t|x) =

βp(t|x) q! for q = 0, . . . , p. The pth order local polynomial estimator θ(t|x) of θ(t|x)

is θ(t|x) = θ(p)(t|x). For the special case of p = 0, we get the local linear esti-

mator θL(t|x) = β0(t|x) of θ(t|x) based on (4) and the equivalent kernel function

W0,2(tj, t;h). Following (1) and (4), we substitute θ(t|x) with θ(t|x) and define the

smoothing estimator of Ft,θ(t|x)[y(t)|x] based on θ(t|x) to be

Ft,θ(t|x)

[y(t)

∣∣x] = Ft,θ(t|x)

[y(t)

∣∣x], (5)

where a common choice of θ(t|x) is the local linear estimator θL(t|x).

2.2.3 Smoothing-Later conditional CDF Estimators

Suppose that Ft,θ(t|x)

[y(t)

∣∣x] is (p+ 1) times differentiable with respect to t ∈ τ . Let

F(q)t,θ(t|x)

[y(t)

∣∣x] be the qth derivative of Ft,θ(t|x)

[y(t)

∣∣x], 1 ≤ q ≤ p, and γq(t|x) =

F(q)t,θ(t|x)

[y(t)

∣∣x]/q!. By the Taylor expansion of Ft,θ(t|x)[y(t)|x],

Ft,θ(t|x)

[y(t)

∣∣x] ≈ p∑q=0

γq(s0|x)(tj − s0)q

for t in some neighborhood of s0, we can treat the raw estimates Ftj ,θ(tj |x)

[y(tj)

∣∣x] as

the “observations” of Ftj ,θ(tj |x)

[y(tj)

∣∣x] at tj, j = 1, . . . , J , and obtain the pth local

polynomial estimators by minimizing

J∑j=1

{Ftj ,θ(tj |x)[y(tj)|x]−

p∑q=0

γq(t|x)(tJ − t)q}2

Kh(tj − t0)

with Kh(tj − t) = K[(tj − t)

/h]/h and h > 0.

Let F (t|x) = Ft,θ(t|x)

[y(t)

∣∣x], F (t|x) =(F (t1|x), . . . , F (tJ |x)

)T, γ(t) =

(γ0(t|x),

15

. . . , γp(t|x))T

, G(t;h) = diag{Kh(tj− t)

}with jth column Gj(t;h) =

(0, . . . , Kh(tj−

t), . . . , 0)T

, and Tp(t) the J × (p + 1) matrix with its jth row given by Tj,p(t) =(1, tj − t, . . . , (tj − t)p

). The local polynomial estimators βq(t|x) minimize

QG

[β(t|x)

]={F (t|x)− Tp(t)β(t|x)

}TG(t;h)

{F (t|x)− Tp(t)β(t|x)

},

and, consequently, the pth order local polynomial estimator of F(q)t,θ(t|x)

[y(t)

∣∣x] based

on Ftj ,θ(tj |x)

[y(tj)|x

], which minimizes QG

[β(t|x)

], is

F(q)t,θ(t|x)

[y(t)|x

]=

J∑j=1

{Wq,p+1(tj, t;h) Ftj ,θ(tj |x)

[y(tj)|x

]}(6)

where Wq,p+1(tj, t;h) is the “equivalent kernel function” defined in (4). By the defini-

tion of γ(t|x), we have γ(t|x) = (γ0(t|x), . . . , γp(t|x))T and F(q)t,θ(t|x)

[y(t)|x

]= γq(t|x) q!

for q = 0, . . . , p. Following (6), the smoothing-later pth order local polynomial esti-

mator of Ft,θ(t|x)[y(t)|x] is

Ft,θ(t|x)

[y(t)

∣∣x] = F(p)t,θ(t|x)

[y(t)

∣∣x]. (7)

For the special case of p = 0, the local linear estimator of Ft,θ(t|x)

[y(t)

∣∣x] based on

(7) is Ft,θ(t|x)

[y(t)

∣∣x] = γ0(t|x) with the equivalent kernel W0,2(tj, t;h).

2.3 Two-Steps estimation methods and inferences for un-

structured nonparametric models

In this approach, we estimate the empirical conditional CDF at each time points

and then apply the Nadaraya-Watson kernel smoothing method to get the smoothing

estimators of conditional distribution function on entire the time design points.

16

2.3.1 Raw estimates of unstructured nonparametric CDF

We have defined the indicator variable

I {Y (t) ≤ y(t)} =

1 if Y (t) ≤ y(t)

0 Otherwise

The different percentile values of y(t) are coming from a previous standard study.

Raw estimates of CDF by empirical method “at each time design point” is estimated

by

1nj

∑nji=1 I {Y (tij) ≤ y(tj)} ∀j.

Here i runs for subjects and j runs for time design points. These raw estimates of

conditional CDF by empirical methods show spiky behavior at different time points.

Dots of Figure 1 (c,d), Figure 2 (c,d) and Figure 3 (c,d) show this spiky patterns. In

addition, this approach suffers another major problem of nonexistence when quantiles

are at the extreme tail point. For example, if y(t) is considered as the 99th percentile,

the estimates of the above empirical CDF at different time points can be 1 and this

leads to no estimates of tail probabilities i.e., no estimate of P [Y (t) > y(t)|t].

2.3.2 Smoothing estimates of unstructured nonparametric CDF

Estimation of the conditional CDF Ft[y(t)

∣∣x] = P[Y (t) ≤ y(t)

∣∣t,X = x]

with

unstructured nonparametric models has been investigated by Wu and Tian (2013a)

using a kernel smoothing method. Let wi be a weight function which may be either

1/(nmi) or 1/N , the kernel estimator of Wu and Tian (2013a) for Ft[y(t)

∣∣x] is

Ft[y(t)

∣∣x] =

∑Jj=1

∑i∈Sj wi1[Yi(tj)≤y(t),Xi=x]Kh

(tj − t

)∑Jj=1

∑i∈Sj wiKh

(tj − t

) , (8)

17

where 1[·] is an indicator function and Kh

(tj − t

)= K

[(tj − t

)/h]

for some ker-

nel function K(·). By smoothing the indicator function 1[Yi(tj)≤y(t),Xi=x] through the

kernel weight wiKh(tj−t) for all i ∈ Sj and j = 1, . . . , J , Ft[y(t)

∣∣x] of (8) can be gen-

erally applied to conditional distributions which may not belong to the time-varying

parametric family Fθ(t). In contrast, however, the local polynomial estimators of (5)

and (7) depend on the time-varying parametric assumption of Fθ(t|x), and may lead to

biased estimates when the structural assumption of Fθ(t|x) is not satisfied. Table 14

of Appendix 1 gives the smoothing probabilities by local linear smoothing estimator

and Nadaraya-Watson kernel smoothing estimator for entire cohort.

Remark 2.1. Although it has been shown by Wu and Tian (2013a) that Ft[y(t)

∣∣x]has adequate properties when y(t) is within the interior of the support of Y (t), in

practice, Ft[y(t)

∣∣x] may have large bias and variance when y(t) is near the boundary

of the support of Y (t). Intuitively, when y(t) is near the boundary of the support of

Y (t) and the number of the observed Yi(tj) within the small neighborhood of t defined

by Kh(tj − t), is small, the values of 1[Yi(tj)≤y(t),Xi=x] are likely to be all 0 or all 1,

which may lead to Ft[y(t)

∣∣x] to be either 0 or 1. The smoothing estimators of (5)

and (7), however, compute the conditional distribution function Ft,θ(t)[y(t)|x] based

on the parametric model Fθ(t|x) and the estimators of the time-varying parameter

θ(t|x). The mean squared errors of Ft,θ(t|x)

[y(t)

∣∣x] and Ft,θ(t|x)

[y(t)

∣∣x] mainly depend

on the estimators of θ(t|x) and are less affected by the values of y(t). We compare

the estimators of (5) and (7) with Ft[y(t)

∣∣x] in the simulation study of Chapter 3.

2.4 Bandwidth Choices

The bandwidths of (5) and (7) may be selected either subjectively by examining the

plots of the estimated parameter curves or using a data-driven bandwidth selection

procedure. As demonstrated by the simulation studies in nonparametric estimation

18

with two-step local polynomial estimators, such as Fan and Zhang (2000), Wu, Tian

and Yu (2010) and Wu and Tian (2013a, 2013b), subjective bandwidth choices ob-

tained from examining the fitted curves of the estimators often produce appropriate

bandwidths in real applications.

Two cross validation approaches, the “Leave-One-Subject-Out Cross Validation”

(LSCV) and the “Leave-One-Time-Point-Out Cross Validation” (LTCV), have been

proposed by Wu and Tian (2013a, 2013b) for the selection of data-driven bandwidths

under the unstructured nonparametric models. These cross validation approaches can

be extended to the smoothing estimators (5) and (7) to provide a potential range of

suitable bandwidths. Let F(−i)t,θ(t|x)

[y(t)

∣∣x] with 1 ≤ i ≤ n be either the estimators (5)

or (7) of Ft,θ(t|x)

[y(t)

∣∣x] computed using the sample with all the observations of the

ith subject deleted, and let wi be a weight function which could be either 1/(nmi) or

1/N . The LSCV bandwidth hx,LSCV is the minimizer of the LSCV score

LSCV[y(·), x

]=

J∑j=1

∑i∈Sj

wi

{1[Yi(tj)≤y(tj),Xi=x] − F (−i)

tj ,θ(tj |x)

[y(tj)

∣∣x]}2

. (9)

For a heuristic justification of LSCV[y(·), x

], we can consider the expansion

LSCV[y(·), x

]=

J∑j=1

∑i∈Sj

wi

{1[Yi(tj)≤y(tj),Xi=x] − Ftj ,θ(tj |x)

[y(tj)

∣∣x]}2

+J∑j=1

∑i∈Sj

wi

{Ftj ,θ(tj |x)

[y(tj)

∣∣x]− F (−i)tj ,θ(tj |x)

[y(tj)

∣∣x]}2

+2J∑j=1

∑i∈Sj

wi

{{1[Yi(tj)≤y(tj),Xi=x] − Ftj ,θ(tj |x)

[y(tj)

∣∣x]}×{Ftj ,θ(tj |x)

[y(tj)

∣∣x]− F (−i)tj ,θ(tj |x)

[y(tj)

∣∣x]}}. (10)

The first term at the right-hand side of (10) does not involve the smoothing estimator,

19

hence, does not depend on the bandwidth. The expected value of the third term

at the right-hand side of (10) is zero, since the observations of the ith subject is

not included in F(−i)tj ,θ(tj |x)

[y(tj)

∣∣x]. Thus, by minimizing LSCV[y(·), x

], the LSCV

bandwidth hx,LSCV approximately minimizes the second term at the right-hand side

of (10), which is approximately the average squared error

ASE[y(·), x

]=

J∑j=1

∑i∈Sj

wi

{Ftj ,θ(tj |x)

[y(tj)

∣∣x]− Ftj ,θ(tj |x)

[y(tj)

∣∣x]}2

. (11)

A potential drawback for the LSCV approach is that the minimization of the LSCV

score (9) is often computationally intensive, particularly when the number of subjects

n is large, which hampers its application potential in real applications. Thus, it is

usually more practical to consider the alternative of k-fold LSCV, which is computed

by deleting the observations of k > 1 subjects in the computation of F(−i)tj ,θ(tj |x)

[y(tj)

∣∣x].Instead of deleting the subjects one at a time, the LTCV procedure deletes the

observations at the time design points t = {t1, . . . , tJ}. When J is smaller than n,

the LTCV procedure may be computationally simpler than the LSCV procedure. Let

F(−j)t,θ(t|x)

[y(t)

∣∣x] with 1 ≤ j ≤ J be either the estimators (5) or (7) of Ft,θ(t|x)

[y(t)

∣∣x]computed using the sample with all the observations at the time point tj deleted.

Then the value of F(−j)t,θ(t|x)

[y(t)

∣∣x] at time point tj is F(−j)tj ,θ(tj |x)

[y(tj)

∣∣x], and the LTCV

score for Ft,θ(t|x)

[y(t)

∣∣x] is

LTCV[y(·), x

]=

J∑j=1

∑i∈Sj

wi

{1[Yi(tj)≤y(tj),Xi=x] − F (−j)

tj ,θ(tj |x)

[y(tj)

∣∣x]}2

. (12)

The LTCV bandwidth hx,LTCV is the minimizer of LTCV[y(·), x

]. Similar to the

k-fold alternative for the LSCV, the k-fold LTCV bandwidths, which are obtained by

deleting k > 1 time points in t each time, may also be used in practical applications to

20

reduce the computing complexity when J is large. Table 15 in the preliminary analysis

of Appendix 1 gives some values of bandwidth together with cross validation scores

in the parenthesis obtained by AIC method for entire cohort, Caucasian cohort and

African American cohort. Table 16 in the preliminary analysis of Appendix 1 gives

some values of bandwidth together with cross validation scores in the parenthesis

obtained by Least Square method for entire cohort, Caucasian cohort and African

American cohort.

Remark 2.2. Since the observations at different time points are potentially cor-

related, the heuristic justification for LSCV[y(·), x

]based on (10) and (11) does not

apply to LTCV[y(·), x

], and the effects of the intra-subject correlations on the ap-

propriateness of the LTCV approach have not been systematically investigated. But

the simulation results of Wu and Tian (2013a, 2013b) have shown that the LTCV

approach may lead to appropriate bandwidths under the unstructured nonparametric

models. Theoretical and practical properties of both the LSCV and LTCV bandwidths

warrant substantial further investigation. In practice, the LSCV and LTCV band-

widths may only be used to provide a rough range of the appropriate bandwidths. The

bandwidths for a actual dataset may be selected by evaluating the overall information

from LSCV[y(·), x

], LTCV

[y(·), x

], the scientific interpretations and the smooth-

ness of the estimates.

Remark 2.3. Bandwidth is known as the smoothing parameter and kernels are

known as the weighting function. The bandwidth parameter controls the smoothness of

the probability estimation and also determines the tradeoff between the bias and vari-

ance of probability estimation. In general, the smaller the bandwidth is the smaller

the bias and the larger the variance. Intuitively, the smoothing estimator is just the

summation of many bumps, each one of them centered at an observation tj. The

kernel function K determines the shape of the bumps and the bandwidth parameter

21

h determines the width of the bumps. For local polynomial smoothing estimator or

smoothing estimator for structural nonparametric model, Epanechnikov kernel is used

as the weighting function. If Gaussian kernel is used instead, the same results can be

achieved as kernel does not change the shape of the smoothing estimates. For unstruc-

tured nonparametric model, Nadaraya-Watson estimator is being used with Epanech-

nikov kernel as the weighting function to get the smoothing estimate. For selecting

appropriate size of the bandwidth, we use subjective choices obtained by examining the

plots and data driven choices by examining the CV scores from the above two methods.

A choice of higher bandwidth value yields over smoothing results and vice-versa. In

our NGHS BP data, bandwidth size around 2.5 is the best representative.

2.5 Bootstrap Pointwise Confidence Intervals

Since different asymptotic distributions of the smoothing estimators may be obtained

depending on the longitudinal designs and whether and how fast mi, i = 1, . . . , n, con-

verge to infinity, statistical inferences based on the asymptotic approximations may

not be an appropriate option in practice, and a widely used inference approach for non-

parametric longitudinal analysis is through the “re-sampling-subject” bootstrap sug-

gested in Hoover et al. (1998). Under the current context, we can obtain a pointwise

bootstrap confidence interval for Ft,θ(t|x)

[y(t)

∣∣x] by first obtaining B bootstrap sam-

ples through re-sampling the subjects of the longitudinal sample with replacement,

and then computing B two-step smoothing estimators{F bt,θ(t|x)

[y(t)

∣∣x] : b = 1, . . . B}

using (5) or (7) and each of the bootstrap samples. The lower and upper bound-

aries of the [100× (1−α)]th empirical quantile bootstrap pointwise confidence inter-

val of Ft,θ(t|x)

[y(t)

∣∣x] are the empirical lower and upper [100 × (α/2)]th percentiles

based on the bootstrap estimators{F bt,θ(t|x)

[y(t)

∣∣x] : b = 1, . . . B}

. Alternatively,

if SD{F bt,θ(t|x)

[y(t)

∣∣x]} is the empirical standard deviation of{F bt,θ(t|x)

[y(t)

∣∣x] : b =

22

1, . . . B}

, the [100×(1−α)]th normally approximated bootstrap pointwise confidence

interval of Ft,θ(t|x)

[y(t)

∣∣x] is

Ft,θ(t|x)

[y(t)

∣∣x]± Z1−α/2 × SD{F bt,θ(t|x)

[y(t)

∣∣x]},where Z1−α/2 is the [100×(1−α/2)]th percentile of the standard normal distribution.

23

3 Chapter Three

Application and Simulation

3.1 Application to NGHS BP data

We apply our method to the NGHS BP data to estimate the conditional distribu-

tion functions of SBP, a main cardiovascular risk outcome, of the Caucasian and

African American girls and their trends over age between 9 and 19 years old. The

NGHS is a multicenter population based observational study designed to evaluate

the prevalence and incidence of cardiovascular risk factors in Caucasian and African-

American girls during childhood and adolescence. The study involves 1166 Caucasian

girls and 1213 African-American girls who had up to 10 annual follow-up visits be-

tween 9 and 19 years old and whose numbers of follow-up visits have median 9, mean

8.2 and standard deviation 2. Among all the important risk factors that have been

studied by the NGHS investigators, childhood systolic blood pressure (SBP) is an

important one. Detailed information about NGHS data can be found at the National

Heart, Lung and Blood Institute Biologic Specimen and Data Repository website

(https://biolincc.nhlbi.nih.gov).

Following the practical definition of age in pediatric studies (e.g., Obarzanek et

al., 2010), we round up the observed age to one decimal point with J = 100 distinct

time design points {t1, t2, . . . , t100} = {9.1, 9.2, . . . , 19.0}. Since our objective is to

estimate the conditional distribution functions of SBP for age t within the interior of

the observed age range, we omit the boundary age of t = 9.0 years in this analysis.

According to this time design points, the entire NGHS data has been partitioned into

24

sub-samples at 100 time design points. The first subsample corresponding to age 9.1

includes all girls that are in the age interval [9, 9.1) and so on for the rest of the

subsamples. The last subsample with age 19 includes all girls that are in the age

interval [18.9, 19). In our preliminary analysis based on the goodness-of-fit tests of

normality for the SBP distributions at the time design points{t1, t2, . . . , t100

}, we

observed that the conditional distributions of the natural logarithmic transformed

SBP given age and race can be reasonably approximated by normal distributions,

while the conditional distributions of the actual SBP given age and race are not

approximately normal. Thus, for a given 1 ≤ j ≤ J = 100 and i ∈ Sj, we denote by

Yi(tj) the natural logarithmic transformed SBP observation of the ith girl at age tj.

The time-invariant categorical covariate Xi is race, which is defined by Xi = 0 if the

ith girl is Caucasian, and Xi = 1 if she is African-American. The random variables

for the natural logarithmic transformed SBP at age t and race are Y (t) and X,

respectively. Given t and X = x, we consider the family of log-normal distributions

of SBP for this population of girls, that is, the conditional CDF of Y (t) is given by

the normal distributions Ft,θ(t|x)

[y(t)

∣∣x] of (2).

We have used all five goodness-of-fit tests of normality for SBP. These are Shapiro-

Wilk test, the Kolmogorov-Smirnov test, the Anderson-Darling test, the Cramer-

Von-Mises test and the Chi-Square test. By looking at the P-values as well as the

visual inspections of the QQ plots in the preliminary analysis part of Table 8 and

Figure 11 to Figure 19, of the Appendix-1, we have seen that 94 of 100 subsamples

follows normal distribution by Shapiro-Wilk test, 90 of 100 subsamples follows normal

distribution by Anderson-Darling test, all 100 subsamples follows normal distribution

by Kolmogorov-Smirnov test, 86 of 100 subsamples follows normal distribution by

Cramer-Von-Mises test and 72 of 100 subsamples follows normal distribution by Chi-

Square tests. Hence, according to the above goodness-of-fit tests, we can conclude

25

that almost all subsamples are approximately normal. If we apply the above normality

test on SBP when the data are extracted from the original data sets in such a way

that individuals with more than median height are only considered, we see that only

7 of the 100 subsamples show nonnormality by the Shapiro-Wilk test, the Anderson-

Darling test and the Cramer-Von-Mises test. Kolmogorov results remain same as

before. Chisquare test shows that only 13 of the 100 subsamples do not follow normal

distribution. For the goodness-of-fit tests, we have used 5% significance level. It is

noteworthy to mention that before splitting the data, SBP on entire time design points

is not normal whether SBP is log scaled or not. But, if we conduct a lognormaility

test, we have seen that SBP follows the lognormal distribution. P-values of these five

normality tests for 100 subsamples are given in Table 8. QQ plots of log scaled SBP

for these 100 subsamples are also given from Figure 11 to Figure 19 in the preliminary

analysis part of Appendix 1. Estimated raw probabilities (ERP) by the estimators

from the both structural nonparametric model (Gaussian Model) and unstructured

nonparametric model (Empirical approach) for entire cohort as well as for Caucasian

girls and for African American girls are given in Table 9, Table 10 and Table 11. Each

of these Tables gives the probability that the SBP exceeds the 90th percentile, the

95th percentile and the 99th percentile values of SBP determined by the gender and

age specific blood pressure quantiles (e.g., NHBPEP, 2004). Expected SBP (µ) for

girls of age t years and height h inches is given by

µ = α +∑4

r=1 βr(t− 10)r +∑4

s=1 τs(Zht)s

where α, βr and τs are given in the Table B1 (e.g., NHBPEP, 2004). When median

height is considered Zht equals zero. To compute yq(t) for q = .90, q = .95 and q = .99

i.e.,the 90th percentile, 95th percentile and 99th percentile of SBP are computed by

1.28 SD, 1.645 SD and 2.326 SD over µ. SD can be found in the same table. Table

13 in the preliminary analysis of the appendix 1 gives the age specific log scaled

26

percentile values of SBP of girls with median height.

A columnwise comparison between estimated raw probabilities by SNM and UNM

can be done from the Table 9, Table 10 and Table 11. From the Table 9, we see

that unstructured nonparametric model is unable to estimate tail probabilities at

5, 17 and 59 different time points out of 100 time design points when SBP exceeds

the 90thpercentiles, the 95th percentiles and the 99th percentiles respectively whereas

structural nonparametric models do not suffer these types of estimation problem.

Table 10 and Table 11 give us worse results than Table 9 when comparison is made for

estimated probabilities by SNM and UNM. Table 10 and Table 11 represent estimated

raw probabilities for Caucasian girls and African American girls. In general, if the

observations in each data set are small, we might end up with no raw estimate of

the tail probabilities by the existing unstructured nonparametric model. Without

raw estimates, we cannot go any further for smoothing estimates. Table 8 to Table

18 and Figure 11 to Figure 20 are presented in the preliminary results part of the

Appendix 1.

Applying the two-step local linear estimators of (5) and (7) to the observed

data{Yi(tj), Xi, tj; 1 ≤ j ≤ J, 1 ≤ i ≤ n

}, we compute the smoothing estimators

Ft,θ(t|x)

[yq(t)

∣∣x] and Ft,θ(t|x)

[yq(t)

∣∣x] of Ft,θ(t|x)

[yq(t)

∣∣x], where yq(t) is the natural

logarithmic transformed (100 × q)th percentiles of SBP for girls with median height

at age t (NHBPEP, 2004). By the monotonicity of the transformation, Ft,θ(t|x)

[yq(t)

∣∣x]is also the conditional probability of SBP at or below the (100× q)th SBP percentile

given in NHBPEP (2004) for girls with age t and race x. In addition to the smoothing

estimators based on the time-varying normal distributions Ft,θ(t|x)

[yq(t)

∣∣x] in (2), we

also compute the kenrel estimator Ft[yq(t)

∣∣x] of the unstructured conditional CDF

Ft[yq(t)

∣∣x] of Y (t) based on (8) with the wi = 1/(nmi) and 1/N weights. Smoothing

estimators of these conditional probabilities are also computed for the entire cohort

27

ignoring the race covariate. These smoothing probabilities are given in Table 14 of

Appendix 1.

Figure 1 shows the local linear smoothing estimates 1 − Ft,θ(t)[yq(t)

∣∣x] based on

(7) (q = 0.90 in Figure 1a; q = 0.95 in Figure 1b), the unstructured kernel estimators

1− Ft[yq(t)

∣∣x] based on (8) with wi = 1/N (q = 0.90 in Figure 1c; q = 0.95 in Figure

1d), and their corresponding empirical quantile bootstrap pointwise 95% confidence

intervals based on B = 1000 bootstrap replications for Caucasian girls (x = 0). The

Epanechnikov kernel and the bandwidth h = 2.5 were used for both the local linear

smoothing estimators and the unstructured kernel estimators. The bandwidth of

h = 2.5 was chosen by examining the LSCV and LTCV scores and the smoothness of

the fitted plots.

Figure 2 shows the local linear smoothing estimates 1− Ft,θ(t|x)

[yq(t)

∣∣x] (2a, 2b),

the unstructured kernel estimators 1 − Ft[yq(t)

∣∣x] with wi = 1/N (2c, 2d), and

their corresponding empirical quantile bootstrap pointwise 95% confidence intervals

based on B = 1000 bootstrap replications for African-American girls (x = 1) with

q = 0.90 and 0.95. Similar to Figure 1, the estimators of Figure 2 are based on the

Epanechnikov kernel and the bandwidth h = 2.5.

Figure 3 shows the local linear smoothing estimates 1− Ft,θ(t|x)

[yq(t)

∣∣x] (2a, 2b),

the unstructured kernel estimators 1− Ft[yq(t)

∣∣x] with wi = 1/N (2c, 2d), and their

corresponding empirical quantile bootstrap pointwise 95% confidence intervals based

on B = 1000 bootstrap replications for entire cohort with q = 0.90 and 0.95. Similar

to Figure 1 and Figure 2, the estimators of Figure 3 are based on the Epanechnikov

kernel and the bandwidth h = 2.5.

28

Both the smoothing estimates in Figure 1 and Figure 2 exhibit similar trends

over t. The estimates 1− Ft,θ(t|x)

[yq(t)

∣∣x] are slightly lower than the estimates based

on the unstructured approach 1 − Ft[yq(t)

∣∣x]. The 95% confidence intervals of 1 −

Ft,θ(t|x)

[yq(t)

∣∣x] are narrower than that of 1− Ft[yq(t)

∣∣x]. These narrower confidence

band implies that SNM gives better results than the UNM in smoothing estimation of

conditional distribution function. A comparison between the corresponding estimates

in Figure 1 and Figure 2 shows that the African-American girls are more likely to

have higher SBP values than the Caucasian girls. In otherwords, African American

girl has higher probability to develop hypertension than Caucasian girl.

In each of the above Figures, dots represent raw estimates, solid middle line repre-

sents smoothing curve and dotted lines represents 95% pointwise bootstrap confidence

band. We have also noticed that in UNM, many raw estimates of the tail probabilities

are zero. The scenarios is worse if we consider yq(t) as the 99th percentile. Smoothing

later approach has been adopted in Figure 1, Figure 2 and Figure 3. We also com-

puted the local linear estimators of 1−Ft,θ(t|x)

[yq(t)

∣∣x] based on 1−Ft,θ(t|x)

[yq(t)

∣∣x],the smoothing-early approach of (5). Figure 4 represents the local linear smooth-

ing estimator of µ(t) and σ(t) for whole cohort. Table 12 gives the local polynomial

smoothing estimates of µ(t) and σ(t) for 100 subsamples. Figure 5 represents local lin-

ear smoothing estimators of the conditional probability by smoothing early approach

for whole cohort. By the comparison of Figure 3 and Figure 5, we can say that we

will end up with the same results whatever be the smoothing approaches (Smoothing-

Early and Smoothing-Later). This means the numerical results of 1−Ft,θ(t|x)

[yq(t)

∣∣x]is similar to 1− Ft,θ(t|x)

[yq(t)

∣∣x]. To avoid redundancy, smoothing-early approach for

the Caucasian girls and African-American girls have been ignored.

29

10 12 14 16 18

0.00

0.05

0.10

0.15

a. Age vs P(Y(t) > y0.9(t))at Median height

Ages of CC girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10

b. Age vs P(Y(t) > y0.95(t))at Median height

Ages of CC girlsP

(Y>

y 0.9

5(t))

10 12 14 16 18

0.00

0.05

0.10

0.15

c. Age vs P(Y(t) > y0.9(t))at Median height

Ages of CC girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10

d. Age vs P(Y(t) > y0.95(t))at Median height

Ages of CC girls

P(Y

>y 0

.95(

t))

Figure 1: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th and 95thpopulation SBP percentiles for Caucasian girls (CC) between 9.1 and 19.0 years old.(1a) and (1b): Estimators based on the time-varying log-normal models. (1c)-(1d):Estimators based on the unstructured kernel estimators.

30

10 12 14 16 18

0.00

0.05

0.10

0.15


Ages of AA girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10


Ages of AA girlsP

(Y>

y 0.9

5(t))

10 12 14 16 18

0.00

0.05

0.10

0.15


Ages of AA girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10


Ages of AA girls

P(Y

>y 0

.95(

t))

Figure 2: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th and 95thpopulation SBP percentiles for African-American (AA) girls between 9.1 and 19.0years old. (1a) and (1b): Estimators based on the time-varying log-normal models.(1c)-(1d): Estimators based on the unstructured kernel estimators.

31

10 12 14 16 18

0.00

0.05

0.10

0.15


Ages of all girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10


Ages of all girlsP

(Y>

y 0.9

5(t))

10 12 14 16 18

0.00

0.05

0.10

0.15


Ages of all girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10


Ages of all girls

P(Y

>y 0

.95(

t))

Figure 3: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th and 95thpopulation SBP percentiles for all girls between 9.1 and 19.0 years old. (1a) and (1b):Estimators based on the time-varying log-normal models. (1c)-(1d): Estimators basedon the unstructured kernel estimators.

32

10 12 14 16 18

4.58

4.60

4.62

4.64

4.66

4.68

4.70

4.72

a. Age vs μ(t)

Ages of All girls

μ(t)

10 12 14 16 18

0.06

0.07

0.08

0.09

0.10

a. Age vs σ(t)

Ages of All girls

σ(t)

Figure 4: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific mean and standard deviation of SBP for All girlsbetween 9.1 and 19.0 years old. Estimators based on the time-varying log-normalmodels.

33

10 12 14 16 18

0.00

0.05

0.10

0.15

age

prob

90th

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10

age

prob

95th

Figure 5: Local linear smoothing estimators (solid curves), and pointwise bootstrap95% confidence intervals (dashed curves, B = 1000 bootstrap replications) of theage specific probabilities of SBP greater than the 90th and 95th population SBPpercentiles for all girls between 9.1 and 19.0 years old. Estimators based on thetime-varying Gaussian models with smoothing early approach.

34

3.2 Simulation Results

Following the data structure of Section 1.2, we generate in each sample n=1000

subjects with 10 visits per subject. The jth visit time, tij, of the ith subject is

generated from the uniform distribution U(j − 1, j) for j = 1, . . . , 10. Given each tij,

we generate the observations Yi(tij) for Y (t) from the following simulation design:

Yij = 21.5 + .7(tij − 5)− 0.05(tij − 5)2 + a0i + εij,

a0i ∼ N(0, (2.5)2

), εij ∼ N

(0, (0.5)2

),

(13)

where εij are independent for all (i, j), and a0i and εij are independent. From (13),

E[Yi(tij)

∣∣tij] = 21.5 + 0.7(tij − 5

)− 0.05

(tij − 5

)2and V ar

[Yi(tij)

∣∣tij] = 6.5. For

each simulated sample{(Yi(tij), tij

): i = 1, . . . , 1000; j = 1, . . . , 10

}, we round up

the time points so that each tij belongs to one of the equally spaced time design

points {t0, . . . , t100} = {0, 0.1, 0.2, . . . , 10}. Let yq(t) be the (100× q)th percentile of

Y (t). It follows that P [Y (t) > yq(t)] = q. More Specifically, let y.90(t) and y.95(t) are

the 90th and 95th quantiles of Y(t). Since Y(t) follows normal distribution, hence

P [Y (t) > y.90(t)] = .10 and P [Y (t) > y.95(t)] = .05. Our theoretical 90th and 95th

quantiles for 10 different time points from 101 different time points for the above

model are given in Table 1. We repeatedly generate 1000 simulation samples. Within

Table 1: Theoretical 90th and 95th quantiles for 10 different time points out of 101different time points from the model of our simulation design.

Time(t) 1 2 3 4 5 6 7 8 9 10Y.90(t) 21.2 22.2 23.2 24 24.8 25.4 26 26.4 26.8 27Y.95(t) 22.1 23.1 24.1 24.9 25.7 26.3 26.9 27.3 27.7 27.9

each simulation sample, we compute the smoothing estimates of P [Y (t) > yq(t)] for

q = 0.90 and 0.95 by the local linear estimators (5) and (7) based on the time-varying

35

Gaussian model (2) and the unstructured kernel estimator (8) using the Epanechnikov

kernel and the LTCV bandwidths. The bootstrap pointwise 95% confidence intervals

for the smoothing estimators are constructed using empirical quantiles of B = 1000

bootstrap samples.

Let Pq(t) be a smoothing estimator of P [Y (t) > yq(t)] = q, which could be either

the local polynomial estimators, e.g., (5) or (7), or the unstructured kernel estimator

of (8). We measure the accuracy of Pq(t) by the average of the bias∑M

m=1

[Pq(t) −

q]/M , the empirical mean squared error MSE

[Pq(t)

]=∑M

m=1

[Pq(t) − q

]2/M or

the square root of MSE[Pq(t)

](root-MSE), where M = 1000 is the total number

of simulated samples. We assess the accuracy of a pointwise confidence interval of

Pq(t) by the empirical coverage probability of the confidence interval covering the

true value P [Y (t) > yq(t)] = q.

Table 2 shows the averages of the estimates, averages of the biases, the root-MSEs,

and the empirical coverage probabilities of the empirical quantile bootstrap pointwise

95% confidence intervals based on B = 1000 bootstrap replications for the estimation

of P [Y (t) > y.90(t)] = 0.10 at t = 1.0, 2.0, . . . , 10.0. For all the 10 time points, the

smoothing-later local linear estimators based on the time-varying Gaussian model

have smaller root-MSEs than the kernel estimators based on the unstructured non-

parametric model. Comparing the empirical coverage probabilities of the bootstrap

pointwise 95% confidence intervals, we observe that the smoothing estimators based

on the time-varying Gaussian model have higher coverage probabilities than the un-

structured kernel estimators at most of the time points.

When q increases to 0.95, Table 3 shows the averages of the biases, the root-

MSEs, and the empirical coverage probabilities of the empirical quantile bootstrap

pointwise 95% confidence intervals based on B = 1000 bootstrap replications for the

estimation of P [Y (t) > y.95(t)] = 0.05 at t = 1.0, 2.0, . . . , 10.0. Again, the smoothing

36

estimators based on the time-varying Gaussian model have smaller root-MSEs than

the unstructured kernel estimators at all 10 time points.

From Figure 6, if we compare (a) with (c) and (b) with (d), we see that smoothing

estimators from the unstructured nonparametric models gives wider confidence bands

than the smoothing estimators from the structural nonparametric models. In Figure

6, (a and b) are estimators from time varying Gaussian models and (c and d) are

estimators from unstructured kernel method. Root MSE of SNM for each time point

is smaller than the root MSE of UNM and consequently relative root MSE at each

time point is smaller than 1, which means that SNM is better than UNM. In Table

2 and Table 3, we have presented the simulation results for only 10 different integer

time points. Simulation results between integer time points are similar to the ones

presented in Table 2 and Table 3.

The results of Table 2 and Table 3 suggest that, when the time-varying paramet-

ric model is appropriate, the structural two-step smoothing estimators have smaller

mean squared errors than the unstructured smoothing estimators under a practical

longitudinal sample with moderate sample sizes. However, these results may not

hold if the time-varying parametric model is not an appropriate approximation to the

time-varying distribution functions of the longitudinal variable being considered.

37

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

a. Age vs P(Y > y0.9(t))

Age

P(Y

>y 0

.9(t)

)

0 2 4 6 8 10

0.00

0.05

0.10

0.15

b. Age vs P(Y > y(c))

Age

P(Y

>y(

c))

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

c. Age vs P(Y > y0.9(t))

Age

P(Y

>y 0

.9(t)

)

0 2 4 6 8 10

0.00

0.05

0.10

0.15

d. Age vs P(Y > y(c))

Age

P(Y

>y(

c))

Figure 6: Black solid line is local polynomial (a,b) and Nadaraya-Watson (c,d)smoothing estimators with Epanechnikov kernel for SNM and UNM. Dotted linesrepresent the 95% pointwise bootstrap confidence band for 1000 simulated samples.

38

Tab

le2:

Ave

rage

sof

esti

mat

es,

aver

ages

ofth

ebia

ses,

the

squar

ero

otof

the

mea

nsq

uar

eder

rors

,th

eem

pir

ical

cove

rage

pro

bab

ilit

ies

ofth

eem

pir

ical

quan

tile

boot

stra

pp

ointw

ise

95%

confiden

cein

terv

als

(B=

1000

boot

stra

pre

plica

tion

s)fo

rth

ees

tim

atio

nofP

[Y(t

)>y .

90(t

)]=

0.10

and

rela

tive

Root

MSE

att

=1.

0,2.

0,...,

10.0

over

1000

sim

ula

ted

sam

ple

.T

he

smoot

hin

g-la

ter

loca

llinea

res

tim

ator

sbas

edon

the

tim

e-va

ryin

gG

auss

ian

model

are

show

nin

the

left

pan

el.

The

kern

eles

tim

ator

sbas

edon

the

unst

ruct

ure

dnon

par

amet

ric

model

are

show

nin

the

righ

tpan

el.

The

Epan

echnik

ovke

rnel

and

the

LT

CV

ban

dw

idthh

=2.

5ar

euse

dfo

ral

lth

esm

oot

hin

ges

tim

ator

s.

Str

uct

ura

lN

onpar

amet

ric

Model

Unst

ruct

ure

dN

onpar

amet

ric

Model

Tim

eE

stim

ate

Ave

.Bia

s√MSE

Cov

erag

eE

stim

ate

Ave

.Bia

s√MSE

Cov

erag

eR

elat

ive√MSE

10.

0998

-0.0

002

0.00

920.

935

0.09

98-0

.000

20.

0111

0.93

00.

8288

20.

1005

0.00

050

0.00

950.

935

0.10

040.

0004

0.01

160.

940

0.81

903

0.10

030.

0003

00.

0091

0.91

50.

1003

0.00

030.

0113

0.89

00.

8053

40.

1000

0.00

000

0.00

930.

935

0.10

000.

0000

0.01

130.

925

0.82

305

0.09

99-0

.000

10.

0092

0.94

0.10

000.

0000

0.01

150.

945

0.80

006

0.10

040.

0004

00.

0092

0.92

0.10

060.

0006

0.01

120.

920

0.82

147

0.10

010.

0001

00.

0093

0.94

50.

1000

0.00

000.

0117

0.90

50.

7949

80.

1002

0.00

020

0.00

900.

940.

1002

0.00

020.

0111

0.92

50.

8108

90.

1000

0.00

000

0.00

910.

915

0.09

99-0

.000

10.

0113

0.92

50.

8053

100.

1002

0.00

020

0.01

100.

925

0.10

030.

0003

0.01

400.

915

0.78

57

39

Tab

le3:

Ave

rage

sof

esti

mat

es,

aver

ages

ofth

ebia

ses,

the

squar

ero

otof

the

mea

nsq

uar

eder

rors

,th

eem

pir

ical

cove

rage

pro

bab

ilit

ies

ofth

eem

pir

ical

quan

tile

boot

stra

pp

ointw

ise

95%

confiden

cein

terv

als

(B=

1000

boot

stra

pre

plica

tion

s)fo

rth

ees

tim

atio

nofP

[Y(t

)>y .

90(t

)]=

0.05

and

rela

tive

Root

MSE

att

=1.

0,2.

0,...,

10.0

over

1000

sim

ula

ted

sam

ple

.T

he

smoot

hin

g-la

ter

loca

llinea

res

tim

ator

sbas

edon

the

tim

e-va

ryin

gG

auss

ian

model

are

show

nin

the

left

pan

el.

The

kern

eles

tim

ator

sbas

edon

the

unst

ruct

ure

dnon

par

amet

ric

model

are

show

nin

the

righ

tpan

el.

The

Epan

echnik

ovke

rnel

and

the

LT

CV

ban

dw

idthh

=2.

5ar

euse

dfo

ral

lth

esm

oot

hin

ges

tim

ator

s.

Str

uct

ura

lN

onpar

amet

ric

Model

Unst

ruct

ure

dN

onpar

amet

ric

Model

Tim

eE

stim

ate

Ave

.Bia

s√MSE

Cov

erag

eE

stim

ate

Ave

.Bia

s√MSE

Cov

erag

eR

elat

ive√MSE

10.

0499

-0.0

001

0.00

610.

915

0.04

99-0

.000

10.

0081

0.91

50.

7531

20.

0504

0.00

040.

0063

0.93

50.

0504

0.00

040.

0085

0.93

00.

7412

30.

0502

0.00

020.

0061

0.90

00.

0504

0.00

040.

0081

0.88

50.

7531

40.

0500

0.00

000.

0061

0.94

00.

0497

-0.0

003

0.00

840.

920

0.72

625

0.05

00-0

.000

0.00

620.

925

0.04

98-0

.000

20.

0084

0.93

00.

7381

60.

0505

0.00

020.

0061

0.91

50.

0503

0.00

030.

0081

0.88

50.

7531

70.

0501

0.00

010.

0062

0.94

50.

0499

-0.0

001

0.00

820.

905

0.75

618

0.05

020.

0002

0.00

600.

960

0.05

020.

0002

0.00

800.

925

0.75

009

0.05

000.

0000

0.00

600.

910

0.05

010.

0001

0.00

830.

915

0.72

2910

0.05

020.

0002

0.00

730.

935

0.05

000.

0000

0.01

020.

915

0.71

57

40

4 Chapter Four

Asymptotic Results

We establish in this section the asymptotic bias, variance and mean squared errors

(MSE) of the smoothing-later local polynomial estimator F(q)t,θ(t|x)

[y(t)

∣∣x] of (6). Be-

cause the smoothing-early estimator Ft,θ(t|x)

[y(t)

∣∣x] of (5) is a function of the two-step

local polynomial estimator θ(t|x) and the asymptotic properties of θ(t|x) have been

established in Fan and Zhang (2000), the asymptotic properties of θ(t|x) are not pre-

sented here in order to avoid redundancy. The asymptotic bias, variance and MSE

of Ft,θ(t|x)

[y(t)

∣∣x] can be derived by applying the delta method to the asymptotic

results of θ(t|x) (e.g., van der Vaart, 1998, Ch 3). Asymptotic distribution of θ(t|x)

and Ft,θ(t|x)

[y(t)

∣∣x] have been derived under the Gaussian model assumption.

4.1 Asymptotic Properties of the Raw Estimators

Following Section 3.1, θ(tj|x) at each of the time design point tj ∈ t is estimated

by the MLE θ(tj|x), and the raw estimator of Ftj ,θ(tj |x)

[y(tj)

∣∣x] is Ftj ,θ(tj |x)

[y(tj)

∣∣x].Suppose that the classical regularity conditions of the MLEs, i.e., the conditions of

Theorem 5.41 of van der Vaart (1998), are satisfied. Then, for all tj ∈ t, n1/2j

[θ(tj|x)−

θ(tj|x)]

has asymptotically the N(0, I−1(θ(tj|x)

)distribution, where I

[θ(tj|x)

]is

the Fisher information matrix at θ(tj|x). It follows that θ(tj|x) is asymptotically

unbiased for θ(tj|x), i.e., E[θ(tj|x)

]= θ(tj|x) and the asymptotic variance of θ(tj|x)

is n−1j I−1

[θ(tj|x)

].

At different time points tj 6= tk, θ(tj)

and θ(tk) are possibly correlated, and the

41

covariance Cov[θ(tj|x), θ(tk|x)

]may depend on the design and unknown correlation

structure of the longitudinal sample. If Cov[θ(tj|x), θ(tk|x)

]has the convergence rate

r(nj, nk, njk

), which depends on the numbers of subjects observed at tj and tk, the

asymptotic expression of Cov[θ(tj|x), θ(tk|x)

]can be written as

limn→∞

r(nj, nk, njk

)Cov

[θ(tj|x), θ(tk|x)

]= ρθ

(tj, tk|x

), (14)

for some limiting function ρθ(tj, tk|x

). Since the correlation structure of the longi-

tudinal sample is unknown, the exact expression of ρθ(tj, tk|x

)is unknown, while it

is known that ρθ(tj, tk|x

)is bounded for all (tj, tk) and may depend on the model

Fθ(t|x), the expression of θ(t|x) at t = tj and tk, and the distance tj − tk = djk.

By the delta method and Theorem 5.41 of van der Vaart (1998), it follows the

asymptotic properties of θ(tj|x) that, as n→∞,

E{Ftj ,θ(tj |x)

[y(tj)

∣∣x]}− Ftj ,θ(tj |x)

[y(tj)

∣∣x] = o(n−1/2j

)njV ar

{Ftj ,θ(tj |x)

[y(tj)

∣∣x]}→ F ′tj ,θ(tj |x)

[y(tj)

∣∣x]T I−1[θ(tj|x)

]F ′tj ,θ(tj |x)

[y(tj)

∣∣x]r(nj, nk, njk

)Cov

{Ftj ,θ(tj |x)

[y(tj)

∣∣x], Ftk,θ(tk|x)

[y(tk)

∣∣x]}→ ρF(tj, tk|x

), j 6= k,

(15)

and n1/2j

{Ftj ,θ(tj |x)

[y(tj)

∣∣x] − Ftj ,θ(tj |x)

[y(tj)

∣∣x]} has asymptotically the normal dis-

tribution with mean 0 and variance F ′tj ,θ(tj |x)

[y(tj)

∣∣x]T I−1[θ(tj|x)

]F ′tj ,θ(tj |x)

[y(tj)

∣∣x],where F ′tj ,θ(tj |x)

[y(tj)

∣∣x] is the column vector of partial derivatives of Ftj ,θ(tj |x)

[y(tj)

∣∣x]with respect to θ(tj|x), and the bounded limiting covariance function ρF

(tj, tk|x

)de-

pend on the unknown covariance ρθ(tj, tk|x

). When θ(tj) represents the parameters

of a Gaussion model, we have θ(tj) = (µ(tj), σ2(tj))T . Asymptotic distributions of

µ(tj) and σ2(tj) are respectively given as

√n(tj)(µ(tj)− µ(tj)) ∼ N(0, σ2(tj))√

n(tj)(σ2(tj)− σ2(tj)) ∼√n(tj)(S

2(tj)− σ2(tj)) ∼ N(0, 2σ4(tj))

42

Where S2(tj) =∑

(Yi(tj)−Y (tj))2

n(tj)−1

If n is large, σ2(tj) =∑

(Yi(tj)−Y (tj))2

n(tj)and S2(tj) =

∑(Yi(tj)−Y (tj))

2

n(tj)−1are equivalent.

When Ftj ,θ(tj |x)

[y(tj)

∣∣x] is a normal CDF, the asymptotic distribution of its esti-

mator is as follows: After plugging the MLEs of µ(tj) and σ2(tj) on Ftj ,θ(tj)[y(tj)|tj]

and doing some algebraic manipulation, we have

Ftj ,θ(tj)[y(tj)|x] = Ftj ,θ(tj)[y(tj)|x]

=

∫ y(tj)

−∞

exp

[−(y(tj)−y(tj))

2

2(y(tj)2−y(tj)2)

]√

2π

√y(tj)

2 − y(tj)2dy(tj)

= h[y(tj), y2(tj)].

Let α1(tj) = E[y(tj)] and α2(tj) = E[y2(tj)]. By the multivariate delta method, we

can show that,

√n(h[y(tj), y2(tj)]− h[α1(tj), α2(tj)]) ∼ N(0, τ 2)

where, τ 2 = σ11δ2hδα2

1+ σ22

δ2hδα2

2+ 2σ12

δhδα1

δhδα2

,

σ11 = V ar[y(tj)] =σ2(tj)

n,

σ22 = V ar[y2(tj)] =4µ2(tj)σ

2(tj)+2σ4(tj)

n,

σ12 = Cov[y(tj), y2(tj)] = −µ(tj)[µ2(tj)+σ

2(tj)]

n

y2(tj) =∑y2(tj)

n.

First and second order derivatives of h with respect to α′s are straightforward com-

putation (Theorem 8.16, Lehmann and Casella, 1998).

4.2 Asymptotic Properties of the Smoothing Estimators:

We assume the following asymptotic assumptions for the two-step local polynomial

estimators F(q)t,θ(t|x)

[y(t)|x

]= F

(q)

t,θ(t|x)

[y(t)|x

]given in (6):

43

A1. If n→∞, then h→ 0, n1/2hp−q+1 →∞, Jh→∞, and nJh2q+1 →∞.

A2. The design time points {t1, t2, . . . , tJ} are independent and identically distributed

with density function g(t). For all 1 ≤ j ≤ J , 1 ≤ j1 ≤ J and 1 ≤ j2 ≤ J

with j1 6= j2, there are known constants 0 < cj ≤ 1 and 0 < cj1j2 ≤ 1 such that

limn→∞(nj/n) = cj and limn→∞(nj1j2/n) = cj1j2 .

A3. The conditional CDFs Ft,θ(t|x)

[y(t)|x

]are p+1 times continuously differentiable

with respect to t.

A4. The kernel function K(·) is a bounded symmetric probability density function

with support within a bounded set [−a, a] for some a > 0.

A5. There is a δ which may tend to 0 as n → ∞, such that the visit times of the

subjects satisfy∣∣tij − ti,j−1

∣∣ > δ for all 1 ≤ i ≤ n and j = 2, . . . ,mi. If δ < ah,

the convergence rate r(nj, nk, njk) and the bandwidth h satisfy the relationship∑Jj=1

∑k:δ≤|tk−tj |≤ah r(nj, nk, njk) = o(Jh) when n is sufficiently large.

Assumptions A1-A4 are similar to the asymptotic conditions used in the estima-

tion of conditional distribution functions with longitudinal data, such as Wu, Tian and

Yu (2010) and Wu and Tian (2013a, 2013b). Assumption A5 is specifically motivated

by the designs of practical longitudinal studies, such as the NGHS, in which there is

usually a prespecified gap δ between the visiting times of the same subject. Although

the assumption of∑J

j=1

∑k:δ≤|tk−tj |≤ah r(nj, nk, njk) = o(Jh) does not appear to be

intuitive, Assumption A5 suggests that, when |tj − tk| is close to zero, the number

of subjects having measurements at both tj and tk, i.e., njk, is small, which leads to

small correlation between the raw estimates Ftj ,θ(tj |x)

[y(tj)

∣∣x] and Ftk,θ(tk|x)

[y(tk)

∣∣x].Let Kq,p+1(t) = eTq,p+1S

−1(1, t, . . . , tp

)TK(t) be the equivalent kernel of local poly-

nomial fit with S =(skl)k,l=0,1,...,p

and skl =∫K(u)uk+ldu, Bp+1(K) =

∫K(u)up+1du

44

and V (K) =∫K2(u)du. The next theorem summarizes the asymptotic expressions

of the bias, variance and mean squared errors of F(q)

t,θ(t|x)

[y(t)|x

].

Theorem 1. Suppose that the Assumptions A1-A5 are satisfied with c = cj and

the asymptotic mean, variance and covariance of the raw estimator Ftj ,θ(tj |x)

[y(tj)|x

]for j = 1, . . . , J are given by (15). When n is sufficiently large,

Bias{F

(q)t,θ(t|x)

[y(t)

∣∣x]} = E{F

(q)t,θ(t|x)

[y(t)

∣∣x]}− F (q)t,θ(t|x)

[y(t)

∣∣x] (16)

=q!hp−q+1

p+ 1F

(p+1)t,θ(t|x)

[y(t)|x

]Bp+1

(Kq,p+1

)[1 + op(1)

],

V ar{F

(q)t,θ(t|x)

[y(t)

∣∣x]} =(q!)2

cnJh2q+1g(t)V(Kq,p+1

)(17)

×F ′t,θ(t|x)

[y(t)

∣∣x]T I−1[θ(t|x)

]F ′t,θ(t|x)

[y(t)

∣∣x][1 + op(1)]

and the asymptotic expression of the mean squared error (MSE)

MSE{F

(q)t,θ(t|x)

[y(t)

∣∣x]} = Bias{F

(q)t,θ(t|x)

[y(t)

∣∣x]}2+ V ar

{F

(q)t,θ(t|x)

[y(t)

∣∣x]}, (18)

is given by substituting Bias{F

(q)t,θ(t|x)

[y(t)

∣∣x]} and V ar{F

(q)t,θ(t|x)

[y(t)

∣∣x]} of (18) with

the right sides of (16) and (17), respectively.

Proof. See Appendix A2.

Remark 4.1. Special cases of Theorem 1 can be easily derived from (16), (17)

and (18). For the local linear estimator of Ft,θ(t|x)

[y(t)

∣∣x]}, we have q = 0 and p = 1,

so that the asymptotic MSE of Ft,θ(t|x)

[y(t)

∣∣x]} is

MSE{Ft,θ(t|x)

[y(t)

∣∣x]} ={h4B2(t|x) + h−1(nJ)−1V(t|x)

}[1 + op(1)

], (19)

45

where B(t|x) = F ′′t,θ(t|x)

[y(t)|x

]B2

(K0,2

)/2 and

V(t|x) =[cg(t)

]−1V(K0,2

)F ′t,θ(t|x)

[y(t)

∣∣x]T I−1[θ(t|x)

]F ′t,θ(t|x)

[y(t)

∣∣x].Setting ∂MSE

{Ft,θ(t|x)

[y(t)

∣∣x]}/∂h to zero, the theoretically optimal bandwidth hopt

which minimizes the dominating term at the right side of (19) is

hopt = (nJ)−1/5[V(t|x)

]1/5[4B2(t|x)

]−1/5.

Substituting hopt into (19), the MSE of the local linear estimator Ft,θ(t|x)

[y(t)

∣∣x]} is

MSE{Ft,θ(t|x)

[y(t)

∣∣x];hopt} = (nJ)−4/5[V(t|x)

]4/5[B(t|x)]2/5(

2−8/5 + 22/5), (20)

which suggests that the optimal rate for the MSE of Ft,θ(t|x)

[y(t)

∣∣x]} to converge to

zero is (nJ)4/5.

Remark 4.2. By Assumption A5, the covariances of the raw estimators do not

affect the asymptotic MSE of the smoothing estimator Ft,θ(t|x)

[y(t)

∣∣x]. In many

practical situations, the visit times of the same subject are set to be larger than a

fixed value, so that δ > 0 is fixed. It can be seen from the proof of Theorem 1 (A.2 of

Appendix) that the contribution of the covariances of the raw estimators to the MSE

of Ft,θ(t|x)

[y(t)

∣∣x] is ignorable because of the local smoothing nature of Ft,θ(t|x)

[y(t)

∣∣x].The assumption that δ > 0 is fixed is appropriate for the NGHS, since the actual visit

times for each subject of NGHS are at least 6 months apart.

46

5 Chapter Five

5.1 Time-Varying Models with Locally Transformed Vari-

ables

Our method is applied to the NGHS BP data to evaluate the conditional CDF of

SBP for African American girls, Caucasian girls and entire cohort and their trends

over ages 9 to 19 years old. We also apply local linear smoothing estimators to

get the smoothing estimates of the conditional CDF on entire time design points.

When the variable of interest is locally transformed, we can use only smoothing-

later local linear smoothing estimator. In many instances, it has been seen that

locally transformed variables show more stability in the sense of normality than a

global transformation on the subsamples partitioned over time design points. NGHS

is a multicenter population-based cohort study designed to evaluate the prevalence

and incidence of cardiovascular risk factors in Caucasian and African-American girls

during childhood and adolescence. This study included 1166 Caucasian and 1213

African American girls for follow-up visits with the summary statistics as: a range

from 1 to 10 visits, median 9, mean 8.2 and standard deviation 2. Among all the

important risk factors that have been studied by the NGHS investigators, childhood

systolic blood pressure (SBP) is an important one.

Because the entry age starts at 9, the observed age in our analysis is limited

to T = [9.1, 19] and rounded up to first decimal point. This age round-up has the

required clinical accuracy for age (Obarzanek et al.,2010), which leads to J=100

distinct time-design points {t1 = 9.1, t2 = 9.2, . . . t100 = 19}. According to this time

design points, the entire NGHS data has been partitioned to 100 subsamples. The

47

first subsample corresponding to age 9.1 includes all girls that are in the age interval

[9, 9.1) and so on for the rest of the subsamples. The last subsample with age 19

includes all girls that are in the age interval [18.9, 19). In our preliminary analysis

based on the goodness-of-fit tests of normality for the SBP distributions at the time

design points{t1, t2, . . . , t100

}, we observed that the conditional distributions of the

Box-Cox transformed SBP given age and race can be reasonably approximated by

normal distributions, while the conditional distributions of the actual SBP given age

and race are not approximately normal. Thus, for a given 1 ≤ j ≤ J = 100 and

i ∈ Sj, we denote by Yi(tj) the Box-Cox transformed SBP observation of the ith

girl at age tj. The time-invariant categorical covariate Xi is race, which is defined

by Xi = 0 if the ith girl is Caucasian, and Xi = 1 if she is African-American. The

random variables for the Box-Cox transformed SBP at age t and race are Y (t) and X,

respectively. Given t and X = x, we consider the family of Power Normal Distribution

(PN) of SBP for this population of girls, that is, the conditional CDF of Y (t) is given

by the normal distributions Ft,θ(t|x)

[y(t)

∣∣x] of (2).

Box-Cox transformation is applied on Z(t) for normality. This transformation is

known as the local Box-Cox as λ(t) varies at different time points. The parameter

λ(t) is estimated by Maximum Likelihood method which takes values in the closed

interval [-2 , 2]. When λ(t) = 0, a log transformation is considered. The local Box-Cox

transformation on Z(t) at time point t is given by

Y (t) =

Zλ(t)(t)−1

λ(t)if λ(t) 6= 0,

log(Z(t)) if λ(t) = 0.

In a series of preliminary goodness-of-fit tests of normality for SBP, such as the

Shapiro-Wilk test, the Kolmogorov-Smirnov test, the Anderson-Darling test, the

48

Cramer-Von-Mises test as well as visual inspections of the QQ plots, we have seen

that 96 of 100 subsamples follow normal distribution by Shapiro-Wilk test, 92 of 100

subsamples follow normal distribution by Anderson-Darling test, all 100 subsamples

follow normal distribution by Kolmogorov-Smirnov test, 92 of 100 subsamples follow

normal distribution by Cramer-Von-Mises test. According to the above goodness-of-

fit tests, we can conclude that almost all subsamples are approximately normal. If

we apply the above normality tests on SBP when the data are extracted from the

original subsamples in such a way that individuals with more than median height are

only considered, then we see that only 2 of the 100 subsamples show nonnormality by

Shapiro-Wilk test, 4 of the 100 subsamples show nonnormality by Anderson-Darling

test and 2 of the 100 subsamples show nonnormality by Cramer-Von-Mises test. Kol-

mogorov results remain same as before.

For the goodness-of-fit tests, we have used 5% significance level. It is noteworthy to

mention that before splitting the NGHS longitudinal data, SBP is not normal. But if

we conduct a lognormality test, we have seen that SBP follows lognormal distribution.

Tabular representation of P-values, All estimated probabilities and QQ plots are ig-

nored to avoid redundancy for the Box-Cox transformed SBP. Applying the two-step

local linear estimator of (7) to the observed data{Yi(tj), Xi, tj; 1 ≤ j ≤ J, 1 ≤ i ≤ n

},

we compute the smoothing estimator Ft,θ(t|x)

[yq(t)

∣∣x] of Ft,θ(t|x)

[yq(t)

∣∣x], where yq(t)

is the Box-Cox transformed (100×q)th percentiles of SBP for girls with median height

at age t (NHBPEP, 2004). By the monotonicity of the transformation, Ft,θ(t|x)

[yq(t)

∣∣x]is also the conditional probability of SBP at or below the (100× q)th SBP percentile

given in NHBPEP (2004) for girls with age t and race x. In addition to the smoothing

estimators based on the time-varying normal distributions Ft,θ(t|x)

[yq(t)

∣∣x] in (2), we

also compute the kenrel estimator Ft[yq(t)

∣∣x] of the unstructured conditional CDF

Ft[yq(t)

∣∣x] of Y (t) based on (8) with the wi = 1/(nmi) or 1/N weights. Smoothing

49

estimators of these conditional probabilities are also computed for the entire cohort

ignoring the race covariate.

Figure 7 shows the local linear smoothing estimates 1−Ft,θ(t)[yq(t)

∣∣x] based on (7)

(q = 0.90 in Figure a; q = 0.95 in Figure c and q = .99 in Figure e), the unstructured

kernel estimators 1 − Ft[yq(t)

∣∣x] based on (8) with wi = 1/N (q = 0.90 in Figure

b; q = 0.95 in Figure d and q = .99 in Figure f), and their corresponding empirical

quantile bootstrap pointwise 95% confidence intervals based on B = 1000 bootstrap

replications for Caucasian girls (x = 0). The Epanechnikov kernel and the bandwidth

h = 2.5 were used for both the local linear smoothing estimators and the unstructured

kernel estimators. The bandwidth of h = 2.5 was chosen by examining the LSCV

and LTCV scores and the smoothness of the fitted plots.

Figure 8 shows the local linear smoothing estimates 1 − Ft,θ(t|x)

[yq(t)

∣∣x] (a, c,

e), the unstructured kernel estimators 1 − Ft[yq(t)

∣∣x] with wi = 1/N (b, d, f), and


based on B = 1000 bootstrap replications for African-American girls (x = 1) with

q = 0.90, 0.95 and .99. Similar to Figure 7, the estimators of Figure 8 are based on

the Epanechnikov kernel and the bandwidth h = 2.5.

Figure 9 shows the local linear smoothing estimates 1 − Ft,θ(t|x)

[yq(t)

∣∣x] (a, c,

e), the unstructured kernel estimators 1 − Ft[yq(t)

∣∣x] with wi = 1/N (b, d, f), and


based on B = 1000 bootstrap replications for entire cohort with q = 0.90, 0.95 and

.99. Similar to Figure 7 and Figure 8, the estimators of Figure 9 are based on the

Epanechnikov kernel and the bandwidth h = 2.5.

50

10 12 14 16 18

0.00

0.05

0.10

0.15

a. Age vs P(Y(t) > y0.9(t))at Median height by SNM

Ages of CC girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.05

0.10

0.15

b. Age vs P(Y(t) > y0.9(t))at Median height by UNM

Ages of CC girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10

c. Age vs P(Y(t) > y0.95(t))at Median height by SNM

Ages of CC girls

P(Y

>y 0

.95(

t))

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10

d. Age vs P(Y(t) > y0.95(t))at Median height by UNM

Ages of CC girls

P(Y

>y 0

.95(

t))

10 12 14 16 18

0.00

0.01

0.02

0.03

0.04

e. Age vs P(Y(t) > y0.99(t))at Median height by SNM

Ages of CC girls

P(Y

>y 0

.99(

t))

10 12 14 16 18

0.00

0.01

0.02

0.03

0.04

f. Age vs P(Y(t) > y0.99(t))at Median height by UNM

Ages of CC girls

P(Y

>y 0

.99(

t))

Figure 7: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th, 95th and99th population SBP percentiles for Caucasian girls (CC) between 9.1 and 19.0 yearsold. (a),(c),(e): Estimators based on the time-varying Gaussian models. (b),(d),(f):Estimators based on the unstructured kernel estimators.

51

10 12 14 16 18

0.00

0.05

0.10

0.15


Ages of AA girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.05

0.10

0.15


Ages of AA girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10


Ages of AA girls

P(Y

>y 0

.95(

t))

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10


Ages of AA girls

P(Y

>y 0

.95(

t))

10 12 14 16 18

0.00

0.01

0.02

0.03

0.04


Ages of AA girls

P(Y

>y 0

.99(

t))

10 12 14 16 18

0.00

0.01

0.02

0.03

0.04


Ages of AA girls

P(Y

>y 0

.99(

t))

Figure 8: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th, 95th and99th population SBP percentiles for African American girls (CC) between 9.1 and19.0 years old. (a),(c),(e): Estimators based on the time-varying Gaussian models.(b),(d),(f): Estimators based on the unstructured kernel estimators.

52

10 12 14 16 18

0.00

0.05

0.10

0.15


Ages of all girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.05

0.10

0.15


Ages of all girls

P(Y

>y 0

.9(t)

)

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10


Ages of all girls

P(Y

>y 0

.95(

t))

10 12 14 16 18

0.00

0.02

0.04

0.06

0.08

0.10


Ages of all girls

P(Y

>y 0

.95(

t))

10 12 14 16 18

0.00

0.01

0.02

0.03

0.04


Ages of all girls

P(Y

>y 0

.99(

t))

10 12 14 16 18

0.00

0.01

0.02

0.03

0.04


Ages of all girls

P(Y

>y 0

.99(

t))

Figure 9: Raw estimators (scatter plots), smoothing estimators (solid curves), andbootstrap pointwise 95% confidence intervals (dashed curves, B = 1000 bootstrapreplications) of the age specific probabilities of SBP greater than the 90th, 95thand 99th population SBP percentiles for entire cohort between 9.1 and 19.0 yearsold. (a),(c),(e): Estimators based on the time-varying Gaussian models. (b),(d),(f):Estimators based on the unstructured kernel estimators.

53

5.2 Simulation Results

For the simulation design, we generate in each sample n=1000 subjects with 10 visits

per subject according to the data structure of NGHS and Section 1.2. The jth visit

time of the ith subject tij is generated from the uniform distribution U(j − 1, j) for

for j = 1, ..., 10. Given each tij, we generate the observations Yij for Y(t) from the

following simulation design:

Yij = 210 + 28(tij − 5)− 2(tij − 5)2 + a0i + εij,

a0i ∼ N(0, (3)2), εij ∼ N(0, (0.9)2)

where εij are independent for all (i, j), and a0i and εij are independent. For the above

design, we have E(Yij|tij) = 210 + 28(tij − 5) − 2(tij − 5)2, V ar(Yij|tij) = 9.81. For

each simulated sample{(Yi(tij), tij

): i = 1, . . . , 1000; j = 1, . . . , 10

}, we round up the

time points so that each tij belongs to one of the equally spaced time design points

{t0, . . . , t100} = {0, 0.1, 0.2, . . . , 10}. Let yq(t) be the (100 × q)th percentile of Y (t).

It follows that P [Y (t) > yq(t)] = q. More Specifically, let y.90(t), y.95(t) and y.99(t)

are the 90th, 95th and 99th quantiles of Y(t). Since Y(t) follows normal distribution,

hence P [Y (t) > y.90(t)] = .10, P [Y (t) > y.95(t)] = .05 and P [Y (t) > y.95(t)] = .01.

Our theoretical 90th, 95th and 99th quantiles for 10 different time points from 101

different time points for the above model are given in Table 4.

54

Tab

le4:

Theo

reti

cal

90th

,95

than

d99

thquan

tile

sfo

r10

diff

eren

tti

me

poi

nts

out

of10

1diff

eren

tti

me

poi

nts

from

the

model

ofou

rsi

mula

tion

des

ign.

Tim

e(t)

12

34

56

78

910

Y.9

0(t

)70

.01

112.

0115

0.01

184.

0121

4.01

240.

0126

2.01

280.

0129

4.01

304.

01Y.9

5(t

)71

.15

113.

1515

1.15

185.

1521

5.15

241.

1526

3.15

281.

1529

5.15

305.

15Y.9

9(t

)73

.29

115.

2915

3.29

187.

2921

7.29

243.

2926

5.29

283.

2929

7.29

307.

29

55

We repeatedly generate 1000 simulation samples. Within each simulation sample,

we compute the smoothing estimates of P [Y (t) > yq(t)] for q = 0.90, 0.95, 0.99

by the local linear estimator (7) based on the time-varying Gaussian model (2) and

the unstructured kernel estimator (8) using the Epanechnikov kernel and the LTCV

bandwidths. The bootstrap pointwise 95% confidence intervals for the smoothing

estimators are constructed using empirical quantiles of B = 1000 bootstrap samples.

For each simulated sample {(Yij, tij) : i = 1, ..., 1000; j = 1, ...., 10} , we round up

the time points so that each tij belongs to one of the equally spaced time points

{t0, ....., t101} = {0, 0.1, 0.2, ...., 10} . We then compute the smoothing estimates of

P.90(t) = P [Y (t) > y.90(t)|t],

P.95(t) = P [Y (t) > y.95(t)|t],

P.99(t) = P [Y (t) > y.99(t)|t].

from estimators from both structural nonparametric models and unstructured non-

parametric models. The 95% point-wise bootstrap confidence bands for this condi-

tional probability are constructed with B=1000 simulated samples. Figure 10 and

Table 5, Table 6 and Table 7 represent the results of our simulation study in pictorial

and tabular form. From Figure 10, we see that smoothing estimators from unstruc-

tured nonparametric model gives wider confidence bands than the estimators from

the structural nonparametric model (Gaussian Model) when smoothing estimates of

the conditional probability for top 10%, top 5% and 1% are computed. Numerical

results of the average estimates, average bias, average root MSE and confidence in-

terval coverage probability are given in Table 5, Table 6 and Table 7 respectively for

top 10%, top 5% and top 1% . Root MSE of SNM for each time point is smaller

than the Root MSE of UNM and consequently relative root MSE at each time point

is smaller than 1, which means that SNM is better than UNM. From the Table 5,

Table 6 and Table 7, we also notice better results from the smoothing estimators of

56

SNM when extreme tail probabilities are estimated.

Let Pq(t) be a smoothing estimator of P [Y (t) > yq(t)] = q, which could be

either the local polynomial estimator of (7), or the unstructured kernel estimator of

(8). We measure the accuracy of Pq(t) by the average of the bias∑M

m=1

[Pq(t) −

q]/M , the empirical mean squared error MSE

[Pq(t)

]=∑M

m=1

[Pq(t) − q

]2/M or

the square root of MSE[Pq(t)

](root-MSE), where M = 1000 is the total number

of simulated samples. We assess the accuracy of a pointwise confidence interval of

Pq(t) by the empirical coverage probability of the confidence interval covering the

true value P [Y (t) > yq(t)] = q.

Table 5 shows averages of estimates, the averages of the biases, the root-MSEs,

and the empirical coverage probabilities of the empirical quantile bootstrap pointwise

95% confidence intervals based on B = 1000 bootstrap replications for the estimation

of P [Y (t) > y.90(t)] = 0.10 at t = 1.0, 2.0, . . . , 10.0. For all the 10 time points, the

smoothing-later local linear estimators based on the time-varying Gaussian model

have smaller root-MSEs than the kernel estimators based on the unstructured non-

parametric model. Comparing the empirical coverage probabilities of the bootstrap

pointwise 95% confidence intervals, we observe that the smoothing estimators based

on the time-varying Gaussian model have higher coverage probabilities than the un-

structured kernel estimators at most of the time points.

When q increases to 0.95 and 0.99, Table 6 and Table 7 shows the averages of the

estimates, averages of the biases, the root-MSEs, and the empirical coverage proba-

bilities of the empirical quantile bootstrap pointwise 95% confidence intervals based

on B = 1000 bootstrap replications for the estimation of P [Y (t) > y.95(t)] = 0.05

at t = 1.0, 2.0, . . . , 10.0. Again, the smoothing estimators based on the time-varying

Gaussian model have smaller root-MSEs than the unstructured kernel estimators at

all 10 time points.

57

From Figure 10, if we compare (a) with (b), (c) with (d) and (e) with (f) we see

that smoothing estimators from the unstructured nonparametric models gives wider

confidence bands than the smoothing estimators from the structural nonparametric

models. The simulation results of Table 5, Table 6 and Table 7 also suggest that, when

the time-varying parametric model is appropriate, the structural two-step smoothing

estimators have smaller mean squared errors than the unstructured smoothing esti-

mators under a practical longitudinal sample with moderate sample sizes. However,

these results may not hold if the time-varying parametric model is not an appropriate

approximation to the time-varying distribution functions of the longitudinal variable

being considered. Looking at the Relative√MSE of Table 5, Table 6 and Table 7,

we see that as q increases, the Relative√MSE decreases, which means that smooth-

ing estimates of extreme tail probabilities by existing unstructured kernel method is

extremely inefficient and misleading when data sets at different time points follow a

parametric family.

58

0 2 4 6 8 10

0.00

0.10

0.20

a. Age vs P(Y > y0.9(t))by SNM

Age

P(Y

>y 0

.9(t)

)

0 2 4 6 8 10

0.00

0.10

0.20

b. Age vs P(Y > y0.9(t))by UNM

Age

P(Y

>y 0

.9(t)

)

0 2 4 6 8 10

0.00

0.05

0.10

0.15

c. Age vs P(Y > y0.95(t))by SNM

Age

P(Y

>y 0

.95(

t))

0 2 4 6 8 10

0.00

0.05

0.10

0.15

d. Age vs P(Y > y0.95(t))by UNM

Age

P(Y

>y 0

.95(

t))

0 2 4 6 8 10

0.00

0.02

0.04

e. Age vs P(Y > y0.99(t))by SNM

Age

P(Y

>y 0

.99(

t))

0 2 4 6 8 10

0.00

0.02

0.04

f. Age vs P(Y > y0.99(t))by UNM

Age

P(Y

>y 0

.99(

t))

Figure 10: Black solid line is local linear (a,c,e) and Nadaraya-Watson (b,d,f) smooth-ing estimators with Epanechnikov kernel and bandwidth 2.5. Dotted lines representthe 95% pointwise bootstrap confidence band for 1000 simulated samples.

59

Tab

le5:

Ave

rage

sof

esti

mat

es,

aver

ages

ofth

ebia

ses,

the

squar

ero

otof

the

mea

nsq

uar

eder

rors

,th

eem

pir

ical

cove

rage

pro

bab

ilit

ies

ofth

eem

pir

ical

quan

tile

boot

stra

pp

ointw

ise

95%

confiden

cein

terv

als

(B=

1000

boot

stra

pre

plica

tion

s)fo

rth

ees

tim

atio

nofP

[Y(t

)>y .

90(t

)]=

0.10

and

rela

tive

Root

MSE

att

=1.

0,2.

0,...,

10.0

over

1000

sim

ula

ted

sam

ple

.T

he

smoot

hin

g-la

ter

loca

llinea

res

tim

ator

sbas

edon

the

tim

e-va

ryin

gG

auss

ian

model

are

show

nin

the

left

pan

el.

The

kern

eles

tim

ator

sbas

edon

the

unst

ruct

ure

dnon

par

amet

ric

model

are

show

nin

the

righ

tpan

el.

The

Epan

echnik

ovke

rnel

and

the

LT

CV

ban

dw

idthh

=2.

5ar

euse

dfo

ral

lth

esm

oot

hin

ges

tim

ator

s.

Str

uct

ura

lN

onpar

amet

ric

Model

Unst

ruct

ure

dN

onpar

amet

ric

Model

Tim

eE

stim

ate

Ave

.Bia

s√MSE

Cov

erag

eE

stim

ate

Ave

.Bia

s√MSE

Cov

erag

eR

elat

ive√MSE

10.

0986

-0.0

014

0.02

470.

948

0.09

84-0

.001

60.

0306

0.95

0.80

5670

133

20.

1002

0.00

020.

0243

0.94

70.

0995

-0.0

005

0.03

190.

952

0.76

1484

532

30.

0998

-0.0

002

0.02

440.

948

0.09

91-0

.000

90.

0306

0.95

70.

7978

7585

44

0.10

100.

0010

0.02

520.

948

0.10

130.

0013

0.03

240.

951

0.77

7396

196

50.

1010

0.00

100.

0244

0.95

80.

1002

0.00

020.

0307

0.94

30.

7943

6251

56

0.09

88-0

.001

20.

0238

0.94

80.

0988

-0.0

012

0.03

060.

947

0.77

8519

871

70.

1013

0.00

130.

0248

0.94

80.

1014

0.00

140.

0317

0.94

20.

7832

0848

80.

0991

-0.0

009

0.02

390.

948

0.09

89-0

.001

10.

0302

0.95

20.

7902

4193

19

0.10

070.

0007

0.02

420.

949

0.10

080.

0008

0.03

130.

954

0.77

1904

612

100.

0986

-0.0

014

0.03

400.

950.

0983

-0.0

017

0.04

300.

955

0.79

0419

587

60

Tab

le6:

Ave

rage

sof

the

bia

ses,

the

squar

ero

otof

the

mea

nsq

uar

eder

rors

,an

dth

eem

pir

ical

cove

rage

pro

bab

ilit

ies

ofth

eem

pir

ical

quan

tile

boot

stra

pp

ointw

ise

95%

confiden

cein

terv

als

(B=

1000

boot

stra

pre

plica

tion

s)fo

rth

ees

tim

atio

nof

P[Y

(t)>y .

95(t

)]=

0.05

att

=1.

0,2.

0,...,

10.0

over

1000

sim

ula

ted

sam

ple

.T

he

smoot

hin

g-la

ter

loca

llinea

res

tim

ator

sbas

edon

the

tim

e-va

ryin

gG

auss

ian

model

are

show

nin

the

left

pan

el.

The

kern

eles

tim

ator

sbas

edon

the

unst

ruct

ure

dnon

par

amet

ric

model

are

show

nin

the

righ

tpan

el.

The

Epan

echnik

ovke

rnel

and

the

LT

CV

ban

dw

idthh

=2.

5ar

euse

dfo

ral

lth

esm

oot

hin

ges

tim

ator

s.

Str

uct

ura

lN

onpar

amet

ric

Model

Unst

ruct

ure

dN

onpar

amet

ric

Model

Tim

eE

stim

ate

Ave

.Bia

s√MSE

Cov

erag

eE

stim

ate

Ave

.Bia

s√MSE

Cov

erag

eR

elat

ive√MSE

10.

0497

-0.0

003

70.0

165

0.94

90.

0486

-0.0

014

0.02

230.

959

0.73

8030

611

20.

0506

0.00

060.

0162

0.94

50.

0497

-0.0

003

0.02

310.

960.

7013

2229

13

0.05

030.

0003

0.01

620.

955

0.05

000.

0000

0.02

110.

963

0.76

8899

154

40.

0513

0.00

130.

0170

0.94

90.

0503

0.00

030.

0222

0.95

10.

7679

902

50.

0512

0.00

120.

0165

0.96

20.

0502

0.00

020.

0230

0.95

80.

7193

7884

16

0.04

96-0

.000

40.

0158

0.95

90.

0488

-0.0

012

0.02

210.

960.

7117

8465

57

0.05

140.

0014

0.01

670.

950.

0515

0.00

150.

0233

0.96

10.

7158

9335

80.

0499

-0.0

001

0.01

600.

952

0.04

94-0

.000

60.

0214

0.94

90.

7465

0521

69

0.05

090.

0009

0.01

620.

957

0.05

050.

0005

0.02

230.

959

0.72

5402

593

100.

0499

-0.0

001

0.02

260.

955

0.04

84-0

.001

60.

0312

0.96

50.

7228

7276

61

Tab

le7:

Ave

rage

sof

the

bia

ses,

the

squar

ero

otof

the

mea

nsq

uar

eder

rors

,an

dth

eem

pir

ical

cove

rage

pro

bab

ilit

ies

ofth

eem

pir

ical

quan

tile

boot

stra

pp

ointw

ise

95%

confiden

cein

terv

als

(B=

1000

boot

stra

pre

plica

tion

s)fo

rth

ees

tim

atio

nof

P[Y

(t)>y .

99(t

)]=

0.01

att

=1.

0,2.

0,...,

10.0

over

1000

sim

ula

ted

sam

ple

.T

he

smoot

hin

g-la

ter

loca

llinea

res

tim

ator

sbas

edon

the

tim

e-va

ryin

gG

auss

ian

model

are

show

nin

the

left

pan

el.

The

kern

eles

tim

ator

sbas

edon

the

unst

ruct

ure

dnon

par

amet

ric

model

are

show

nin

the

righ

tpan

el.

The

Epan

echnik

ovke

rnel

and

the

LT

CV

ban

dw

idthh

=2.

5ar

euse

dfo

ral

lth

esm

oot

hin

ges

tim

ator

s.

Str

uct

ura

lN

onpar

amet

ric

Model

Unst

ruct

ure

dN

onpar

amet

ric

Model

Tim

eE

stim

ate

Ave

.Bia

s√MSE

Cov

erag

eE

stim

ate

Ave

.Bia

s√MSE

Cov

erag

eR

elat

ive√MSE

10.

0104

0.00

040.

0056

0.94

80.

0097

-0.0

003

0.01

020.

951

0.54

2704

172

0.01

070.

0007

0.00

550.

947

0.01

020.

0002

0.01

000.

948

0.54

6403

262

30.

0105

0.00

050.

0054

0.94

20.

0100

0.00

000.

0100

0.94

70.

5431

7865

94

0.01

100.

0010

0.00

590.

948

0.01

040.

0004

0.01

030.

951

0.57

2775

648

50.

0109

0.00

090.

0057

0.94

90.

0102

0.00

020.

0101

0.94

70.

5676

1380

86

0.01

030.

0003

0.00

520.

952

0.00

99-0

.000

10.

0097

0.95

30.

5412

3299

87

0.01

090.

0009

0.00

570.

940.

0105

0.00

050.

0106

0.95

20.

5394

098

80.

0104

0.00

040.

0054

0.94

70.

0096

-0.0

004

0.00

970.

947

0.55

3963

269

0.01

080.

0008

0.00

550.

951

0.01

040.

0004

0.01

040.

948

0.52

9256

297

100.

0108

0.00

080.

0078

0.94

80.

0097

-0.0

003

0.01

460.

941

0.53

6028

584

62

6 Chapter Six

6.1 Discussion and Future Research

We have proposed a time varying structural nonparametric model to estimate the

conditional distribution functions in longitudinal study. Such method is usually ap-

propriate for a long term follow up studies, such as NGHS, which have a large number

of subjects and sufficient numbers of repeated measurements over time. This approach

has practical advantages over the well-established conditional-mean-based models in

longitudinal analysis when the scientific objective is better achieved by evaluating the

conditional distribution functions. Our application to NGHS SBP data demonstrate

that “estimating conditional distribution function by SNM” is a useful quantitative

measure to see the risk of developing hypertension over time for the adolescents. Al-

though our estimation of conditional distribution function by SNM does not include

any time variant covariates and limited to local polynomial smoothing estimator and

a specific set of asymptotic assumptions, they provide some useful insight into the

accuracy of the statistical results under different repeated measurement scenarios.

There are a number of theoretical and methodological aspects that warrant further

investigation. First , further theoretical and simulation studies are needed to investi-

gate the properties of other smoothing methods, such as global smoothing methods

through splines, wavelets and other basis approximations, and their corresponding

asymptotic inference procedures. Second, flexible conditional-distribution based sta-

tistical models incorporating both time-dependent and time-invariant covariates are

still not well-understood and need to be developed. Third, many longitudinal stud-

ies have multivariate outcome variables, so that appropriate statistical models and

63

estimation methods for multivariate conditional distribution functions deserve to be

systematically investigated. For our future research, we are interested to use copula

model to estimate the bivariate and multivariate conditional distribution function.

Incorporating continuous and time variant covariates in the estimating conditional

distribution functions can also be considered as our future research.

64

7 Appendix 1: Preliminary Analysis

Table 8: P-values for normality test of 100 data sets.

SW, AD, KS, CVM and ChiSq stand for Shapiro-Wilk

Test, Anderson Darling Test, Kolmogorov-Smirnov Test,

Cramer-Von-Mises Test and Chi Square Test respec-

tively.

Data Sets SW AD KS CVM Chisq

1 0.752 0.836 0.929 0.796 0.488

2 0.24 0.169 0.956 0.217 0.27

3 0.062 0.295 0.959 0.3 0.229

4 0.197 0.47 0.945 0.573 0.4

5 0.385 0.192 0.959 0.194 0.094

6 0.022 0.035 0.956 0.046 0.338

7 0.811 0.389 0.917 0.251 0.006

8 0.685 0.343 0.963 0.363 0.008

9 0.117 0.091 0.94 0.096 0.04

10 0.284 0.29 0.958 0.29 0.001

11 0.339 0.401 0.958 0.406 0.419

12 0.567 0.333 0.951 0.408 0.017

13 0.318 0.269 0.96 0.197 0.043

14 0.105 0.013 0.949 0.011 0

15 0.205 0.585 0.96 0.658 0.008

16 0.87 0.765 0.947 0.707 0.028

continued . . .

65

. . . continued


17 0.059 0.102 0.951 0.195 0

18 0.548 0.236 0.948 0.224 0.006

19 0.288 0.117 0.956 0.099 0.004

20 0.369 0.198 0.955 0.181 0

21 0.359 0.197 0.914 0.18 0

22 0.577 0.38 0.952 0.363 0.106

23 0.308 0.325 0.937 0.35 0.075

24 0.324 0.268 0.954 0.224 0.003

25 0.477 0.488 0.944 0.456 0.021

26 0.473 0.096 0.961 0.057 0.006

27 0.122 0.082 0.93 0.058 0

28 0.298 0.095 0.93 0.073 0.049

29 0.318 0.098 0.939 0.061 0.001

30 0.55 0.476 0.96 0.493 0.129

31 0.083 0.023 0.955 0.017 0.012

32 0.423 0.248 0.948 0.216 0.448

33 0.742 0.485 0.942 0.408 0.012

34 0.373 0.141 0.961 0.092 0

35 0.043 0.019 0.887 0.013 0

36 0.074 0.009 0.955 0.009 0

37 0.137 0.035 0.947 0.027 0.013

38 0.525 0.448 0.963 0.405 0.039

39 0.552 0.465 0.946 0.503 0.398

continued . . .

66

. . . continued


40 0.136 0.142 0.943 0.166 0

41 0.056 0.051 0.959 0.066 0.003

42 0.102 0.056 0.916 0.04 0

43 0.412 0.186 0.961 0.141 0

44 0.494 0.237 0.965 0.28 0.011

45 0.078 0.027 0.941 0.025 0.005

46 0.138 0.241 0.958 0.262 0.265

47 0.058 0.094 0.961 0.094 0.144

48 0.392 0.349 0.927 0.298 0.316

49 0.203 0.128 0.951 0.139 0.089

50 0.651 0.58 0.945 0.614 0.079

51 0.13 0.21 0.96 0.282 0

52 0.351 0.515 0.93 0.556 0.288

53 0.438 0.466 0.963 0.49 0.183

54 0.121 0.125 0.924 0.099 0.713

55 0.58 0.264 0.954 0.187 0.398

56 0.028 0.009 0.927 0.012 0.02

57 0.271 0.139 0.963 0.132 0.003

58 0.331 0.115 0.939 0.079 0.222

59 0.271 0.074 0.924 0.038 0.005

60 0.041 0.099 0.957 0.156 0.118

61 0.277 0.135 0.954 0.092 0.219

62 0.15 0.309 0.962 0.468 0.153

continued . . .

67

. . . continued


63 0.793 0.551 0.943 0.404 0.629

64 0.07 0.033 0.962 0.061 0.001

65 0.45 0.298 0.903 0.243 0.285

66 0.78 0.77 0.957 0.768 0.877

67 0.691 0.591 0.942 0.646 0.677

68 0.185 0.244 0.947 0.335 0.038

69 0.088 0.036 0.951 0.044 0.001

70 0.687 0.567 0.955 0.637 0.111

71 0.482 0.383 0.962 0.476 0.497

72 0.543 0.54 0.957 0.634 0.969

73 0.85 0.772 0.962 0.653 0.421

74 0.149 0.082 0.914 0.08 0.099

75 0.262 0.379 0.937 0.277 0.613

76 0.001 0.022 0.915 0.044 0.323

77 0.001 0 0.966 0.001 0.061

78 0.592 0.549 0.962 0.548 0.326

79 0.339 0.507 0.905 0.419 0.811

80 0.253 0.117 0.961 0.112 0.254

81 0.038 0.285 0.904 0.338 0.733

82 0.804 0.77 0.957 0.705 0.241

83 0.082 0.339 0.95 0.336 0.562

84 0.26 0.563 0.962 0.803 0.341

85 0.294 0.438 0.936 0.451 0.413

continued . . .

68

. . . continued


86 0.155 0.115 0.904 0.093 0.38

87 0.693 0.676 0.922 0.686 0.115

88 0.143 0.094 0.962 0.142 0.397

89 0.897 0.798 0.966 0.723 0.901

90 0.012 0.015 0.937 0.03 0.071

91 0.777 0.494 0.962 0.422 0.391

92 0.058 0.156 0.89 0.135 0.328

93 0.403 0.273 0.946 0.278 0.124

94 .341 0.361 0.957 0.443 0.303

95 0.129 0.198 0.934 0.231 0.02

96 0.245 0.166 0.936 0.203 0.185

97 0.334 0.136 0.953 0.141 0

98 0.004 0.006 0.925 0.008 0.005

99 0.807 0.517 0.966 0.42 0.386

100 0.166 0.111 0.96 0.097 0.374

69

Tab

le9:

Est

imat

edR

awP

robab

ilit

ies

(ER

P)

ofSB

Pfo

r

enti

reco

hor

tth

atex

ceed

diff

eren

tquan

tile

sofy q

(t)

(q=

.90,

.95,

.99)

by

SN

Man

dU

NM

.

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

10.

0340

50.

0000

00.

0117

50.

0000

00.

0013

40.

0000

0

20.

1014

20.

1000

00.

0442

20.

0600

00.

0077

10.

0000

0

30.

0508

90.

0612

20.

0162

70.

0000

00.

0014

60.

0000

0

40.

0401

70.

0697

70.

0132

90.

0232

60.

0013

50.

0000

0

50.

0590

90.

0612

20.

0201

70.

0612

20.

0020

90.

0000

0

60.

1178

70.

0961

50.

0546

40.

0769

20.

0108

40.

0000

0

70.

0875

80.

0869

60.

0377

80.

0724

60.

0065

60.

0144

9

80.

1002

10.

1093

80.

0393

00.

0468

80.

0052

60.

0156

3

90.

1288

40.

1666

70.

0620

10.

0606

10.

0133

20.

0151

5

100.

1046

80.

1428

60.

0361

40.

0714

30.

0034

40.

0000

0

conti

nued

...

70

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

110.

0583

30.

0707

10.

0209

40.

0303

00.

0024

50.

0101

0

120.

0547

40.

0666

70.

0181

50.

0476

20.

0017

60.

0000

0

130.

0592

40.

0761

90.

0200

60.

0476

20.

0020

30.

0000

0

140.

0796

30.

1485

10.

0311

00.

0594

10.

0042

80.

0000

0

150.

0587

30.

1010

10.

0206

50.

0303

00.

0022

90.

0000

0

160.

0720

60.

0927

80.

0257

70.

0515

50.

0028

90.

0206

2

170.

0697

10.

0796

50.

0268

10.

0531

00.

0036

20.

0177

0

180.

1432

60.

1284

40.

0665

50.

1009

20.

0128

30.

0275

2

190.

1026

20.

1428

60.

0408

90.

0446

40.

0056

70.

0089

3

200.

0810

10.

1121

50.

0297

60.

0373

80.

0035

00.

0093

5

210.

0820

70.

1149

40.

0328

40.

0574

70.

0047

70.

0114

9

220.

0695

80.

1250

00.

0280

50.

0250

00.

0042

50.

0125

0

230.

0717

60.

0875

00.

0274

00.

0500

00.

0036

10.

0000

0

conti

nued

...

71

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

240.

0890

20.

1304

30.

0339

70.

0326

10.

0043

20.

0217

4

250.

1281

20.

1546

40.

0541

60.

0618

60.

0083

60.

0103

1

260.

1042

00.

1162

80.

0434

50.

0581

40.

0067

30.

0000

0

270.

0584

70.

1111

10.

0181

10.

0555

60.

0014

60.

0222

2

280.

1051

60.

1411

80.

0395

60.

0588

20.

0046

90.

0117

6

290.

1294

90.

1828

00.

0556

50.

0645

20.

0089

40.

0107

5

300.

1183

90.

1428

60.

0480

20.

0714

30.

0067

70.

0204

1

310.

0752

80.

0595

20.

0312

70.

0238

10.

0050

30.

0119

0

320.

0502

80.

0945

90.

0156

80.

0540

50.

0013

20.

0000

0

330.

1009

30.

1590

90.

0448

80.

0795

50.

0081

90.

0113

6

340.

0622

40.

1325

30.

0238

20.

0361

40.

0032

20.

0000

0

350.

0739

50.

1500

00.

0282

50.

0500

00.

0037

10.

0000

0

360.

0849

10.

1176

50.

0358

00.

0352

90.

0058

70.

0117

6

conti

nued

...

72

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

370.

0723

30.

0769

20.

0276

80.

0384

60.

0036

60.

0256

4

380.

0919

20.

1265

80.

0364

20.

0632

90.

0050

60.

0126

6

390.

1010

10.

1473

70.

0370

20.

0210

50.

0041

30.

0000

0

400.

0673

70.

1318

70.

0192

50.

0659

30.

0012

20.

0000

0

410.

0393

60.

0595

20.

0165

70.

0357

10.

0010

30.

0000

0

420.

0489

70.

0882

40.

0217

80.

0441

20.

0016

10.

0000

0

430.

0748

40.

1276

60.

0361

80.

0638

30.

0034

10.

0000

0

440.

0305

20.

0384

60.

0117

10.

0256

40.

0005

40.

0000

0

450.

0685

80.

1039

00.

0354

50.

0649

40.

0043

70.

0259

7

460.

0677

70.

0821

90.

0342

30.

0274

00.

0038

70.

0137

0

470.

1061

70.

1111

10.

0605

80.

0833

30.

0100

90.

0416

7

480.

0585

80.

1176

50.

0298

10.

0588

20.

0035

50.

0147

1

490.

0774

80.

1369

90.

0398

30.

0958

90.

0047

10.

0000

0

conti

nued

...

73

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

500.

0564

30.

0821

90.

0274

00.

0411

00.

0027

60.

0137

0

510.

0175

70.

0333

30.

0044

90.

0000

00.

0002

70.

0000

0

520.

0600

00.

1044

80.

0230

30.

0447

80.

0031

50.

0000

0

530.

0600

70.

0714

30.

0244

60.

0357

10.

0038

50.

0178

6

540.

0569

30.

0847

50.

0199

70.

0508

50.

0022

00.

0339

0

550.

0370

60.

0634

90.

0132

70.

0476

20.

0016

30.

0158

7

560.

0836

80.

1428

60.

0331

40.

1071

40.

0046

50.

0000

0

570.

0497

80.

0678

00.

0185

20.

0339

00.

0024

00.

0000

0

580.

0428

90.

0350

90.

0133

80.

0350

90.

0011

50.

0000

0

590.

1072

70.

1587

30.

0504

20.

0952

40.

0104

30.

0317

5

600.

0124

10.

0000

00.

0023

40.

0000

00.

0000

70.

0000

0

610.

0510

10.

0697

70.

0199

30.

0232

60.

0028

90.

0000

0

620.

0704

20.

1000

00.

0274

50.

0400

00.

0038

00.

0200

0

conti

nued

...

74

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

630.

0320

30.

0416

70.

0102

70.

0208

30.

0009

80.

0000

0

640.

0637

50.

0851

10.

0265

00.

0638

30.

0043

50.

0000

0

650.

0403

50.

0666

70.

0121

70.

0444

40.

0009

70.

0000

0

660.

0227

50.

0540

50.

0067

30.

0270

30.

0005

60.

0000

0

670.

0445

00.

1063

80.

0164

10.

0425

50.

0021

10.

0000

0

680.

0401

70.

0625

00.

0141

20.

0416

70.

0016

40.

0000

0

690.

1177

00.

1087

00.

0622

20.

0652

20.

0168

80.

0434

8

700.

0669

10.

1224

50.

0254

90.

0408

20.

0033

60.

0000

0

710.

0529

70.

0303

00.

0209

20.

0303

00.

0031

00.

0000

0

720.

0785

90.

1081

10.

0337

90.

0810

80.

0058

60.

0000

0

730.

0230

40.

0540

50.

0060

90.

0000

00.

0003

80.

0000

0

740.

0667

10.

0681

80.

0287

60.

0454

50.

0051

20.

0227

3

750.

0680

70.

1025

60.

0271

40.

0256

40.

0039

90.

0256

4

conti

nued

...

75

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

760.

0299

70.

0277

80.

0083

90.

0277

80.

0005

80.

0000

0

770.

0047

10.

0476

20.

0008

10.

0000

00.

0000

20.

0000

0

780.

0501

10.

0652

20.

0178

70.

0217

40.

0020

90.

0000

0

790.

0609

50.

0697

70.

0233

70.

0465

10.

0031

70.

0232

6

800.

0891

40.

1290

30.

0392

30.

1290

30.

0070

80.

0322

6

810.

0544

30.

0263

20.

0234

40.

0000

00.

0042

50.

0000

0

820.

0134

40.

0256

40.

0030

60.

0000

00.

0001

40.

0000

0

830.

0217

00.

0681

80.

0065

40.

0000

00.

0005

60.

0000

0

840.

0388

80.

0000

00.

0139

30.

0000

00.

0017

00.

0000

0

850.

0238

80.

0263

20.

0074

90.

0263

20.

0007

00.

0000

0

860.

0720

30.

1142

90.

0317

80.

0285

70.

0059

20.

0000

0

870.

0121

60.

0540

50.

0034

50.

0270

30.

0002

80.

0000

0

880.

0662

90.

0952

40.

0263

70.

0476

20.

0038

70.

0238

1

conti

nued

...

76

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

890.

0478

40.

0512

80.

0197

00.

0000

00.

0032

60.

0000

0

900.

0575

10.

0444

40.

0224

60.

0444

40.

0032

10.

0222

2

910.

0171

60.

0434

80.

0050

10.

0000

00.

0004

10.

0000

0

920.

0305

70.

0238

10.

0118

70.

0238

10.

0017

80.

0238

1

930.

0028

10.

0000

00.

0004

10.

0000

00.

0000

10.

0000

0

940.

0273

50.

0769

20.

0084

20.

0000

00.

0007

40.

0000

0

950.

0491

10.

0204

10.

0210

80.

0000

00.

0038

30.

0000

0

960.

0175

70.

0333

30.

0042

70.

0000

00.

0002

30.

0000

0

970.

0146

60.

0175

40.

0045

70.

0175

40.

0004

40.

0000

0

980.

0744

20.

1363

60.

0352

50.

0909

10.

0077

30.

0227

3

990.

0091

80.

0000

00.

0020

00.

0000

00.

0000

90.

0000

0

100

0.03

238

0.05

263

0.01

139

0.05

263

0.00

135

0.00

000

77

Tab

le10

:E

stim

ated

Raw

Pro

bab

ilit

ies

(ER

P)

ofSB

Pfo

r

Cau

casi

anG

irls

that

exce

eddiff

eren

tquan

tile

sof

q(t

)(q=

.90,

.95,

.99)

by

SN

Man

dU

NM

.

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

10.

0560

00.

0000

00.

0239

60.

0000

00.

0043

00.

0000

0

20.

0923

00.

1500

00.

0441

30.

1000

00.

0097

50.

0000

0

30.

0250

30.

0000

00.

0072

70.

0000

00.

0005

70.

0000

0

40.

0092

50.

0000

00.

0018

30.

0000

00.

0000

70.

0000

0

50.

0344

90.

0370

40.

0095

00.

0370

40.

0006

30.

0000

0

60.

0862

70.

0000

00.

0415

20.

0000

00.

0093

90.

0000

0

70.

1362

60.

1290

30.

0738

40.

1290

30.

0210

20.

0322

6

80.

0956

70.

1333

30.

0408

40.

0666

70.

0068

20.

0333

3

90.

0993

50.

1428

60.

0448

90.

0000

00.

0085

70.

0000

0

100.

1151

60.

1724

10.

0442

80.

1034

50.

0054

80.

0000

0

conti

nued

...

78

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

110.

1156

50.

1282

10.

0507

60.

0512

80.

0088

00.

0256

4

120.

0803

10.

1000

00.

0310

10.

0750

00.

0041

40.

0000

0

130.

0459

50.

0588

20.

0144

40.

0392

20.

0012

60.

0000

0

140.

0396

10.

1250

00.

0125

70.

0208

30.

0011

50.

0000

0

150.

0522

80.

0909

10.

0192

60.

0227

30.

0024

40.

0000

0

160.

1383

30.

1851

90.

0696

70.

1481

50.

0165

50.

0740

7

170.

0673

40.

0454

50.

0283

20.

0454

50.

0047

70.

0227

3

180.

0643

50.

0277

80.

0233

70.

0277

80.

0027

70.

0000

0

190.

1272

40.

1555

60.

0581

20.

0666

70.

0109

60.

0222

2

200.

0657

70.

0816

30.

0216

90.

0204

10.

0020

10.

0204

1

210.

0657

30.

0625

00.

0264

60.

0312

50.

0040

20.

0000

0

220.

0537

20.

1034

50.

0166

80.

0000

00.

0013

80.

0000

0

230.

0375

70.

0571

40.

0128

60.

0285

70.

0014

20.

0000

0

conti

nued

...

79

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

240.

0588

70.

1000

00.

0184

80.

0000

00.

0015

40.

0000

0

250.

0996

10.

1052

60.

0392

00.

0526

30.

0052

80.

0263

2

260.

1341

50.

1875

00.

0638

70.

1250

00.

0132

40.

0000

0

270.

0093

80.

0303

00.

0011

80.

0000

00.

0000

10.

0000

0

280.

0314

70.

0571

40.

0054

80.

0000

00.

0001

10.

0000

0

290.

0730

70.

1111

10.

0231

60.

0555

60.

0019

00.

0000

0

300.

1631

80.

1612

90.

0830

60.

1290

30.

0197

20.

0645

2

310.

0781

30.

0625

00.

0328

10.

0312

50.

0053

90.

0000

0

320.

0303

20.

0571

40.

0083

00.

0285

70.

0005

50.

0000

0

330.

0730

40.

1428

60.

0304

00.

0238

10.

0049

30.

0000

0

340.

0293

40.

1081

10.

0091

60.

0270

30.

0008

30.

0000

0

350.

0320

70.

1000

00.

0097

30.

0333

30.

0008

20.

0000

0

360.

0654

40.

0625

00.

0255

30.

0312

50.

0035

90.

0000

0

conti

nued

...

80

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

370.

0244

30.

0500

00.

0068

10.

0000

00.

0004

80.

0000

0

380.

0653

40.

0967

70.

0228

80.

0322

60.

0024

60.

0000

0

390.

0819

50.

1282

10.

0180

40.

0000

00.

0005

30.

0000

0

400.

0796

40.

1428

60.

0243

40.

0857

10.

0017

80.

0000

0

410.

0126

20.

0250

00.

0038

90.

0000

00.

0000

90.

0000

0

420.

0593

20.

1034

50.

0289

20.

0689

70.

0029

30.

0000

0

430.

0457

70.

0909

10.

0188

20.

0227

30.

0010

40.

0000

0

440.

0199

10.

0454

50.

0063

20.

0454

50.

0001

50.

0000

0

450.

0172

40.

0512

80.

0063

90.

0000

00.

0002

80.

0000

0

460.

0433

80.

0384

60.

0200

20.

0000

00.

0017

30.

0000

0

470.

1302

70.

1282

10.

0841

30.

1282

10.

0215

40.

0512

8

480.

0819

70.

1428

60.

0455

80.

0857

10.

0071

70.

0285

7

490.

0734

10.

1111

10.

0375

70.

1111

10.

0044

10.

0000

0

conti

nued

...

81

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

500.

0202

30.

0666

70.

0068

30.

0000

00.

0002

10.

0000

0

510.

0266

60.

0625

00.

0069

40.

0000

00.

0004

10.

0000

0

520.

0387

00.

0882

40.

0138

90.

0000

00.

0017

00.

0000

0

530.

0426

90.

0588

20.

0141

40.

0000

00.

0014

10.

0000

0

540.

0412

30.

0833

30.

0130

20.

0833

30.

0011

60.

0416

7

550.

0312

70.

0540

50.

0111

40.

0540

50.

0013

80.

0000

0

560.

0100

10.

0526

30.

0019

40.

0000

00.

0000

60.

0000

0

570.

0221

00.

0294

10.

0063

30.

0000

00.

0004

90.

0000

0

580.

0365

50.

0294

10.

0106

00.

0294

10.

0007

80.

0000

0

590.

1061

30.

1612

90.

0512

00.

0967

70.

0113

00.

0322

6

600.

0180

40.

0000

00.

0043

20.

0000

00.

0002

20.

0000

0

610.

0240

10.

0588

20.

0066

20.

0588

20.

0004

60.

0000

0

620.

0630

90.

1034

50.

0239

60.

0344

80.

0031

60.

0000

0

conti

nued

...

82

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

630.

0262

10.

0384

60.

0083

30.

0000

00.

0008

00.

0000

0

640.

0276

00.

0322

60.

0086

30.

0322

60.

0007

90.

0000

0

650.

0133

30.

0000

00.

0025

50.

0000

00.

0000

80.

0000

0

660.

0040

10.

0000

00.

0006

80.

0000

00.

0000

20.

0000

0

670.

0314

00.

0357

10.

0106

60.

0000

00.

0011

70.

0000

0

680.

0266

80.

0000

00.

0086

10.

0000

00.

0008

60.

0000

0

690.

0685

20.

0952

40.

0288

40.

0476

20.

0048

30.

0000

0

700.

0466

00.

0454

50.

0164

00.

0000

00.

0018

80.

0000

0

710.

0664

60.

0526

30.

0284

10.

0526

30.

0049

60.

0000

0

720.

0525

90.

0476

20.

0207

10.

0476

20.

0030

50.

0000

0

730.

0287

20.

0869

60.

0076

50.

0000

00.

0004

70.

0000

0

740.

0694

50.

0800

00.

0298

40.

0400

00.

0052

50.

0000

0

750.

0392

70.

1000

00.

0116

80.

0000

00.

0009

10.

0000

0

conti

nued

...

83

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

760.

0321

30.

0500

00.

0106

60.

0500

00.

0011

10.

0000

0

770.

0010

80.

0000

00.

0001

30.

0000

00.

0000

00.

0000

0

780.

0394

20.

0416

70.

0121

00.

0000

00.

0010

10.

0000

0

790.

0263

00.

0000

00.

0064

20.

0000

00.

0003

20.

0000

0

800.

0359

00.

0666

70.

0124

10.

0666

70.

0014

00.

0000

0

810.

0390

60.

0434

80.

0136

40.

0000

00.

0015

60.

0000

0

820.

0133

60.

0344

80.

0029

50.

0000

00.

0001

30.

0000

0

830.

0211

90.

0357

10.

0065

60.

0000

00.

0006

00.

0000

0

840.

0280

40.

0000

00.

0095

30.

0000

00.

0010

60.

0000

0

850.

0355

40.

0454

50.

0123

40.

0454

50.

0014

10.

0000

0

860.

0493

80.

0588

20.

0198

40.

0000

00.

0030

90.

0000

0

870.

0018

20.

0000

00.

0002

80.

0000

00.

0000

10.

0000

0

880.

0567

00.

0476

20.

0229

10.

0000

00.

0035

50.

0000

0

conti

nued

...

84

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

890.

0217

80.

0000

00.

0077

00.

0000

00.

0009

60.

0000

0

900.

0321

80.

0000

00.

0103

20.

0000

00.

0009

90.

0000

0

910.

0296

70.

0689

70.

0112

60.

0000

00.

0016

10.

0000

0

920.

0157

70.

0000

00.

0051

70.

0000

00.

0005

60.

0000

0

930.

0027

40.

0000

00.

0004

10.

0000

00.

0000

10.

0000

0

940.

0216

40.

0909

10.

0064

40.

0000

00.

0005

40.

0000

0

950.

0340

90.

0416

70.

0140

30.

0000

00.

0023

90.

0000

0

960.

0437

80.

0555

60.

0149

70.

0000

00.

0016

10.

0000

0

970.

0148

50.

0312

50.

0047

60.

0312

50.

0004

90.

0000

0

980.

0144

40.

0476

20.

0042

10.

0476

20.

0003

50.

0000

0

990.

0013

20.

0000

00.

0001

60.

0000

00.

0000

00.

0000

0

100

0.03

632

0.00

000

0.01

567

0.00

000

0.00

295

0.00

000

85

Tab

le11

:E

stim

ated

Raw

Pro

bab

ilit

ies

(ER

P)

ofSB

Pfo

r

Afr

ican

Am

eric

anG

irls

that

exce

eddiff

eren

tquan

tile

sof

y q(t

)(q=

.90,

.95,

.99)

by

SN

Man

dU

NM

.

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

10.

0241

60.

0000

00.

0072

00.

0000

00.

0006

10

20.

1044

60.

0666

70.

0408

30.

0333

30.

0053

80

30.

0690

60.

1111

10.

0202

90.

0000

00.

0013

90

40.

0900

50.

1428

60.

0387

00.

0476

20.

0066

30

50.

0975

10.

0909

10.

0407

00.

0909

10.

0064

10

60.

1352

50.

1724

10.

0540

80.

1379

30.

0071

50

70.

0445

90.

0526

30.

0133

30.

0263

20.

0010

40

80.

1022

90.

0882

40.

0358

30.

0294

10.

0035

60

90.

1543

40.

1842

10.

0779

10.

1052

60.

0183

30.

0263

1579

100.

0989

50.

1219

50.

0318

10.

0487

80.

0025

30

conti

nued

...

86

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

110.

0284

00.

0333

30.

0080

40.

0166

70.

0005

80

120.

0415

90.

0461

50.

0123

80.

0307

70.

0009

60

130.

0743

00.

0925

90.

0269

80.

0555

60.

0031

30

140.

1220

00.

1698

10.

0529

90.

0943

40.

0088

60

150.

0630

60.

1090

90.

0208

90.

0363

60.

0019

80

160.

0468

00.

0571

40.

0132

20.

0142

90.

0008

80

170.

0692

50.

1014

50.

0244

00.

0579

70.

0026

60.

0144

9275

180.

1862

80.

1780

80.

0923

90.

1369

90.

0200

60.

0410

9589

190.

0864

00.

1343

30.

0308

20.

0298

50.

0033

20

200.

0956

40.

1379

30.

0383

40.

0517

20.

0054

60

210.

0924

30.

1454

50.

0365

50.

0727

30.

0050

50.

0181

8182

220.

0756

00.

1372

50.

0334

20.

0392

20.

0062

50.

0196

0784

230.

0991

90.

1111

10.

0376

50.

0666

70.

0046

30

conti

nued

...

87

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

240.

1146

40.

1538

50.

0492

90.

0576

90.

0081

10.

0384

6154

250.

1489

70.

1864

40.

0658

60.

0678

00.

0110

80

260.

0879

90.

0740

70.

0336

00.

0185

20.

0042

90

270.

0963

70.

1578

90.

0375

90.

0877

20.

0049

80.

0350

8772

280.

1542

90.

2000

00.

0728

30.

1000

00.

0144

10.

02

290.

1651

20.

2280

70.

0803

70.

0701

80.

0169

90.

0175

4386

300.

0949

90.

1343

30.

0327

90.

0447

80.

0031

80

310.

0755

30.

0576

90.

0316

10.

0192

30.

0051

80.

0192

3077

320.

0719

10.

1282

10.

0243

00.

0769

20.

0023

50

330.

1298

20.

1739

10.

0604

30.

1304

30.

0118

80.

0217

3913

340.

0943

80.

1521

70.

0397

90.

0434

80.

0064

20

350.

1040

20.

1800

00.

0429

20.

0600

00.

0064

60

360.

0991

40.

1509

40.

0439

60.

0377

40.

0079

80.

0188

6792

conti

nued

...

88

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

370.

1343

10.

1052

60.

0580

20.

0789

50.

0093

70.

0526

3158

380.

1120

50.

1458

30.

0478

40.

0833

30.

0077

50.

0208

3333

390.

0979

90.

1607

10.

0404

90.

0357

10.

0061

80

400.

0616

80.

1250

00.

0171

50.

0535

70.

0010

30

410.

0732

70.

0909

10.

0362

00.

0681

80.

0037

20

420.

0428

80.

0769

20.

0178

30.

0256

40.

0010

40

430.

1036

80.

1600

00.

0557

40.

1000

00.

0074

30

440.

0458

60.

0294

10.

0209

70.

0000

00.

0017

40

450.

1376

70.

1578

90.

0821

80.

1315

80.

0155

60.

0526

3158

460.

0843

80.

1063

80.

0446

10.

0425

50.

0057

90.

0212

766

470.

0539

20.

0909

10.

0206

90.

0303

00.

0008

40.

0303

0303

480.

0383

60.

0909

10.

0174

90.

0303

00.

0014

80

490.

0821

90.

1521

70.

0428

60.

0869

60.

0053

00

conti

nued

...

89

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

500.

0850

40.

0930

20.

0474

80.

0697

70.

0075

50.

0232

5581

510.

0081

80.

0000

00.

0018

20.

0000

00.

0000

90

520.

0841

00.

1212

10.

0329

20.

0909

10.

0044

90

530.

0852

60.

0909

10.

0426

20.

0909

10.

0104

80.

0454

5455

540.

0711

20.

0857

10.

0270

30.

0285

70.

0035

10.

0285

7143

550.

0471

50.

0769

20.

0170

00.

0384

60.

0020

60.

0384

6154

560.

1350

60.

1891

90.

0603

80.

1621

60.

0106

20

570.

0999

00.

1200

00.

0459

00.

0800

00.

0090

70

580.

0564

00.

0434

80.

0200

40.

0434

80.

0022

90

590.

1116

20.

1562

50.

0519

50.

0937

50.

0104

30.

0312

5

600.

0075

70.

0000

00.

0010

30.

0000

00.

0000

20

610.

0720

70.

0769

20.

0329

10.

0000

00.

0066

50

620.

0859

10.

0952

40.

0357

60.

0476

20.

0056

60.

0476

1905

conti

nued

...

90

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

630.

0413

50.

0454

50.

0135

40.

0454

50.

0013

20

640.

1535

20.

1875

00.

0844

50.

1250

00.

0244

30

650.

0616

30.

1034

50.

0221

10.

0689

70.

0025

50

660.

0550

90.

1052

60.

0212

00.

0526

30.

0029

40

670.

0705

90.

2105

30.

0292

00.

1052

60.

0046

80

680.

0524

20.

1034

50.

0196

30.

0689

70.

0025

70

690.

1594

00.

1200

00.

0954

20.

0800

00.

0336

40.

08

700.

0881

00.

1851

90.

0358

00.

0740

70.

0053

20

710.

0419

40.

0000

00.

0155

00.

0000

00.

0020

20

720.

1219

70.

1875

00.

0576

40.

1250

00.

0118

50

730.

0162

30.

0000

00.

0042

50.

0000

00.

0002

70

740.

0680

60.

0526

30.

0304

90.

0526

30.

0059

30.

0526

3158

750.

1010

60.

1052

60.

0493

90.

0526

30.

0113

00.

0526

3158

conti

nued

...

91

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

760.

0192

30.

0000

00.

0032

50.

0000

00.

0000

70

770.

0114

40.

0952

40.

0023

80.

0000

00.

0000

90

780.

0643

60.

0909

10.

0266

50.

0454

50.

0043

20

790.

0779

20.

1000

00.

0339

50.

0666

70.

0060

80.

0333

3333

800.

1523

10.

1875

00.

0753

60.

1875

00.

0167

80.

0625

810.

0749

00.

0000

00.

0393

00.

0000

00.

0109

20

820.

0173

40.

0000

00.

0047

60.

0000

00.

0003

40

830.

0254

30.

1250

00.

0076

70.

0000

00.

0006

50

840.

0625

10.

0000

00.

0243

80.

0000

00.

0034

30

850.

0134

40.

0000

00.

0036

80.

0000

00.

0002

70

860.

1009

80.

1666

70.

0489

10.

0555

60.

0109

50

870.

0741

90.

1818

20.

0373

80.

0909

10.

0094

70

880.

0789

50.

1428

60.

0311

30.

0952

40.

0043

60.

0476

1905

conti

nued

...

92

...c

onti

nued

Dat

aE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

PE

RP

ofSB

P

Set

s≥y .

90(t

)≥y .

90(t

)≥y .

95(t

)≥y .

95(t

)≥y .

99(t

)≥y .

99(t

)

by

SN

Mby

UN

Mby

SN

Mby

UN

Mby

SN

Mby

UN

M

890.

0822

50.

1052

60.

0363

10.

0000

00.

0066

70

900.

0985

90.

1000

00.

0462

50.

1000

00.

0096

10.

05

910.

0012

90.

0000

00.

0001

00.

0000

00.

0000

00

920.

0706

90.

0666

70.

0338

50.

0666

70.

0076

60.

0666

6667

930.

0036

80.

0000

00.

0005

60.

0000

00.

0000

10

940.

0388

70.

0588

20.

0128

80.

0000

00.

0013

00

950.

0666

20.

0000

00.

0294

00.

0000

00.

0055

20

960.

0007

60.

0000

00.

0000

50.

0000

00.

0000

00

970.

0159

70.

0000

00.

0050

00.

0000

00.

0004

90

980.

1479

50.

2173

90.

0839

30.

1304

30.

0262

30.

0434

7826

990.

0244

70.

0000

00.

0070

30.

0000

00.

0005

30

100

0.02

870

0.08

333

0.00

839

0.08

333

0.00

065

0

93

Table 12: Local linear smoothing estimates for µ(t) and

σ(t) for 100 data sets of entire cohort.

Data Sets Smoothed Mean Smoothed SD

1 4.607932 0.08324157

2 4.61047 0.08305636

3 4.612976 0.08287314

4 4.615447 0.08269112

5 4.617884 0.08251192

6 4.620288 0.08233448

7 4.622656 0.08215882

8 4.624989 0.08198541

9 4.627286 0.08181361

10 4.629547 0.08164475

11 4.631771 0.08147822

12 4.633957 0.0813142

13 4.636104 0.08115307

14 4.638213 0.08099445

15 4.640281 0.08083881

16 4.642309 0.08068629

17 4.644294 0.08053695

18 4.646237 0.08039086

19 4.648137 0.08024827

20 4.649992 0.08010934

21 4.651802 0.07997427

continued . . .

94

. . . continued


22 4.653567 0.07984321

23 4.655285 0.07971635

24 4.656956 0.07959387

25 4.65858 0.07947592

26 4.660157 0.07936274

27 4.661685 0.07925439

28 4.663166 0.07915105

29 4.664599 0.0790529

30 4.665984 0.07895995

31 4.667322 0.07887228

32 4.668613 0.07879008

33 4.669857 0.07871307

34 4.671056 0.07864152

35 4.672209 0.07857545

36 4.673319 0.07851449

37 4.674386 0.07845881

38 4.675411 0.07840828

39 4.676396 0.07836244

40 4.677342 0.07832134

41 4.678252 0.07828468

42 4.679125 0.07825221

43 4.679965 0.07822363

44 4.680772 0.07819863

continued . . .

95

. . . continued


45 4.681549 0.07817689

46 4.682298 0.07815806

47 4.683019 0.07814178

48 4.683715 0.07812769

49 4.684388 0.07811543

50 4.685038 0.07810463

51 4.685669 0.07809494

52 4.68628 0.07808602

53 4.686874 0.07807754

54 4.687451 0.07806919

55 4.688013 0.07806069

56 4.688561 0.07805178

57 4.689094 0.07804222

58 4.689615 0.07803181

59 4.690124 0.07802037

60 4.69062 0.07800776

61 4.691104 0.07799387

62 4.691577 0.07797857

63 4.692037 0.07796183

64 4.692486 0.07794371

65 4.692922 0.07792409

66 4.693345 0.07790303

67 4.693755 0.07788049

continued . . .

96

. . . continued


68 4.694152 0.07785662

69 4.694534 0.07783156

70 4.694902 0.07780524

71 4.695255 0.07777797

72 4.695592 0.07774966

73 4.695913 0.07772052

74 4.696218 0.07769065

75 4.696505 0.07766014

76 4.696774 0.07762916

77 4.697026 0.07759781

78 4.697259 0.07756618

79 4.697474 0.07753438

80 4.69767 0.0775025

81 4.697848 0.07747058

82 4.698008 0.07743869

83 4.698149 0.07740685

84 4.698271 0.07737503

85 4.698377 0.0773432

86 4.698465 0.07731135

87 4.698536 0.07727941

88 4.698592 0.0772471

89 4.698632 0.07721448

90 4.698658 0.07718144

continued . . .

97

. . . continued


91 4.69867 0.07714765

92 4.69867 0.07711316

93 4.698658 0.07707707

94 4.698636 0.07704011

95 4.698603 0.07700127

96 4.698563 0.07696029

97 4.698515 0.07691734

98 4.698462 0.0768714

99 4.698404 0.07682238

100 4.698343 0.07676942

98

Table 13: Girls with median height and age specific log

scaled SBP percentile values.

Age 90th Percentile 95th Percentile 99th Percentile

9.1 4.733495 4.766604 4.825591

9.2 4.735181 4.768234 4.827128

9.3 4.736867 4.769866 4.828667

9.4 4.738555 4.771499 4.830206

9.5 4.740243 4.773133 4.831747

9.6 4.741932 4.774767 4.833288

9.7 4.74362 4.776401 4.834829

9.8 4.745308 4.778034 4.836369

9.9 4.746994 4.779666 4.837909

10 4.748679 4.781297 4.839448

10.1 4.750362 4.782926 4.840985

10.2 4.752043 4.784553 4.84252

10.3 4.753721 4.786177 4.844053

10.4 4.755396 4.787799 4.845584

10.5 4.757067 4.789417 4.847111

10.6 4.758735 4.791032 4.848635

10.7 4.760398 4.792642 4.850156

10.8 4.762057 4.794248 4.851672

10.9 4.76371 4.795849 4.853184

11 4.765358 4.797445 4.854691

11.1 4.767001 4.799036 4.856193

continued . . .

99

. . . continued


11.2 4.768637 4.80062 4.857689

11.3 4.770266 4.802199 4.85918

11.4 4.771889 4.80377 4.860665

11.5 4.773504 4.805335 4.862143

11.6 4.775112 4.806892 4.863614

11.7 4.776711 4.808441 4.865078

11.8 4.778302 4.809983 4.866535

11.9 4.779884 4.811516 4.867984

12 4.781458 4.81304 4.869424

12.1 4.783021 4.814555 4.870856

12.2 4.784575 4.816061 4.87228

12.3 4.786119 4.817557 4.873694

12.4 4.787652 4.819042 4.875098

12.5 4.789174 4.820517 4.876493

12.6 4.790685 4.821982 4.877878

12.7 4.792184 4.823435 4.879252

12.8 4.793672 4.824877 4.880616

12.9 4.795147 4.826306 4.881968

13 4.796609 4.827724 4.883309

13.1 4.798059 4.829129 4.884638

13.2 4.799495 4.830522 4.885955

13.3 4.800918 4.831901 4.88726

13.4 4.802326 4.833266 4.888552

continued . . .

100

. . . continued


13.5 4.803721 4.834618 4.889832

13.6 4.8051 4.835956 4.891098

13.7 4.806465 4.837279 4.89235

13.8 4.807815 4.838588 4.893588

13.9 4.809148 4.839881 4.894813

14 4.810466 4.841159 4.896023

14.1 4.811768 4.842422 4.897218

14.2 4.813053 4.843668 4.898397

14.3 4.814321 4.844898 4.899562

14.4 4.815572 4.846111 4.900711

14.5 4.816806 4.847308 4.901844

14.6 4.818021 4.848487 4.90296

14.7 4.819219 4.849648 4.90406

14.8 4.820398 4.850792 4.905143

14.9 4.821558 4.851917 4.906209

15 4.822698 4.853024 4.907257

15.1 4.82382 4.854112 4.908288

15.2 4.824922 4.855181 4.9093

15.3 4.826003 4.85623 4.910294

15.4 4.827064 4.85726 4.91127

15.5 4.828105 4.858269 4.912226

15.6 4.829124 4.859259 4.913164

15.7 4.830122 4.860227 4.914081

continued . . .

101

. . . continued


15.8 4.831099 4.861174 4.914979

15.9 4.832053 4.862101 4.915857

16 4.832985 4.863005 4.916714

16.1 4.833895 4.863888 4.917551

16.2 4.834782 4.864748 4.918366

16.3 4.835645 4.865586 4.91916

16.4 4.836485 4.866401 4.919933

16.5 4.837301 4.867193 4.920683

16.6 4.838093 4.867962 4.921412

16.7 4.83886 4.868706 4.922118

16.8 4.839602 4.869427 4.922801

16.9 4.84032 4.870123 4.923461

17 4.841011 4.870795 4.924098

17.1 4.841678 4.871441 4.924711

17.2 4.842318 4.872063 4.9253

17.3 4.842931 4.872658 4.925864

17.4 4.843518 4.873228 4.926404

17.5 4.844077 4.873771 4.92692

17.6 4.84461 4.874288 4.92741

17.7 4.845114 4.874777 4.927874

17.8 4.845591 4.87524 4.928313

17.9 4.846039 4.875675 4.928725

18 4.846458 4.876082 4.929111

continued . . .

102

. . . continued


18.1 4.846848 4.876461 4.929471

18.2 4.847209 4.876811 4.929803

18.3 4.84754 4.877133 4.930108

18.4 4.847841 4.877425 4.930385

18.5 4.848112 4.877688 4.930634

18.6 4.848352 4.87792 4.930855

18.7 4.84856 4.878123 4.931047

18.8 4.848738 4.878295 4.93121

18.9 4.848883 4.878436 4.931344

19 4.848996 4.878546 4.931448

103

Table 14: Smoothing probabilities by local linear smooth-

ing estimator and Nadaraya-Watson Kernel smoothing

estimator for entire cohort.

Age 1-p90 by SNM 1-p95 by SNM 1-np90 by UNM 1-np95 by UNM

9.1 0.05803788 0.022803294 0.07195069 0.03444722

9.2 0.05811078 0.022827506 0.07224472 0.03451883

9.3 0.05817845 0.022849613 0.07253242 0.03458641

9.4 0.05824079 0.022869543 0.07281305 0.03465001

9.5 0.05829699 0.02288699 0.07308645 0.03470968

9.6 0.05834656 0.022901787 0.073352 0.03476539

9.7 0.05838847 0.022913498 0.07360678 0.0348157

9.8 0.05842182 0.022921738 0.0738512 0.034861

9.9 0.05844624 0.022926387 0.07408485 0.03490098

10 0.05846104 0.022927176 0.07430589 0.03493536

10.1 0.05846525 0.022923712 0.07451408 0.03496384

10.2 0.05845849 0.022915888 0.07470823 0.03498642

10.3 0.05843926 0.022903063 0.07488735 0.03500228

10.4 0.05840743 0.02288521 0.07505088 0.03501158

10.5 0.0583618 0.022861864 0.07519683 0.03501363

10.6 0.05830167 0.022832749 0.07532518 0.03500853

10.7 0.05822601 0.022797462 0.07543442 0.03499562

10.8 0.05813429 0.02275582 0.07552378 0.03497503

10.9 0.05802547 0.022707414 0.07559211 0.0349461

11 0.0578991 0.022652075 0.07563932 0.03490875

continued . . .

104

. . . continued


11.1 0.05775452 0.022589587 0.07566352 0.03486276

11.2 0.05759073 0.022519531 0.07566411 0.03480789

11.3 0.0574073 0.022441781 0.07564059 0.0347441

11.4 0.05720408 0.022356339 0.07559289 0.0346713

11.5 0.05697988 0.02226268 0.07551881 0.0345888

11.6 0.05673479 0.022160899 0.07541893 0.03449675

11.7 0.05646867 0.022050977 0.07529256 0.03439532

11.8 0.05618111 0.021932769 0.07513914 0.03428418

11.9 0.055872 0.021806263 0.07495833 0.03416328

12 0.05554136 0.021671495 0.07474988 0.03403258

12.1 0.0551893 0.02152854 0.07451364 0.03389208

12.2 0.05481605 0.021377518 0.07424957 0.03374178

12.3 0.05442196 0.021218597 0.0739577 0.03358171

12.4 0.05400749 0.021051985 0.07363819 0.03341195

12.5 0.05357321 0.020877935 0.07329128 0.03323256

12.6 0.0531198 0.020696741 0.07291732 0.03304365

12.7 0.05264806 0.02050874 0.07251677 0.03284533

12.8 0.05215888 0.020314301 0.07209016 0.03263773

12.9 0.05165323 0.020113831 0.07163812 0.03242101

13 0.0511322 0.019907766 0.07116138 0.03219532

13.1 0.05059693 0.01969657 0.07066074 0.03196083

13.2 0.05004863 0.019480728 0.07013708 0.03171773

13.3 0.04948859 0.019260743 0.06959136 0.03146619

continued . . .

105

. . . continued


13.4 0.04891811 0.019037134 0.0690246 0.03120641

13.5 0.04833856 0.018810425 0.06843787 0.03093859

13.6 0.04775131 0.018581147 0.06783231 0.03066294

13.7 0.04715775 0.018349829 0.06720909 0.03037967

13.8 0.04655925 0.018116996 0.06656944 0.03008899

13.9 0.0459572 0.017883164 0.06591459 0.02979113

14 0.04535295 0.017648834 0.06524583 0.0294863

14.1 0.0447478 0.017414491 0.06456446 0.02917475

14.2 0.04414304 0.017180602 0.06387178 0.02885671

14.3 0.04353989 0.016947609 0.06316913 0.02853245

14.4 0.04293952 0.01671593 0.06245782 0.02820223

14.5 0.04234302 0.016485956 0.06173919 0.02786633

14.6 0.04175143 0.01625805 0.06101456 0.02752503

14.7 0.04116572 0.016032546 0.06028523 0.02717865

14.8 0.04058677 0.015809747 0.05955252 0.02682753

14.9 0.04001538 0.015589929 0.05881769 0.026472

15 0.0394523 0.015373336 0.05808202 0.02611243

15.1 0.03889817 0.015160184 0.05734674 0.02574921

15.2 0.03835357 0.014950662 0.05661306 0.02538274

15.3 0.037819 0.014744931 0.05588216 0.02501347

15.4 0.03729489 0.014543128 0.05515519 0.02464182

15.5 0.03678161 0.014345365 0.05443325 0.02426827

15.6 0.03627945 0.014151734 0.05371742 0.0238933

continued . . .

106

. . . continued


15.7 0.03578865 0.013962305 0.05300872 0.0235174

15.8 0.03530938 0.01377713 0.05230815 0.02314108

15.9 0.03484177 0.013596244 0.05161663 0.02276487

16 0.0343859 0.013419668 0.05093506 0.02238928

16.1 0.0339418 0.01324741 0.05026429 0.02201485

16.2 0.03350946 0.013079465 0.04960508 0.0216421

16.3 0.03308885 0.012915818 0.04895819 0.02127156

16.4 0.0326799 0.012756447 0.04832429 0.02090374

16.5 0.03228249 0.012601319 0.04770401 0.02053916

16.6 0.03189654 0.012450415 0.04709821 0.02017844

16.7 0.03152149 0.012303487 0.04650637 0.01982162

16.8 0.03115794 0.012160876 0.04593015 0.01946977

16.9 0.03080541 0.012022344 0.04536957 0.01912283

17 0.03046354 0.011887786 0.04482458 0.01878114

17.1 0.03013192 0.011756999 0.04429566 0.01844542

17.2 0.02981092 0.011630239 0.0437832 0.01811609

17.3 0.02950006 0.011507334 0.04328741 0.01779333

17.4 0.02919899 0.011388082 0.04280823 0.01747768

17.5 0.02890791 0.011272732 0.04234566 0.01716916

17.6 0.02862638 0.011161014 0.04190018 0.01686785

17.7 0.02835416 0.011052894 0.04147152 0.01657398

17.8 0.02809098 0.01094827 0.04105959 0.01628787

17.9 0.0278365 0.010847015 0.04066411 0.01600972

continued . . .

107

. . . continued


18 0.0275908 0.010749217 0.0402854 0.0157398

18.1 0.02735343 0.010654692 0.03992308 0.01547763

18.2 0.02712425 0.010563398 0.03957669 0.01522356

18.3 0.02690254 0.010475015 0.03924569 0.01497729

18.4 0.02668897 0.010389954 0.03893077 0.01473981

18.5 0.02648311 0.010307996 0.03863127 0.01451039

18.6 0.02628428 0.01022884 0.03834664 0.01428872

18.7 0.02609273 0.010152629 0.03807706 0.01407526

18.8 0.02590809 0.010079245 0.03782172 0.01386954

18.9 0.02573003 0.010008548 0.03758043 0.01367179

19 0.02555815 0.009940343 0.03735263 0.01348134

108

Table 15: Some values of bandwidth for entire cohort, Caucasian cohort and AfricanAmerican cohort obtained by AIC cross validation method. Cross validation scoresare given in the parenthesis.

Prob. Entire Cohort Caucasian Cohort African American Cohort1-p90 1.488871 (-6.26955) 20353050(-5.97297) 2.48544 (-5.43709)1-p95 2.450879 (-7.58673) 63760225(-7.18375) 2.877118 (-6.61335)1-p99 28514439 (-10.5807) 5637720 (-9.94674) 17.86149 (-9.18483)1-np90 1.148214 (-5.60183) 1.443465(-5.23541) 1.767473 (-4.62549)1-np95 2.451945 (-6.26849) 15246194(-5.63706) 3.393213 (-5.27116)1-np99 3.497903 (-7.95842) 11684023(-7.60147) 4.861644 (-6.83946)

109

Table 16: Some values of bandwidth for entire cohort, Caucasian cohort and AfricanAmerican cohort obtained by LS cross validation method. Cross validation scores aregiven in the parenthesis.

Prob. Entire Cohort Caucasian Cohort African American Cohort1-p90 1.487693 (0.000683) 13116003 (0.000916) 2.59700 (0.001572)1-p95 2.452068 (0.000182) 57713608 (0.000272) 2.92834 (0.000484)1-p99 22484681 (0.0000091) 4244505 (0.000017) 22.9261 (0.000036)1-np90 1.213203 ( 0.00134) 1.507063 (0.001940) 1.85182 (0.003550)1-np95 2.924783 (0.000687) 50740390 (0.001281) 3.53802 (0.001855)1-np99 2.705329 (0.000124) 1.122294 (0.000181) 3.87748 (0.000383)

110

Tab

le17

:M

Les

tim

ator

san

dlo

cal

pol

ynom

ial

smoot

h-

ing

esti

mat

ors

ofL

amb

da

wit

hth

eir

corr

esp

ondin

gm

ean,

min

imum

and

max

imum

ofea

chsu

b-s

ample

s.

Age

Raw

Lam

bda

Raw

mea

nR

awm

inR

awm

axSm

oot

hL

amb

da

Sm

oot

hm

ean

Sm

oot

hm

inSm

oot

hm

ax

9.1

0.40

013

.211

11.9

9914

.467

0.44

515

.126

13.6

4016

.677

9.2

0.80

048

.650

39.5

4357

.094

0.43

214

.621

12.8

8516

.125

9.3

2.00

049

32.7

8626

64.0

0067

27.5

000.

419

13.9

5112

.005

15.0

84

9.4

1.70

014

50.9

4888

5.02

719

85.4

190.

405

13.3

7811

.644

14.6

38

9.5

0.00

04.

604

4.38

24.

812

0.39

112

.936

11.6

3914

.241

9.6

2.00

051

48.3

1028

12.0

0075

64.0

000.

377

12.4

6210

.863

13.6

36

9.7

0.60

024

.799

21.2

6428

.965

0.36

311

.922

10.7

0713

.285

9.8

0.60

024

.888

21.7

8328

.387

0.34

911

.465

10.4

6912

.539

9.9

0.40

013

.296

12.0

7015

.019

0.33

410

.972

10.0

6212

.236

10.0

1.00

010

1.08

982

.000

119.

000

0.32

010

.589

9.71

911

.326

10.1

0.70

034

.914

27.6

3641

.690

0.30

510

.143

8.90

411

.187

10.2

-0.1

003.

701

3.55

63.

849

0.29

09.

729

8.88

410

.665

conti

nued

...

111

...c

onti

nued

Age

Raw

Lam

bda

Raw

mea

nR

awm

inR

awm

axSm

oot

hL

amb

da

Sm

oot

hm

ean

Sm

oot

hm

inSm

oot

hm

ax

10.3

0.90

069

.719

52.3

5384

.588

0.27

59.

295

8.23

810

.079

10.4

0.10

05.

880

5.49

96.

219

0.26

08.

948

8.16

59.

669

10.5

0.80

048

.900

40.3

7857

.094

0.24

48.

540

7.84

89.

145

10.6

-0.6

001.

563

1.54

71.

577

0.22

98.

234

7.57

88.

968

10.7

-0.9

001.

094

1.09

01.

097

0.21

37.

867

7.31

78.

514

10.8

0.00

04.

634

4.35

74.

942

0.19

87.

592

6.91

58.

384

10.9

0.40

013

.455

11.7

0815

.073

0.18

27.

280

6.62

57.

857

11.0

0.50

018

.242

15.4

3620

.804

0.16

76.

978

6.35

07.

506

11.1

-0.1

003.

709

3.54

83.

859

0.15

16.

715

6.21

57.

208

11.2

0.10

05.

892

5.48

06.

270

0.13

66.

448

5.96

36.

895

11.3

0.30

010

.062

9.17

010

.923

0.12

06.

204

5.81

06.

573

11.4

-0.6

001.

564

1.55

11.

578

0.10

55.

977

5.65

76.

387

11.5

1.00

010

4.25

681

.000

127.

000

0.08

95.

772

5.40

26.

076

11.6

0.40

013

.565

11.9

2715

.443

0.07

45.

554

5.18

05.

951

11.7

-1.3

000.

767

0.76

70.

768

0.05

95.

332

5.08

45.

644

conti

nued

...

112

...c

onti

nued

Age

Raw

Lam

bda

Raw

mea

nR

awm

inR

awm

axSm

oot

hL

amb

da

Sm

oot

hm

ean

Sm

oot

hm

inSm

oot

hm

ax

11.8

0.00

04.

661

4.44

34.

875

0.04

55.

181

4.91

45.

446

11.9

0.90

072

.725

56.8

8491

.346

0.03

05.

002

4.69

95.

295

12.0

-0.1

003.

723

3.59

53.

854

0.01

64.

835

4.61

65.

061

12.1

1.20

022

3.54

216

1.72

230

4.57

70.

002

4.68

14.

414

4.94

4

12.2

-0.5

001.

806

1.78

61.

824

-0.0

124.

540

4.35

34.

726

12.3

-0.1

003.

725

3.58

73.

881

-0.0

254.

402

4.20

84.

627

12.4

-0.2

003.

029

2.94

43.

105

-0.0

374.

273

4.09

44.

438

12.5

-0.6

001.

564

1.55

31.

576

-0.0

504.

157

4.01

54.

307

12.6

-0.2

003.

032

2.92

93.

123

-0.0

614.

055

3.86

24.

231

12.7

1.00

010

5.15

781

.000

133.

000

-0.0

723.

956

3.77

34.

123

12.8

-0.5

001.

806

1.78

81.

825

-0.0

833.

869

3.74

84.

010

12.9

0.40

013

.784

12.4

8815

.180

-0.0

933.

797

3.66

33.

930

13.0

0.80

051

.196

42.0

3560

.135

-0.1

023.

713

3.56

43.

835

13.1

-1.1

000.

904

0.90

30.

905

-0.1

113.

651

3.55

73.

758

13.2

-0.8

001.

220

1.21

61.

226

-0.1

193.

587

3.48

73.

735

conti

nued

...

113

...c

onti

nued

Age

Raw

Lam

bda

Raw

mea

nR

awm

inR

awm

axSm

oot

hL

amb

da

Sm

oot

hm

ean

Sm

oot

hm

inSm

oot

hm

ax

13.3

-1.1

000.

904

0.90

30.

905

-0.1

263.

537

3.43

03.

665

13.4

0.20

07.

732

7.18

68.

256

-0.1

323.

487

3.36

83.

595

13.5

-1.1

000.

904

0.90

30.

905

-0.1

383.

449

3.34

93.

575

13.6

-0.6

001.

566

1.55

51.

581

-0.1

423.

415

3.32

93.

549

13.7

-1.3

000.

767

0.76

70.

768

-0.1

463.

390

3.28

63.

540

13.8

-1.2

000.

830

0.82

90.

831

-0.1

493.

362

3.26

63.

479

13.9

-0.1

003.

738

3.60

23.

849

-0.1

513.

353

3.24

63.

441

14.0

-0.3

002.

514

2.45

42.

570

-0.1

533.

343

3.22

63.

456

14.1

0.70

036

.087

29.2

6641

.690

-0.1

533.

334

3.19

23.

431

14.2

1.00

010

7.10

885

.000

126.

000

-0.1

533.

343

3.23

13.

422

14.3

-1.1

000.

904

0.90

30.

905

-0.1

523.

349

3.25

63.

467

14.4

-0.1

003.

738

3.62

43.

899

-0.1

503.

366

3.27

53.

493

14.5

-0.9

001.

095

1.09

11.

098

-0.1

473.

383

3.28

23.

493

14.6

0.00

04.

685

4.40

74.

920

-0.1

433.

413

3.26

83.

531

14.7

-1.3

000.

767

0.76

70.

768

-0.1

393.

440

3.35

23.

550

conti

nued

...

114

...c

onti

nued

Age

Raw

Lam

bda

Raw

mea

nR

awm

inR

awm

axSm

oot

hL

amb

da

Sm

oot

hm

ean

Sm

oot

hm

inSm

oot

hm

ax

14.8

0.70

036

.513

31.9

0441

.922

-0.1

343.

480

3.38

13.

581

14.9

-1.1

000.

904

0.90

30.

905

-0.1

283.

534

3.42

73.

647

15.0

-1.4

000.

713

0.71

30.

713

-0.1

223.

565

3.49

73.

666

15.1

1.10

015

8.22

812

8.97

418

8.12

4-0

.114

3.63

03.

523

3.72

3

15.2

-1.2

000.

830

0.83

00.

831

-0.1

073.

690

3.57

33.

840

15.3

0.60

025

.830

22.6

3128

.965

-0.0

993.

743

3.61

33.

857

15.4

0.20

07.

793

7.35

28.

236

-0.0

903.

831

3.71

63.

943

15.5

-1.0

000.

991

0.98

90.

993

-0.0

813.

899

3.77

34.

046

15.6

0.30

010

.293

9.56

711

.057

-0.0

713.

990

3.85

94.

120

15.7

1.00

010

6.37

285

.000

127.

000

-0.0

614.

065

3.89

94.

198

15.8

-1.3

000.

767

0.76

70.

768

-0.0

514.

169

4.03

94.

324

15.9

0.00

04.

696

4.48

94.

963

-0.0

404.

278

4.10

64.

498

16.0

0.50

019

.018

17.0

7920

.978

-0.0

294.

391

4.22

44.

548

16.1

1.60

011

70.2

3483

6.26

415

25.2

33-0

.018

4.50

64.

318

4.66

2

16.2

-0.5

001.

810

1.79

41.

828

-0.0

074.

626

4.46

94.

818

conti

nued

...

115

...c

onti

nued

Age

Raw

Lam

bda

Raw

mea

nR

awm

inR

awm

axSm

oot

hL

amb

da

Sm

oot

hm

ean

Sm

oot

hm

inSm

oot

hm

ax

16.3

0.10

05.

979

5.68

36.

232

0.00

44.

730

4.54

04.

891

16.4

0.30

010

.297

9.48

111

.409

0.01

54.

866

4.64

65.

149

16.5

-1.6

000.

625

0.62

50.

625

0.02

75.

013

4.84

25.

299

16.6

-1.0

000.

991

0.98

80.

994

0.03

85.

145

4.81

25.

599

16.7

0.00

04.

677

4.48

94.

844

0.04

95.

263

5.02

65.

474

16.8

0.60

026

.204

22.9

6428

.965

0.06

15.

430

5.16

05.

643

16.9

-1.3

000.

768

0.76

70.

768

0.07

25.

596

5.35

95.

945

17.0

-0.4

002.

118

2.07

92.

149

0.08

35.

747

5.39

16.

056

17.1

2.00

060

80.9

1633

61.5

0079

37.5

000.

094

5.90

75.

461

6.12

4

17.2

1.20

023

1.35

918

6.09

227

5.39

20.

105

6.05

85.

769

6.30

0

17.3

-0.5

001.

808

1.79

11.

834

0.11

66.

229

5.93

96.

739

17.4

1.70

017

61.4

7512

81.7

1421

29.3

550.

126

6.42

36.

095

6.63

3

17.5

-1.3

000.

767

0.76

70.

768

0.13

66.

568

6.25

27.

053

17.6

2.00

060

94.4

0736

12.0

0087

11.5

000.

146

6.75

86.

259

7.13

0

17.7

0.30

010

.214

9.60

910

.990

0.15

66.

884

6.57

37.

276

conti

nued

...

116

...c

onti

nued

Age

Raw

Lam

bda

Raw

mea

nR

awm

inR

awm

axSm

oot

hL

amb

da

Sm

oot

hm

ean

Sm

oot

hm

inSm

oot

hm

ax

17.8

1.30

035

0.19

825

4.75

546

0.32

90.

166

7.13

16.

617

7.60

6

17.9

0.70

036

.724

31.3

8441

.922

0.17

57.

277

6.79

97.

702

18.0

1.40

052

1.79

736

4.18

272

1.14

30.

184

7.49

56.

906

8.06

5

18.1

-0.4

002.

117

2.08

32.

145

0.19

37.

634

7.11

88.

120

18.2

-2.0

000.

500

0.50

00.

500

0.20

27.

821

7.41

58.

594

18.3

-1.3

000.

767

0.76

70.

768

0.21

07.

972

7.60

78.

436

18.4

0.00

04.

690

4.45

44.

860

0.21

88.

177

7.53

58.

656

18.5

2.00

060

99.2

0439

60.0

0083

20.0

000.

226

8.37

97.

784

8.85

4

18.6

2.00

060

14.2

3843

24.0

0079

37.5

000.

234

8.54

38.

069

8.97

8

18.7

0.00

04.

684

4.48

94.

898

0.24

18.

693

8.10

09.

371

18.8

-2.0

000.

500

0.50

00.

500

0.24

98.

926

8.22

110

.048

18.9

1.00

010

7.63

288

.000

128.

000

0.25

69.

047

8.41

09.

636

19.0

1.90

039

53.4

6024

38.0

8455

46.5

210.

262

9.24

08.

414

9.88

3

117

Table 18: ML estimators and their local polynomial

smoothing estimators with corresponding p-values by

Shapiro-Wilk (SW) Test for each sub-samples.

Age ML.Lambda SW p-values Smooth.Lambda SW p-values

9.1 0.4 0.798 0.445119729 0.893

9.2 0.8 0.367 0.43195112 0.254

9.3 2 0.791 0.418588776 0.441

9.4 1.7 0.843 0.405032313 0.576

9.5 0 0.385 0.391255708 0.207

9.6 2 0.476 0.377294702 0.072

9.7 0.6 0.694 0.36317325 0.375

9.8 0.6 0.568 0.348865167 0.331

9.9 0.4 0.127 0.334367136 0.108

10 1 0.407 0.319726542 0.369

10.1 0.7 0.732 0.304901839 0.449

10.2 -0.1 0.562 0.28996274 0.389

10.3 0.9 0.877 0.27487598 0.395

10.4 0.1 0.102 0.259678107 0.009

10.5 0.8 0.42 0.244369525 0.686

10.6 -0.6 0.922 0.228976998 0.699

10.7 -0.9 0.112 0.213495789 0.091

10.8 0 0.548 0.197943409 0.226

10.9 0.4 0.353 0.182357136 0.115

11 0.5 0.501 0.166763948 0.222

continued . . .

118

. . . continued


11.1 -0.1 0.357 0.151194377 0.206

11.2 0.1 0.583 0.135659244 0.35

11.3 0.3 0.338 0.120168606 0.325

11.4 -0.6 0.515 0.104747617 0.241

11.5 1 0.882 0.089463339 0.526

11.6 0.4 0.537 0.074334151 0.106

11.7 -1.3 0.469 0.059372383 0.075

11.8 0 0.298 0.04462395 0.097

11.9 0.9 0.637 0.030124114 0.105

12 -0.1 0.571 0.015909271 0.471

12.1 1.2 0.251 0.002016778 0.024

12.2 -0.5 0.509 0.011515247 0.25

12.3 -0.1 0.762 0.024648152 0.488

12.4 -0.2 0.389 0.037343003 0.144

12.5 -0.6 0.068 0.049560851 0.021

12.6 -0.2 0.076 0.061263029 0.01

12.7 1 0.134 -0.07241146 0.032

12.8 -0.5 0.659 0.082968974 0.475

12.9 0.4 0.662 0.092899646 0.444

13 0.8 0.382 0.102169121 0.124

13.1 -1.1 0.137 0.110744956 0.059

13.2 -0.8 0.286 0.118596941 0.061

13.3 -1.1 0.836 0.125697421 0.219

continued . . .

119

. . . continued


13.4 0.2 0.472 0.132021594 0.26

13.5 -1.1 0.367 0.137547793 0.038

13.6 -0.6 0.26 0.142257739 0.264

13.7 -1.3 0.634 0.146136763 0.119

13.8 -1.2 0.952 0.149174001 0.439

13.9 -0.1 0.216 0.151362541 0.15

14 -0.3 0.679 0.152699535 0.553

14.1 0.7 0.212 0.153186271 0.226

14.2 1 0.345 0.152828196 0.505

14.3 -1.1 0.783 0.151634893 0.505

14.4 -0.1 0.137 -0.14962002 0.118

14.5 -0.9 0.708 0.146801197 0.286

14.6 0 0.028 0.143199859 0.012

14.7 -1.3 0.6 0.138841059 0.155

14.8 0.7 0.303 0.133753241 0.101

14.9 -1.1 0.58 0.127967976 0.088

15 -1.4 0.058 -0.12151967 0.106

15.1 1.1 0.422 0.114445245 0.121

15.2 -1.2 0.582 0.106783802 0.346

15.3 0.6 0.799 0.098576264 0.536

15.4 0.2 0.068 0.089865014 0.033

15.5 -1 0.552 0.080693526 0.302

15.6 0.3 0.735 0.071105997 0.767

continued . . .

120

. . . continued


15.7 1 0.412 0.061146984 0.607

15.8 -1.3 0.309 0.050861052 0.246

15.9 0 0.088 0.040292435 0.036

16 0.5 0.695 0.029484719 0.564

16.1 1.6 0.934 0.018480543 0.377

16.2 -0.5 0.647 -0.00732133 0.541

16.3 0.1 0.847 0.003952961 0.771

16.4 0.3 0.116 0.015304047 0.083

16.5 -1.6 0.855 0.026695509 0.369

16.6 -1 0.004 0.03809056 0.021

16.7 0 0.001 0.049457222 0

16.8 0.6 0.636 0.060760408 0.553

16.9 -1.3 0.959 0.071983753 0.471

17 -0.4 0.275 0.083110792 0.111

17.1 2 0.554 0.094090441 0.314

17.2 1.2 0.909 0.104928501 0.783

17.3 -0.5 0.197 0.115594631 0.324

17.4 1.7 0.358 0.126074211 0.577

17.5 -1.3 0.632 0.136347126 0.416

17.6 2 0.95 0.146409111 0.143

17.7 0.3 0.614 0.156253827 0.633

17.8 1.3 0.244 0.165858147 0.109

17.9 0.7 0.936 0.175231839 0.823

continued . . .

121

. . . continued


18 1.4 0.032 0.184356962 0.021

18.1 -0.4 0.809 0.193249469 0.441

18.2 -2 0.872 0.201896656 0.116

18.3 -1.3 0.617 0.210287399 0.238

18.4 0 0.341 0.218428597 0.307

18.5 2 0.467 0.226325664 0.249

18.6 2 0.583 0.233988684 0.201

18.7 0 0.334 0.241412567 0.108

18.8 -2 0.284 0.248602215 0.003

18.9 1 0.891 0.255575468 0.563

19 1.9 0.409 0.262308056 0.125

122

−2 −1 0 1 2

4.4

4.6

4.8

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Data Set 1−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 2−2 −1 0 1 2

4.3

4.6


Sam

ple

Qua

ntile

s

Data Set 3

−2 −1 0 1 2

4.3

4.6


Sam

ple

Qua

ntile

s

Data Set 4−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 5−2 −1 0 1 2

4.3

4.6


Sam

ple

Qua

ntile

s

Data Set 6

−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 7−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 8−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 9

−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 10−3 −2 −1 0 1 2 3

4.3

4.6


Sam

ple

Qua

ntile

s

Data Set 11−3 −2 −1 0 1 2 3

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 12

Figure 11: QQplot of SBP after log transformation from the 1st data set to 12th dataset.

123

−3 −2 −1 0 1 2 3

4.3

4.6


Sam

ple

Qua

ntile

s

Data Set 13−3 −2 −1 0 1 2 3

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 14−3 −2 −1 0 1 2 3

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 15

−3 −2 −1 0 1 2 3

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 16−3 −2 −1 0 1 2 3

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 17−3 −2 −1 0 1 2 3

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 18

−3 −2 −1 0 1 2 3

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 19−3 −2 −1 0 1 2 3

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 20−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 21

−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 22−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 23−3 −2 −1 0 1 2 3

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 24

Figure 12: QQplot of SBP after log transformation from the 13th data set to 24thdata set.

124

−3 −2 −1 0 1 2 3

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 25−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 26−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 27

−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 28−3 −2 −1 0 1 2 3

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 29−3 −2 −1 0 1 2 3

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 30

−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 31−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 32−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 33

−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 34−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 35−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 36


125

−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 37−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 38−3 −2 −1 0 1 2 3

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 39

−3 −2 −1 0 1 2 3

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 40−2 −1 0 1 2

4.55

4.80


Sam

ple

Qua

ntile

s

Data Set 41−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 42

−3 −2 −1 0 1 2 3

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 43−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 44−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 45

−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 46−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 47−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 48


126

−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 49−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 50−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 51

−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 52−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 53−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 54

−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 55−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 56−2 −1 0 1 2

4.5

4.7

4.9


Sam

ple

Qua

ntile

s

Data Set 57

−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 58−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 59−2 −1 0 1 2

4.55

4.75


Sam

ple

Qua

ntile

s

Data Set 60


127

−2 −1 0 1 2

4.50

4.75


Sam

ple

Qua

ntile

s

Data Set 61−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 62−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 63

−2 −1 0 1 2

4.55

4.80


Sam

ple

Qua

ntile

s

Data Set 64−2 −1 0 1 2

4.5

4.7

4.9


Sam

ple

Qua

ntile

s

Data Set 65−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 66

−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 67−2 −1 0 1 2

4.6

4.8


Sam

ple

Qua

ntile

s

Data Set 68−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 69

−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 70−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 71−2 −1 0 1 2

4.6

4.8


Sam

ple

Qua

ntile

s

Data Set 72


128

−2 −1 0 1 2

4.50

4.75


Sam

ple

Qua

ntile

s

Data Set 73−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 74−2 −1 0 1 2

4.6

4.9


Sam

ple

Qua

ntile

s

Data Set 75

−2 −1 0 1 2

4.4

4.8


Sam

ple

Qua

ntile

s

Data Set 76−2 −1 0 1 2

4.50

4.75


Sam

ple

Qua

ntile

s

Data Set 77−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 78

−2 −1 0 1 2

4.6

4.9


Sam

ple

Qua

ntile

s

Data Set 79−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 80−2 −1 0 1 2

4.4

4.7


Sam

ple

Qua

ntile

s

Data Set 81

−2 −1 0 1 2

4.50

4.75


Sam

ple

Qua

ntile

s

Data Set 82−2 −1 0 1 2

4.6

4.9


Sam

ple

Qua

ntile

s

Data Set 83−2 −1 0 1 2

4.55

4.75


Sam

ple

Qua

ntile

s

Data Set 84


129

−2 −1 0 1 2

4.6

4.9


Sam

ple

Qua

ntile

s

Data Set 85−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 86−2 −1 0 1 2

4.55

4.80


Sam

ple

Qua

ntile

s

Data Set 87

−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 88−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 89−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 90

−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 91−2 −1 0 1 2

4.6

4.9


Sam

ple

Qua

ntile

s

Data Set 92−2 −1 0 1 2

4.55

4.75


Sam

ple

Qua

ntile

s

Data Set 93

−2 −1 0 1 2

4.5

4.8


Sam

ple

Qua

ntile

s

Data Set 94−2 −1 0 1 2

4.5

4.7


Sam

ple

Qua

ntile

s

Data Set 95−2 −1 0 1 2

4.55

4.75


Sam

ple

Qua

ntile

s

Data Set 96


130

−2 −1 0 1 2

4.5

4.6

4.7

4.8

4.9


Sam

ple

Qua

ntile

s

Data Set 97−2 −1 0 1 2

4.5

4.6

4.7

4.8

4.9

5.0


Sam

ple

Qua

ntile

s

Data Set 98

−2 −1 0 1 2

4.5

4.6

4.7

4.8


Sam

ple

Qua

ntile

s

Data Set 99−2 −1 0 1 2

4.5

4.6

4.7

4.8


Sam

ple

Qua

ntile

s

Data Set 100


131

10 12 14 16 18

-2

-1

0

1

2

age

lam

da

Figure 20: Local polynomial smoothing estimator of Box-Cox Lambda.

132

8 Appendix 2: Proof of Theoretical Results

8.1 A.1 Useful Approximation for the Equivalent Kernels

The following approximations for the equivalent kernel function Wq,p+1(tj, t;h) are

used in computing the asymptotic bias and variance of F(q)

t,θ(t|x)

[y(t)|x

]:

Wq,p+1(tj, t;h) =q!

Jhq+1g(t)Kq,p+1

(tj − th

)[1 + op(1)

], j = 1, . . . , J ; (A.1)

J∑j=1

Wq,p+1(tj, t;h) (tj − t)k = q!1[k=q], k = 0, 1, . . . , p; (A.2)

J∑j=1

Wq,p+1(tj, t;h) (tj − t)p+1

= q!hp−q+1Bp+1

(Kq,p+1

)[1 + op(1)

], k = 1, . . . , p; (A.3)

J∑j=1

W 2q,p+1(tj, t;h) =

(q!)2

Jh2q+1g(t)V(Kq,p+1

)[1 + op(1)

], (A.4)

where Kq,p+1(t), Bp+1(K) and V (K) are defined in Chapter 4. Proofs of equations

(A.1)-(A.4) are given Fan and Zhang (2000, Appendix A, Lemma 1 and Lamma 2).

8.2 A.2 Proof of Theorem 1

First, note that Ftj ,θ(tj |x)

[y(tj)|x

]= Ftj ,θ(tj |x)

[y(tj)|x

]. Using the equation (6), the

bias of F(q)t,θ(t|x)

[y(t)

∣∣x] is

E{F

(q)t,θ(t|x)

[y(t)

∣∣x]}− F (q)t,θ(t|x)

[y(t)

∣∣x] =W1 +W2, (A.5)

133

where W1 =∑J

j=1Wq,p+1(tj, t;h){EFtj ,θ(tj |x)

[y(tj)|x

]− Ftj ,θ(tj |x)

[y(tj)|x

]},

W2 =∑J

j=1Wq,p+1(tj, t;h)Ftj ,θ(tj |x)

[y(tj)|x

]− F (q)

t,θ(t|x)

[y(t)

∣∣x]. (A.6)

It then follows from (15) and (A.1) that

W1 =J∑j=1

q!

Jhq+1g(t)Kq,p+1

(tj − th

)[1 + op(1)

]op(n−1/2j

)= op

(n−1/2

), (A.7)

where the second equality holds because, by Assumption A2, limn→∞(nj/n) is bounded

between 0 and 1, and∑J

j=1

∣∣q![Jhq+1g(t)]−1

Kq,p+1

[(tj − t)/h

]∣∣ is bounded. By the

Taylor expansions for Ftj ,θ(tj)[y(tj)|x

]and the equations (A.2) and (A.3), we have

W2 =J∑j=1

Wq,p+1(tj, t;h){ p+1∑k=0

[F

(k)t,θ(t|x)

[y(t)|x

](tj − t)kk!

]+op

[(tj − t)p+1

]}− F (q)

t,θ(t|x)

[y(t)|x

]=

q!hp−q+1

(p+ 1)!F

(p+1)t,θ(t|x)

[y(t)|x

]Bp+1

(Kq,p+1

)[1 + op(1)

]. (A.8)

Since, by Assumption A1, and the asymptotic expressions of (A.7) and (A.8), W2 is

the dominating term overW1. The asymptotic expression for the bias of F(q)t,θ(t|x)

[y(t)

∣∣x]follows from (A.6)-(A.8).

Let µF (tj) = E{Ftj ,θ(tj |x)

[y(tj)

∣∣x]}. Then, by equation (6),

V ar{F

(q)t,θ(t|x)

[y(t)

∣∣x]} = E{ J∑

j=1

Wq,p+1(tj, t;h)[Ftj ,θ(tj |x)

[y(tj)

∣∣x]− µF (tj)]}2

= W3 +W4, (A.9)

134

where, by (15), (A.4) and Assumption A2 with cj = c,

W3 =J∑j=1

W 2q,p+1(tj, t;h)E

{[Ftj ,θ(tj |x)

[y(tj)

∣∣x]− µF (tj)]2}

=J∑j=1

W 2q,p+1(tj, t;h)V ar

{Ftj ,θ(tj |x)

[y(tj)

∣∣x]]=

(q!)2

cnJh2q+1g(t)V(Kq,p+1

)×F ′t,θ(t|x)

[y(t)

∣∣x]T I−1[θ(t|x)

]F ′t,θ(t|x)

[y(t)

∣∣x][1 + op(1)]

(A.10)

and, by (14), Assumptions A2 and A4, the equation (A.1) and limn→∞ njk/n = cjk,

there is a constant C1 > 0 such that, when n is sufficiently large,

W4 =J∑j 6=k

{Wq,p+1(tj, t;h)Wq,p+1(tk, t;h)

×E{[Ftj ,θ(tj |x)

[y(tj)

∣∣x]− µF (tj)][Ftk,θ(tk|x)

[y(tk)

∣∣x]− µF (tk)]}}

≤J∑j 6=k

{∣∣∣Wq,p+1(tj, t;h)Wq,p+1(tk, t;h)∣∣∣

×∣∣∣Cov{Ftj ,θ(tj |x)

[y(tj)


[y(tk)

∣∣x]}∣∣∣} (A.11)

≤ C1

J2h2q+2g2(t)

J∑j 6=k

∣∣∣Kq,p+1

(tj − th

)Kq,p+1

(tk − th

) [ ρF (tj, tk|x)

r(nj, nk, njk)

]∣∣∣.The bounded support of K(·) implies that, for any t, Kq,p+1

[(tj−t)/h

]= Kq,p+1

[(tk−

t)/h]

= 0 for any j 6= k such that∣∣tj − tk∣∣ ≤ ah for some constant a > 0.

We now consider the following three situations:

(i) If∣∣tj−tk∣∣ ≤ δ, by Assumption A2, Cov

{Ftj ,θ(tj |x)

[y(tj)


[y(tk)

∣∣x]} =

0, so that W4 = 0, and, by (A.9), V ar{F

(q)t,θ(t|x)

[y(t)

∣∣x]} =W3.

(ii) If∣∣tj − tk∣∣ > δ ≥ ah, then K

[(tj − t)/h

]K[(tk− t)/h

]= 0, so that, by (A.11),

W4 = 0, and it still follows from (A.9) that V ar{F

(q)t,θ(t|x)

[y(t)

∣∣x]} =W3.

135

(iii) If δ < ah and δ ≤∣∣tj − tk∣∣ ≤ ah, since Kq,p+1(s) and ρF (tj, tk) are bounded,

there is C2 > 0, so that, by (A.11) and∑J

j=1

∑k:δ≤|tk−tj |≤ah r

−1(nj, nk, njk) = o(Jh),

W4 ≤C2

J2h2q+2g2(t)

{ J∑j=1

∑k:δ≤|tk−tj |≤ah

r−1(nj, nk, njk)}[

1 + op(1)]

= op[(nJh2q+1

)−1]. (A.12)

Then, by (A.9) and (A.12), V ar{F

(q)t,θ(t|x)

[y(t)

∣∣x]} =W3

[1+op(1)

]. Since, for all three

situations (i), (ii) and (iii), V ar{F

(q)t,θ(t|x)

[y(t)

∣∣x]} = W3

[1 + op(1)

], the asymptotic

expression for the variance of F(q)t,θ(t|x)

[y(t)

∣∣x] follows from (A.9)-(A.11).

136

9 Appendix 3: R Code

ONLY FOR BOOTSTRAPS

For bootstrap Confidence band

scatboot <- function(x,y,nreps=100,confidence=0.9,degree=2,span=2/3,

family=“gaussian”){

dat <- na.omit(data.frame(x=x,y=y))

if(nrow(dat) == 0) {

print(“Error: No data left after dropping NAs”)

print(dat)

return(NULL)

ndx <- order(dat$x)

dat$x <- dat$x[ndx]

dat$y <- dat$y[ndx]

r <- range(dat$x, na.rm=T)

x.out <- seq(r[1], r[2], length.out=40)

f <- loess(y∼x, data=dat, degree=degree, span=span,family=family)

y.fit <- approx(f$x, fitted(f), x.out,rule=2)$y

len <- length(dat$x)

mat <- matrix(0,nreps,length(x.out))

for(i in seq(nreps)){

ndx <- sample(len,replace=T)

x.repl <- x[ndx]

y.repl <- y[ndx]

f <- loess(y.repl x.repl, degree=degree, span=span, family=family)

mat[i,] <- predict(f, newdata=x.out)

n.na <- apply(is.na(mat), 2, sum)

137

nx <- ncol(mat)

up.lim <- rep(NA, nx)

low.lim <- rep(NA, nx)

stddev <- rep(NA, nx)

for(i in 1:nx) {

if(n.na[i] > nreps*(1.0-confidence)) {

# Too few good values to get estimate next

conf <- confidence*nreps/(nreps-n.na[i])

pr <- 0.5*(1.0 - conf)

up.lim[i] <- quantile(mat[,i], 1.0-pr, na.rm=T)

low.lim[i] <- quantile(mat[,i], pr, na.rm=T)

stddev[i] <- sd(mat[,i], na.rm=T)

ndx <- !is.na(up.lim) # indices of good values

fit <- data.frame(x=x.out[ndx],y.fit=y.fit[ndx],up.lim=up.lim[ndx],

low.lim=low.lim[ndx], stddev=stddev[ndx])

return(list(nreps=nreps, confidence=confidence,degree=degree,

span=span,family=family, data=dat, fit=fit))

scatboot.plot <- function(sb,...) {

require(lattice)

require(grid)

p <- xyplot(y∼ x, data=sb$data,

panel=function(x,y,...) {

panel.xyplot(x,y,...)

panel.xyplot(sb$fit$x, sb$fit$y.fit, type=“l”,...)

panel.xyplot(sb$fit$x, sb$fit$up.lim, type=“l”, col=“gray”,...)

panel.xyplot(sb$fit$x, sb$fit$low.lim, type=“l”, col=“gray”,...)

138

pg.x <- c(sb$fit$x, rev(sb$fit$x))

pg.y <- c(sb$fit$up.lim, rev(sb$fit$low.lim))

grid.polygon(pg.x, pg.y, gp=gpar(fill=“pink”,col=“transparent”, alpha=0.5),

default.units=“native”), ...)

return(p)

# Alan R. Rogers, 26 Feb 2011

scatboot.test <- function(nreps=100, confidence=0.9, span=2/3,

degree=2, family=“gaussian”) {

x <- seq(0, 4, length.out=25)

y <- sin(2*x)+ 0.5*x + rnorm(25, sd=0.5)

sb <- scatboot(x, y, nreps=nreps, confidence=confidence, span=span,

degree=degree, family=family)

scatboot.plot(sb)

# My bootstrap code

# Fit a kernel smoothers and calculates a symmetric

# nonparametric bootstrap confidence intervals.

# Arguments:

# x, y : data values

# nreps : number of bootstrap replicates

# mohammed, May 5, 2012

ksboot <- function(x,y,nreps=1000, band=5, confidence = 0.95)

# Put input data into a data frame, sorted by x, with no missing values.

dat <- na.omit(data.frame(x=x,y=y))

if(nrow(dat) == 0) {

print(“Error: No data left after dropping NAs”)

print(dat)

139

return(NULL)

ndx <- order(dat$x)

dat$x <- dat$x[ndx]

dat$y <- dat$y[ndx]

# Fit curve to data

require(KernSmooth)

len <- length(dat$x)

f0 <- ksmooth(x, y, kernel = “normal”, bandwidth = band, n.points = len)

y.fit <- f0$y

# Generate bootstrap replicates

mat <- matrix(0,NROW(dat), nreps)

for(i in seq(nreps)){

ndx <- sample(len,replace=T)

x.repl <- x[ndx]

y.repl <- y[ndx]

f <- ksmooth(x.repl, y.repl, kernel = “normal”, bandwidth = 5, n.points = len)

mat[, i] <- f$y

# calculating confidence intervals

ci <- t(apply(mat, 1, quantile, probs = c((1-confidence)/2, (1+confidence)/2)))

res <- cbind(as.data.frame(f0), ci)

colnames(res) <- c(‘x’,‘y’, ‘lwr.limit’,‘upr.limit’)

res

# ———–

# example

# ———–

140

m <- with(cars, ksboot(speed, dist, nreps = 5000))

with(cars, plot(speed, dist, las = 1))

with(m, matpoints(x, m[, - 1], type = ‘l’, col = c(1, 2, 2), lty = c(1, 2, 2)))

# MY R code Starts from here

u<-.10

v<-.05

w<-.01

setwd(“C:\\Users\\mchowdhury\\Desktop\\phd”)

library(MASS)

library(TeachingDemos)

library(car)

library(nnet)

library(fitdistrplus)

library(survival)

library(splines)

library(MASS)

library(KernSmooth)

library(copula)

library(mvtnorm)

library(scatterplot3d)

library(pspline)

library(grid)

require(graphics)

require(KernSmooth)

require(gridExtra)

require(copula)

141

require(np)

library(nortest)

# With Missing Data

new<-read.csv(“new.csv”,header=TRUE)

x<- new[c(1,2,4,6,7,8,9,10,12)]

x<- new[c(1,2,4,6,7,8,9,10)]

#x1<-x[which(x$RACE==1),]

#x2<-x[which(x$RACE==2),]

dim(x)

head(x)

summary(x$AGE)

# Without Missing Data

newx<-na.omit(x)

dim(newx)

head(newx)

summary(newx$AGE)

xx <- subset(newx,AGE≥9.09 & AGE <19.01)

xx$AGE <- round(xx$AGE, 1)

uage<-unique(xx$AGE)

# Normality Test of whole Data Set before log transformation

aa<-round(shapiro.test(xx$SYSAV)$p.value,3)

bb<-round(ad.test(xx$SYSAV)$p.value,3)

cc<-round(ks.test(xx$SYSAV, mean(xx$SYSAV),sd(xx$SYSAV))$p.value,3)

dd<-round(cvm.test(xx$SYSAV)$p.value,3)

ee<-round(pearson.test(xx$SYSAV)$p.value,3)

ff<-c(aa,bb,cc,dd,ee)

142

# Normality Test of whole Data Set after log transformation

a<-log(xx$SYSAV)

aa<-round(shapiro.test(a)$p.value,3)

bb<-round(ad.test(a)$p.value,3)

cc<-round(ks.test(a, mean(a),sd(a))$p.value,3)

dd<-round(cvm.test(a)$p.value,3)

ee<-round(pearson.test(a)$p.value,3)

# frequency checking

table(xx$AGE)

# Spliting Data set

sdata<-with(xx,split(xx,xx$AGE))

a<-length(sdata)

names(sdata)<-paste(‘int’,1:a,sep=“ ”)

nn<-names(sdata)

# for(i in 1:nn)

aa<-round(shapiro.test(xx$SYSAV)$p.value,3)

# Dimension Checking of the Data Set before median height

dimen<-matrix(NA,100,2)

for(i in 1:100){

d<-sdata[[i]]

one<-dim(d)

dimen[i,]<-one }

dimen

# Log Transformation of SBP data

fda <- vector(“list”, NROW(100))

for(i in 1:100){

143

d <- sdata[[i]] i-th data set

d$logsbp <- log(d$SYSAV)

fda[[i]] <- d}

# Normality Test after Log Transformation of the data

pvaluelogsbp<-matrix(NA,100,5)

for(i in 1:100){

d<-fda[[i]]

aa<-round(shapiro.test(d$logsbp)$p.value,3)

bb<-round(ad.test(d$logsbp)$p.value,3)

cc<-round(ks.test(d$logsbp, mean(d$logsbp),sd(d$logsbp))$p.value,3)

dd<-round(cvm.test(d$logsbp)$p.value,3)

ee<-round(pearson.test(d$logsbp)$p.value,3)


pvaluelogsbp[i,]<-ff}

colnames(pvaluelogsbp)<-c(‘SW’,‘AD’,‘KS’,‘CVM’,‘P.ChiSq’)

pvaluelogsbp

# lognormality Test of DBP4 Data Set

gofsum <- function(x){

result <- gofstat(x)

c(cramer = result$cvm, anderson = result$ad, kolmogorov = result$ks)}

newp<-matrix(NA,100,3)

for(i in 1:100){

d<-fda[[i]]

d1<-fitdist(d$DIA4AV,“lnorm”)

d2<-gofsum(d1)

newp[i,]<- d2}

144

colnames(newp) <- c(‘cramer’,‘anderson’,‘kolmogorov’)

newp

# Data Set with median height

fff<-vector(“list”,NROW(100))

for(i in 1:100){

d<-fda[[i]]

sd1<-d[d$HTAV>median(d$HTAV),]

fff[[i]]<-sd1

# Dimension Checking of the Data Set after median height

dimen1<-matrix(NA,100,2)

for(i in 1:100){

d<-fff[[i]]

one<-dim(d)

dimen1[i,]<-one}

dimen1

# Normality Test after Log Transformation after median height

pvaluelogsbp<-matrix(NA,100,5)

for(i in 1:100){

d<-fff[[i]]

aa<-round(shapiro.test(d$logsbp)$p.value,3)

bb<-round(ad.test(d$logsbp)$p.value,3)

cc<-round(ks.test(d$logsbp, mean(d$logsbp),sd(d$logsbp))$p.value,3)

dd<-round(cvm.test(d$logsbp)$p.value,3)

ee<-round(pearson.test(d$logsbp)$p.value,3)


pvaluelogsbp[i,]<-ff}

145

colnames(pvaluelogsbp)<-c(‘SW’,‘AD’,‘KS’,‘CVM’,‘P.ChiSq’)

pvaluelogsbp

# lognormality Test of DBP4 after median height

gofsum <- function(x){

result <- gofstat(x)

c(cramer = result$cvm, anderson = result$ad, kolmogorov = result$ks)

}

newp<-matrix(NA,100,3)

for(i in 1:100){

d<-fff[[i]]

d1<-fitdist(d$DIA4AV,“lnorm”)

d2<-gofsum(d1)

newp[i,]<- d2

}

colnames(newp) <- c(‘cramer’,‘anderson’,‘kolmogorov’)

newp

# Univariate Case

# qqplot of SBP withour qqline

#pdf(”qq.pdf”)

par(mfrow=c(3,3))

for(i in 1:100){

d <- fda[[i]]

qqnorm(d$logsbp, main=“qqplot after bct of sbp”)}

#dev.off()

#getwd()

# qqplot of SBP with qqline

146

#pdf(“qqline.pdf”)

par(mfrow=c(3,3))

for(i in 1:100){

d <- fda[[i]]

qqnorm(d$logsbp, main=“”, xlab = “”, ylab = “”, las = 1)

mtext(‘SBP’, 3, line = .3, cex = .8)

qqline(d$logsbp, col = 2)

mtext(paste(‘Data’, i), 3, line = 2, font = 2)

Sys.sleep(.01)}

#dev.off()

#getwd()

# percentile values of SBP

t<-seq(9.1,19,.1)

y<-102.01027+1.94397*(t-10)+.00598*((t-10)2)− .00789 ∗ ((t− 10)3)− .00059∗

((t-10)2)

s1<-log(y+1.28*10.4855)

s2<-log(y+1.645*10.4855)

s3<-log(y+2.326*10.4855)

# computation of cross validation score

mat<-matrix(0,150,100)

for(j in 1:100){

dataN<-fff[[j]]$logsbp

for(i in 1:length(dataN)){

one<-ifelse(dataN[[i]]¡s1[[j]],1,0)

mcv<-mean(dataN[-i])

scv<-length(dataN[-i]-1)*sd(dataN[-i])/length(dataN[-i])

147

p90cv<-pnorm(s1[j],mcv,scv)

diffsq<-((one-p90cv)2)/150

mat[i,j]<-diffsq

}

}

newmat<-c(mat)

sum(newmat)

setwd(“C:\\Users\\mchowdhury\\Desktop′′)

ndata1<-read.csv(“entirecv.csv”,header=TRUE)

ndata2<-read.csv(“cccv.csv”,header=TRUE)

ndata3<-read.csv(“aacv.csv”header=TRUE)

library(np)

require(np)

#by CV.AIC

#entire cohort

on1<-npregbw(ndata1$age,ndata1$s10,regtype=“ll”,bwmethod=“cv.aic”)

summary(on1)

on2<-npregbw(ndata1$age,ndata1$u10,regtype=“ll”,bwmethod=“cv.aic”)

summary(on2)


summary(on3)


summary(on4)


summary(on5)


148

summary(on6)

#caucasian

on1<-npregbw(ndata2$age,ndata2$p90,regtype=“ll”,bwmethod=“cv.aic”)

summary(on1)

on2<-npregbw(ndata2$age,ndata2$np90,regtype=“ll”,bwmethod=“cv.aic”)

summary(on2)


summary(on3)


summary(on4)


summary(on5)


summary(on6)

#african american


summary(on1)


summary(on2)


summary(on3)


summary(on4)


summary(on5)


149

summary(on6)

#by CV.ls

#entire cohort

on1<-npregbw(ndata1$age,ndata1$s10,regtype=“ll”,bwmethod=“cv.ls”)

summary(on1)

on2<-npregbw(ndata1$age,ndata1$u10,regtype=“ll”,bwmethod=“cv.ls”)

summary(on2)


summary(on3)


summary(on4)


summary(on5)


summary(on6)

#caucasian

on1<-npregbw(ndata2$age,ndata2$p90,regtype=“ll”,bwmethod=“cv.ls”)

summary(on1)

on2<-npregbw(ndata2$age,ndata2$np90,regtype=“ll”,bwmethod=“cv.ls”)

summary(on2)


summary(on3)


summary(on4)


summary(on5)

150


summary(on6)

#african american


summary(on1)


summary(on2)


summary(on3)


summary(on4)


summary(on5)


summary(on6)

# computation of raw probabilities for all girls

all<-matrix(NA,100,6)

for(i in 1:100){

d<-fff[[i]]

m<-mean(d$logsbp)

s<-(length(d)-1)*sd(d$logsbp)/length(d)

p90<-1-pnorm(s1[i],m,s)



np90<-mean(ifelse(d$logsbp≥ s1[i],1,0))


151


prob<-c(p90,np90,p95,np95,p99,np99)

all[i,]<-prob}

colnames(all)<-c(‘p1’,‘np1’,‘p2’,‘np2’,‘p3’,‘np3’)

all

b1<-data.frame(all)

aveall<-c(mean(b1$p1),mean(b1$np1),mean(b1$p2),

mean(b1$np2),mean(b1$p3),mean(b1$np3))

all1<-sum(ifelse(all[,1]>all[,2],1,0))



allsum<-c(all1,all2,all3)

# difference between structural nonparametric and nonparametric

a<-data.frame(all)

bias<-data.frame(b1 = (a$p1-a$np1), b2 = (a$p2-a$np2),b3 = (a$p3-a$np3))

new1<-c(mean(bias$b1),mean(bias$b2),mean(bias$b3))

new1

biasall<-data.frame(bp1=mean(a$p1-u),bnp1=mean(a$np1-u),

bp2=mean(a$p2-v), bnp2=mean(a$np2-v),bp3=mean(a$p3-w),

bnp3=mean(a$np3-w))

biasall

# Bootstrap Confidence Band for all girls

#pdf(“OPSBP.pdf”)

age<-uage

a1<-data.frame(age,all)

par(mfrow=c(2,3))

152

#one

m1<-with(a1, ksboot(age, p1, nreps=5000))

one1<-expression(atop(Age vs P(Y>y[.90](t)),“at Median height”))

one2<-expression(paste(P(Y>y[.90](t))))

with(a1,plot(age,p1,las=1,main=one1,ylim=c(0,.20),

ylab=one2,xlab=“Ages of all girls”))

with(m1,matpoints(x,m1[,-1],type=’l’,col=c(1,2,2),lty=c(1,2,2)))

#two


two1<-expression(atop(Age vs P(Y>y[.95](t)),“at Median height”))

two2<-expression(paste(P(Y>y[.95](t))))

with(a1,plot(age,p2,las=1,main=two1,ylim=c(0,.10),ylab=two2,

xlab=“Ages of all girls”))


#three


three1<-expression(atop(Age vs P(Y>y[.99](t)),“at Median height”))

three2<-expression(paste(P(Y>y[.99](t))))

with(a1,plot(age,p3,las=1,main=three1,ylim=c(0,.02),ylab=three2,



#four

m4<-with(a1, ksboot(age, np1, nreps=5000))

four1<-expression(atop(Age vs P(Y>y[.90](t)),“at Median height”))

four2<-expression(paste(P(Y>y[.90](t))))

with(a1,plot(age,np1,las=1,main=four1,ylim=c(0,.20),ylab=four2,

153



#five


five1<-expression(atop(Age vs P(Y>y[.95](t)),“at Median height”))

five2<-expression(paste(P(Y>y[.95](t))))

with(a1,plot(age,np2,las=1,main=five1,ylim=c(0,.10),ylab=five2,



#six


six1<-expression(atop(Age vs P(Y>y[.99](t)),“at Median height”))

six2<-expression(paste(P(Y>y[.99](t))))

with(a1,plot(age,np3,las=1,main=six1,ylim=c(0,.02),ylab=six2,



#dev.off()

#getwd()

# computation of raw probabilities for Caucasian girls

cau<-matrix(NA,100,6)

for(i in 1:100){

d<-fff[[i]][which(fff[[i]]$RACE==1),]

m<-mean(d$logsbp)




154


np90<-mean(ifelse(d$logsbp≥s1[i],1,0))




cau[i,]<-prob}

colnames(cau)<-c(‘p1’,‘np1’,‘p2’,‘np2’,‘p3’,‘np3’)

cau

cau1<-sum(ifelse(cau[,1]>cau[,2],1,0))



causum<-c(cau1,cau2,cau3)

causum

# difference between structural nonparametric and unstructured nonparametric

for Caucasian Girl

a<-data.frame(cau)



new2

biasca<-data.frame(bp1=mean(a$p1-u),bnp1=mean(a$np1-u),bp2=mean(a$p2-v),

bnp2=mean(a$np2-v),bp3=mean(a$p3-w),bnp3=mean(a$np3-w))

# Bootstrap Confidence Band for Caucasian girls

#pdf(“OPSBP.pdf”)

age<-uage

a1<-data.frame(age,cau)

par(mfrow=c(2,3))

155

#one


one1<-expression(atop(Age vs P(Y>y[.90](t)),“at Median height”))


with(a1,plot(age,p1,las=1,main=one1,ylim=c(0,.20),ylab=one2,

xlab=“Ages of CC girls”))


#two


two1<-expression(atop(Age vs P(Y>y[.95](t)),“at Median height”))


with(a1,plot(age,p2,las=1,main=two1,ylim=c(0,.10),ylab=two2,



#three


three1<-expression(atop(Age vs P(Y>y[.99](t)),“at Median height”))


with(a1,plot(age,p3,las=1,main=three1,ylim=c(0,.02),ylab=three2,



#four


four1<-expression(atop(Age vs P(Y>y[.90](t)),“at Median height”))


with(a1,plot(age,np1,las=1,main=four1,ylim=c(0,.20),ylab=four2,

156



#five


five1<-expression(atop(Age vs P(Y>y[.95](t)),“at Median height”))


with(a1,plot(age,np2,las=1,main=five1,ylim=c(0,.10),ylab=five2,



#six


six1<-expression(atop(Age vs P(Y>y[.99](t)),“at Median height”))


with(a1,plot(age,np3,las=1,main=six1,ylim=c(0,.02),ylab=six2,



#dev.off()

#getwd()

# computation of raw probabilities for African American girls

aa<-matrix(NA,100,6)

for(i in 1:100){

d<-fff[[i]][which(fff[[i]]$RACE==2),]

m<-mean(d$logsbp)




157






aa[i,]<-prob }

colnames(aa)<-c(‘p1’,‘np1’,‘p2’,‘np2’,‘p3’,‘np3’)

aa

aa1<-sum(ifelse(aa[,1]>aa[,2],1,0))



aasum<-c(aa1,aa2,aa3)

aasum

a<-data.frame(aa)

biasaa<-data.frame(bp1=mean(a$p1-u),bnp1=mean(a$np1-u),bp2=mean(a$p2-v),

bnp2=mean(a$np2-v),bp3=mean(a$p3-w),bnp3=mean(a$np3-w))

naa1<-sum(ifelse(aa[,2]==0,1,0))



naasum<-c(naa1,naa2,naa3)

ncau1<-sum(ifelse(cau[,2]==0,1,0))



ncausum<-c(ncau1,ncau2,ncau3)

nall1<-sum(ifelse(all[,2]==0,1,0))


158


nallsum<-c(nall1,nall2,nall3)

# difference between structural nonparametric and nonparametric for Caucasian

Girl a<-data.frame(aa)



new3

finalnew<-rbind(new1,new2,new3)

colnames(finalnew)<-c(’90th’,’95th’,’99th’)

rownames(finalnew)<-c(’all’,’cau’,’aa’)

finalnew

finalbiases<-rbind(biasall,biasca,biasaa)

colnames(finalbiases)<-c(’p90th’,’np90th’,’p95th’,’np95th’,’p99th’,’np99th’)

rownames(finalbiases)<-c(’all’,’cau’,’aa’)

finalbiases

finalsum<-rbind(allsum,causum,aasum)

colnames(finalsum)<-c(’90th’,’95th’,’99th’)

rownames(finalsum)<-c(’all’,’cau’,’aa’)

finalsum

nfinalsum<-rbind(nallsum,ncausum,naasum)

colnames(nfinalsum)<-c(’90th’,’95th’,’99th’)

rownames(nfinalsum)<-c(’all’,’cau’,’aa’)

nfinalsum

# Bootstrap Confidence Band for African American girls

#pdf(”OPSBP.pdf”)

age<-uage

159

a1<-data.frame(age,aa)

par(mfrow=c(2,3))

#one


one1<-expression(atop(Age vs P(Y>y[.90](t)),”at Median height”))


with(a1,plot(age,p1,las=1,main=one1,ylim=c(0,.20),ylab=one2,xlab=”Ages of AA

girls”))


#two


two1<-expression(atop(Age vs P(Y>y[.95](t)),”at Median height”))


with(a1,plot(age,p2,las=1,main=two1,ylim=c(0,.10),ylab=two2,xlab=”Ages of AA

girls”))


#three


three1<-expression(atop(Age vs P(Y>y[.99](t)),”at Median height”))


with(a1,plot(age,p3,las=1,main=three1,ylim=c(0,.02),ylab=three2,xlab=”Ages of

AA girls”))


#four


four1<-expression(atop(Age vs P(Y>y[.90](t)),”at Median height”))

160


with(a1,plot(age,np1,las=1,main=four1,ylim=c(0,.20),ylab=four2,xlab=”Ages of

AA girls”))


#five


five1<-expression(atop(Age vs P(Y>y[.95](t)),”at Median height”))


with(a1,plot(age,np2,las=1,main=five1,ylim=c(0,.10),ylab=five2,xlab=”Ages of AA

girls”))


#six


six1<-expression(atop(Age vs P(Y>y[.99](t)),”at Median height”))


with(a1,plot(age,np3,las=1,main=six1,ylim=c(0,.02),ylab=six2,xlab=”Ages of AA

girls”))


#dev.off()

#getwd()

# simulating one data

simdata <- function(n){

# i is the subject (columns)

# j is the time (rows)

# Yij <- 21.5 + 0.7*(tij - 5) - 0.05*(tij - 5)2 + a0i+ e1ij

# a0i ∼ N(0, 2.52), e1ij N(0, 0.52)

161

#n <- length(ui)

#a0i <- rnorm(1, 0, 2.5)

m <-10

j <- seq(1, 10, by = 1)

u <-replicate(n, sapply(j, function(j) runif(1, j - 1, j)))

ui <- round(c(u))

a0i <- rep(rnorm(n, 0, 2.5),each=10)

e1ij <- rnorm(n*m, 0, 0.5)

ccaa <- rbinom(n,1,.51)

y <- 21.5 + 0.7*(ui - 5) - 0.05*(ui - 5)2 + a0i+ e1ij

d1 <- data.frame(ID = 1:length(y), age = ui, y1 = y, race = ccaa)

with(d1, split(d1, age))

# Quantiles from Regression Model

ti<-seq(0,10,by=1)

z<-21.5 + 0.7*(ti - 5) - 0.05*(ti - 5)2

y90<-qnorm(.90,z,rep(sqrt(6.5),11))



# this function calculates the probabilties for a given n

# computation of overall probability

foo <- function(n){

X <- simdata(n)

k <- length(X)

res <- sapply(1:k, function(i){

d <- X[[i]]

y1 <- d[, ”y1”]

162

m <- mean(y1)

s <- sd(y1)

p1 <- 1 - pnorm(y90[i], m, s)

p2 <- 1 - pnorm(y95[i], m, s)

p3 <- 1 - pnorm(y99[i], m, s)

np1<- mean(ifelse(y1>y90[i],1,0))



c(p1, np1, p2, np2, p3, np3))

rownames(res) <- c(’p1’, ’np1’, ’p2’, ’np2’, ’p3’, ’np3’)

list(res)

# example

foo(1000)

n<-1000

# similuating B probabilities

B <- 100

a<-replicate(B, foo(n))

a1 <- sapply(a, function(x) x[1,])






# estimated value

m<-rowMeans

u<-.10

163

v<-.05

w<-.01

ALLr<-cbind(ALLp1est=m(a1),ALLnp1est=m(a2),ALLp2est=m(a3),

ALLnp2est=m(a4),ALLp3est=m(a5),ALLnp3est=m(a6))

# average bias(AB-Average Bias)

ALLr1<-cbind(p1AB=m(a1-u),n1AB=m(a2-u),p2AB=m(a3-v),

n2AB=m(a4-v),p3AB=m(a5-w),n3AB=m(a6-w))

colnames(ALLr1)<-c(’ALLp1AveBias’,’ALLnp1AveBias’,’ALLp2AveBias’,

’ALLnp2AveBias’,’ALLp3AveBias’,’ALLnp3AveBias’)

# MSE

ALLr2<-cbind(one=m((a1-u)2), two = m((a2− u)2), three = m((a3− v)2),

four=m((a4-v)2), five = m((a5− w)2), six = m((a6− w)2))

colnames(ALLr2)<-c(’ALLp1AveMSE’,’ALLnp1AveMSE’,’ALLp2AveMSE’,

’ALLnp2AveMSE’,’ALLp3AveMSE’,’ALLnp3AveMSE’)

# coverage probabilities

CI1 <- t(apply(a1, 1, quantile, probs = c(0.025, 0.975)))






Cov1 <- sapply(1:NROW(CI1), function(i) mean(a1[i, ] ≥ CI1[i, 1] &

a1[i, ] ≤ CI1[i, 2]))


a2[i, ] ≤ CI2[i, 2]))


164

a3[i, ] ≤ CI3[i, 2]))


a4[i, ] ≤ CI4[i, 2]))


a5[i, ] ≤ CI5[i, 2]))


a6[i, ] ≤ CI6[i, 2]))

coverage<-cbind(Cov1,Cov2,Cov3,Cov4,Cov5,Cov6)

# plot

par(mfrow=c(2,3))

m1<-c(0.06,.14)

m2<-c(0.03,.07)

m3<-c(0,.02)

o1<-seq(0,10, by=1)

ALLp1Ave <- m(a1)

N1 <- t(apply(a1, 1, quantile, probs = c(0.025, 0.975)))

n1<-cbind(ALLp1Ave, N1)

one1<-expression(atop(Age vs P(Y> y[.90](t))))

one2<-expression(paste(P(Y> y[.90](t))))

matplot(o1, n1, type = ’l’, lty = c(1, 2, 2), col = 1, main=one1,

xlab = “Ages of All girls”, ylab = one2, ylim=m1)

ALLp2Ave <- m(a3)



two1<-expression(atop(Age vs P(Y> y[.95](t))))

two2<-expression(paste(P(Y> y[.95](t))))

165

matplot(o1, n2, type = ’l’, lty = c(1, 2, 2), col = 1, main=two1,

xlab = “Ages of All girls”, ylab = two2, ylim=m2)

ALLp3Ave <- m(a5)



three1<-expression(atop(Age vs P(Y> y[.99](t))))

three2<-expression(paste(P(Y> y[.99](t))))

matplot(o1, n3, type = ‘l’, lty = c(1, 2, 2), col = 1, main=three1,

xlab = “Ages of All girls”, ylab = three2, ylim=m3)

ALLnp4Ave <- m(a2)


n4<-cbind(ALLnp4Ave, N4)

four1<-expression(atop(Age vs P(Y> y[.90](t))))

four2<-expression(paste(P(Y> y[.90](t))))

matplot(o1, n4, type = ‘l’, lty = c(1, 2, 2), col = 1, main=four1,

xlab = “Ages of All girls”, ylab = four2, ylim=m1)

ALLnp5Ave <- m(a4)



five1<-expression(atop(Age vs P(Y> y[.95](t))))

five2<-expression(paste(P(Y> y[.95](t))))

matplot(o1, n5, type = ’l’, lty = c(1, 2, 2), col = 1, main=five1,

xlab = “Ages of All girls“, ylab = five2, ylim=m2)

ALLnp6Ave <- m(a6)



166

six1<-expression(atop(Age vs P(Y> y[.99](t))))

six2<-expression(paste(P(Y> y[.99](t))))

matplot(o1, n6, type = ‘l’, lty = c(1, 2, 2), col = 1, main=six1,

xlab = “Ages of All girls”, ylab = six2, ylim=m3)

# computing probabilities for caucasian girls

# for a given n

foo <- function(n)

X <- simdata(n)

k <- length(X)

res <- sapply(1:k, function(i)

d <- X[[i]][which(X[[i]]$race==1),]

y1 <- d[, “y1”]

m <- mean(y1)

s <- sd(y1)

p1 <- 1 - pnorm(y90[i], m, s)

p2 <- 1 - pnorm(y95[i], m, s)

p3 <- 1 - pnorm(y99[i], m, s)

np1<- mean(ifelse(y1> y90[i],1,0))



c(p1, np1, p2, np2, p3, np3))

rownames(res) <- c(‘p1’, ‘np1’, ‘p2’, ‘np2’, ‘p3’, ‘np3’)

list(res)

# example

foo(1000)


167

B <- 1000








# estimated value

m<-rowMeans

u<-.10

v<-.05

w<-.01






colnames(ALLr1)<-c(‘ALLp1AveBias’,‘ALLnp1AveBias’,‘ALLp2AveBias’,‘ALLnp2AveBias’,‘ALLp3AveBias’,‘ALLnp3AveBias’)

# MSE

ALLr2<-cbind(one=m((a1-u)2),two=m((a2-u)2),three=m((a3-v)2),

four=m((a4-v)2),five=m((a5-w)2),six=m((a6-w)2))

colnames(ALLr2)<-c(‘ALLp1AveMSE’,‘ALLnp1AveMSE’,‘ALLp2AveMSE’,

‘ALLnp2AveMSE’,‘ALLp3AveMSE’,‘ALLnp3AveMSE’)



168






Cov1 <- sapply(1:NROW(CI1), function(i) mean(a1[i, ] > CI1[i, 1] &

a1[i, ] ≤ CI1[i, 2]))


a2[i, ] ≤ CI2[i, 2]))


a3[i, ] ≤ CI3[i, 2]))


a4[i, ] ≤ CI4[i, 2]))


a5[i, ] ≤ CI5[i, 2]))


a6[i, ] ≤ CI6[i, 2]))


plot

par(mfrow=c(2,3))

m1<-c(0.06,.14)

m2<-c(0.025,.08)

m3<-c(0,.025)

o1<-seq(0,10, by=1)

ALLp1Ave <- m(a1)


169




matplot(o1, n1, type = ’l’, lty = c(1, 2, 2), col = 1, main=one1,

xlab = ”Ages of CC girls”, ylab = one2, ylim=m1)

ALLp2Ave <- m(a3)





matplot(o1, n2, type = ’l’, lty = c(1, 2, 2), col = 1, main=two1,

xlab = ”Ages of CC girls”, ylab = two2, ylim=m2)

ALLp3Ave <- m(a5)





matplot(o1, n3, type = ’l’, lty = c(1, 2, 2), col = 1, main=three1,

xlab = “Ages of CC girls”, ylab = three2, ylim=m3)

ALLnp4Ave <- m(a2)





matplot(o1, n4, type = ’l’, lty = c(1, 2, 2), col = 1, main=four1,

xlab = “Ages of CC girls”, ylab = four2, ylim=m1)

170

ALLnp5Ave <- m(a4)




five2<-expression(paste(P(Y> y[.95](t))))

matplot(o1, n5, type = ’l’, lty = c(1, 2, 2), col = 1, main=five1,

xlab = “Ages of CC girls”, ylab = five2, ylim=m2)

ALLnp6Ave <- m(a6)




six2<-expression(paste(P(Y> y[.99](t))))


xlab = “Ages of CC girls”, ylab = six2, ylim=m3)

# computation of african american probability

# for a given n

foo <- function(n){

X <- simdata(n)

k <- length(X)

res <- sapply(1:k, function(i){

d <- X[[i]][which(X[[i]]$race==0),]

y1 <- d[,“y1”]

m <- mean(y1)

s <- sd(y1)

p1 <- 1 - pnorm(y90[i], m, s)

p2 <- 1 - pnorm(y95[i], m, s)

171

p3 <- 1 - pnorm(y99[i], m, s)




c(p1, np1, p2, np2, p3, np3))

rownames(res) <- c(‘p1’, ‘np1’, ‘p2’, ‘np2’, ‘p3’, ‘np3’)

list(res)

# example

foo(1000)


B <- 1000








# estimated value

m<-rowMeans

u<-.10

v<-.05

w<-.01




172



colnames(ALLr1)<-c(‘ALLp1AveBias’,‘ALLnp1AveBias’,‘ALLp2AveBias’,

‘ALLnp2AveBias’,‘ALLp3AveBias’,‘ALLnp3AveBias’)

# MSE

ALLr2<-cbind(one=m((a1-u)2),two=m((a2-u)2),three=m((a3-v)2),

four=m((a4-v)2),five=m((a5-w)2),six=m((a6-w)2))

colnames(ALLr2)<-c(‘ALLp1AveMSE’,‘ALLnp1AveMSE’,‘ALLp2AveMSE’,

‘ALLnp2AveMSE’,‘ALLp3AveMSE’,‘ALLnp3AveMSE’)









a1[i, ] ≤ CI1[i, 2]))


a2[i, ] ≤ CI2[i, 2]))


a3[i, ] ≤ CI3[i, 2]))


a4[i, ] ≤ CI4[i, 2]))


a5[i, ] ≤ CI5[i, 2]))

173


a6[i, ] ≤ CI6[i, 2]))


# plot

par(mfrow=c(2,3))

m1<-c(0.06,.14)

m2<-c(0.025,.08)

m3<-c(0,.025)

o1<-seq(0,10, by=1)

ALLp1Ave <- m(a1)





matplot(o1, n1, type = ‘l’, lty = c(1, 2, 2), col = 1, main=one1,

xlab = “Ages of AA girls”, ylab = one2, ylim=m1)

ALLp2Ave <- m(a3)





matplot(o1, n2, type = ‘l’, lty = c(1, 2, 2), col = 1, main=two1,

xlab = “Ages of AA girls”, ylab = two2, ylim=m2)

ALLp3Ave <- m(a5)



174



matplot(o1, n3, type = ‘l’, lty = c(1, 2, 2), col = 1, main=three1,

xlab = “Ages of AA girls”, ylab = three2, ylim=m3)

ALLnp4Ave <- m(a2)





matplot(o1, n4, type = ‘l’, lty = c(1, 2, 2), col = 1, main=four1,

xlab = “Ages of AA girls”, ylab = four2, ylim=m1)

ALLnp5Ave <- m(a4)





matplot(o1, n5, type = ‘l’, lty = c(1, 2, 2), col = 1, main=five1,

xlab = “Ages of AA girls”, ylab = five2, ylim=m2)

ALLnp6Ave <- m(a6)






xlab = “Ages of AA girls”, ylab = six2, ylim=m3)

175

# R code for Box-Cox Transformation

#lambda <- seq(-5, 5, length = 500)

t1data<-lapply(sdata,function(l) with(l, boxcox(SYSAV AGE+RACE+HTAV, data

= l, plotit = FALSE)))

# lambdas for SBP

result sbp <- t(sapply(t1data, function(d){

lambdas <- d$x[which.max(d$y)]

ll <- d$y[which.max(d$y)]

c(lambdas, ll)

}))

colnames(result sbp) <- c(’lambdas’,’logLik’)

c<-result sbp

lamda<-result sbp[,1]

age<-seq(9.1,19,by=.1)

nlamda<-data.frame(cbind(age,lamda))

# Bootstrap confidence band

m <- with(nlamda, ksboot(age, lamda, nreps = 5000))

with(nlamda, plot(age, lamda, las = 1))

with(m, matpoints(x, m[, - 1], type = ’l’, col = c(1, 2, 2), lty = c(1, 2, 2)))

newdat<-cbind(age,lamda,slamda=m[,2])

# data sets with transformed SBP variable with raw lamda and smooth lamda

require(car)

newsbp <- vector(“list”, NROW(result sbp))

for(i in 1:NROW(result sbp)){

d <- sdata[[i]] # i-th data set

d$sbp bcraw <- bcPower(d$SYSAV, newdat[i, 2])

176

d$sbp bcsmooth <- bcPower(d$SYSAV, newdat[i, 3])

newsbp[[i]] <- d

}

# mean, min and max

msd<-matrix(NA,100,6)

for(i in 1:100){

d<-newsbp[[i]]

muraw<-mean(d$sbp bcraw)

minraw<-min(d$sbp bcraw)

maxraw<-max(d$sbp bcraw)

musmooth<-mean(d$sbp bcsmooth)

minsmooth<-min(d$sbp bcsmooth)

maxsmooth<-max(d$sbp bcsmooth)

msd[i,]<-c(muraw,minraw,maxraw,musmooth,minsmooth,maxsmooth)

}

colnames(msd)<-c(’rmean’,’rmin’,’rmax’,’smean’,’smin’,’smax’)

msd<-data.frame(msd)

177

10 References

1. American Academy of Pediatrics (Aug-2005): The fourth report on the di-

agnosis, Evaluation, and Treatment of High Blood Pressure in Children and

Adolescents-Pediatrics.

2. Box, G. E. P. Cox, D. R. (1964): An analysis of transformations, Journal of

the Royal Statistical Society, Series B, 26, pp. 211-252.

3. Chiou, J.M., Mller, H.G., Wang, J.L. (2004): Functional response models. Sta-

tistica Sinica 14, pp. 675-693.

4. Danieals, S.R., McMahor, R.P.,Obarzanek, E., Waclawiw, M.A.,Similo, S.L.,Biro,

F.M, Schreiber, G.B.,Kimm, S.Y.S., Morrison, J.A., and Barton, B.A.(1998):

Longitudinal corrlates of change in blood pressure in adolescent girls. Hyper-

tension 31, pp. 97-103

5. Diggle, P.J., Heagery, P.,Liang, K.Y. and Zeger, S.L. (2002): Analysis of Lon-

gitudinal data, 2nd ed., Oxford:Oxford University Press

6. Fan, J. and Gijbels, I. (1996): Local Polynomial Modelling and Its Applications,

Chapman and Hall, First Edition.

7. Fan, J. and Wu,Y. (2008): Semiparametric estimation of covariance matrices

for longitudinal data. Journal of American Statistical Assosciation. 103, pp.

1520-1533.

8. Fan, J. and Zhang, J.T. (2000): Two-step estimation of functional linear models

with applications to longitudinal data. Journal of Royal Statistcal Society, Ser.

B62, pp. 303-322.

178

9. Fitzmaurice, G., Davidian, M., Verbeke, G., and Molenberghs G. (Editors)

(2009): Longitudinal Data Analysis. Chapman & and Hall/CRC:Boca Ra-

ton, FL.

10. Freedman, Jade Lee (2003): An Analysis of Box-Cox Transformed Data. PhD

Thesis.

11. Hall, P., Wolff, R.C.L. and Yao, Q. (1999): Methods for estimating a conditional

distribution function. Journal of American Statistical Assosciation. 94, pp.

154-163

12. Hall, P., Racine, J. and Li,Q. (2004): Cross-validation and the estimation of

conditional probability densities. Journal of American Statistical Assosciation,

99, pp. 1015-1026

13. Hart, T.D., and T.E.Wehrly. (1993): Kernel regression estimation using re-

peated measurements data. Stochastic Processes Their Application., 45, pp.

351-361.

14. Hoover, D. R., Rice, J. A., Wu, C. O. and Yang, L. P. (1998): Nonparametric

smoothing estimates of time-varying coefficient models with longitudinal data.

Biometrika 85, pp. 809-822.

15. Hu, Z., Wang, N. and Carroll, R. J. (2004). Profile-kernel versus backfitting in

the partially linear model for longitudinal/clustered data. Biometrika, 91, pp.

251-262.

16. Gibbons and Chakraborti (4th Edition): Nonparametric Statistical Inference.

17. James, G.M., T.J. Hastie, and C.A.Sugar. 2000. Principal component models

for sparse functional data. Biometrika 87, pp. 587-602.

179

18. Jianqing Fan and Jin-Ting Zhang (2000): Two-step estimation of functional

linear models with applications to longitudinal data- Journal of Royal Statistical

Society, Series B (Statistical Methodology), Vol. 62, No.2, pp. 303-322.

19. Lin, X., and R. Carroll. (2001): Semiparametric regression for clustered data.

Biometrika, 88, pp. 1179-1185.

20. L.J.Wei, Z.Ying, S.C.Cheng (1995): Analysis of transformation models with

censored data. Biometrika, 8, 835-845.

21. Lehmann and Casella (1998): Theory of Point Estimation, Springer, Second

Edition.

22. Molenberghs, G. and Verbeke, G. (2005): Models for Discrete Longitudinal

Data. Springer: New York, NY.

23. National Heart, Lung, and Blood Institute Growth and Health Research Group

(NGHSRG) (1992): Obesity and cardiovascular disease risk factors in black and

white girls: the NHLBI Growth and Health Study. American Journal of Public

Health 82, pp. 1613–1620.

24. Obarzanek, E., C.O. Wu, J.A. Cutler, R. W. Kavey, R.W. Pearson, S.R. Daniels,

(2010): Prevalence and incidence of hypertension in adolescent girls. The Jour-

nal of Pediatrics, 157(3), pp. 461-467.

25. National High Blood Pressure Education Program Working Group on High

Blood Pressure in Children and Adolescents (NHBPEP Working Group) (2004).

The fourth report on the diagnosis, evaluation, and treatment of high blood

pressure in children and adolescents. Pediatrics 114, pp. 555-576.

180

26. Qu, A. and Li, R. (2006): Nonparametric modeling and inference function for

longitudinal data. Biometrics, 62, pp. 379-391

27. Senturk, D. and Muller, H. G. (2006): Inference for covariate adjusted regression

via varying coefficient models. Annals of Statistics, 34: pp. 654-679.

28. Rohatgi and Saleh (2006): An Introduction to Probability and Statistics, second

edition, John Wiley & Sons Inc.

29. Sakia, R.M. (1992): The Box-cox transformation technique: A review. The

Statistician 41, pp. 169-178.

30. Thompson, D. R., Obarzanek, E., Franko, D. L., Barton, B. A., Morrison,

J., Biro, F. M., Daniels, S. R. and Striegel-Moore, R. H. (2007). Childhood

overweight and cardiovascular disease risk factors: The National Heart, Lung,

and Blood Institute Growth and Health Study. Journal of Pediatrics 150, pp.

18-25.

31. van der vaart, A. W. (1998): Asymptotic Statistics, Cambridge University

Press.

32. Wu, C. O., Tian, X. and Yu, J. (2010): Nonparametric estimation for time-

varying transformation models with longitudinal data. Journal of Nonparamet-

ric Statistics, 22, pp. 133-147.

33. Wu, C. O., Tian, X. (2013a): Nonparametric estimation of conditional distribu-

tion functions and rank-tracking probabilities with longitudinal data. Journal

of Statistical Theory and Practice, 7, pp. 1-26.

34. Wu, C. O., Tian, X. (2013b): Nonparametric estimation of conditional distribu-

tions and rank-tracking probabilities with time varying transformation models

181

in longitudinal studies, Journal of the American Statistical Association, Vol

108, Issue 503, 2013.

35. Zhou, L., Huang, J. Z., Carroll, R. (2008): Joint modelling of paired sparse

functional data using principal components. Biometrika, 95(3), pp. 601-619.

182

Nonparametric Smoothing Estimation of Conditional ...

Documents

Transcript of Nonparametric Smoothing Estimation of Conditional ...