To save our environment, you are adviced not to print them out
Transcript of To save our environment, you are adviced not to print them out
To save our environment, you are adviced not to print them out because every participant will get a hard copy of the abstracts in the conference bag.
Plenary Talks
Testing Multinormality in Two-level Structural Equation Models
Peter M. Bentler
Department of Psychology & Statistics, University of California, Los Angeles, USA
Jiajuan Liang
Department of Quantitative Analysis, University of New Haven, USA
Multinormality is a common assumption in the maximum likelihood analysis of two-level struc-
tural equation models (Bentler, Liang & Yuan, 2005). In these models, the independence condition
on level-1 observations is no longer satisfied. As a result, existing statistics for testing multinor-
mality based on independent observations cannot be directly used in two-level structural equation
models. In this talk we will tackle this problem by constructing three types of necessary tests: 1)
tests based on the theory of spherical and spherical matrix distributions (Fang, Kotz & Ng, 1990;
Fang & Zhang, 1990). A series of necessary tests and a graphical method with a balanced design
are constructed by using the same techniques as in Fang, Li and Liang (1998) and Liang, Li, Fang
and Fang (2000); 2) tests based on extended Mardias skewness and kurtosis statistics with missing
data (Yuan, Lambert & Fouladi, 2004); and 3) tests based on imputation of independent missing
data (Tan, Fang, Tian & Wei, 2004). These necessary tests can be applied without requiring a large
level-1 or level-2 sample size. Monte Carlo studies are carried out to demonstrate the performance
of the proposed tests in the aspects of controlling type I error rates, the power against a departure
from normality for level-1 variables, and the power against a departure from normality for level-2
variables. An application of these tests to a practical data set is illustrated.
Projection Properties of Factorial Designs for Factor Screening
Ching-Shui Cheng
Academia Sinica, Taiwan and University of California, Berkeley, USA
The projection of a factorial design onto a subset of factors is the subdesign consisting of the
given subset of factors (or, equivalently, the subdesign obtained by deleting the complementary
set of factors). A factor-screening design with good projections onto small subsets of factors can
provide useful information when a small number of active factors have been identified. I shall give
a review of projection properties of factorial designs, in particular those of nonregular designs with
complex aliasing.
1
Plenary Talks
Bayesian Networks for Forensic DNA Identification
Philip Dawid
Department of Statistical Science, University College London, United Kingdom
Problems of forensic identification from DNA profile evidence can become extremely challenging,
both logically and computationally, in the presence of such complicating features as missing data
on individuals, mixed trace evidence, mutation, silent alleles, laboratory and handling errors, etc.
etc. In recent years it has been shown how Bayesian networks can be used to represent and solve
such problems.
”Object-oriented” Bayesian network systems, such as Hugin version 6, allow a network to con-
tain repeated instances of other networks. This architecture proves particularly natural and useful
for genetic problems, where there is repetition of such basic structures as Mendelian inheritance
or mutation processes.
I will describe a ”construction set” of fundamental networks, that can be pieced together, as
required, to represent and solve a wide variety of problems arising in forensic genetics. Some
examples of their use will be provided.
Joint work with Julia Mortera and Paola Vicard.
Statistical Foundation: Then and Now
Jianqing Fan
Department of Operation Research and Financial Engineering, Princeton University, USA
The theory on the distribution of maximum likelihood ratios is fundamental and indispensible
to classical parametric inferences. Despite their success in parametric inferences, the maximum
likelihood ratio statistics might not exist in nonparametric function estimation setting. Even if they
exist, they may be are hard to find and can be not optimal. The generalized likelihood statistics
will be introduced to overcome these drawbacks. New Wilks’ phenomenon is unveiled in infinite
dimensional parameter spaces. We demonstrate that the generalized likelihood statistics (GLR) are
asymptotically distribution free and follow χ2-distributions for a number of testing problems and
a variety of useful semiparametric and nonparametric models. These include the Gaussian white
noise model, nonparametric regression models, varying coefficient models, additive models and
partial linear varying-coefficient models. We further demonstrate that generalized likelihood ratio
statistics are asymptotically optimal in the sense that they achieve optimal rates of convergence
given by Ingster (1993). They can even be adaptively optimal in the sense of Spokoiny (1996).
Issues on bias reduction will be addressed. The talk is based on a series of recent papers by my
collaborators, Chunming Zhang, Jian Zhang, Jiangcheng Jiang and Tao Huang.
2
Plenary Talks
PCA for FDA
Peter Hall
Centre for Mathematics and its Applications, Australian National University, Australia
Principal components analysis (PCA) is arguably more important a tool for functional data
analysis (FDA) than it is when one is analysing vector-valued data, since the distribution of
functional data cannot readily be treated in its full, infinite-dimensional context. For example,
even simple linear regression in function data analysis requires dimension reduction, and there,
principal-components analysis can help determine both the ”angles” of projection and the number
of projections used. In this talk we shall discuss some of the theoretical issues that arise in principal
components analysis for functional data, and some of the methodology to which PCA for FDA
leads.
Kai-Tai Fang’s Contributions to Quasi-Monte Carlo Methods
Fred J. Hickernell
Hong Kong Baptist University, Hong Kong and Illinois Institute of Technology, USA
Prof. Kai-Tai Fang’s contributions to quasi-Monte Carlo (or number-theoretic) methods have
been groundbreaking, opening new areas of fruitful research. Quasi-Monte Carlo methods use
evenly distributed, often deterministic, points instead of simple random points. Such points are
called low discrepancy points. In the 1980s Prof. Fang together with Prof. Yuan Wang proposed
using low discrepancy points for experimental design. Previously such points had been used pri-
marily for evaluating high dimensional integrals. Prof. Fang went on to explore various uses of
low discrepancy points for solving statistical problems. These were explained in the monograph
he co-authored with Prof. Wang. Prof. Fang and his collaborators developed many methods for
constructing low discrepancy sets of small size. These included both computational methods and
lower bounds that could be used to verify when those methods obtained the actual optima. This
talk provides an overview of Prof. Fang’s work on quasi-Monte Carlo methods.
Fang Kai-Tai: A Life of Statistician
Dennis Lin
Department of Statistics, Pennsylvania State University, USA
Kai-Tai Fang was born in a poor family and received very little statistical education in his early
career, and yet he became one of the most international influential statisticians. Academia typically
evaluates their faculty in three portions: teaching, research and service. From my personal view,
KT Fang is simply the best with these three components combined. This talk attempts to review
history of Fang’s life before his first retirement in 2005. Hopefully, we will be able to generate
some general observations on the making of a successful statistician to be shared with young
statisticians. Thanks the program organizer for providing such a great opportunity for presenting
my very primitive findings.
3
Plenary Talks
Sometimes it is Possible to Reduce Both Variance and Bias: The
Multi-process Parallel Antithetic Coupling For Backward and Forward
Markov Chain Monte Carlo
Xiao-Li Meng
Department of Statistics, Harvard University, USA
This talk is based on Craiu and Meng (2005, The Annals of Statistics), which has the following
abstract:
”Antithetic coupling is a general stratification strategy for reducing Monte Carlo variance with-
out increasing the simulation size. The use of the antithetic principle in the Monte Carlo literature
typically employs two strata via antithetic quantile coupling. We demonstrate here that further
stratification, obtained by using k>2 (e.g., k = 3 − 10) antithetically coupled variates, can offer
substantial additional gain in Monte Carlo efficiency, in terms of both variance and bias. The
reason for reduced bias is that antithetically coupled chains can provide a more dispersed search
of the state space than multiple independent chains. The emerging area of perfect simulation
provides a perfect setting for implementing the k-process parallel antithetic coupling for MCMC
because, without antithetic coupling, this class of methods delivers genuine independent draws.
Furthermore, antithetic backward coupling provides a very convenient theoretical tool for inves-
tigating antithetic forward coupling. However, the generation of k > 2 antithetic variates that
are negatively associated, i.e., they preserve negative correlation under monotone transformations,
and extremely antithetic, i.e., they are as negatively correlated as possible, is more complicated
compared to the case with k = 2. In this paper, we establish a theoretical framework for investigat-
ing such issues. Among the generating methods that we compare, Latin hypercube sampling and
its iterative extension appear to be general-purpose choices, making another direct link between
Monte Carlo and Quasi Monte Carlo.”
Optimal Factorial Designs for cDNA Microarray Experiments
Tathagata Banerjee
University of Calcutta, India
Rahul Mukerjee
Indian Institute of Management Calcutta, India
Optimal designing of cDNA microarray experiments is an area of enormous potential that
has started opening up only very recently. Although these experiments are structurally similar
to classical paired comparison experiments, the two can have quite different objects of principal
interest. Hence optimality results from the latter to not routinely carry forward to the former.
The present paper addresses the design problem for microarrays when the treatments have a
factorial structure, via deployment of tools like approximate theory, Kronecker representation and
unimodularity.
We begin by presenting an outline of cDNA microarray experiments. Next, analytical results
for the 22 factorial are obtained using the approximate theory. Thereafter, we consider general
factorials, obtain an exact result on optimal saturated designs and also study nearly saturated
cases. The situation where the underlying model includes dye-color effects is also considered and
the role of dye-swapping is rigorously investigated. The paper ends with a discussion of some open
issues.
4
Plenary Talks
Uniform Point Sets and Their Applications
Harald Niederreiter
Department of Mathematics, National University of Singapore, Singapore
Uniform point sets in probability spaces were introduced by the speaker in 2003. They are finite
point sets in a given probability space (X,B, µ) with a uniformity property relative to a family
of µ-measurable subsets of X. Uniform point sets have since found applications in areas such as
numerical integration, computer graphics, and computational finance. The talk will present this
work in a broad framework that will link it with related themes.
Probability Models in Reliability and Survival Analysis
Ingram Olkin
Department of Statistics, Stanford University, USA
Nonnegative random variables arise naturally in a wide variety of applications, in particular,
as life-lengths of devices or of biological organisms. By contrast, the normal distribution, which
has long played a central role in statistics, allows for the corresponding random variables to take
on all real values. For nonnegative random variables, as arise in reliability and survival analysis,
there is no distribution as pervasive as the normal distribution, with its foundation in the central
limit theorem.
Several approaches are in common use for the analysis of data: distribution-free methods,
qualitatively conditioned methods, semiparametric methods, and parametric methods.
Historically, methods of fitting ere confined to disribution functions and density functions under
the rubric of ”curve fitting”. Here the names of Pearson, Charlier, Edgeworth, Kapteyn, Thiele,
and others come to mind.
In this survey we describe characteristics of distributions that may serve in deciding on a
model to be used in data analysis. In particular we discuss the behavior of other definitions of
distributions: hazard rate, hazard function, residual life distribution, mean residual life, odds ratio,
inverse distribution function, among others.
An important aspect of distributions is whether some ordering holds, and we here review sev-
eral orderings. The discussion of nonparametric families includes descriptions and implications
of increasing failure rate families, new-better-than-used families, among others. Semiparametric
families include those for which a parameter is included, and here we discuss how such families are
ordered.
This is a joint work with Albert W. Marshall.
5
Plenary Talks
Professor Fang’s Contribution to Multivariate Analysis
Dietrich von Rosen
Department of Biometry and Engineering, Swedish Univeristy of Agricultural Sciences,
Sweden
The talk will briefly consider Professor Fang’s very broad contribution to multivariate statistics.
In particular we focus on multivariate distribution theory (elliptical distribution theory, copulas,
among others) and multivariate linear models (growth curve models). Some connections to high
dimensional multivariate analysis will be shown. Moreover, we also present a new application
of elliptical distribution theory to the approximation of maximum likelihood estimators in the
classical Growth Curve model.
Key words and phrases. Multivariate analysis, elliptical distributions, growth curve model, distribution
approximation.
Statistical Methods for the Design and Aanlysis of Xenograft
Experiments: Uniform Design and Constrained Parameter Models
Ming T. Tan, Hong-Bin Fang and Guo-Liang Tian
Division of Biostatistics, University of Maryland Greenebaum Cancer Center, USA
Xenograft model typically refers to human cancer bearing mouse model, where human tumor
tissues (e.g., sliced tissue blocks, or tumor cells) are grown in mice. In cancer drug development,
demonstrated anti-tumor activity in this model is an important step to bring a promising compound
to human. The key outcome variable is tumor volumes measured over a period of time. Since cancer
therapy typically involves combinations of several drugs with the goal to achieve greater efficacy
with lesser toxicity, combination studies need to be optimally designed, so that with moderate
sample size, the joint action of two drugs can be estimated and best combinations identified with
reasonable cost. Since we typically do not have enough information on the joint action of the
two compounds before experiment, we propose a novel nonparametric model that does not impose
strong assumptions on the joint action. We then propose an experimental design for testing
joint action using uniform measure. Statistical analysis of these experiments involves typically
incomplete data since a mouse may die during the experiment or may be sacrificed when its tumor
becomes unbearable, or the tumor volume falls below detectable limit. In addition, if no treatment
were given to the tumor-bearing mice, the tumors would keep growing until the mice die or are
sacrificed, thus resulting some regression coefficients being constrained. We develop a maximum
likelihood method based on an EM-type algorithm to estimate the dose-response relationship while
accounting for the informative censoring and the constraints of model parameters. We illustrate
the current methods of experimental design and data analysis with a study on two new drugs.
6
Plenary Talks
Optimal Designs for Fourier Regression: Some Dynamical System
Constructions
Henry Wynn
Department of Statistics, London School of Economics, United Kingdom
The work on optimal design for Fourier regression in the joint paper (with R Schwabe, E.
Riccomagno, Annals of Statistics, 1997), and other papers, is revisited. This is a kind of mul-
tidimensional Nyquist sampling theory in which there is a trade-off between model complexity
and a generalised type of sampling frequency. As the model complexity increases more complex
sampling patterns are required. The basic type of design is an integer lattice of the kind pioneered
by Professor Fang. The problem lies in the special choice of generators. At a certain point in
the previous work a pattern similar to the Cantor set construction was discovered. This is the
starting point for the present work in an attempt to link the design problem to certain problems
in the construction of optimal designs using dynamical systems. New tools are used, in particular
methods of symbolic computation, to obtain solutions. Ad hoc methods can also be used which
can then be seen to give rise to special sequences.
Censoring and Truncation in Neutron Lifetime Estimation
Grace L. Yang
Department of Mathematics, University of Maryland, USA
In an international effort, a team of researchers demonstrated at the NIST Cold Neutron Re-
search Facility for the first time in 1999 that they can confine ultra cold neutrons in a three-
dimensional magnetic trap filled with liquid helium. This technical breakthrough gives a new way
to make more accurate measurement of neutron lifetime and to answer other questions fundamental
to physics and astrophysics. Since the demonstration, experimental protocol has been under the
development for data collection. The experiment consists of two stages. In the first stage, ultra
cold neutrons are generated and confined by the magnetic trap. In the second stage, the decays
of the trapped neutrons are recorded for analysis. Sophisticated as the experiment is, the data it
collects are nevertheless incomplete. We do not know: (a) How many neutrons are captured in the
magnetic trap, (b) The birth time of each trapped neutron, (c) How many of them have decayed
during the trapping period, and (d) How many have not yet decayed so that their decays can be
detected during the observation period. Furthermore, a recorded signal can be either a true decay
time or a false background noise. As a result, an observation can be censored, truncated or a false
signal. Under these conditions, we use a birth and death process to model lifetime of a neutron.
From the model likelihoods are constructed for estimation. Our unified approach is applicable to
other similar two-stage experiments for studying radioactive decay processes. Comparison is made
with the usual censoring model and cross-sectional sampling in biostatistics. Some open problems
are introduced.
7
Invited Talks
Mining Course Scheduling Data with Markov Chain
Fengshan Bai
Department of Mathematical Sciences, Tsinghua University, China
Course scheduling is one of the most classical timetabling problems in optimization. It is,
however, NP-Complete, hence hard to get the global optimal solution.
An introduction of PageRank, which are the way of deciding a page’s importance in Google, is
given first in this talk. Google figures that when one page links to another page, it is effectively
casting a vote for the other page. It matters because it is one of the factors that determine a page’s
ranking in the search results. CourseRank is introduced by using Markov chain, which is similar to
PageRank. Computational results show that CourseRank brings ranking in courses, and is useful
in mining the course scheduling data.
Development of the Pearl River Delta Based on Statistical Data: its
Past, Present and Future
Xinmin Bu
Guangdong Provincial Bureau of Statistics, China
With the 26 years of reform and opening-up, the Pearl River Delta in Guangdong has wit-
nessed a hard-pioneering and brilliantly-developing period. At the beginning of the reform and
opening-up, the Pearl River Delta did not play a bright role in Guangdong’s economy. Since the
26-year reform and opening-up, the Pearl River Delta has realized its economic take-off with indus-
trialization playing the dominated role, export-oriented economy having a wonderful performance,
urbanization making a fast improvement and information showing a good beginning. Its industrial
competitiveness is growing. The Pearl River Delta is not only the front-runner and backbone of
Guangdong but also one of the most dynamic powerhouses in China. In 2004, the GDP, foreign
direct investment actually utilized and total export reached 1,357.5 billion yuan, 9.02 billion US
dollars and 182.43 billion US dollars, accounting for 75.0 percent, 90.0 percent and 95.2 percent of
the provincial total; or 9.9 percent, 14.9 percent and 30.7 percent of the national total.
At present, the economic development in the Pearl River Delta is dynamic and continues to
take the leading role in China. In 2004, its GDP growth rate was 16.0 percent. The main fea-
tures of development include rapid growth of economy, fast development of high-tech industry,
preliminarily-taking-shape of IT enterprises and household electrical appliance enterprises base,
fairly high level of economic globalization, front-runner of the market-oriented reform, tremendous
development of urbanization with fast economic steps and outstanding performance of industrial
competitiveness.
Looking forward to the future, Guangdong, Hong Kong and Macao will stand more closely. With
their differentiated and complementary economic developments, wide cooperation is expected in
the days to come. It is believed that our future will be better with the common efforts of the three
sides.
8
Invited Talks
Nonparametric Modeling for Conditional Capital Asset Pricing Models
Zongwu Cai
Department of Mathematics and Statistics, University of North Carolina at Charlotte,
USA
In this paper, we study two classes of nonparametric capital asset pricing models. First, we
consider a unified nonparametric econometric model for the time-varying betas to capture the
time variation in the market betas by allowing the betas to change over time or to be a function
of some state variables. A local linear approach is developed to estimate the functional betas and
the asymptotic properties of the proposed estimators are established without specifying the error
distribution. Also, a simple nonparametric version of bootstrap test is adopted for testing the
misspecification. Secondly, we investigate a general nonparametric asset pricing model to avoid
functional form misspecification of betas, risk premia, and the stochastic discount factor. To
estimate the nonparametric functionals, we propose a new nonparametric estimation procedure,
termed as nonparametric generalized method of moment (NPGMM), which combines the local
linear fitting and the generalized method of moment, and we establish the asymptotic properties
of the resulting estimators. An efficient and feasible estimation procedure is suggested and its
asymptotic behavior is studied. Finally, finite sample properties of the proposed estimators are
investigated by the Monte Carol simulations and the empirical examples.
Orthogonal Arrays of 2- and 3-levels for Lean Designs
Ling-Yau Chan
Department of Industrial and Manufacturing Systems Engineering, University of Hong
Kong, Hong Kong
Chang-Xing Ma
Department of Statistics, University of Florida, USA
When an orthogonal array (OA) of n rows is used as the design matrix for an experiment, n
is the number of runs for the experiment. In an OA of q levels, n is an integer multiple of q2. In
an experiment, if the number of runs cannot be set exactly equal to the number of rows of an OA
because of constraints in resources or other reasons, the experimenter may use a design matrix
formed by omitting some rows of an OA. If such a design matrix is used, the number of observed
response obtained may not be enough for estimation of all the effects corresponding to columns of
the orthogonal array. A lean design is a design matrix formed by deleting some rows and columns
of an OA, which still allows efficient estimation the effects of the factors corresponding to the
remaining columns of the OA. The authors shall discuss lean designs of 2 and 3 levels, and provide
D-optimal OA’s from which lean designs can be formed.
9
Invited Talks
Asymptotics for Estimation in a Partly Linear Errors-in-Variables
Regression Model with Time Series Errors
Min Chen and Dong Li
Academy of Mathematics and Systems Science, CAS, China
In this talk, we study asymptotics of some estimators for a partly linear errors-in-variables
regression model with time series errors, The estimation of parameters of model, of autocovariance
function and autocorrelation function and of the smooth function are derived by using the nearest
neighbor-generalized least square method. Under a set of weaker conditions, the strong consistency
and asymptotic normality of these estimators are obtained. It is shown that the estimator of the
smooth function achieves an optional rate of convergence.
Obtaining O(N−2+ε) Convergence for Quadrature Rules Based on Digital
Nets
Josef Dick
School of Mathematics, University of New South Wales, Australia
This is a joint work with Friedrich Pillichshammer and Ligia-Loretta Cristea.
We are interested in approximating a high dimensional integral over the unit cube∫[0,1]2
f(x)dx
by a quadrature rule of the form
1
N
N∑k=1
f(x),
where x1, ...,xN ∈ [0, 1]s are the quadrature points.
It has been shown that for functions lying in a Sobolev space of absolutely continuous once
differentiable functions we obtain a convergence rate of O(N−1+ε) using for example randomly
shifted lattice rules or randomly digitally shifted digital nets.
If we impose stronger smoothness conditions on the functions, Hickernell showed that, using
the bakers-transformation in conjunction with randomly shifted lattice rules, we can obtain a
convergence rate of O(N−2+ε). No such results have until now been known for digital nets. On
the other hand, Dick and Pillichshammer showed that digital nets and lattice rules have many
essential properties in common, and many of the results previously only known to hold for lattice
rules have now been shown to also hold for digital nets in a similar fashion.
In this talk we show that the analogy also carries over to the use of the bakers-transformation
in conjunction with digital nets and a digital shift, that is, we also obtain a convergence rate of
O(N−2+ε) for digital nets which are randomly digitally shifted and then folded by the bakers-
transformation. Though the analysis is somewhat more involved, a computer implementation of
this method is not. Furthermore, by using digital nets instead of lattice rules, one is not confined
to use search algorithms for finding good underlying deterministic point sets.
10
Invited Talks
Balanced Factorial Designs for cDNA Microarray Experiments
Sudhir Gupta
Division of Statistics, Northern Illinois University, USA
Balanced factorial designs are introduced for cDNA microarray experiments. Single replicate
designs obtained using the classical method of confounding are shown to be particularly useful for
deriving suitable balanced designs for cDNA microarrays. Classical factorial designs obtained using
methods other than the method of confounding are also shown to be useful. The paper provides a
systematic method of deriving designs for microarray experiments as opposed to algorithmic and
ad-hoc methods given recently in literature.
The Role of Official Statistics in the Development of the Pearl River
Delta Region
Frederick W H Ho
Census and Statistics Department, HKSAR
The process of globalization, Chinas accession to WTO, implementation of Closer Economic
Partnership Arrangement (CEPA) and ever-developing regional cooperation of territories in the
Pan-Pearl River Delta (Pan-PRD) Region provide a platform for further strengthening the eco-
nomic collaboration and development among Hong Kong, Macao and the Mainland. These new
circumstances bring about both challenges and opportunities to statistics practitioners in measur-
ing and analyzing social and economic phenomena for meeting an expected increase in demand for
statistical information, in terms of both quantity and variety. Also, it entails the compilation for
statistical information in a more timely manner in order to capture as well as discern the rapidly
changing and evolving economic conditions.
Owing to their close proximity, Guangdong, Hong Kong and Macao have long been maintaining
very close contacts in the arena of statistical matters. For years, they have been making concerted
efforts to broaden the scope and raising the level of cooperation. The regional cooperation of the
Pan-PRD stimulates freer flow of people, goods and capital, thus injecting new impetus into the
closer partnership in statistical work of the Guangdong − Hong Kong − Macao Region. Building
upon the existing foundation of cooperation, statistical authorities of the Region can foster a more
synergic development by leveraging their unique strengths, complementing each others statistical
endeavours while staggering individual focuses.
To consolidate cooperation among the statistical authorities of the Guangdong − Hong Kong
− Macao Region, it is imperative that communication and coordination should be deepened and
broadened. Through strengthening the sharing of experiences gained and lessons learned, all can
certainly enhance the knowledge and understanding of such aspects as statistical systems, statistical
techniques and statistical methods adopted by each other, which will in turn help further improve
the availability and comparability of statistical data and raise the quality of statistical services of
respective authorities.
11
Invited Talks
Affine α-resolvability of Group Divisible Designs
Sanpei Kageyama
Department of Mathematics Education, Hiroshima University, Japan
The concept of affine α-resolvabiliity has been discussed for block designs in literature since
1942 for α = 1 and in particular 1963 for α ≥ 2. Among group divisible (GD) designs, affine
α-resolvable designs are known for both classes of singular GD and semi-regular GD designs.
However, no example has been found for an affine α-resolvable regular GD design in literature. In
this talk, the validity of such concept will be disproved for regular GD designs in general. Thus
the regular GD design dose not possess any property of the affine α-resolvability.
Large p Small n Asymptotics for Significance Analysis in High
Throughput Screening
Michael R. Kosorok
Department of Statistics, University of Wisconsin-Madison, USA
We develop large p small n asymptotics suitable for significance analysis after normalization
in microarray gene expression studies and other high throughput screening settings. We consider
one sample and two sample comparisons where the number n of replications (arrays) per group is
extremely small relative to the number p of items (genes). Provided the log of p squared divided
by n goes to zero, we show under very general dependency structures that p-values based on a
variety of marginal test statistics are uniformly valid in a manner which allows accurate control
of the false discovery rate. We demonstrate the results with simulation studies and several real
microarray studies. This is joint work with Shuangge Ma.
12
Invited Talks
Search for Relevant Sets of Variables in a High-Dimensional Setup
Jurgen Lauter
Otto von Guericke University Magdeburg and Interdisciplinary Centre of Bioinformatics,
University of Leipzig, Germany
Kai-Tai Fang has deserved well of the multivariate analysis and, in particular, the theory of
spherical distributions. The strategy of spherical distributions is a generalization of the classical
multivariate approach. However, this spherical concept is also an effective tool to solve difficult
problems of classical inference starting from the normality assumption.
In 1996, we have developed the principle of spherical tests. We utilized that linear combinations
of the given normally distributed variables are left-spherically distributed under the null hypothesis,
provided that the coefficients are defined as a function of the total sums of products matrix of the
sample. This principle allows testing of hypotheses, even if the dimension p is huge and the sample
size n is small. Such applications arise in recent years, for example, in gene expression analysis.
In the framework of spherical tests, multiple procedures for the recognition of relevant variables
and relevant sets of variables can also be constructed. Kropf (2000), Hommel (2004) and Westfall,
Kropf, Finos (2004) have proposed procedures to find significant single variables. The lecture to
be presented will contain some extensions to sets of variables. The multivariate spherical tests
are combined with the search for biologically interpretable structures in the data. Thus, groups
of highly correlated variables allowing the rejection of the null hypothesis are detected. In the
procedures, the emphasis lies on managing the large number of possible subsets of variables. Besides
this parametric procedure, a corresponding non-parametric strategy based on the principles by
Westfall and Young (1993) is represented.
Key words and phrases. Multiple procedure, model choice, spherical test, Gene expression analysis.
13
Invited Talks
Reduced Kernel on Support Vector Machines
Yuh-Jye Lee
Department of Computer Science and Information Engineering, National Taiwan
University of Science and Technology, Taiwan
The reduced support vector machine (RSVM) was proposed for the practical objective to over-
come the computational difficulties as well as to reduce the model complexity in generating a
nonlinear separating surface for a massive data set. It has been also successfully applied to other
kernel-based learning algorithms. Also, there were experimental studies on RSVM that showed the
efficiency of RSVM. In this talk, we first present a study the RSVM from the viewpoint of robust
design in model building and consider the nonlinear separating surface as a mixture of kernels.
The RSVM uses a compressed model representation instead of a saturated full model. Our main
result shows that the uniform random selection of a reduced set to form the compressed model
in RSVM is the optimal robust selection scheme in terms of the following criteria: (1) it mini-
mizes an intrinsic model variation measure; (2) it minimizes the maximal model bias between the
compressed model and the full model; (3) it maximizes the minimal test power in distinguishing
the compressed model from the full model. In the second part of the talk, we propose a new
algorithm, Incremental Reduced Support Vector Machine (IRSVM). In contrast to the uniform
random selection of a reduced set used in RSVM, IRSVM begins with an extremely small re-
duced set and incrementally expands the reduced set according to an information criterion. This
information-criterion based incremental selection can be achieved by solving a series of small least
squares problems. In our approach, the size of reduced set will be determined automatically and
dynamically but not pre-specified. The experimental tests on four publicly available datasets from
the University of California (UC) Irvine repository show that IRSVM used a smaller reduced set
than RSVM without scarifying classification accuracy.
14
Invited Talks
Data Mining in Chemistry
Yizeng Liang
Research Center of Modernization of Traditional Chinese Medicines, Central South
University, China
yizeng [email protected]
Huge amount of chemical and biological data have been being accumulated nowadays. How to
mine out the rules and knowledge hidden in them is really an important task for chemists, especially
for chemometricians. Here we report some results obtained in our research group in recent years,
which show that data mining in chemistry will have a prosperous future. Seeking quantitative
structure-activity relationship (QSAR) and quantitative structure-property relationship (QSRR)
has been being a dream in chemistry for a long time. In this work, projection pursuit technique
proposed in statistics was utilized to do data mining from the data of retention index and structural
descriptors. In order to find out some valuable information about the relationship, an algorithm
based on projection pursuit was developed so as to search a reasonable projection direction to
reduce the dimension of the employed high-dimensional data and see the structure of data. Samples
of alkane, alkene and cycloalkane have been studied, and it is found that when a good projection
direction is used, compounds can be separated into different classes based on special chemical
structures, such as, different number of carbon atoms in molecules, different number of branches,
double bonds number, position of double bonds, having conjugated double bonds or not and number
of rings and etc.. In order to build accurate regression models, the classification information
obtained by projection pursuit was utilized to establish different models for different classes. With
the help of the class distance, an excellent regression model was then obtained. Its estimation
errors and prediction errors are all very small and within the measurement error level, which really
gives some quite useful insight explaining why and how the chemical structure will influence the
retention behaviors of the different molecules.
References
[1] Du YP, Liang YZ, Yun D, Data mining for seeking an accurate quantitative relationship between
molecular structure and GC retention indices of alkenes by projection pursuit, Journal Of Chemical
Information And Computer Sciences 42 (6) (2002) 1283-1292
[2] Du YP, Liang YZ, Wang WM, Studies on QSPR between GC retention indices and topolog-
ical indices of cycloalkanes by using projection pursuit method, Chemical Journal Of Chinese
Universities-Chinese 24 (10) (2003) 1795-1797
[3] Du YP, Liang YZ, Data mining for seeking accurate quantitative relationship between molecular
structure and GC retention indices of alkanes by projection pursuit, Computational Biology And
Chemistry 27 (3) (2003) 339-353
[4] Hu QN, Liang YZ, Yin H, et al., Structural interpretation of the topological index. 2. The
molecular connectivity index, the Kappa index, and the atom-type E-State index, Journal Of
Chemical Information And Computer Sciences 44 (4) (2004) 1193-1201
[5] Hu QN, Liang YZ, Peng XL, et al., Structural interpretation of a topological index. 1. External
factor variable connectivity index (EFVCI), Journal Of Chemical Information And Computer
Sciences 44 (2) (2004) 437-446
15
Invited Talks
Self-weighted LAD Estimation for Infinite Variance AutoregressiveModels
Shiqing Ling
Department of Mathematics, Hong Kong University of Science and Technology, Hong
Kong
How to undertake statistical inference for infinite variance autoregressive models has been a long-
standing open problem. In order to solve this problem, we propose a self-weighted LAD estimator and
show that this estimator is asymptotically normal if the density of errors and its derivative are uniformly
bounded. Furthermore, a Wald test statistic is developed for the linear restriction on the parameters, and
it is shown to have non-trivial local power.
Simulation experiments are carried out to assess the performance of the theory and method in finite
samples and a real data example is given. The results in this paper are entirely different from results in
the literature and should provide new insights for future research on heavy-tailed time series.
Toward the Mapping of Complex Human Disorders: StatisticalMethods and Applications
Shaw-Hwa Lo
Department of Statistics, Columbia University, USA
The mapping of complex traits is one of the central areas of human genetics. Many common human
disorders are believed to be “complex“ or multifactorial, meaning that they cannot be attributed to alleles
of a single gene or one risk factor. Many genes and environmental factors contribute modest effects to a
combined action in deciding these traits. Despite a number of novel methods that have been proposed
during the past 20 years to detect the responsible genes, the success, however, has been largely restricted to
simple Mendelian diseases. For common /complex human disorders, the progress has been slow, and results
are limited. This is perhaps due in part to the need for capable statistical methods that accommodate
large amounts of correlated genotypic and phenotypic data. Most current methods that make use of
marginal information only, fail to include the information of the interaction among the disease loci. It is
thus less likely for these methods to have adequate power to detect the responsible loci. Since interactive
information among markers reflects the joint information of the traits due to multiple genes (and perhaps
other risk factors), we believe, mapping methodologies that are able to simultaneously inspect disjoint
marker loci (possibly on different chromosomes) seem crucial for the success of future genes mappings. I
shall introduce an alternative approach to address the difficulties. I will first review the methods using
family-trio data and several disease models, the backward haplotype transmission association (BHTA)
algorithm, proposed in Lo and Zheng (2002, 2004). Applications and findings using this approach from
recent projects on large-scale datasets will be presented. The methods that are applicable to other type of
data and designs are to be discussed. Time permitting, the issues of multiple comparisons and statistical
significance for a large number of tests will be addressed.
16
Invited Talks
Return Distribution and Risk Estimation: Some Empirical Evidence
Dietmar Maringer
Faculty of Economics, Politics, and Social Sciences, University of Erfurt, Germany
Value at risk (VaR) has become a standard measure of portfolio risk over the last decade. It even
became one of the corner stones in the Basel II accord about banks’ equity requirements. Nevertheless,
the practical application of the VaR concept suffers from two problems: how to estimate VaR and how
to optimize a portfolio for a given level of VaR? The optimization problem can be tackled using recent
advances in heuristic optimization algorithms. For the estimation problem, several approaches have been
suggested including the use of parametric and empirical distributions; the former are computationally
less demanding whereas the latter are often seen to be more reliable. However, our application to bond
portfolios shows that a solution to the two aforementioned problems gives raise to a third one: the actual
VaR of bond portfolios optimized under a VaR constraint might exceed its nominal level to a large extent.
Thus, optimizing bond portfolios under a VaR constraint might increase risk. This finding is of relevance
not only for investors, but even more so for bank regulation authorities.
Role of Official Statistics Under the Regional Cooperation Framework
Iun Lei Mok
Statistics and Census Service, Macao SAR
Regional economy, a widely discussed issue that has been catching much attention in recent years,
is indeed a leading trend of the economic development worldwide. Around the globe, various forms of
regional cooperation are meant to sharpen its overall competitiveness. The basis of regional cooperation
lies in the synergy of complementing one another to pursue economic growth. Formations of the Greater
Pearl River Delta region, Yangtze Delta region and the Pan-Pearl River Delta region, epitomes of the
needs in coordinating development between different parts of China, showed a conducive move toward the
promotion of regional cooperation.
To spur sustainable economic growth in Hong Kong and Macao, the Central Government has estab-
lished Closer Economic Partnership Arrangement (CEPA) with its two Special Administrative Regions.
Furthermore, signing of the ”Pan-Pearl River Delta (9+2) Regional Cooperation Frame Agreement” will
accelerate economic cooperation within the region.
Economic integration of the Pearl River Delta region has intensified closer economic ties between
Guangdong, Hong Kong and Macao. In light of these new circumstances, official statistics compilers have
to fine-tune on the scope of data collection and other statistical endeavours, so as to provide timely, relevant
and accurate information that serves as reference to the data users. This presentation will focus on the
challenges and opportunities faced by the statistical offices and the directions on future co-operation.
17
Invited Talks
Linear Regression in Case-cohort Studies: Theory and NumericalAspects
Bin Nan
Department of Biostatistics, University of Michigan, USA
Menggang Yu and John D. Kalbfleisch
Right censored data from a classical case-cohort design and a stratified case-cohort design are consid-
ered. In the classical case-cohort design, the subcohort is obtained as a simple random sample of the entire
cohort, whereas in the stratified design, the subcohort is selected by independent Bernoulli sampling with
arbitrary selection probabilities. For each design and under a linear regression model, methods for esti-
mating the regression parameters are proposed and analyzed. These methods are derived by modifying the
linear ranks tests and estimating equations that arise from full-cohort data using methods that are similar
to the ”pseudo-likelihood” estimating equation that has been used in relative risk regression for these mod-
els. The estimates so obtained are shown to be consistent and asymptotically normal. When generalized
Gehan-type weights are used, the estimating functions are shown to be monotone and a Newton-type iter-
ated method is proposed to solve the estimating equations. Variance estimation and numerical illustrations
are also provided.
Joint Modelling of Mean-Covariance Structures in Longitudinal Studies
Jianxin Pan
School of Mathematics, University of Manchester, United Kingdom
In the literature of longitudinal data analysis (LDA), modelling of mean structure was considered under
a specification of within-subject covariance structure. For example, compound symmetry and AR(1) are
common choices among others. Alternatively, it may be selected from a class of candidates according
to certain information criteria like AIC or BIC. If unfortunately the true covariance structure is not
contained in the class, the selected structure may be not close to the truth. Accordingly, misspecification
of covariance structure may severely compromise statistical inferences. Some approaches were proposed to
model the covariance structures, e.g., Chiu et al (1996), but they may suffer from either no clear statistical
interpretation or no guarantee of positive definiteness for resulted covariance matrices.
In this talk I will give a brief review on recent developments that do not have the above problems.
These include a) using iteratively re-weighted least squares algorithm to update the parameter estimates
in the new models, b) selecting a reasonable model when polynomials of time are used to model the
mean-covariance structures (Pan and MacKenzie, 2003), c) modelling heterogeneity arising often in LDA,
d) modelling conditional covariance structures in linear mixed models, and e) modelling the covariance
structure in generalized estimating equations.
These approaches will be illustrated through analyzing real data and simulation studies. It is concluded
that the covariance structures should be modelled with the means, simultaneously.
References
[1] Chiu, T. Y. M., Leonard, T. & Tsui, K. W. (1996). The matrix-logarithm covariance model.
Journal of the American Statistical Association, 91, 198-210.
[2] Pan J X & MacKenzie G (2003). On modelling mean-covariance structures in longitudinal studies.
Biometrika, 90, 239-244.
Key words and phrases. Covariance Modelling, longitudinal data.
18
Invited Talks
Comparison of Discrimination Methods for High Dimensional Data
M.S. Srivastava
University of Toronto, Canada
T. Kubokawa
University of Tokyo, Japan
Dudoit, Fridlyand, and Speed(2002) compares several discrimination methods for the classification of
tumors using gene expression data. The comparison includes the Fisher(1936)’s linear discrimination analy-
sis methods (FLDA), classification and regression tree (CART) method of Breiman et al.(1984), aggregating
classifiers of Breiman(1996) and Breiman(1996) which include“bagging” methods of Friedman(1998) and
“boosting” method of Freund and Schapire(1997). The comparison also included two more methods called
DQDA method and DLDA method respectively. In DLDA method, it is assumed that the population
covariance matrix are not only diagonal matrices but they are all equal. However, among all the preceding
methods considered by Dudoit, Fridlyand, and Speed(2002), only DLDA did well. While it is not possible
to give reasons as to why other methods did not perform well, the poor performance of the FLDA method
may be due to the large dimension p of the data even when the degrees of freedom associated with the
sample covariance n > p. In large dimensions, the sample covariance may become near singular with very
small eigenvalues. For this reason, it may be reasonable to consider a version of the principal component
method which is applicable even when p > n. Using the Moore-Penrose inverse, a general method based
on minimum distance rule is proposed. Another method which uses an empirical Bayes estimate of the
inverse of the covariance matrix along with a variation of this method are also proposed. We compare
these three new methods with DLDA method of Dudoit, Fridlyand, and Speed(2002).
Key words and phrases. Classification, discrimination analysis, minimum distance, Moore-Penrose inverse.
Optimal and Efficient Crossover Designs when Subject Effects areRandom
John Stufken
Department of Statistics, University of Georgia, USA
Crossover designs are often evaluated under the assumption that subject effects are fixed. One justifica-
tion for this is that most information about treatment comparisons is based on within subject information.
But how efficient are designs that are optimal for fixed subject effects when the subject effects are really
random? Which designs are optimal when subject effects are really random, and how does this change with
the size of the subject effects variance relative to the size of the random error variance? We investigate
these questions in the presence of carry-over effects for the situation that the number of periods is at most
equal to the number of treatments.
19
Invited Talks
A Semiparametric Partly Linear Model for Censored Survival Data
Gang Li
Department of Biostatistics, University of California, Los Angeles, USA
Qihua Wang
Academy of Mathematics and Systems Science, Chinese Academy of Science, China
This article studies a semiparametric partly linear model fro regression analysis of right censored data.
The model postulates that the mean regression function is the sum of a linear part and a completely
unspecified nonlinear component and that the error distribution is unknown. An iterative estimation
procedure is proposed to estimate the regression coefficients of the linear part and the nonlinear function.
The partly linear model allows one to study the effects of certain covariates that are of primary interest,
while imposing minimal assumptions on other independent variables. The nonlinear component can also
be used to check the linearity assumption of a covariate and further suggest more parsimonious parametric
models. Some numerical studies are conducted to evaluate the performance of the proposed estimators.
We also illustrate our methods using two real data sets.
Polynomial-Time Algorithms for Multivariate Linear Problems withSmall Effective Dimension; Average Case Setting
G. W. Wasilkowski
Department of Computer Science, University of Kentucky, USA
There is a host of practical problems that deal with functions of very many variables. As observed in a
number of papers, some of those problems (e.g., in mathematical finance or physics) deal with functions of,
so called small effective dimension, i.e., functions which essentially depend only on groups of few variables.
For some applications, the effective dimension q∗ is fairly small, e.g., q∗ = 1 or 2. In the average case
setting, such problems can be modeled by stochastic processes that are special weighted tensor products
of Gaussian processes with finite-order weights. In this talk we present resent results on tractability of
such problems. More specifically, assuming that the univariate problem admits algorithms reducing the
initial error by ε in cost proportional to ε−p, we provide a construction of algorithms Ad,ε for the general
d-variate problem reducing the initial error ε times in cost essentially bounded by ε−p dq for q independent
of d and n. The exponent q depends on q∗ and often q = q∗, i.e., it is small. For some problems q = 0,
i.e., such problems are essentially no more difficult that the corresponding scalar problem.
20
Invited Talks
The Transformation on the Measures of Underemployment: When aDeveloping Country Transferred into Developed One
Duan Wei
Tamkang University, Taiwan
Hsien-Tang Tsai
National Sun Yat-Sen University, Taiwan
Chung-Han Lu
Cheng Shiu University, Taiwan
Meng-Hsun Shih
National Sun Yat-Sen University, Taiwan
Most developing countries lack unemployment relief programs, and many unemployed workers only
have to engage in marginal economic activities to survive. On the other hand, employed persons in
developed countries are experiencing lack of adequate employment opportunities, with persons who have
jobs often being compelled to use their skills less fully or to earn lower hourly wages or work fewer
hours than they are willing and able to. But the traditional unemployment figure does not account for
these workers. Statistics on underemployment, therefore, should be used to complement statistics on
both employment and unemployment. During her transition from a developing country to developed
one, Taiwan has enacted and implemented the measures of underemployment from 1977. But fundamental
changes in Taiwans labor market, employment law, and expansion of higher education have taken place over
the past decades. Therefore, overall refinements in the measures of underemployment are most urgently
needs because the current measurement methodology has not been modified since 1993 and it is hard to
reflect the reality of labor force today. The other aim of this study is to redesign the data collection,
processing procedures and methods of reporting on the issue of underemployment in accordance with the
principles set by the International Labour Organization (ILO) to enhance international comparability.
This study firstly decomposed the underemployment issue into three dimensions, inadequate working
hours, lower income, and occupational mismatch, by reviewing literatures on this topic and analyzing the
responses from questionnaire survey on senior academics and government officers who have deep insights
of underemployment problems by the Analytic Hierarchy Process (AHP) method. Results recommended
that the measurement of low hourly income underemployment should take precedence over others forms
of underemployment. This study also retrospected to historical labor force surveys data with proposed
new measures and found that more workers were affected by inadequate employment situation than by
time-related underemployment.
Key words and phrases. underemployment, low hourly income, low working hours, occupational mismatch,
AHP.
Confidence Band and Hypothesis Testing of Leaf Area Index Trend
Lijian Yang
Department of Statistics and Probability, Michigan State University, USA
Asymptotically exact/conservative confidence bands are obtained for nonparametric regression func-
tion, based on piecewise constant/linear polynomial spline estimation respectively. Compared to the
pointwise nonparametric confidence interval of Huang (2003), the confidence bands are inflated only by a
factor of square root of log(n). Simulation experiments have provided strong evidence that corroborates
with the asymptotic theory. The method has been applied to the testing of the rigonometric trend of Leaf
Area Index, prescribed by the pupular RAMS model, on data collected from several land types from East
Africa. The spline confidence band provides strong evidence against the RAMS trend model.
21
Invited Talks
Strong Tractability of Quasi-Monte Carlo Quadrature Using Nets forCertain Banach Spaces1
Rong-Xian Yue
E-Institute of Shanghai Universities and Shanghai Normal University, China
Fred J. Hickernell
Department of Mathematics, Hong Kong Baptist University, Hong Kong
We consider the problem of approximating the weighted multivariate integral
Iρ(f) =
Z
D
f(x)ρ(x)dx,
where D is a bounded or unbounded subset of the Euclidean space Rs, the dimension s can be large, and the
weight function ρ(x) is nonnegative. Two quasi-Monte Carlo rules are considered. One uses deterministic
Niederreiter (T, s)-sequences, and another uses randomly scrambled Niederreiter digital (T, m, s)-nets. For
deterministic Niederreiter sequence rules we assume that the integrands f lie in a weighted Banach space,
F(1)p,q,γ,s, of functions whose mixed anchored first derivatives, ∂|u|f(xu, cu)/∂xu, are bounded in Lp norms
with anchor c fixed in the domain D, and the weighted coefficients, γ = γkk, are introduced via `q
norms over the index u, where p, q ∈ [1,∞]. For the randomly scrambled Niederreiter net rules, the class
of integrands is a weighted Banach space, F(2)p,q,γ,s, of functions whose unanchored mixed first derivatives,
∂|u|f(x)/∂xu, are bounded in Lp norms and the weighted coefficients, γ = γkk, are introduced via `q
norms, where p, q ∈ [1,∞]. The worst-case error and randomized error are investigated for quasi-Monte
Carlo quadrature rules. For the worst-case setting the quadrature rule uses deterministic Niederreiter
sequences, and for the randomized setting the quadrature rule uses randomly scrambled Niederreiter digital
nets. Sufficient conditions are found under which multivariate integration is strongly tractable in the worst-
case and randomized settings, respectively. Results presented in this article extend and improve upon those
found previously.
Key words and phrases. Multivariate integration, quasi-Monte Carlo quadrature rules, tractability.
1This work was partially supported by Hong Kong Research Grants Council grant RGC/HKBU/2020/02P, and
by NSFC grant 10271078, E-Institutes of Shanghai Municipal Education Commission (Project Number E03004) and
the Special Funds for Major Specialties of Shanghai Education Committee.
22
Invited Talks
Statistical Methods to Integrate Different Data Sources in GenomicsStudies
Hongyu Zhao
Department of Epidemiology and Public Health, Yale University, USA
Recent advances in large-scale RNA expression measurements, DNA-protein interactions, protein-
protein interactions and the availability of genome sequences from many organisms have opened the op-
portunity for massively parallel biological data acquisition and integrated understanding of the genetic
networks underlying complex biological phenotypes. Many established statistical procedures have been
proposed to analyze a single data type, e.g. clustering algorithms for microarray data and motif find-
ing methods for sequence data. However, different data sources offer different perspectives on the same
underlying system, and they can be combined to increase our chance of uncovering underlying biological
mechanisms. In this talk, we will describe various statistical methods that have been developed (as well as
need to be developed) to integrate diverse genomics and proteomics information to dissect transcriptional
regulatory networks and infer protein-protein interaction networks. Some of these methods will be illus-
trated through their applications to understand transcription regulation and protein-protein interaction in
yeast.
A Nonparametric Multipoint Screening Method for QTL Mapping
Tian Zheng, Hui Wang and Shaw-Hwa Lo
Department of Statistics, Columbia University, USA
It is believed that most human disorders are polygenic, which means the variation in the quantitative
traits of such orders can not be attributed to a single gene. Rather, multiple genes, with complicated
interactions, may contribute to the spectrum of variation of such traits. To study the genetics of such
traits, one should inspect multiple loci simultaneously. In this talk, we present an efficient and robust
statistical screening algorithm for the mapping of quantitative trait loci (QTL). The algorithm is based on
a measure of association between the trait and the genotypes on multiple marker loci under investigation.
Through the use of multi-loci genotypes, one can take into consideration both the marginal and joint
association information with respect to the trait. The algorithm evaluates the genes in an iterative fashion
and screens out those marker loci that do not contain much information w.r.t. the trait. We will show the
advantages of this method through theoretical justification and simulation studies.
Key words and phrases. QTL mapping, nonparametric algorithm, rank-based, association mapping.
23
Invited Talks
Reform and Improvement of China’s National Account System
Xiangdong Zhu
National Bureau of Statistics of China, China
I. Brief Review of Economic Accounting System in China
• Early 1950s : Set up economic accounting system and a series of balance sheets under the Material
Production System (MPS), with total product of society and national income as core indicators.
• 1985 : Compiled Gross Domestic Product (GDP).
• 1992 : Formulated the National Economic Accounting System of China (Pilot Programme), cap-
turing the international standards as stipulated in the 1968 version of System of National Accounts
(SNA) of the United Nations and some MPS modules.
• 1993 : Removed national income aggregates from the accounting system.
• 1999 : Started the revision of the 1992 Pilot Programme, bringing about the prevailing National
Economic Accounting System of China 2002.
II. Current Status of Economic Accounting System in China
• National Economic Accounting System of China 2002 consists of basic accounting tables, economic
accounts and satellite tables.
• Schedules
– Quarterly : GDP by production approach at both current and constant prices
– Annual : GDP by production and expenditure approaches at both current and constant
prices, flow-of-fund tables, balance of payments tables, assets and liability tables and eco-
nomic accounts
– Every 5 years : Input-output tables
III. Reform and Improvement of China’s National Accounts System
• To put in place a sample survey system for the service sector and to improve the mechanism for
data reporting/sharing with other ministries.
• To revamp historical GDP time series based on economic census results.
• To improve quarterly statistical systems for better compilation of production and use accounts of
quarterly GDP.
• To compile producer price indices for selected service sectors and price indices of trade in services
to cater for the estimation of constant price GDP.
• To enhance data collection methods to facilitate further breakdowns of source data for national
accounts statistics.
• To standardize the GDP compilation methodology at regional level for better management of
key series and to establish a joint assessment mechanism for GDP figures to improve consistency
between national and regional data.
• To implement studies and pilot projects on resource and environment accounting in collaboration
with viable agencies.
• To study non-observed economic activities in China for future integration of such activities into
the GDP estimation.
24
Contributed Talks
Generalized Liu Type Estimators Under the Balanced Loss Function
Fikri Akdeniz
Department of Management, University of Cukurova, Turkey
Alan T. K. Wan
Department of Statistics, City University of Hong Kong, Hong Kong
Esra Akdeniz
Department of Statistics, Pennsylvania State University, USA
In regression analysis, ridge regression estimator and Liu type estimator (alternative biased
estimator), where 0 < d < 1 [Kejian, Liu, Commun. Statist. Theory Methods 22 (1993): 393-402]
are often used to overcome the problem of multicollinearity. These estimators have been evaluated
using the risk under quadratic loss criterion, which places sole emphasis on estimators’ precision.
The traditional mean square error (MSE) as the measure of efficiency of an estimator only takes
the error of estimation into account. In 1994, Zellner [see, Statistical Decision Theory and Related
Topics: 377-390, S.S. Gupta and J.O. Berger (Eds.)] proposed a balanced loss function. Here,
we consider the balanced loss function which incorporates a measure for the goodness of fit of the
model as well as estimation precision.
Key words and phrases. Balanced loss, collinearity, Liu type estimator, ridge regression, risk.
Estimation of a Multivariate Normal Mean Under Balanced LossFunction
Akbar Asgharzadeh
Department of Statistics, University of Mazandaran, Iran
[email protected] or [email protected]
This paper considers the Bayesian analysis of the multivariate normal distribution using a loss
function that reflects both goodness of fit and precision of estimation. The Bayes estimators of the
mean vector are obtained and the admissibility of cX + d for the mean vector is also studied.
Key words and phrases. Admissibility, balanced loss function, Bayes estimator, inadmissibility, multivari-
ate normal distribution.
25
Contributed Talks
Mixture of Generalized Pareto Distributions
Abdurrazagh M. Baeshu
Department of Statistics, Al-Fatah University, Libya
We introduce the mixture of two-component generalized Pareto distributions with special cases.
Estimation of the parameters of the mixture using the maximum likelihood method. The quasi-
Newton algorithm for finding minimum(maximum) of a function of several variables is employed
to maximize the log-likelihood function of the mixture. Applications such as modeling failure data
are applied in which two kinds of mixture distributions, which are special cases of two-component
mixture of GPD’s to be fitted to these data.
Key words and phrases. Mixture distributions, maximum likelihood, EM algorithm, quasi-Newton algo-
rithm, simulated annealing, goodnes of fit, P-P plots, Q-Q plots, confidence envelops.
Modelling of Mean-covariance Structures in Linear Mixed Model forCensored Data
Yanchun Bao and Jianxin Pan
Department of Mathematics, Manchester University, United Kingdom
[email protected], [email protected]
Linear mixed models(LMMs) have been widely used in multivariate survival studies, which
incorporate random effects into the model to account for possible correlation among data. By
integrating out random effects from the model Hughes et al (1999) and Klein et al (1999) proposed
to use the marginalized likelihood-based procedure to estimate the parameters. In order to avoid
the high-dimensional integrals, Ha et al (2002) suggested maximizing the hierarchical likelihood
(Lee and Nelder 1996) to obtain the parameter estimates. Their approach, however, assumes that
data are conditionally independent given random effects. This unfortunately may not be true in
practice.
In this paper we propose a new procedure to jointly model the mean and covariance structures
for multivariate survival data within the framework of LMMs. We remove Ha et al’s (2002)
conditional independence assumption and develop a data-driven approach to model the covariance
structures. The main idea is as follows. First, we propose a new model for the extended pseudo-
response variables in the sense of Buckley and James (1979). Accordingly, modelling of multivariate
survival data reduces to modelling of ordinary longitudinal data. Second, we proposed to use the
unconstrained parameterizations of Pourahmadi (1999) and Pan and MacKenzie (2003) to jointly
model the mean-covariance structures in the new model. The parameters are then estimated within
the framework of the h-likelihood (Lee and Nelder, 1996).
For illustration the proposed approach is used to analyze the famous Litter Rat data (Mantel
et al, 1977) and is also compared to Ha et al’s (2002) and Klein et al’s (1999) results. We find that
the proposed approach improves the estimate efficiencies significantly. Simulation studies confirm
this finding and further show the new procedure produces rather accurate parameter estimates for
both mean and covariance parameters even if the censoring rate is high.
26
Contributed Talks
2005 International Comparison Program
Marion Shui-yu Chan
Price Statistics Branch, Census and Statistics Department, Hong Kong
Inter-country comparison of Gross Domestic Product (GDP) based on exchange rate conversion
is subject to considerable limitations as it does not take into account differences in price levels across
countries. The International Comparison Program (ICP) aims at collecting and comparing price
data for a basket of comparable items across countries and producing a set of Purchasing Power
Parities (PPPs) which would enable meaningful volume comparisons of GDP and other expenditure
aggregates among different countries. A PPP is the rate of currency conversion at which a given
amount of currency will purchase the same volume of goods and services in two countries.
The 2005 ICP is a global statistical initiative led by the World Bank and covers around 150
countries/territories worldwide. It is organized on a regional basis and the regional results will
be linked through a ”ring comparison” to obtain the global results. The ring comparison involves
choosing a subset of countries from each region to price a common product list in addition to their
regional lists. Hong Kong is currently participating in both the comparison for the Asia Pacific
region and the ring comparison.
In this paper, the methodological framework of the 2005 ICP is presented. The proposed method
for calculating the PPPs, known as the CPRD (which stands for country, product, representativity
and dummy) method, is explained and illustrated with examples. The CPRD method adopts a
multilateral approach to estimate a set of transitive parities simultaneously for a group of countries
using data for all countries in the group. The pricing survey being carried out in Hong Kong is
also introduced.
Key words and phrases. Purchasing power parity (PPP), ring comparison, CPRD method, representativ-
ity, transitivity.
27
Contributed Talks
Statistical Assessment for Dynamic Multimedia Transport andTransformation Model
Chu-Chih Chen
Department of Mathematics, Tamkang University, Taiwan
Kuen-Yuh Wu
Division of Environmental Health and Occupational Medicine, National Health Research
Institutes, Taiwan
Multimedia model is a dynamic model that can be used to assess time-varying concentrations
of contaminants introduced initially to soil layers or for contaminants released continuously to
air or water. A typical multimedia model consists of the following ”compartments”: air, plants,
ground-surface soil, root soil, vadose soil, surface water, and sediments (McKone and Enoch,
2002). In this paper, we apply a state-space model approach to estimate the dynamic multimedia
model parameters such as transfer and contamination input rates among different compartments.
Missing values of certain compartments are estimated by their theoretical expectations given other
compartments measurements. Total amount of contaminant within a compartment is obtained
by integrating out the contaminant spatial distributions using kriging method. The unobserved
”true” contaminants are then simulated from their posterior distributions following the Kalman
filter procedure. The model parameters and the true contaminants are then simulated iteratively
using the Markov Chain Monte Carlo simulations.
Key words and phrases. Kalman filter, kriging, Markov chain Monte Carlo, missing values, state-space
model.
A Kind of Urn Models in Clinical Trial
Guijing Chen
Department of Mathematics, Anhui University, China
Chunhua Zhu
Department of Finance and Statistics, USTC, China
Yao Hung Wang
Department of Statistics, Tunghai University, Taiwan
In this paper we study urn model, using some available estimates of successes probabilities and
adding particle parameter, we establish adaptive models. We obtain some strong convergence the-
orems, rates of convergence, asymptotic normality of components in the urn, and estimates. With
these asymptotical results, we show that the adaptive designs given in this paper are asymptotically
optimal designs.
Key words and phrases. Urn model, sequential design, strong convergence, asymptotical normality, optimal
design.
Mathematics Subject Classification. Primary 60F15; Secondary 62L05, 60G42
28
Contributed Talks
Asymptotic Distributions of Error Density Estimators in First-orderAutoregressive Models
Fuxia Cheng
Department of Mathematics, Illinois State University, USA
This paper considers the asymptotic distributions of the error density estimators in first-order
autoregressive models. At a fixed point, the distribution of the error density estimator is shown
to be normal. Globally, the asymptotic distribution of the maximum of a suitably normalized
deviation of the density estimator from the expectation of the kernel error density (based on the
true error) is the same as in the case of the one sample set up, which is given in Bickel and
Rosenblatt (1973).
Key words and phrases. AR(1) process, residuals, kernel density estimation.
Power Analysis and the Cochran-Mantel-Haenszel Test
Philip E. Cheng, Michelle Liou and John D. Aston
Institute of Statistical Science, Academia Sinica, Taiwan
[email protected], [email protected]
Fisher’s exact test for testing independence in a 2x2 contingency table has been criticized as
conservative by the discrete nature of the null distribution. A calibration study however establishes
that the conditional distributions of the Pearson chi-square, the Fisher exact and the likelihood
ratio are closely comparable, and are also invariant under the alternative hypotheses based upon
mutual information identity. Power analysis for the conditional tests is thereby validated and the
delusive nature of conservative test is remedied. As an application of the information identity,
tests for homogeneity of odds ratios and for conditional independence across strata in a series of
2x2 tables are analyzed. The Cochran-Mantel-Haenszel test is examined for evidence of invalid
conclusion. Simple algorithm for computing the maximum likelihood estimate of the common odds
ratio is developed using the relative entropy.
Key words and phrases. Cochran-Mantel-Haenszel test, conditional maximum likelihood estimate, entropy,
Fisher’s exact test, Kullback-Leibler divergence, likelihood ratio test, mutual information, odds ratio, Pear-
son’s Chi-square test, three-way effect.
29
Contributed Talks
Deseasonalisation of Official Statistical Series
Thomas C K Cheung
Census and Statistics Department, Hong Kong
As the seasonal component is usually one of the predominant components in a time series, its
presence often complicates the interpretation of the time series since it is difficult to discern whether
changes in data for a given period really reflect the trend-cycle movement or are merely due to
seasonal variations. Seasonal adjustment is a statistical technique commonly used to estimate and
remove the seasonal variations from a time series so as to make the underlying trend-cycle of the
series being analysed more discernible.
In Hong Kong, the Census and Statistics Department has adopted the X-11-ARIMA method
for deseasonalising official statistical series. The X-11-ARIMA method, developed by Statistics
Canada, is an internationally well-recognised method which is widely used for compiling season-
ally adjusted series by official statistical authorities. The presentation will give a brief account
on the basic principle of the X-11-ARIMA method and its application to time series of official
statistics in Hong Kong. Some practical considerations about seasonal adjustment, which are cru-
cial to the proper use of the technique, will also be discussed by using seasonal adjustment of the
unemployment rate as an illustration.
Key words and phrases. Seasonal adjustment, X-11-ARIMA method.
30
Contributed Talks
Optimal Policy for Hiring of Expertise Service in Manpower Planning
Ramaiyan Elangovan
Department of Statistics, Annamalai University, India
Manpower planning is an interdisciplinary activity. It requires combined technical skill of statis-
ticians, economists and behavioral scientists together with the practical knowledge of managers and
planners. Manpower planning techniques have become an essential tool for the modern managers,
especially in a climate of economic recession and government cutbacks. In manpower planning the
completed length of service until leaving is of great interest, since it enables us to predict staff turn
over. For a detailed account of this subject in this direction, refers to McClean et al.(1991). At the
level of the firm, the availability of appropriate manpower is an important factor that contributes
to the task force available for the completion of planned jobs and projects including industrial
production. With the growing need for highly specialized and technical manpower to deal with
very complex real life situations, suitable policies regarding recruitment, training and other aspects
should be evolved taking into considerations all the existing ground realities. This reiterates the
need for utilizing the expertise of the specialists in manpower planning. Service may be on hourly
contract basis and also likely to be fairly costly. If the number of man hours on contract is in
excess of the requirements there is wastage on one hand and on the other if the requirement is
larger than the hired man hours, there is a shortage loss arising due to non completion of the work.
In this paper an optimal policy for hiring of expertise service in manpower planning is discussed,
and the optimal number of man hours to be hired is determined. Numerical examples are provided
for exponential, truncated normal and Pearsonian type XI distributions.
Key words and phrases. Manpower planning, completed length of service, wastage optimal policy.
31
Contributed Talks
Optimization of Risk and Dividend for a Firm with the Presence ofLiability and Transaction Costs
Rosana Fok
Department of Mathematical and Statistical Science, University of Alberta, USA
A model of a financial corporation is being considered. This corporation controls its business
activities, which includes the timing and amount of dividends paid out to the shareholders, in
order to reduce its risk. We consider cases when the risk process has different bounds. This model
also includes a constant liability factor, such as bond liability or loan amortization. The general
objective of this model is to find the optimal policy that maximizes the expected total discounted
dividends paid out until the time of bankruptcy. Due to the presence of a fixed transaction cost, a
mixed classical-impulse stochastic control problem is resulted. The value function with the optimal
policy is found to be a solution to this problem and can be used to justify this problem. In the
end, a numerical analysis will be used to support the problem considered in this model.
This talk is based on a joint work with Tahir Choulli ([email protected]) and Yau Shu
Wong ([email protected]) (my supervisors).
References
[1] Cadenillas, A., Choulli, T., Taksar, M., Zhang, L. (2005) Classical and impulse control for the
dividend optimization and risk for an insurance firm. To be appeared in Mathematical Finance.
[2] Choulli, T., Taksar, M., and Zhou, X.Y.: A diffusion model for optimal dividend distribution for a
company with constraints on risk control. SIAM Journal of Control and Optimization forthcoming
(2003).
Key words and phrases. Dividends, risk, liability.
Study on the Techniques of Financial Risk Measurement in theSituation of Small Sample
He Sihui
Department of Actuarial Science, Xi‘an University of Finance and Economics, China
Wang Mao
Rejoy Group Ltd. Cor., China
On the study of financial risk management, how to analysis the qualitative and quantitative attribution
in the population by means of small sample is very important in theory and practice. The small sample
fitting technique with the Weibull-distribution be discussed in the paper, and a risk measurement method
which can be applied in financial market risk management technique be setup also. At last, the real data
demonstrate its characters in practical application.
Key words and phrases. Small sample database, financial risk measurement techniques, Weibull-distribution,
Kaplan-Meier estimator, extreme theory.
32
Contributed Talks
Comparison between 6 Sigman and 3 Sigma
Xiaoqun He
Department of Statistics, Renmin University of China, China
In the process of 6σ’s popularization in enterprises, some students very often can’t understand why
6σ is better than 3σ, and even think the specification limit of 6σ’s is wider than 3σ’s so that the 6σ has
high pass rate. This paper considers that comparison table showed by reference [1] and [2] leads to that
misunderstandings. So, the paper gives another comparison and interpretation.
References
[1] Pyzdek, T., 2001. The Six Sigma Handbook: A Complete Guide for Greenbelts, Blackbelts and
Managers at All Levels. McGraw-Hill Companies, Inc.
[2] ,2003.
Key words and phrases. 6σ, 3σ, comparison.
Spatially Weighted Finite Population Sampling Design
Sami Helle
PvTT, Finland
Erkki Liski
Department of Mathematics, Statistics and Philosophy, University of Tampere, Finland
Often in the finite population sampling we have some prior information about the subject of study i.e.
some prior knowledge of the spatial height distribution of the trees in the forest. By using prior information
of the study variable it is possible to spatially adjust the net of sampling design points so, that more design
points are in an area which affects more to the variance of the estimator and less sampling points to an
area which has less effect to the variance of the estimator. Weighting the sampling design points net by
using representative points of the spatial prior distribution it is possible to increase effectiveness of the
estimator. In this paper we present a method how to use the representative points of the prior distribution
as the sampling design points in the spatial finite population sampling and effectiveness of the method
is compared with respect to the random plot sampling and the uniform sampling designs in a simulated
forestry sampling case.
Key words and phrases. Finite population sampling, spatial sampling, representative points.
Information Theoretic Models for Dependence Analysis and MissingData Estimation
D. S. Hooda
Jaypee Institute of Engineering and Technology, India
In the present paper we derive a new information theoretic model for testing and measurement of de-
pendence among attributes in a contingency table. A relationship between information theoretic measure
and chi-square statistic is established and discussed with numerical problems. A new generalized informa-
tion theoretic measure is defined and studied in details. Maximum entropy model for estimation of missing
data in design of experiment is also explained.
33
Contributed Talks
Factors Associated with Haemoglobin Level in 43-month-old Children
S. M. Hosseini, J. H. McColl
Department of Statistics, University of Glasgow, United Kingdom
A. Sherriff
Department of Child Health, University of Bristol, United Kingdom
Aims: To investigate the relation of haemoglobin levels with some of the nutrient covariates and sex,
age education level, ethnicity, birth weight, current weight, infection, whether the child was a twin or a
singleton at 43 months.
Methods: Normal values for haemoglobin were obtained from a representative cohort of children
in focus sample at 43 months old who were randomly selected from children taking part in the Avon
Longitudinal Study of Parents and Children (ALSPAC).
Results: Haemoglobin data were symmetrically, but not normally, distributed. The Non Haem iron,
vitamin C, iron, calcium and NSP intake were skewed and were transformed to approximate normality
using the natural logarithm. Haemoglobin concentration at 43 months was negatively associated with birth
weight, positively associated with energy adjusted vitamin C intake and was higher in children who were
singleton pregnancy.
Conclusion: The prevalence of anaemia varies strongly with singleton/multiple pregnancy, and sug-
gests that the effects of singleton pregnancy may be most closely associated with higher haemoglobin level
in children at 43. Higher of vitamin C intake are associated with higher haemoglobin level in children of
this age and the inclusion of vitamin C in the diet of children is advisable.
Infants born with low weight were at increased risk of developing iron deficiency; advice to mothers
should focus on the importance of introducing nutrient dense complementary foods during pregnancy.
References
[1] Sherriff A., Emond A., Hawkins N. and Golding J., Haemoglobin and ferritin concentration in
children age 12 and 18 months, Archives of Disease in Childhood, 80 (1999), 0-5.
[2] Cowin I., Emond A., Emmett P. and ALSPAC study team, Association between composition of
the diet and haemoglobin and ferritin level in 18 month old children, European Journal of Clinical
Nutrition, 55 (2001), 278-286.
On Markov Processes in Space-Time Random Environments
Dihe Hu
School of Mathematics and Statistics, Wuhan University, China
The Markov chains in time random environments have been pursued for some time.Nawrotzki(1981)
established a general theory of this topic. Cogburn(1980-1991) developed this theory in a wide context
by making use of more powerful tools such as the theory of Hopf Markov chain. Orey(1991) reviewed the
works on this field. Hu(2004) introduced the concept of Markov processes(continuous time) in time random
environments and obtained the construction theorem and several equivalence theorems. Hu(2004), in
another paper, proved the existence and uniqueness of q-process in time random environment. Berard(2004)
introduced the concept of random walks in a space-time random environment and proved the CLT is
a.s. true. In this paper we would like to introduce the general Markov chains in space-time random
environments and prove the existence theorem, equivalence theorem and give some properties of the skew
preduct Markov chain generated by the Markov chain in space-time random environment.
34
Contributed Talks
The Sinh-Arcsinh Transformation
M. C. Jones
Department of Statistics, Open University, United Kingdom
Well, perhaps not the sinh(arcsinh) – or identity – function per se (!), but simple one- and two-parameter
variations thereon which, when their inverses are applied to normal random variables, yield a family of
distributions on the real line that:
(a) include symmetric distributions with both heavier and lighter tails than the normal;
(b) essentially incorporate the best of both Johnson’s S U distributions and the sinh-normal distribu-
tion but not the worst of the latter i.e. they are all unimodal;
(c) are reasonably tractable;
(d) control tailweights separately, so that distributions with one heavier and one lighter tail than
normal are available;
(e) afford skewness;
(f) always have their median at zero;
(g) are, as four-parameter families with the addition of location and scale parameters in the usual
way, readily fit to data; and
(h) are readily extended to the multivariate case.
I might also say a few general words about the interplay between scale and tailweight in the symmetric
case.
The Identification of Outliers in Time Series Using AutocorrelationFunction and Partial Autocorrelation Function
Rida M. Khaga and Maryouma Elakder Enaami
Department of Statistics, Alfateh-University, Libya
Outliers have recently been studied more and more in the statistical time series literature. This work
considers the problem of detecting outliers in time series data and proposes a general detection methods
based on autocorrelation and partial autocorrelation function. We have presented a simple diagnostic test
for detecting an Additive Single outlier (AO) and an Additive Step outlier (AS) using the influence function
is based on the influences of autocorrelation and partial autocorrelation functions. In this paper, the work of
Cherinck et al.(1982) is extended to form quantitative outlier detection statistics. The statistics are formed
from the absolute elements of the matrix which gives the influence function matrix, where each element
of the matrix which gives the influence on estimated autocorrelation function rk or partial autocorrelation
function Φkk of a pair of observations at time lag k. Their behaviors are different for single outlier and
step outlier. Hence, they may be used not only to detect an outlier, but also to distinguish a single outlier
from a step outlier. The proposed methods are compared with other existing outlier detection procedures.
Comparisons based on various models, sample size, parameter values, real and simulated data are used
to illustrate the effectiveness of the proposed methods, and identify outlier types and distinguish between
them. In particular, additive single outlier (AO) and additive step outlier (AS). The methods are evaluated
using both simulated time series from autoregressive and moving average added, and a set of observed time
series.
Key words and phrases. Outliers, influence function, acf, pacf.
35
Contributed Talks
Profit Analysis of a Two-unit Hot Standby Programmable LogicController (PLC)
S. M. Rizwan
Department of Mathematics and Science, Sultanate of Oman, India
Vipin Khurana
Department of Mathematics, JSS Academy of Technical Education, India
Gulshan Taneja
Department of Statistics, M D University, India
A two-unit hot standby PLCs system is described where different types of failures are repairs are noted.
It is observed that the failure of a unit is due to various reasons. The concept of inspection for detecting
the failure of a unit is also introduced. The repairman comes immediately on failure detection. System
is analyzed and expressions for various reliability measures such as mean time to system failure (MTSF),
steady-state availability, expected number of repairs, expected number of replacements (Type I and Type
II), expected number of times reinstallations and inspections carried out, busy period of the repairman
(Repair time, Type I replacement time, Type II replacement time, reinstallation time) are obtained by
using semi-Markov process and regenerative processes. Profit incurred to the system is evaluated.
Official Tourism Statistics of Macao
Pek Fong Kong
Statistics and Census Service, Macao SAR
Tourism sector has always been an important economic growth engine of Macao over the years; in
light of its rapid development, particularly after the Handover, tourism statistics become one of the more
popular subjects sought by data users.
In Macao, official tourism statistics comprise a number of indicators, viz. visitor arrivals, package tour &
hotel occupancy rate, visitor expenditure and tourist price index. Compilation of Macaos official tourism
statistics demonstrates close collaborations among different government departments and the business
sector; for instance, Immigration Department, Macao Tourism Office, Macao International Airport, hotels
and travel agencies, etc.
This presentation will briefly outline some of the special features of the indicators encompassed in
Macaos tourism statistics and, more importantly, the challenges that laid ahead.
36
Contributed Talks
Optimization of Surface Finish Parameters on Ground Face of BevelGear Using Parameter Design: Doe Way of Improvement in the Process
C. S. Pathak, A. K. Bewoor
Department of Mechanical Engineering, Sinhgad College of Engineering, India
V. A. Kulkarni
Department of Production Engineering, D. Y. Patil College of Engineering, India
S. G. Tillu
D. Y. Patil College of Engineering, India
The Problem: The differential gearbox consisting of four bevel gears mounted at 90 degrees to each
other inside the casing was facing the series problem of frequent failure of bevel gears. The Kaizen team
identified that the failure of the back face of the bevel gear is due to poor surface finish, which results in
faster crack initiation and propagation. Due to this, pre-tension in the gears were reduced leading to failure
of the gearbox. The gear face was ground using CNC machines with a recommended Ra value of 0.8µ. In
real practice, Kaizen team found out that the parameters were not in control and Ra value obtained were
consistently around 1 − 1.2µ, which is unacceptable.
The Method: An attempt has been made in this paper to recondition the grinding process by using
Taguchi’s parameter design methodology. The parameters affecting surface finish on ground face of bevel
gear were identified and set of experiments known as orthogonal array was formed to conduct the statistical
experiments.
The Result: Based on experiments, ANOM is formed to determine the relative magnitude of the
effect of each factor and to estimate the error variance. ANOVA showed the adequality of the model and
23.5% improvement in the process.
The complete methodology is presented in the paper.
How Neyman-Pearson would Advise a Clearinghouse on its MarginSetting Methodology
K. Lam and C. Y. Sin
Hong Kong Baptist University, Hong Kong
The margin system is the first line of defense against the default risk of a clearinghouse. From the
perspectives of a clearinghouse, the utmost concern is to have a prudential system to control the default
exposure. Once the level of prudentiality is set, the next concern will be the opportunity cost of the
investors. It is because high opportunity cost discourages people from hedging through futures trading
and thus defeats the function of a futures market. In this paper, we first develop different measures of
prudentiality and opportunity cost. We then borrow from Neyman-Pearsons idea to formulate a statistical
framework to evaluate different margin setting methodologies. Five margin-setting methodologies, namely,
(1) using simple moving averages of historical volatility; (2) using exponentially weighted moving averages
of historical volatility; (3) using a GARCH approach on historical volatility; (4) using implied volatility and
(5) using realized volatility are applied to the Hang Seng Index futures. Keeping the same prudentiality
level, it is shown that the approach using implied volatility approach by and large gives the lowest average
overcharge.
37
Contributed Talks
Geometric Process
Yeh Lam
Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong
Lam [1998a, b] introduced the following geometric process.
Definition. A stochastic process Xn, n = 1, 2, . . . is a geometric process (GP), if there exists some
a > 0 such that an−1Xn, n = 1, 2, . . . forms a renewal process. The number a is called the ratio of the
geometric process.
Clearly, a GP is stochastically increasing if the ratio 0 < a ≤ 1; it is stochastically decreasing if the
ratio a ≥ 1. A GP will become a renewal process if the ratio a = 1. Thus, the GP is a simple monotone
process and is a generalization of renewal process.
Let E(X1) = λ and V ar(X1) = σ2, then
E(Xn) =λ
an−1,
V ar(Xn) =σ2
a2(n−1).
Therefore, a, λ and σ2 are three important parameters in the GP.
In this talk, we shall introduce the fundamental probability theory of GP, then study the statistical
inference of the GP. Furthermore, we shall consider the application of GP to reliability, especially to the
maintenance problem. We shall also investigate the application of GP to the analysis of data. From
real data analysis, it has been shown that on average, the GP model is the best one among four models
including a Poisson process model and two nonhomogeneous Poisson process models. See Lam (2005) for
a brief review and more references.
References
[1] Lam, Y. (1988a). A note on the optimal replacement problem. Adv. Appl. Prob., 20, 479-782.
[2] Lam, Y. (1988b). Geometric processes and replacement problem. Acta Math. Appl. Sinica, 4,
366-377.
[3] Lam, Y. (2005) Geometric process. In The Encyclopedia of Statistical Sciences, 2nd edition, N.
Balakrishnan, C. Read, S. Kotz and B. Vidakovic ed., John Wiley & Sons, Inc., New York. To
appear.
Key words and phrases. Geometric process, renewal process, reliability, data analysis.
Modeling for Analysis and Evaluating the Stability of Supply ChainSystems
Jiukun Li and Philip L.Y. Chan
Department of Industrial and Manufacturing Systems Engineering, The University of
Hong Kong, Hong Kong
[email protected], [email protected]
In this paper, we examine this stability of supply chain systems under different demand situations,
such as stochastic or deterministic, dynamic or stability, by developing a theoretical model with combining
the statistics methods and the traffic network theory. A numerical example is used to demonstrate the
applicability of the model.
Key words and phrases. Stability analysis, supply chain, statistics methods, traffic network theory.
38
Contributed Talks
Variable Selection via MM Algorithms
Runze Li
Department of Statistics, Penn State University, USA
In this talk, I give a brief review on variable selection via nonconcave penalized likelihood. An algorithm
is proosed for maximizing the penalized likelihood via MM algorithms, extensions of the well known EM
algorithm. I will demonstrate the convergence of the proposed algorithm using techniques related to
the EM algroithm (Wu, 1983). The proposed algorithm allows us using numerical algorithm (such as
Newton-Raphson algroithm) to deal with combinatorial optimization in varible selection which is a NP
hard problem. Numerical comparisons will also presented.
The Use of Lee-Carter Model to Project Mortality Trends of HongKong
Billy Y G Li
Census and Statistics Department, Hong Kong
The Lee-Carter model was developed in the early 1990s to model and project mortality trend of
a country/territory. The Census and Statistics Department of the Hong Kong Special Administrative
Region Government has been using this model since 2001. The Lee-Carter model decomposes age-sex
specific mortality rates into three components: a general shape of the age-sex mortality profile, the rate of
increase or decrease from the general age-sex profile and an index on the mortality level. Modelling and
projecting the index on the mortality level provide the projected age-sex specific mortality rates.
The presentation will provide a background on mortality projections together with the technical details
of the Lee-Carter model. Estimation procedures and limitations of the Lee-Carter model will also be
discussed.
Key words and phrases. Lee-Carter, mortality, projection.
A New Parametric Test for AR and Bilinear Time Series withGraphical Models
Wai-Cheung Ip, Heung Wong
Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
Yuan Li
School of Mathematics and Information Science, Guangzhou University, China
The classic AR and Bilinear time series models are expressed as directed graphical models. Based on
the directed graphical models, it is shown that a coefficient of AR and Bilinear models is the conditional
correlation coefficient conditioned on the other components of the time series. Then a new procedure
is proposed to test the coefficients of AR and Bilinear time series models. Simulations shows that our
procedure is better than the classic tests both in sizes and powers.
Key words and phrases. AR model, bilinear model, graphical models.
39
Contributed Talks
T3-Plot for Testing Spherical Symmetry for High-Dimensional Datawith a Small Sample Size
Jiajuan Liang
Department of Quantitative Analysis, University of New Haven, USA
High-dimensional data with a small sample size, such as microarray data and image data, are commonly
encountered in some practical problems for which many variables have to be measured but it is too costly
or time-consuming to repeat the measurements for many times. Analysis of this kind of data poses a great
challenge for statisticians. In this paper, we develop a new graphical method for testing spherical symmetry
that is especially suitable for high-dimensional data with a small sample size. The new graphical method
associated with the local acceptance regions can provide a quick visual perception on the assumption of
spherical symmetry. The performance of the new graphical method is demonstrated by a Monte Carlo
study and illustrated by a real data set.
Model Based Approaches for Simultaneous Dimension Reduction andClustering
Xiaodong Lin
University of Cincinnati and National Institute of Statistical Science, USA
Recently, very high dimensional data have been generated from a variety of sources. In many of
these high dimensional problems, it is often the case that the data-generating process is nonhomogeneous.
Traditional data analysis usually deals with such heterogeneity by performing dimension reduction and
clustering separately. In this talk, we present a constrained mixture of factor analyzers model to address
the problem of simultaneous clustering and dimension reduction. We propose a constraint on the factor
loading matrix so that the total variation explained by the noise component is restricted by a threshold.
Two approaches, namely, the mixture likelihood approach and the classification likelihood approach are
used to fit the model. A constrained EM algorithm is developed for parameter estimation. In our model,
the number of factors is allowed to differ across different mixture components. This flexibility generates
model selection challenges. To overcome the difficulty, we propose a two-step model selection procedure
during which parameter estimations and model specifications are altered dynamically. We show that this
procedure converges under different model selection criteria. Finally, we demonstrate the performance of
our methods on an image dataset and a metabolomic dataset.
Key words and phrases. Dimension reduction, clustering, EM algorithm, mixture models, model selection.
40
Contributed Talks
Strong Near-epoch Dependent Random Variables
Zhengyan Lin
Department of Mathematics, Zhejiang University, China
We introduce a new class of dependent sequences of random variables, which is a subclass of near-epoch
dependent (NED) sequences, but can also be approximated by mixing sequences. We call them strong
near epoch dependent sequence. Many important econometric models, such as linear processes, a sort of
popular nonlinear diference equations, ARMA model, GARCH model etc., are strong NED under usual
conditions. Under dependence conditions substantially weaker than for NED sequences, we show a p-order,
p > 2, (maximum) moment inequality for strong NED sequences. Then, using this inequality, we derive a
central limit theorem and a functional central limit theorem and based on these results, we can also obtain
limit distributions of many importan processes with strong NED innovations, such as linear processes with
strong NED innovations. Moreover, we show a result on the variances of partial sums of a strong NED
sequence, but it is usually considered as a prior-assumption in discussing the large sample behavior for an
NED sequence.
Normalized Maximum Likelihood and MDL in Model Selection
Erkki P. Liski
Department of Mathematics, Statistics and Philosophy University of Tampere, Finland
By viewing models as a means of providing statistical descriptions of observed data, the comparison
between competing models is based on the stochastic complexity (SC) of each description. The Normal-
ized Maximum Likelihood (NML) form of the the stochastic complexity (SC) (Rissanen 1996) contains a
component that may be interpreted as the parametric complexity of the model class. The SC for the
data, relative to a class of suggested models, serves as a criterion for selecting the optimal model with the
smallest SC.
We calculate the SC for the Gaussian linear regression by using the NML density and consider it as a
criterion for model selection. The final form of the selection criterion depends on the method for bounding
the parametric complexity. As opposed to traditional fixed penalty criteria, this technique yields adaptive
criteria that have demonstrated success in certain applications.
Reference
[1] Rissanen, J. (1996). Fisher information and stochastic complexity. IEEE Transactions on Infor-
mation Theory 42(1), 40–47.
Key words and phrases. Minimum description length, Stochastic complexity, normalized maximum likeli-
hood, parametric complexity, adaptive selection criteria.
41
Contributed Talks
Connections Among Different Criteria for Asymmetrical FractionalFactorial Designs
Min-Qian Liu
Department of Statistics, Nankai University, China
Kai-Tai Fang
Department of Mathematics, Hong Kong Baptist University, Hong Kong
Fred J. Hickernell
Department of Applied Mathematics, Illinois Institute of Technology, USA
In most recent years, there has been increasing interest in the study of asymmetrical fractional fac-
torial designs. Various new optimality criteria have been proposed from different principles for design
construction and comparison, such as the generalized minimum aberration, minimum moment aberration,
minimum projection uniformity and the χ2(D) (for design D) criterion. In this paper, those criteria are
reviewed, the χ2(D) criterion is generalized to the so-called minimum χ2 criterion. Connections among
these different criteria are investigated, which show that there exist some equivalencies among them. These
connections provide strong statistical justification for each of them from other viewpoints. Some general
optimality results are developed, which not only unify several results (including results for symmetrical
case), but also are useful for constructing asymmetrical supersaturated designs.
Key words and phrases. Discrepancy, fractional factorial design, generalized minimum aberration, mini-
mum moment aberration, orthogonal array, supersaturated design, uniformity.
Two-stage Response Surface Designs
Xuan Lu and Xi Wang
Department of Mathematical Science, Tsinghua University, China
The construction of two-stage response surface designs with high estimation efficiency is motivated
from a real case and studied. In the first stage, two-level points and central point replicates are used for
screening significant variables and testing the possible curvature of the response surface, and then, in the
second stage, additional three-level points are used together with the first stage points to identify a second
order model. Excepting the well-known D criterion, a new criterion, C, is proposed to find points in the
second stage, given the points of the first stage. The log C is a weighted sum of log efficiency measures
for four subsets of parameters. By selecting suitable weights, one can construct two-stage response surface
designs with high estimation efficiency. A construction algorithm is introduced. The superiorities of new
designs are demonstrated by comparing them with existing response surface designs. An answer is given
to the motivating case.
42
Contributed Talks
Marginal Permutations in Linear Models
Tatjana Nahtman
Institute of Mathematical Statistics, University of Tartu, Estonia
Dietrich von Rosen
Department of Biometry and Egineering, Swedish University of Agricultural Sciences,
Sweden
The purpose with the project is to study covariance structures in balanced linear models which are
invariant with respect to marginal permutations (including shift-permutations). We shall focus on model
formulation and interpretation of variance components rather than the prediction of them. Marginal
permutation invariance implies a Kronecker product structure with specific patterns on the covariance
matrices. In particular, under shift-invariant permutations Kronecker products of Toplitz matrices appear.
Useful results are obtained for the spectrum of these covariance matrices.
Via the spectrum of the covariance matrices the reparameterization of factor levels, i.e. imposing
certain constraints on parameters, are studied. In particular we focus on the most commonly used con-
straints, the so called ”sum-to-zero” constraints. The constraints imposed on the spectrum lead to singular
covariance matrices and one of the main results of the project is that only some constraints provide useful
reparametrizations.
Key words and phrases. Covariance structures, eigenspace, invariance, marginal permutations, reparame-
terization, spectrum, toeplitz matrix.
Bayesian Semiparametric Structural Equation Models with LatentVariables
D. Dunson and J. Palomo
National Institute of Environmental Health Sciences, Duke University, USA
[email protected], [email protected]
Structural equation models (SEMs) provide a general framework for modeling of multivariate data,
particularly in settings in which measured variables are designed to measure one or more latent variables.
In implementing SEM analysis, it is typically assumed that the model structure is known and that the latent
variables have normal distributions. To relax these assumptions, this article proposes a semiparametric
Bayesian approach. Categorical latent variables with an unknown number of classes are accommodated
using Dirichlet process (DP) priors, while DP mixtures of normals allow continuous latent variables to have
unknown distributions. Robustness to the assumed SEM structure is accommodated by choosing mixture
priors, which allow uncertainty in the occurrence of certain links within the path diagram. A Gibbs
sampling algorithm is developed for posterior computation. The methods are illustrated using biomedical
and social science examples.
Key words and phrases. Covariance structural model, Dirichlet process, graphical model, latent class,
latent trait, MCMC algorithm, measurement error, mixture model, variable selection.
43
Contributed Talks
Stochastic Optimization Method to Estimate the Parameters of theTwo-parameter Pareto Distribution - A Short Communication
Wan-Kai Pang
Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong
Pareto distribution plays an important role in modelling income distribution in economic models.
Parameter estimation of the two-parameter Pareto distribution has been studies by others in the past. A
number of optimization methods have been proposed. In this paper, we use the Markov Chain Monte Carlo
(MCMC) techniques to estimate the Pareto parameter. The method is quite successful and the method
performed well in estimating the threshold parameter of the Pareto distribution.
Key words and phrases. Pareto probability distribution, Markov Chain Monte Carlo, maximum likelihood
estimation, Bayesian estimation
Generalized Gamma Frailty Model
Yingwei Peng
Department of Mathematics and Statistics, Memorial University of Newfoundland, Canada
N. Balakrishnan
McMaster University, Canada
In this article, we present a frailty model using the generalized gamma distribution as the frailty
distribution. It is a generalization of the popular gamma frailty model. It also includes other frailty
models such as the lognormal and Weibull frailty models as special cases. The flexibility of this frailty
distribution makes it possible to detect a complex frailty distribution structure which may otherwise be
missed. Due to the intractable integrals in the likelihood function and its derivatives, we proposed to
approximate the integrals either by the Monte Carlo simulation or by a quadrature method and then
find the maximum likelihood estimates of the parameters in the model. We explore the properties of the
proposed frailty model and the computation method through a simulation study. The study shows that the
proposed model can potentially reduce errors in the estimation, and that it provides a viable alternative
for correlated data. We also illustrate the model with a real-life data set.
Key words and phrases. Censoring, clustered data, Generalized gamma distribution, lognormal distribu-
tion, Monte Carlo approximation, piecewise constant hazards.
44
Contributed Talks
Probabilistic Model of Learning
Jonny B. Pornel
Iloilo National High School, USA
Leonardo Sotaridona
CTB McGraw-Hill, USA
Leonardo [email protected]
This paper proposed an algorithm to simulate learning. It postulates that Learning is divided into
three importance phases of knowledge acquisition, recall, and connection. By using probability to quantify
the three processes of learning, the Model draws incisive explanations of many phenomena in learning and
cognitive science. In the knowledge acquisition algorithm, each incoming information(impulse) augments
the probability of the stored up knowledge on which such information was attached to. Each knowledge
is then sorted by decreasing probability. These probabilities then will be the factor that affects the recall
and connection functions. The Model unifies the three currently dominant branches of Learning Theories:
Behavioral, Cognitive and Constructivism. It is more flexible than the Self Organizing Map (Kohonen,
2001) and other Neural Network structure. The algorithm can have important application in the field of
artificial intelligence and information technology.
Key words and phrases. Probability, theory of learning, self organizing map.
Connection Between Uniformity and Orthogonality
Hong Qin
Faculty of Mathematics and Statistics, Central China Normal University, China
Considerable effort has been done for studying the usefulness of uniformity in fractional factorial designs.
Uniformity is a geometric concept, which is related to computer experiment design and quasi-Monte Carlo
methods. Orthogonality is an important criterion to compare factorial designs. It looks unrelated to
uniformity of factorial designs. In this talk, we will give a justifiable interpretation for orthogonality by
the consideration of uniformity. Two orthogonality measure criteria, proposed by Fang, Ma and Mukerjee
(2002) and Fang, Lu and Winker (2003) respectively, are employed, which can be viewed as extensions of the
concept of strength in orthogonal arrays and have been utilized to evaluate the orthogonality of factorials.
We will give an analytic link between uniformity measured by the discrete discrepancy (Hickernell and
Liu, 2002; Fang, Lin and Liu, 2003; Qin and Fang, 2004) and two orthogonality criteria mentioned above,
and show that comparing the orthogonality of two symmetrical factorials is equivalent to comparing their
uniformity.
Key words and phrases. Discrete discrepancy, factorial design, orthogonality, uniformity, uniform design.
45
Contributed Talks
Probabilistic Analysis of a Two Unit Cold Stand by System withRandom Checks and Various Repair Policies
Syed Mohd Rizwan
Department of Mathematics and Science, Sultanate of Oman, Oman
A two unit cold standby system with two repairmen is analyzed. The failed unit goes under the
repairman who is rarely available. After the repair, his job is to check the system randomly about the
working of the system, if not found satisfactory an efficient repairman, who is 100% efficient, is called.
Efficient repairman once completes the repair; the unit becomes as good as new. Various reliability
measures have been calculated, using semi-markov processes and regenerative point technique.
Key words and phrases. Cold standby system, semi-markov process, regenerative point technique.
Single Server Queue with Batch Arrival and α - Poisson Distribution
V. R. Saji Kumar
Kerala University Library, University of Kerala, India
saji [email protected]
We consider a generalization of the Mx|G| 1 queue with α - Poisson arrivals (or Mittag-leffler inter-
arrival time distribution). When α = 1, it is the usual classical Mx|G| 1 model with batch arrivals. We
analyze single server retrial queue and queue with server and vacations.
Key words and phrases. Batch arrival, infinite mean waiting time, Mittag Leffler function, retrial queue,
vacations, waiting server.
Robust R-estimation of a Consensus Value in Multi-Center Studies
Inkyung Jung
Harvard Medical Center, USA
Pranab K. Sen
University of North Carlina, Chapel Hill, USA
There are various situations where a multi-center experiment is conducted under possibly hetergeneous
set-ups (for determining a consensus value). This results in unbalanced heteroscedastic one-way random
effects models. Standard parametric approaches based on the stringent assumption of normality of both
the error and the random effect components may not perform well when either of these conditions is
vitiated. Moreover, for heteroscdastic erros (across the centers or blocks), parametric estimators may lack
robustness properties to a greater extent. Two robust R-estimators for the common location parameter
(the consensus value) based on Wilcoxon signed-rank statistics are proposed and their properties studied.
When the extent of heteroscedasticity is large, or the distributions of the random effect and/or errors
deviate from normality, the proposed estimators perform better than the classical weighted least squares
and some other parametric estimators. Along with the supporting methodology, this robust-efficiency
perspective is illustrated with an ’Arsenic in oyster tissue’ problem. Some other simulastion studies are
also made in this connection.
46
Contributed Talks
Robust Estimation for PAR Time Series
Qin Shao
Department of Mathematics, University of Toledo, USA
Considerable attention has been given to the periodically stationary time series due to their wide ap-
plications in climatology, hydrology, electrical engineering, economics, etc. Periodic autoregressive models
with order p (PAR(p)) are commonly applied when modelling periodically stationary time series. The
discussion on the parameter estimation was focused on the moment estimation, the least squares estima-
tion, and the maximum likelihood estimation. As it is known, the weakness of these methods is that their
corresponding estimates are sensitive to outliers and small changes in distributions. Robust estimation
procedures that resist the adverse effects of abnormal random disturbances and should be applied in such
scenarios have not been considered for PAR time series. The primary interest of this talk is to propose
a robust estimation method for PAR(p) model paramenters. The proposed method not only robustifies
the residuals and their weights in estimating equations with odd bounded and differentiable functions, but
utilizess the systematic structure of time series as well.
Key words and phrases. Periodically stationary time series, periodic autoregressive models, robust estima-
tors, estimation equations, asymptotic relative efficiency
Empirical Bayes Estimation for the Reliability Indexes of a ColdStandby Series System Based on Type-II Censored Samples
Yimin Shi
Department of Applied Mathematics, Northwestern Polytechnical University, China
Based on Type-II Censored Samples,we investigate the empirical Bayes estimation and approximate
confidence limits of the reliability indexes (such as failure rate, reliability function and average life) for a
cold standby series system.
Suppose that a standby system consists of k series working units and n′s independent cold standby
units ( the unit is said to be in the case of cold standby if it does not fail in the period of standby ). The
switching is perfect, i.e., it never fails and it is instantaneous. When a working unit fails, a cold standby
unit replaces it immediately and the system can work as before. If the standby units are used up and one
of k series units becomes unusable, then the system is said to be invalidation. Let the lifetime of all units
in the system be independent and have identical exponential distribution exp(λ), where λ is the failure
rate of unit.
In this paper, Based on Type-II Censored Samples, the empirical Bayes estimation(EBE) and approx-
imate upper confidence limit of failure rate are obtained firstly, and next Bayesian approximate lower
confidence limits for reliability function and average life are presented. The expressions for calculating
Bayesian lower confidence limits of the reliability function and average life are also obtained. Furthermore,
the maximum likelihood estimation(MLE)of the failure rate is also obtained, and an illustrative example is
examined numerically by means of the Monte-Carlo simulation. Finally, the accuracy of confidence limits
is discussed. The numerical results show that the accuracy of EBE is better than that MLE’s and our
confidence limits are also effecient.
Mathematics Subject Classification. 62F25, 62N05.
47
Contributed Talks
A Lagrange Multiplier Test for GARCH in Partially Nonstationary andStationary Multivariate Autoregressive Models: with Applications to
Economic Data
Chor-Yiu Sin
Department of Economics, Hong Kong Baptist University, Hong Kong
Economic time-series data are often modelled with partially nonstationary multivariate autoregression
and GARCH (Generalized Auto-Regressive Conditional Heteroskedasticity). Noticeable examples include
those studies of price discovery, in which stock prices of the same underlying asset are cointegrated and they
exhibit multivariate GARCH. It was not until recently that Li, Ling and Wong (2001) formally derive the
asymptotic distribution of the estimators for the parameters in this model. The efficiency gain in some of the
parameters are huge even when the deflated error is symmetrically distributed (the symmetry assumption).
Taking into consideration the different rates of convergence, we derive the asymptotic distribution of the
usual LM (Lagrange multiplier) test in partially nonstationary multivariate autoregression. Under the
symmetry assumption, the distribution is the same as that in stationary multivariate autoregression. The
null can be no multivariate GARCH, or it can be a specific multivariate GARCH. We then apply our test
to the monthly or quarterly Nelson-Plosser data, embedded with some prototype multivariate models. We
also apply the tests to the intra-daily stock price indices and their derivatives. Comparisions are made
with the portmanteau test suggested in Ling and Li (1997) and the residual-based test suggested in Tse
(2002), both of which do not specify a definite alternative hypothesis.
Key words and phrases. Cointegration, efficiency gain, LM test, multivariate GARCH, Portmanteau test,
residual-based test.
On Some Aspects of Data Integration Techniques with EnvironmentalApplications
Bimal Sinha
Department of Mathematics University of Maryland, Baltimore County, USA
In a canonical form, the problem is to meaningfully combine the columns of an N x K matrix whose
elements represent some ‘pollution’ emissions, columns correspond to different kinds of pollutions, and
rows present various sources of pollution, in order to define an overall or combined index of pollution.
Multiple Criteria Decision Making (MCDM) method will be described and used to accomplish this goal.
Various modifications of MCDM method will also be discussed.
48
Contributed Talks
An M/M/1 Retrial Queue with Two Orbits
R Sivasamy
Department of Statistics, Annamalai University, India
This paper deals with a retrial queue where customers joining an M/M/1 type are admitted into two
orbits. Any arriving customer finding the server busy enters into a group called Orbit-1 or occupies the
server for his first service. The server applies a set Ω of specifications and declares each customer as the
’Satisfied Type (ST)’ if the customer satisfies all the specifications of Ω; otherwise he declares him as the
’dissatisfied type (DST)’ at the time of completing first service. For the present study such events of ST
and DST are assumed to occur with probabilities p and q respectively, where p + q = 1. Each ST leaves
the service area while each DST joins the other group called ’Orbit-2’ of unsatisfied customers. Each
customer can reapply for service from both the orbits after the passage of a random amount of time and
can get in for service only when the server is free. The server does not apply the verification process to any
customer coming from Orbit-2 to get service i.e. he is served when he receives his service, the second time,
to his satisfaction without any verification. Steady state conditions, joint distribution of X1 = number of
busy servers and X2 = number of customers present in the system at a random epoch are derived. Some
measures of X2 are then obtained.
Key words and phrases. Satisfied type of customer, unsatisfied type of customer, steady state condition,
and joint distribution.
Simultanenous Comparison of Several Population Dispertions with anApplication to Livestock Bases
Yeong-Tzay Su
Department of Mathematics, National Kaohsiung Normal University, Taiwan
We present a robust testing procedure to compare the variability levels from arbitrary number of
populations. No specific distribution or moment condition for the populations is required. We also prove
some asymtotic properties of the test and demontrate the testing procedure by using the livestock bases
data from three regional markets in the U.S..
Key words and phrases. Rank sums, homogeneity of variance, power of the test, small sample distribution,
asymtotic behaviors.
49
Contributed Talks
Applicability of the Vitis GeneChip for Transcriptome Analysis inHeterologous Grapevine Species
Wenping Qiu, Laszlo G. Kovacs
Department of Fruit Science, Southwest Missouri State University, USA
Yingcai Su
Department of Mathematics, Southwest Missouri State University, USA
The Vitis GeneChip (Affymetrix) contains probe sets for 16,437 grape genes, most of which (> 14, 000)
are derived from Vitis vinifera, the cultivated grape. Transcriptome analysis, however, is also important in
other Vitis species, because the agronomically most valuable genetic resources are represented by the wild,
non-cultivated members of the genues. The purpose of this work is to determine if the Vitis GeneChip can
be utilized for transcriptome analysis in another graphevine species, namely Vitis aestivalis.
The Additive Hazards Model for Recurrent Gap Times
Liuquan Sun, Do-Hwan Park and Jianguo Sun
Chinese Academy of Sciences, China and University of Missouri, USA
Recurrent event data and gap times between recurrent events are often the targets in the analysis of
longitudinal follow-up or epidemiological studies. To analyze the gap times, among others, Huang and
Chen (2003) proposed to fit them to the proportional hazards model. It is well-known, however, that the
proportional hazards model may not fit the data well sometimes. To address this, this paper investigates
the fit of the additive hazards model to gap time data and an estimating equation approach is presented
for inference about regression parameters. Both asymptotic and finite sample properties of the proposed
parameter estimates are established. One major advantage of the use of the additive hazards model over
the proportional hazards model is that the resulting parameter estimator has a closed form and thus can
be easily obtained. The method is applied to an cancer study.
Key words and phrases. Additive hazards model, estimating equations, gap time, recurrent event data,
regression analysis.
Stirling Numbers and the Variance of Chi-square Statistic
Ping Sun
Department of Mathematics, Northeastern University, China
Stirling number of the second kind S(n, k) in combinatorics is the number of n distinct elements
x1, x2, · · · , xn distributed into k indistinguishable sets such that no set is empty, it should be noticed
that every state of discrete population must have observed samples in statistics, then Stirling numbers can
help to derive the exact distribution of some statistics. This paper gives the exact formula of the variance
of Chi-square statistic applying the theory of Stirling numbers of the second kind in Combinatorics.
Key words and phrases. Stirling numbers of the second kind, Chi-square statistics, variance.
50
Contributed Talks
Robust Designs for GLMM Models when Covariates are Involved
Frans E. S. Tan
Department of Methodology and Statistics, University of Maastricht, Netherlands
When planning an experimental design, it is often advocated to randomly allocate an equal number
of subjects to each of the groups of the treatment factor. In general, the primary virtue of this random
allocation is to balance the treatment groups with respect to several covariates.
From an optimal design perspective, it is known that the equal assignment strategy (balanced design)
is not always Dk-optimal. A low efficiency for the balanced design may be encountered if the absolute
values of the model parameters are large, or if the distribution of the covariates is much skewed.
In the presentation, we will review our main findings on Dk-optimal allocation problems for generalized
linear mixed models and discuss the optimal allocation problem when several important covariates are
present. Some general guidelines will be presented that characterizes the Dk-optimal distribution of the
treatment factor or of a collection of independent variables.
Under certain conditions the minimum relative efficiency of the balanced design for all possible distri-
butions of covariates is maximized.
Sufficient conditions, under which the balanced design is maximin optimal, will be given for models
with an arbitrary number of continuous and discrete independent variables and an arbitrary number of
random effects. Still, the efficiency of balanced designs may be low. Additional information about (the
sign of) the parameter values obtained from literature or from a pilot of study and applying the maximin
procedure may lead to designs with a higher efficiency than the balanced design.
Key words and phrases. GLMM, Dk-optimality, maximin, balanced design, robust designs, relative effi-
ciency.
Empirical Likelihood Method for a Common Mean Problem
Min Tsao
Department of Mathematics and Statistics, University of Victoria, Canada
Changbao Wu
University of Waterloo, Canada
We discuss empirical likelihood (EL) based methods of inference for a common mean using data from
several independent but nonhomogeneous samples. For point estimation, we propose a maximum empirical
likelihood (MEL) estimator and show that it is root-n consistent and is asymptotically optimal. For
constructing confidence intervals, we discuss the weighted EL method and the naive application of the
EL method. Finite sample performances of the MEL estimator and the EL based confidence intervals are
evaluated through a simulation study. Numerical results indicate that overall the MEL estimator and the
weighted EL confidence interval are superior alternatives to the existing methods.
51
Contributed Talks
Regression Coefficient and Autoregressive Order Shrinkage andSelection via the RA-lasso
Hansheng Wang
Guanghua School of Management, Peking University, China
Chin-Ling Tsai
Graduate School of Management, University of California, Davis, USA
The least absolute shrinkage and selection operator (lasso) has been widely used in regression shrinkage
and selection. However, the lasso is not designed to take into account the autoregressive process in a nested
fashion. In this article, we propose the regression and autocorrelated lasso (RA-lasso) to jointly shrink
the regression and the nested autocorrelated coefficients in the REGression model with AutoRegressive
errors (REGAR). We show that the RA-lasso estimator performs as well as the oracle estimator (i.e., it
works as well as if the correct submodel were known). Our extensive simulation studies demonstrate that
the RA-lasso outperforms the lasso. An empirical example is also presented to illustrate the usefulness
of RA-lasso. Finally, the extension of RA-lasso to the autoregression with exogenous variables (ARX) is
discussed.
Key words and phrases. ARX, Lasso, RA-lasso, REGAR.
Generalized Sphericity Test
Jing-Long Wang
Department of Statistics, East China Normal University, China
The generalized sphericity test and an asymptotic expansion of generalized sphericity test are studied.
Also sphericity test in nested repeated measures model and in one-way multivariate repeated measurements
analysis of variance model are given as an applications of generalized sphericity test.
52
Contributed Talks
Evaluating the Power of Minitab’s Data Subsetting Lack of Fit Test inMultiple Linear Regression
Daniel Xiaohong Wang
Central Michigan University, USA
Michael D. Conerly
University of Alabama, USA
Minitab’s data sub-setting lack of fit test (denoted XLOF) is a combination of Burn & Ryan’s test
and Utts’ test for testing lack of fit in linear regression models. As an alternative to the classical or pure
error lack of fit test, it does not require replicates of predictor variables. However, due to the uncertainty
about its performance, XLOF still remains unfamiliar to regression users while the well-known classical
lack of fit test is not applicable to regression data without replicates. So far this procedure has not been
mentioned in any textbooks and has not been included in any other software packages. This study assesses
the performance of XLOF in detecting lack of fit in linear regressions without replicates by comparing the
power with the classic test. The power of XLOF is simulated using Minitab macros for variables with
several forms of curvature. These comparisons lead to pragmatic suggestions on the use of XLOF. The
performance of XLOF was shown to be superior to the classical test based on the results. It should be
noted that the replicates required for the classical test made itself unavailable for most of the regression
data while XLOF can still be as powerful as the classic test even without replicates.
Key words and phrases. Minitab XLOF, lack of fit test, linear regression, diagnosis, power, simulation.
53
Contributed Talks
Local Equilibrium Ruin Probability and Ruin Probability for theRenewal Risk Model without the Lundberg Exponent2
Yuebao Wang and Dongya Cheng
Department of Mathematics, Soochow University, China
It is well-known that ruin probability for the classical renewal risk model has been extensively studied
during 1970’s and 1980’s. See Teugels (1975), Veraverbeke (1977), Embrechts, Goldie and Veraverbekee
(1979), Embrechts and Veraverbeke (1982), Embrechts and Goldie (1982) et al.. But when the Lundgerg
exponent does not exist, there remains some interesting problems to be solved. This paper will further
discuss these problems. Theorem 1 and 2 of Veraverbeke (1977) gave an asymptotic estimate of ruin
probability for the renewal risk model. But it was pointed out in Embrechts and Goldie (1982) that the
proof for the case γ > 0 and when the Lundberg exponent does not exist is incomplete. This paper
attempts to improve the proof of Theorem 1 and 2 of Veraverbeke (1977) using different method such that
all its conclusions remain valid.
Theorem 1 Let B < ∞ and −γ be the left abscissa of convergence of fK(iλ) =R
∞
−∞e−λxdK(x). If
γ > 0 and fK(−iγ) < 1, then
ψ(x) = o(e−γx),
but
K ∈ S(γ) ⇔ K+ ∈ S(γ) ⇔ W ∈ S(γ).
and each of the above implies
W (x) ∼ C1K(x)
where C1 = (eB(1 − g+(−γ))(1 − fK(−iγ)))−1 = gW (−γ)(1 − fK(−iγ))−1.
To prove Theorem 1, we will first discuss local properties of exponential equilibrium distribution, which
include asymptotic estimates for local exponential equilibrium ruin probability.
Key words and phrases. Lundberg exponent, exponential equilibrium distribution, ruin probability, asymp-
totics.
2This work was supported by National Science Foundation of China (NO. 10271087).
54
Contributed Talks
Selection of Integrated Statistical Evaluation Indices
Jianwu Wen
Research Institute of Statistical Sciences, National Bureau of Statistics of China, China
I. Background
Features of integrated statistical evaluation lie in the numerical circumscriptions of evaluating conclusions.
Integrated statistical evaluation can also help to scrutinize types of issues comprehensively and systemat-
ically. Methods applied by integrated statistical evaluation are more and more diversified; applied areas
are more and more widely, which is the same thing happening to its future developments.
II. Issues to be Discussed and the Solutions
Following ten points should be concerned when selecting integrated statistical evaluating indices:
1, Objectives of Evaluation: Special attention should be paid on how well acquainting the objectives of
evaluating. Choices of the objectives depend on the selection of definitions, which is the basic problem to
be resolved firstly.
2, Examining Evaluation and Achieving Evaluation: Make sure to differentiate the examining evaluation
and the evaluating of achievements properly, which is popularly omitted by many academic research works.
A great deal of achievements is lack of persuasion, the reason of which is the ignorance of the difference.
3, Scientific and Operational: Dealing carefully with relationship of scientific indices system and operational
indices system.
4, Independence and Correlation: Dealing properly with relationship of independence and correlation.
5, Stability and Sensitivity: If it would have to apply extreme stagnant indices, make sure to take measures
to make them sensitive; and much more stability should be enforced on the extreme stagnant indices.
6, Unification and Diversity: When comparing two different regions, dealing properly with relationship of
unification and diversity.
7, Complication and Simplicity: Attempt to integrated evaluate not only comprehensively, but also sim-
ply. It should be right to attach importance to how well major indices are indicating circumstances; yet
secondary indices can still produce reference value.
8, Subjectivity and Objectivity: When having to apply subjective indices, dealing properly with relation-
ship of subjectivity and objectivity.
9, Value Index and Volume Index: If the same set of indices system contains both value indices and volume
indices, make sure to deal with the relationship of these two types of indices.
10, Limitation and Infinity: Methods of integrated statistical evaluation are applied through statistical
perspective. Those interpretations of the evaluating results that are beyond statistical field are apparently
implausible.
III. Conclusion
When proceeding integrated statistical evaluation, the idea of ”to what extent” is extremely significant.
The critical factor of Changing from volume into quality is to master the ”extent” accurately.
55
Contributed Talks
The Ethical Experiment Model in Clinical Trial and it’s StatisticalInference
Minyu Xie, Bo Li and Jianhui Ning
Faculty Statistics, Center China Normal University, China
In order to acquiring whether a new treatment or medicine to a disease is effective or not, or comparing
it with the old one, modern clinical trials always gain valid statistical information through designing some
experiment and carrying it out. Because all the attention are centralized at collecting information, design
might less care about whether the current patient get the best treatment. Designer always separate the
pool of research subjects into distinct groups at the outset and then gather information about the responses
of these groups to their assigned treatments. These kinds of trials obviously contravene the obligation: to
apply knowledge for the best possible treatment of each individual patient, stated in the Declaration of
Helsinki [1], and criticized by more and more people especially when it used to treating those desperate
disease. Hence finding a trial design which not only can gather the valid information but also can care
about the current patient’s well being, is interested by many researcher in this field. It also get a lot
of attention as an essentially bandit problem [2]. In this paper, by absorbing the sequential statistical
principle, we have established an experiment model, which makes each patient in the trial has much more
possibility to be treated by the best treatment, at lest in the current acknowledge background. It’s a pit
we have no valid statistical means to coin to it at present. While it is exactly the things we do want to
discuss in this paper. We have been obtained the Maximum Likelihood Estimators of the parameters in
this medical statistical model, and we also find a effectively statistical way to compare the efficiency of
different treatments involved in the model.
References
[1] World Medical Association. Declaration of Helsinki(2000). Reprinted in JAMA2000, 284:1043-
1045.
[2] Daryl Pullman, Xikui Wang(2001). Adaptive Design, Informed Consent, and the Ethics of Re-
search. Controlled Clinical Trials, 22:203-210.
Key words and phrases. Clinical trial, maximum Likelihood estimator.
Mathematics Subject Classification. 62K05, 62G10.
56
Contributed Talks
Robust Designs and Weights for Biased Regression Models withPossible Heteroscedasticity in Accelerated Life Testing
Xiaojian Xu and Douglas P. Wiens
Department of Mathematical and Statistical Sciences, University of Alberta, Canada
[email protected], [email protected]
We consider the construction of designs for accelerated life testing, allowing both for possible het-
eroscedasticity and for imprecision in the specification of the response function. We find minimax designs
and corresponding optimal estimation weights in the context of the following problems: (1) For ordinary
least squares estimation with homoscedasticity, determine a design to minimize the maximum value of
the mean squared prediction error (MSPE), with the maximum being evaluated over the departure of the
response function; (2) For ordinary least squares estimation with heteroscedasticity, determine a design to
minimize the maximum value of MSPE, with the maximum being evaluated over both types of departure;
(3) for weighted least squares estimation, determine both weights and a design to minimize the maximum
MSPE; (4) Choose weights and design points to minimize the maximum MSPE, subject to a side con-
dition of unbiasedness. All solutions to (1)–(4) are given in complete generality. Applications to several
life-stress relationship models in accelerated life testing are discussed. Numerical comparisons indicate
that our designs and weights perform well in compromising robustness and efficiency.
Key words and phrases. Regression design, accelerated life testing, least square estimates, generalized
linear response model, extrapolation, heteroscedasticity, mean squared prediction error.
Application of Design of Experiments in Computer Simulation Study
Shu Yamada and Hiroe Tsubaki
Graduate School of Business Sciences, University of Tsukuba, Japan
[email protected], [email protected]
This paper presents an approach of design of experiments in computer simulation with some case studies
in automobile industry.
In recent days, computer simulation has been applied in many fields, such as Computer Aided En-
gineering in manufacturing industry and so forth. In order to apply computer simulation effectively, we
need to consider the following two points: (1) Exploring a model for computer simulation, (2) Effective
application of simulation based on the explored model. As regard (1), once a tentative model is derived
based on knowledge in the field, it is necessary to examine validity of the model. At this examination,
design of experiments plays an important role. After exploring a computer model, the next stage is (2),
such as optimization of the response by utilizing computer simulation. This paper presents an approach of
design of experiments in computer simulation in terms of (1) and (2) with some case studies in automobile
industry. For example, in order to optimize a response by many factors, the first step may be screening
active factors from many candidate factors. Design of experiments, such as supersaturated design, etc
help at this screening problem. After fining some active factors, the next step may be approximation of
the response by an appropriate function. Composite design, Uniform design is helpful to fit second order
model as an approximation.
Key words and phrases. Model fitting, supersaturated design, uniform design, screening factors, validation
& verification.
57
Contributed Talks
Probability Distributions in Infectious Disease Transmission Modelsand Risks of Major Outbreaks
Ping Yan
Surveillance and Risk Assessment Division, Centre for Infectious Disease Prevention and
Control Public Health Agency of Canada, Canada
Ping [email protected]
When an infectious agent enters a susceptible population of size n, with probability π, the outbreak
of a disease terminates with few cases, of which, the average number remains constant as n → ∞ and
the outbreak size as a proportion, f, concentrates at zero. This quantifies a “minor outbreak”. With
probability 1 − π, the initial growth of infected individuals over time t may be approximated by an
exponential function Cert with rate r and the outbreak size as a number, nf , scales linearly with n, while
f > 0 is a proportionality constant. This quantifies a “major outbreak”. N is the random number of
infections produced by an infective individual throughout its infectious period. PrN = n depends on the
point process for contacts, the transmission probability and the infectious period distribution. It determines
the risk of major outbreaks 1−π. The intrinsic growth rate r during the early phase of an outbreak is also
determined by the same set of assumptions that affect PrN = n. In many classical infectious disease
modelling literature, under the assumptions that the contacts are homogeneous(Poisson); the probability
per contact is homogeneous (Bernoulli); and the infectious period is exponentially distributed, r = E[N ]−1µ
where µ is the mean infectious period and π = E[N ]−1. These assumptions, far from being realistic,
give geometric distribution for N, which is only a special case of a family of models for PrN = n. A
general model will be presented using probability generating functions, various forms of age-dependent
and continuous time branching processes and random effect models. This presentation will show how
transmission heterogeneity, such as the ”super-spreading events” that observed in the outbreaks of the
Severe Acute Respiratory Syndrome (SARS) in 2003, affects π and r. A further extension of this topic
is to examine the relationships between this family of probability model for PrN = n and a family of
distributions recently developed in random graph theory with focus on the contact network structure in
the transmission of infectious diseases.
Cure Rate Models: A Unified Approach
Guosheng Yin
Department of Biostatistics, University of Texas, USA
This is a joint work with Joseph G. Ibrahim.
The authors propose a novel class of cure rate models for right-censored failure time data. The class is
formulated through a transformation on the unknown population survival function. It includes the mixture
cure model and the promotion time cure model as two special cases. The authors propose a general form
of the covariate structure which automatically satisfies an inherent parameter constraint and includes the
corresponding binomial and exponential covariate structures in the two main formulations of cure models.
The proposed class provides a natural link between the mixture and promotion time cure models, and it
offers a wide variety of new modelling structures as well. Within the Bayesian paradigm, a Markov chain
Monte Carlo computational scheme is implemented for sampling from the full conditional distributions of
the parameters. Model selection is based on the conditional predictive ordinate criterion. The class of
models is illustrated with a real dataset involving a melanoma clinical trial.
Key words and phrases. Bayesian inference, Box-Cox transformation, cure fraction, Gibbs sampling, mix-
ture cure model, promotion time cure model.
58
Contributed Talks
Estimating Secondary Parameters After Termination of a MultivariateGroup Sequential Test
C. Wu, A. Liu and Kai Fun Yu
National Institute of Child Health and Human Development, USA
We consider estimation of secondary parameters following a group sequential test, with stopping re-
gions determined by testing hypotheses concerning a set of primary parameters. We derive statistics that
are jointly sufficient for the primary and secondary parameters and show that the maximum likelihood
estimators remain unchanged but no longer possess unbiasedness and minimum variance. We construct
bias-reduced and unbiased estimators for the vector of secondary parameters and show them to substan-
tially reduce the bias and improve the precision of estimation.
Key words and phrases. Bias and bias-reduction, correlated endpoints, medical trials, minimum variance;
primary andsecondary endpoints, restricted completeness, truncation adaptation.
Normal Theory Based Missing Data Procedure with Violation ofDistribution Assumption
Ke-Hai Yuan
Department of Psychology, University of Notre Dame, USA
Missing data exist in almost all areas of empirical research. Many statistical developments have been
made towards the analysis of missing data. When missing data are either missing completely at random
(MCAR) or missing at random (MAR), the maximum likelihood (ML) estimation procedure preserves
many of its properties. However, in any statistical modeling, the distribution specification is at best only
an approximation to the real world, especially for higher-dimensional data. We study the properties of
the ML procedure based on the normal distribution assumption. Specifically, we study the consistency
and asymptotic normality of the MLE when data are not normally distributed and missing data are MAR
or MCAR. When data are not missing at random, factors affect the asymptotic biases in the MLE will
be discussed. Consistent estimates of standard errors using the sandwich-type covariance matrix will be
obtained. Our results indicate that formula or conclusions in the existing literature are not all correct.
59
Contributed Talks
A Weighted Bivariate Density Estimation Method
W. K. Yuen and M. L. Huang3
Department of Mathematics, Brock University, Canada
[email protected], [email protected]
A method of nonparametric bivariate density estimation based on a bivariate sample level crossing
function is introduced in this paper, which leads to the construction of a weighted bivariate kernel density
estimator (WBKDE). A mean square integrated error (MSIE) function and an efficiency functions for this
WBKDE relative to the classical bivariate kernel density estimator is derived. The WBKDE gives more
efficient estimates and better convergence rate than the classical method, in the tails of any underlying
continuous distribution, for both small and large sample sizes. We run simulation on various distributions,
the results of simulations confirm the theoretical results.
References
[1] Huang, M. L. and Brill, P. H. (2004). ”A distribution estimation method based on level crossing”,
Journal of Statistical Planning and Inference, 124, 45–62.
[2] Silverman, B. W. (1996). Density Estimation for Statistics and Data Analysis, Chapman and
Hall, New York.
Key words and phrases. Mean square integrated error, efficiency, multivariate level crossing sample func-
tion, nonparametric kernel density estimator, order statistics.
Mathematics Subject Classification. 62G07, 62E20
Schur-power Discrimination Theory for Orthogonal Designs withGeneralized Minimum Aberration
Aijun Zhang
Department of Statistics, University of Michigan, USA
The majorization framework for fractional factorial designs is a two-stage investigation scheme based
upon pairwise coincidences among experimental runs and their Schur-convex functions. The weak equidis-
tant designs are argued by Zhang et. al (2005; Ann. Statist. to appear) to be universally optimum in the
sense of majorization. In this talk we concentrate on orthogonal designs, introduce the mean and variance
of pairwise coincidences to analyze the existence of weak equidistant benchmark, and develop a Schur-power
discrimination theory. The equivalence is established between Schur-power criteria and the state-of-the-art
criterion of generalized minimum aberration. For both criteria, we derive their lower and upper bounds
via a constrained optimization approach, which extends Zhang et. al (2005) from resolution-II designs to
III and also lay down a foundation for further study of orthogonal designs with higher resolution.
3Both authors are supported by NSERC Canada grants.
60
Contributed Talks
Some Notes on the Business Survey
Yongguang Zhang
The Institute of Systems Science, Academia Sinica, China
In recent years Chinese government has issued the Business survey index at each quarter, which is a
important index for Macro economys run and it is already used by many countries in the world. The
construction of this index is not very difficult, but there are still some problem for the perfect use and
understanding it. For example, the design of sampling questionnaires, how to weight the answer question-
naires and how to explain it by some model, such as Logistic model, Ordered probit model. Besides, we
try to fit it more precisely by using of Arctangent model, the result is satisfactory..
Bayesian Inference for Two-parameter Exponential Distribution underType-II Doubly Censored
Xuanmin Zhao and Yanlin Li
Department of Applied Mathematics, Northwester Polytechnical University, China
In this paper, Bayesian estimation of parameters and index of reliability from two-parameter exponen-
tial distribution under Type-II doubly censored were given, and prediction bounds for future observations
are obtained using Bayesian approach. Prediction intervals are derived for unobserved lifetimes in one-
sample prediction and two-sample prediction based on type II doubly censored samples.
Key words and phrases. Type-II doubly censored, two-parameter exponential distribution, Bayesian esti-
mation, Bayesian prediction.
Empirical Likelihood for a Class of Function of Survival Distributionwith Censored Data
Ming Zheng, Sihua Li and Yi Yang
Department of Statistics, Fudan University, China
The empirical likelihood was first introduced by Owen in 1990 in the complete data case. Wang applied
this method to a class of functionals of survival function in the presence of censoring. In this paper, a
generally adjusted empirical likelihood is defined. It is showed that the adjusted empirical likelihood
asymptotically follows a chi square distribution also. Some simulation studies indicate that the better
results may be got than those of Wang.
Key words and phrases. Empirical likelihood, Kaplan-Meier estimate, censoring.
61
Contributed Talks
Marginal Hazard Models with Varying-coefficients for MultivariateFailure Time Data4
Jianwen Cai
Department of Biostatistics, University of North Carolina, USA
Jianqing Fan
Department of Operations Research and Financial Engineering, Princeton University, USA
Haibo Zhou
Department of Biostatistics, University of North Carolina, USA
Yong Zhou
University of North Carolina, USA and Chinese Academy of Science, China
Statistical inference for the marginal hazard models with varying-coefficients for multivariate failure
time data is studied in this paper. A local pseudo-partial likelihood procedure is proposed for estimating
the unknown coefficients and the intercept function. A weighted average estimator is also proposed in
an attempt to improve the efficiency of the estimator. The consistency and the asymptotic normality
of the proposed estimators are established and the standard error formulas for the estimated coefficients
are derived and empirically tested. To reduce the computational burden of the maximum local pseudo-
partial likelihood estimator, a simple and useful one-step estimator is proposed. Statistical properties of
the weighted avarage optimal estimator and one-step estimator are established and simulation studies are
conducted to compare the performance of the one-step estimator to the maximum local pseudo-partial
likelihood estimator. The results show that the one-step estimator can save computational cost without
deteriorating its performance both asymptotically and empirically and the weight average optimal average
estimator are more efficient than the maximum local pseudo-partial likelihood estimator. A data set from
the Busselton Population Health Surveys is analyzed to illustrate our proposed methodology.
Key words and phrases. Local pseudo-partial likelihood, marginal hazard model, martingale, multivariate
failure time, one-step estimator, varying coefficients.
4This research was partially, supported by NIH grant RO1 HL69720. Fan’s research was also partially supported
by NSF grant DMS-0355174, and a RGC grant CUHK4262/01P of HKSAR. Y. Zhou’s research was done at the
University of North Carolina at Chapel Hill while on leave from the Chinese Academy of Sciences, Beijing and was
also partially supported by the Fund of National Natural Science (No.10171103) of China. The authors thank Dr.
Matthew Knuiman and the Busselton Population Medical Research Foundation in Western Australia for providing
the data used in the illustration.
62
Contributed Talks
Using SPSS as a Powerful Tool to Teach Probability Better
Yu Zhu
Department of Statistics, Xi’an University of Finance & Economics, China
This paper aims at conveying a practical and important idea in conducting probability teaching in a
better way through operable examples. Other statistical software may work as good as SPSS. SPSS is a
wide-spread statistical software. Normally, it can be used for data editing and data analyzing. But because
its nice graph-drawing feature, it can also be used to facilitate teaching of probability course, especially
the teaching of probability to a relatively low level students such as professional training. For the low level
teaching, the core of the probability course is not the theoretical aspect. Rather, it is the understanding of
the basic concepts that matters. In general, visual effect is much more impressive and understandable than
that of conceptual explanation. Years of experience tells me that it is also true to probability teaching.
For example, pdf can not be easily understood literally, but can be easily understood via a graphical or
visual display. A fictitious pdf graph can be easily drawn by teachers on the board using a piece of chalk,
or on a piece of paper using a pencil. But it is always imprecise, sometimes could even be wrong. To get
rid of this harassment, one can use the computer to draw precise pdfs to facilitate the teaching. More
important, if we teach this drawing approach to the students, their homework assignments can be done
more effectively, and the relevant concept of probability theory can stay longer with them in their minds.
Key words and phrases. Probability teaching, graph, pdf.
63