To save our environment, you are adviced not to print them out

65
To save our environment, you are adviced not to print them out because every participant will get a hard copy of the abstracts in the conference bag.

Transcript of To save our environment, you are adviced not to print them out

Page 1: To save our environment, you are adviced not to print them out

To save our environment, you are adviced not to print them out because every participant will get a hard copy of the abstracts in the conference bag.

Page 2: To save our environment, you are adviced not to print them out
Page 3: To save our environment, you are adviced not to print them out

Plenary Talks

Testing Multinormality in Two-level Structural Equation Models

Peter M. Bentler

Department of Psychology & Statistics, University of California, Los Angeles, USA

[email protected]

Jiajuan Liang

Department of Quantitative Analysis, University of New Haven, USA

[email protected]

Multinormality is a common assumption in the maximum likelihood analysis of two-level struc-

tural equation models (Bentler, Liang & Yuan, 2005). In these models, the independence condition

on level-1 observations is no longer satisfied. As a result, existing statistics for testing multinor-

mality based on independent observations cannot be directly used in two-level structural equation

models. In this talk we will tackle this problem by constructing three types of necessary tests: 1)

tests based on the theory of spherical and spherical matrix distributions (Fang, Kotz & Ng, 1990;

Fang & Zhang, 1990). A series of necessary tests and a graphical method with a balanced design

are constructed by using the same techniques as in Fang, Li and Liang (1998) and Liang, Li, Fang

and Fang (2000); 2) tests based on extended Mardias skewness and kurtosis statistics with missing

data (Yuan, Lambert & Fouladi, 2004); and 3) tests based on imputation of independent missing

data (Tan, Fang, Tian & Wei, 2004). These necessary tests can be applied without requiring a large

level-1 or level-2 sample size. Monte Carlo studies are carried out to demonstrate the performance

of the proposed tests in the aspects of controlling type I error rates, the power against a departure

from normality for level-1 variables, and the power against a departure from normality for level-2

variables. An application of these tests to a practical data set is illustrated.

Projection Properties of Factorial Designs for Factor Screening

Ching-Shui Cheng

Academia Sinica, Taiwan and University of California, Berkeley, USA

[email protected]

The projection of a factorial design onto a subset of factors is the subdesign consisting of the

given subset of factors (or, equivalently, the subdesign obtained by deleting the complementary

set of factors). A factor-screening design with good projections onto small subsets of factors can

provide useful information when a small number of active factors have been identified. I shall give

a review of projection properties of factorial designs, in particular those of nonregular designs with

complex aliasing.

1

Page 4: To save our environment, you are adviced not to print them out

Plenary Talks

Bayesian Networks for Forensic DNA Identification

Philip Dawid

Department of Statistical Science, University College London, United Kingdom

[email protected]

Problems of forensic identification from DNA profile evidence can become extremely challenging,

both logically and computationally, in the presence of such complicating features as missing data

on individuals, mixed trace evidence, mutation, silent alleles, laboratory and handling errors, etc.

etc. In recent years it has been shown how Bayesian networks can be used to represent and solve

such problems.

”Object-oriented” Bayesian network systems, such as Hugin version 6, allow a network to con-

tain repeated instances of other networks. This architecture proves particularly natural and useful

for genetic problems, where there is repetition of such basic structures as Mendelian inheritance

or mutation processes.

I will describe a ”construction set” of fundamental networks, that can be pieced together, as

required, to represent and solve a wide variety of problems arising in forensic genetics. Some

examples of their use will be provided.

Joint work with Julia Mortera and Paola Vicard.

Statistical Foundation: Then and Now

Jianqing Fan

Department of Operation Research and Financial Engineering, Princeton University, USA

[email protected]

The theory on the distribution of maximum likelihood ratios is fundamental and indispensible

to classical parametric inferences. Despite their success in parametric inferences, the maximum

likelihood ratio statistics might not exist in nonparametric function estimation setting. Even if they

exist, they may be are hard to find and can be not optimal. The generalized likelihood statistics

will be introduced to overcome these drawbacks. New Wilks’ phenomenon is unveiled in infinite

dimensional parameter spaces. We demonstrate that the generalized likelihood statistics (GLR) are

asymptotically distribution free and follow χ2-distributions for a number of testing problems and

a variety of useful semiparametric and nonparametric models. These include the Gaussian white

noise model, nonparametric regression models, varying coefficient models, additive models and

partial linear varying-coefficient models. We further demonstrate that generalized likelihood ratio

statistics are asymptotically optimal in the sense that they achieve optimal rates of convergence

given by Ingster (1993). They can even be adaptively optimal in the sense of Spokoiny (1996).

Issues on bias reduction will be addressed. The talk is based on a series of recent papers by my

collaborators, Chunming Zhang, Jian Zhang, Jiangcheng Jiang and Tao Huang.

2

Page 5: To save our environment, you are adviced not to print them out

Plenary Talks

PCA for FDA

Peter Hall

Centre for Mathematics and its Applications, Australian National University, Australia

[email protected]

Principal components analysis (PCA) is arguably more important a tool for functional data

analysis (FDA) than it is when one is analysing vector-valued data, since the distribution of

functional data cannot readily be treated in its full, infinite-dimensional context. For example,

even simple linear regression in function data analysis requires dimension reduction, and there,

principal-components analysis can help determine both the ”angles” of projection and the number

of projections used. In this talk we shall discuss some of the theoretical issues that arise in principal

components analysis for functional data, and some of the methodology to which PCA for FDA

leads.

Kai-Tai Fang’s Contributions to Quasi-Monte Carlo Methods

Fred J. Hickernell

Hong Kong Baptist University, Hong Kong and Illinois Institute of Technology, USA

[email protected]

Prof. Kai-Tai Fang’s contributions to quasi-Monte Carlo (or number-theoretic) methods have

been groundbreaking, opening new areas of fruitful research. Quasi-Monte Carlo methods use

evenly distributed, often deterministic, points instead of simple random points. Such points are

called low discrepancy points. In the 1980s Prof. Fang together with Prof. Yuan Wang proposed

using low discrepancy points for experimental design. Previously such points had been used pri-

marily for evaluating high dimensional integrals. Prof. Fang went on to explore various uses of

low discrepancy points for solving statistical problems. These were explained in the monograph

he co-authored with Prof. Wang. Prof. Fang and his collaborators developed many methods for

constructing low discrepancy sets of small size. These included both computational methods and

lower bounds that could be used to verify when those methods obtained the actual optima. This

talk provides an overview of Prof. Fang’s work on quasi-Monte Carlo methods.

Fang Kai-Tai: A Life of Statistician

Dennis Lin

Department of Statistics, Pennsylvania State University, USA

[email protected]

Kai-Tai Fang was born in a poor family and received very little statistical education in his early

career, and yet he became one of the most international influential statisticians. Academia typically

evaluates their faculty in three portions: teaching, research and service. From my personal view,

KT Fang is simply the best with these three components combined. This talk attempts to review

history of Fang’s life before his first retirement in 2005. Hopefully, we will be able to generate

some general observations on the making of a successful statistician to be shared with young

statisticians. Thanks the program organizer for providing such a great opportunity for presenting

my very primitive findings.

3

Page 6: To save our environment, you are adviced not to print them out

Plenary Talks

Sometimes it is Possible to Reduce Both Variance and Bias: The

Multi-process Parallel Antithetic Coupling For Backward and Forward

Markov Chain Monte Carlo

Xiao-Li Meng

Department of Statistics, Harvard University, USA

[email protected]

This talk is based on Craiu and Meng (2005, The Annals of Statistics), which has the following

abstract:

”Antithetic coupling is a general stratification strategy for reducing Monte Carlo variance with-

out increasing the simulation size. The use of the antithetic principle in the Monte Carlo literature

typically employs two strata via antithetic quantile coupling. We demonstrate here that further

stratification, obtained by using k>2 (e.g., k = 3 − 10) antithetically coupled variates, can offer

substantial additional gain in Monte Carlo efficiency, in terms of both variance and bias. The

reason for reduced bias is that antithetically coupled chains can provide a more dispersed search

of the state space than multiple independent chains. The emerging area of perfect simulation

provides a perfect setting for implementing the k-process parallel antithetic coupling for MCMC

because, without antithetic coupling, this class of methods delivers genuine independent draws.

Furthermore, antithetic backward coupling provides a very convenient theoretical tool for inves-

tigating antithetic forward coupling. However, the generation of k > 2 antithetic variates that

are negatively associated, i.e., they preserve negative correlation under monotone transformations,

and extremely antithetic, i.e., they are as negatively correlated as possible, is more complicated

compared to the case with k = 2. In this paper, we establish a theoretical framework for investigat-

ing such issues. Among the generating methods that we compare, Latin hypercube sampling and

its iterative extension appear to be general-purpose choices, making another direct link between

Monte Carlo and Quasi Monte Carlo.”

Optimal Factorial Designs for cDNA Microarray Experiments

Tathagata Banerjee

University of Calcutta, India

Rahul Mukerjee

Indian Institute of Management Calcutta, India

[email protected]

Optimal designing of cDNA microarray experiments is an area of enormous potential that

has started opening up only very recently. Although these experiments are structurally similar

to classical paired comparison experiments, the two can have quite different objects of principal

interest. Hence optimality results from the latter to not routinely carry forward to the former.

The present paper addresses the design problem for microarrays when the treatments have a

factorial structure, via deployment of tools like approximate theory, Kronecker representation and

unimodularity.

We begin by presenting an outline of cDNA microarray experiments. Next, analytical results

for the 22 factorial are obtained using the approximate theory. Thereafter, we consider general

factorials, obtain an exact result on optimal saturated designs and also study nearly saturated

cases. The situation where the underlying model includes dye-color effects is also considered and

the role of dye-swapping is rigorously investigated. The paper ends with a discussion of some open

issues.

4

Page 7: To save our environment, you are adviced not to print them out

Plenary Talks

Uniform Point Sets and Their Applications

Harald Niederreiter

Department of Mathematics, National University of Singapore, Singapore

[email protected]

Uniform point sets in probability spaces were introduced by the speaker in 2003. They are finite

point sets in a given probability space (X,B, µ) with a uniformity property relative to a family

of µ-measurable subsets of X. Uniform point sets have since found applications in areas such as

numerical integration, computer graphics, and computational finance. The talk will present this

work in a broad framework that will link it with related themes.

Probability Models in Reliability and Survival Analysis

Ingram Olkin

Department of Statistics, Stanford University, USA

[email protected]

Nonnegative random variables arise naturally in a wide variety of applications, in particular,

as life-lengths of devices or of biological organisms. By contrast, the normal distribution, which

has long played a central role in statistics, allows for the corresponding random variables to take

on all real values. For nonnegative random variables, as arise in reliability and survival analysis,

there is no distribution as pervasive as the normal distribution, with its foundation in the central

limit theorem.

Several approaches are in common use for the analysis of data: distribution-free methods,

qualitatively conditioned methods, semiparametric methods, and parametric methods.

Historically, methods of fitting ere confined to disribution functions and density functions under

the rubric of ”curve fitting”. Here the names of Pearson, Charlier, Edgeworth, Kapteyn, Thiele,

and others come to mind.

In this survey we describe characteristics of distributions that may serve in deciding on a

model to be used in data analysis. In particular we discuss the behavior of other definitions of

distributions: hazard rate, hazard function, residual life distribution, mean residual life, odds ratio,

inverse distribution function, among others.

An important aspect of distributions is whether some ordering holds, and we here review sev-

eral orderings. The discussion of nonparametric families includes descriptions and implications

of increasing failure rate families, new-better-than-used families, among others. Semiparametric

families include those for which a parameter is included, and here we discuss how such families are

ordered.

This is a joint work with Albert W. Marshall.

5

Page 8: To save our environment, you are adviced not to print them out

Plenary Talks

Professor Fang’s Contribution to Multivariate Analysis

Dietrich von Rosen

Department of Biometry and Engineering, Swedish Univeristy of Agricultural Sciences,

Sweden

[email protected]

The talk will briefly consider Professor Fang’s very broad contribution to multivariate statistics.

In particular we focus on multivariate distribution theory (elliptical distribution theory, copulas,

among others) and multivariate linear models (growth curve models). Some connections to high

dimensional multivariate analysis will be shown. Moreover, we also present a new application

of elliptical distribution theory to the approximation of maximum likelihood estimators in the

classical Growth Curve model.

Key words and phrases. Multivariate analysis, elliptical distributions, growth curve model, distribution

approximation.

Statistical Methods for the Design and Aanlysis of Xenograft

Experiments: Uniform Design and Constrained Parameter Models

Ming T. Tan, Hong-Bin Fang and Guo-Liang Tian

Division of Biostatistics, University of Maryland Greenebaum Cancer Center, USA

[email protected]

Xenograft model typically refers to human cancer bearing mouse model, where human tumor

tissues (e.g., sliced tissue blocks, or tumor cells) are grown in mice. In cancer drug development,

demonstrated anti-tumor activity in this model is an important step to bring a promising compound

to human. The key outcome variable is tumor volumes measured over a period of time. Since cancer

therapy typically involves combinations of several drugs with the goal to achieve greater efficacy

with lesser toxicity, combination studies need to be optimally designed, so that with moderate

sample size, the joint action of two drugs can be estimated and best combinations identified with

reasonable cost. Since we typically do not have enough information on the joint action of the

two compounds before experiment, we propose a novel nonparametric model that does not impose

strong assumptions on the joint action. We then propose an experimental design for testing

joint action using uniform measure. Statistical analysis of these experiments involves typically

incomplete data since a mouse may die during the experiment or may be sacrificed when its tumor

becomes unbearable, or the tumor volume falls below detectable limit. In addition, if no treatment

were given to the tumor-bearing mice, the tumors would keep growing until the mice die or are

sacrificed, thus resulting some regression coefficients being constrained. We develop a maximum

likelihood method based on an EM-type algorithm to estimate the dose-response relationship while

accounting for the informative censoring and the constraints of model parameters. We illustrate

the current methods of experimental design and data analysis with a study on two new drugs.

6

Page 9: To save our environment, you are adviced not to print them out

Plenary Talks

Optimal Designs for Fourier Regression: Some Dynamical System

Constructions

Henry Wynn

Department of Statistics, London School of Economics, United Kingdom

[email protected]

The work on optimal design for Fourier regression in the joint paper (with R Schwabe, E.

Riccomagno, Annals of Statistics, 1997), and other papers, is revisited. This is a kind of mul-

tidimensional Nyquist sampling theory in which there is a trade-off between model complexity

and a generalised type of sampling frequency. As the model complexity increases more complex

sampling patterns are required. The basic type of design is an integer lattice of the kind pioneered

by Professor Fang. The problem lies in the special choice of generators. At a certain point in

the previous work a pattern similar to the Cantor set construction was discovered. This is the

starting point for the present work in an attempt to link the design problem to certain problems

in the construction of optimal designs using dynamical systems. New tools are used, in particular

methods of symbolic computation, to obtain solutions. Ad hoc methods can also be used which

can then be seen to give rise to special sequences.

Censoring and Truncation in Neutron Lifetime Estimation

Grace L. Yang

Department of Mathematics, University of Maryland, USA

[email protected]

In an international effort, a team of researchers demonstrated at the NIST Cold Neutron Re-

search Facility for the first time in 1999 that they can confine ultra cold neutrons in a three-

dimensional magnetic trap filled with liquid helium. This technical breakthrough gives a new way

to make more accurate measurement of neutron lifetime and to answer other questions fundamental

to physics and astrophysics. Since the demonstration, experimental protocol has been under the

development for data collection. The experiment consists of two stages. In the first stage, ultra

cold neutrons are generated and confined by the magnetic trap. In the second stage, the decays

of the trapped neutrons are recorded for analysis. Sophisticated as the experiment is, the data it

collects are nevertheless incomplete. We do not know: (a) How many neutrons are captured in the

magnetic trap, (b) The birth time of each trapped neutron, (c) How many of them have decayed

during the trapping period, and (d) How many have not yet decayed so that their decays can be

detected during the observation period. Furthermore, a recorded signal can be either a true decay

time or a false background noise. As a result, an observation can be censored, truncated or a false

signal. Under these conditions, we use a birth and death process to model lifetime of a neutron.

From the model likelihoods are constructed for estimation. Our unified approach is applicable to

other similar two-stage experiments for studying radioactive decay processes. Comparison is made

with the usual censoring model and cross-sectional sampling in biostatistics. Some open problems

are introduced.

7

Page 10: To save our environment, you are adviced not to print them out

Invited Talks

Mining Course Scheduling Data with Markov Chain

Fengshan Bai

Department of Mathematical Sciences, Tsinghua University, China

[email protected]

Course scheduling is one of the most classical timetabling problems in optimization. It is,

however, NP-Complete, hence hard to get the global optimal solution.

An introduction of PageRank, which are the way of deciding a page’s importance in Google, is

given first in this talk. Google figures that when one page links to another page, it is effectively

casting a vote for the other page. It matters because it is one of the factors that determine a page’s

ranking in the search results. CourseRank is introduced by using Markov chain, which is similar to

PageRank. Computational results show that CourseRank brings ranking in courses, and is useful

in mining the course scheduling data.

Development of the Pearl River Delta Based on Statistical Data: its

Past, Present and Future

Xinmin Bu

Guangdong Provincial Bureau of Statistics, China

With the 26 years of reform and opening-up, the Pearl River Delta in Guangdong has wit-

nessed a hard-pioneering and brilliantly-developing period. At the beginning of the reform and

opening-up, the Pearl River Delta did not play a bright role in Guangdong’s economy. Since the

26-year reform and opening-up, the Pearl River Delta has realized its economic take-off with indus-

trialization playing the dominated role, export-oriented economy having a wonderful performance,

urbanization making a fast improvement and information showing a good beginning. Its industrial

competitiveness is growing. The Pearl River Delta is not only the front-runner and backbone of

Guangdong but also one of the most dynamic powerhouses in China. In 2004, the GDP, foreign

direct investment actually utilized and total export reached 1,357.5 billion yuan, 9.02 billion US

dollars and 182.43 billion US dollars, accounting for 75.0 percent, 90.0 percent and 95.2 percent of

the provincial total; or 9.9 percent, 14.9 percent and 30.7 percent of the national total.

At present, the economic development in the Pearl River Delta is dynamic and continues to

take the leading role in China. In 2004, its GDP growth rate was 16.0 percent. The main fea-

tures of development include rapid growth of economy, fast development of high-tech industry,

preliminarily-taking-shape of IT enterprises and household electrical appliance enterprises base,

fairly high level of economic globalization, front-runner of the market-oriented reform, tremendous

development of urbanization with fast economic steps and outstanding performance of industrial

competitiveness.

Looking forward to the future, Guangdong, Hong Kong and Macao will stand more closely. With

their differentiated and complementary economic developments, wide cooperation is expected in

the days to come. It is believed that our future will be better with the common efforts of the three

sides.

8

Page 11: To save our environment, you are adviced not to print them out

Invited Talks

Nonparametric Modeling for Conditional Capital Asset Pricing Models

Zongwu Cai

Department of Mathematics and Statistics, University of North Carolina at Charlotte,

USA

[email protected]

In this paper, we study two classes of nonparametric capital asset pricing models. First, we

consider a unified nonparametric econometric model for the time-varying betas to capture the

time variation in the market betas by allowing the betas to change over time or to be a function

of some state variables. A local linear approach is developed to estimate the functional betas and

the asymptotic properties of the proposed estimators are established without specifying the error

distribution. Also, a simple nonparametric version of bootstrap test is adopted for testing the

misspecification. Secondly, we investigate a general nonparametric asset pricing model to avoid

functional form misspecification of betas, risk premia, and the stochastic discount factor. To

estimate the nonparametric functionals, we propose a new nonparametric estimation procedure,

termed as nonparametric generalized method of moment (NPGMM), which combines the local

linear fitting and the generalized method of moment, and we establish the asymptotic properties

of the resulting estimators. An efficient and feasible estimation procedure is suggested and its

asymptotic behavior is studied. Finally, finite sample properties of the proposed estimators are

investigated by the Monte Carol simulations and the empirical examples.

Orthogonal Arrays of 2- and 3-levels for Lean Designs

Ling-Yau Chan

Department of Industrial and Manufacturing Systems Engineering, University of Hong

Kong, Hong Kong

[email protected]

Chang-Xing Ma

Department of Statistics, University of Florida, USA

[email protected]

When an orthogonal array (OA) of n rows is used as the design matrix for an experiment, n

is the number of runs for the experiment. In an OA of q levels, n is an integer multiple of q2. In

an experiment, if the number of runs cannot be set exactly equal to the number of rows of an OA

because of constraints in resources or other reasons, the experimenter may use a design matrix

formed by omitting some rows of an OA. If such a design matrix is used, the number of observed

response obtained may not be enough for estimation of all the effects corresponding to columns of

the orthogonal array. A lean design is a design matrix formed by deleting some rows and columns

of an OA, which still allows efficient estimation the effects of the factors corresponding to the

remaining columns of the OA. The authors shall discuss lean designs of 2 and 3 levels, and provide

D-optimal OA’s from which lean designs can be formed.

9

Page 12: To save our environment, you are adviced not to print them out

Invited Talks

Asymptotics for Estimation in a Partly Linear Errors-in-Variables

Regression Model with Time Series Errors

Min Chen and Dong Li

Academy of Mathematics and Systems Science, CAS, China

[email protected]

In this talk, we study asymptotics of some estimators for a partly linear errors-in-variables

regression model with time series errors, The estimation of parameters of model, of autocovariance

function and autocorrelation function and of the smooth function are derived by using the nearest

neighbor-generalized least square method. Under a set of weaker conditions, the strong consistency

and asymptotic normality of these estimators are obtained. It is shown that the estimator of the

smooth function achieves an optional rate of convergence.

Obtaining O(N−2+ε) Convergence for Quadrature Rules Based on Digital

Nets

Josef Dick

School of Mathematics, University of New South Wales, Australia

[email protected]

This is a joint work with Friedrich Pillichshammer and Ligia-Loretta Cristea.

We are interested in approximating a high dimensional integral over the unit cube∫[0,1]2

f(x)dx

by a quadrature rule of the form

1

N

N∑k=1

f(x),

where x1, ...,xN ∈ [0, 1]s are the quadrature points.

It has been shown that for functions lying in a Sobolev space of absolutely continuous once

differentiable functions we obtain a convergence rate of O(N−1+ε) using for example randomly

shifted lattice rules or randomly digitally shifted digital nets.

If we impose stronger smoothness conditions on the functions, Hickernell showed that, using

the bakers-transformation in conjunction with randomly shifted lattice rules, we can obtain a

convergence rate of O(N−2+ε). No such results have until now been known for digital nets. On

the other hand, Dick and Pillichshammer showed that digital nets and lattice rules have many

essential properties in common, and many of the results previously only known to hold for lattice

rules have now been shown to also hold for digital nets in a similar fashion.

In this talk we show that the analogy also carries over to the use of the bakers-transformation

in conjunction with digital nets and a digital shift, that is, we also obtain a convergence rate of

O(N−2+ε) for digital nets which are randomly digitally shifted and then folded by the bakers-

transformation. Though the analysis is somewhat more involved, a computer implementation of

this method is not. Furthermore, by using digital nets instead of lattice rules, one is not confined

to use search algorithms for finding good underlying deterministic point sets.

10

Page 13: To save our environment, you are adviced not to print them out

Invited Talks

Balanced Factorial Designs for cDNA Microarray Experiments

Sudhir Gupta

Division of Statistics, Northern Illinois University, USA

[email protected]

Balanced factorial designs are introduced for cDNA microarray experiments. Single replicate

designs obtained using the classical method of confounding are shown to be particularly useful for

deriving suitable balanced designs for cDNA microarrays. Classical factorial designs obtained using

methods other than the method of confounding are also shown to be useful. The paper provides a

systematic method of deriving designs for microarray experiments as opposed to algorithmic and

ad-hoc methods given recently in literature.

The Role of Official Statistics in the Development of the Pearl River

Delta Region

Frederick W H Ho

Census and Statistics Department, HKSAR

[email protected]

The process of globalization, Chinas accession to WTO, implementation of Closer Economic

Partnership Arrangement (CEPA) and ever-developing regional cooperation of territories in the

Pan-Pearl River Delta (Pan-PRD) Region provide a platform for further strengthening the eco-

nomic collaboration and development among Hong Kong, Macao and the Mainland. These new

circumstances bring about both challenges and opportunities to statistics practitioners in measur-

ing and analyzing social and economic phenomena for meeting an expected increase in demand for

statistical information, in terms of both quantity and variety. Also, it entails the compilation for

statistical information in a more timely manner in order to capture as well as discern the rapidly

changing and evolving economic conditions.

Owing to their close proximity, Guangdong, Hong Kong and Macao have long been maintaining

very close contacts in the arena of statistical matters. For years, they have been making concerted

efforts to broaden the scope and raising the level of cooperation. The regional cooperation of the

Pan-PRD stimulates freer flow of people, goods and capital, thus injecting new impetus into the

closer partnership in statistical work of the Guangdong − Hong Kong − Macao Region. Building

upon the existing foundation of cooperation, statistical authorities of the Region can foster a more

synergic development by leveraging their unique strengths, complementing each others statistical

endeavours while staggering individual focuses.

To consolidate cooperation among the statistical authorities of the Guangdong − Hong Kong

− Macao Region, it is imperative that communication and coordination should be deepened and

broadened. Through strengthening the sharing of experiences gained and lessons learned, all can

certainly enhance the knowledge and understanding of such aspects as statistical systems, statistical

techniques and statistical methods adopted by each other, which will in turn help further improve

the availability and comparability of statistical data and raise the quality of statistical services of

respective authorities.

11

Page 14: To save our environment, you are adviced not to print them out

Invited Talks

Affine α-resolvability of Group Divisible Designs

Sanpei Kageyama

Department of Mathematics Education, Hiroshima University, Japan

[email protected]

The concept of affine α-resolvabiliity has been discussed for block designs in literature since

1942 for α = 1 and in particular 1963 for α ≥ 2. Among group divisible (GD) designs, affine

α-resolvable designs are known for both classes of singular GD and semi-regular GD designs.

However, no example has been found for an affine α-resolvable regular GD design in literature. In

this talk, the validity of such concept will be disproved for regular GD designs in general. Thus

the regular GD design dose not possess any property of the affine α-resolvability.

Large p Small n Asymptotics for Significance Analysis in High

Throughput Screening

Michael R. Kosorok

Department of Statistics, University of Wisconsin-Madison, USA

[email protected]

We develop large p small n asymptotics suitable for significance analysis after normalization

in microarray gene expression studies and other high throughput screening settings. We consider

one sample and two sample comparisons where the number n of replications (arrays) per group is

extremely small relative to the number p of items (genes). Provided the log of p squared divided

by n goes to zero, we show under very general dependency structures that p-values based on a

variety of marginal test statistics are uniformly valid in a manner which allows accurate control

of the false discovery rate. We demonstrate the results with simulation studies and several real

microarray studies. This is joint work with Shuangge Ma.

12

Page 15: To save our environment, you are adviced not to print them out

Invited Talks

Search for Relevant Sets of Variables in a High-Dimensional Setup

Jurgen Lauter

Otto von Guericke University Magdeburg and Interdisciplinary Centre of Bioinformatics,

University of Leipzig, Germany

[email protected]

Kai-Tai Fang has deserved well of the multivariate analysis and, in particular, the theory of

spherical distributions. The strategy of spherical distributions is a generalization of the classical

multivariate approach. However, this spherical concept is also an effective tool to solve difficult

problems of classical inference starting from the normality assumption.

In 1996, we have developed the principle of spherical tests. We utilized that linear combinations

of the given normally distributed variables are left-spherically distributed under the null hypothesis,

provided that the coefficients are defined as a function of the total sums of products matrix of the

sample. This principle allows testing of hypotheses, even if the dimension p is huge and the sample

size n is small. Such applications arise in recent years, for example, in gene expression analysis.

In the framework of spherical tests, multiple procedures for the recognition of relevant variables

and relevant sets of variables can also be constructed. Kropf (2000), Hommel (2004) and Westfall,

Kropf, Finos (2004) have proposed procedures to find significant single variables. The lecture to

be presented will contain some extensions to sets of variables. The multivariate spherical tests

are combined with the search for biologically interpretable structures in the data. Thus, groups

of highly correlated variables allowing the rejection of the null hypothesis are detected. In the

procedures, the emphasis lies on managing the large number of possible subsets of variables. Besides

this parametric procedure, a corresponding non-parametric strategy based on the principles by

Westfall and Young (1993) is represented.

Key words and phrases. Multiple procedure, model choice, spherical test, Gene expression analysis.

13

Page 16: To save our environment, you are adviced not to print them out

Invited Talks

Reduced Kernel on Support Vector Machines

Yuh-Jye Lee

Department of Computer Science and Information Engineering, National Taiwan

University of Science and Technology, Taiwan

[email protected]

The reduced support vector machine (RSVM) was proposed for the practical objective to over-

come the computational difficulties as well as to reduce the model complexity in generating a

nonlinear separating surface for a massive data set. It has been also successfully applied to other

kernel-based learning algorithms. Also, there were experimental studies on RSVM that showed the

efficiency of RSVM. In this talk, we first present a study the RSVM from the viewpoint of robust

design in model building and consider the nonlinear separating surface as a mixture of kernels.

The RSVM uses a compressed model representation instead of a saturated full model. Our main

result shows that the uniform random selection of a reduced set to form the compressed model

in RSVM is the optimal robust selection scheme in terms of the following criteria: (1) it mini-

mizes an intrinsic model variation measure; (2) it minimizes the maximal model bias between the

compressed model and the full model; (3) it maximizes the minimal test power in distinguishing

the compressed model from the full model. In the second part of the talk, we propose a new

algorithm, Incremental Reduced Support Vector Machine (IRSVM). In contrast to the uniform

random selection of a reduced set used in RSVM, IRSVM begins with an extremely small re-

duced set and incrementally expands the reduced set according to an information criterion. This

information-criterion based incremental selection can be achieved by solving a series of small least

squares problems. In our approach, the size of reduced set will be determined automatically and

dynamically but not pre-specified. The experimental tests on four publicly available datasets from

the University of California (UC) Irvine repository show that IRSVM used a smaller reduced set

than RSVM without scarifying classification accuracy.

14

Page 17: To save our environment, you are adviced not to print them out

Invited Talks

Data Mining in Chemistry

Yizeng Liang

Research Center of Modernization of Traditional Chinese Medicines, Central South

University, China

yizeng [email protected]

Huge amount of chemical and biological data have been being accumulated nowadays. How to

mine out the rules and knowledge hidden in them is really an important task for chemists, especially

for chemometricians. Here we report some results obtained in our research group in recent years,

which show that data mining in chemistry will have a prosperous future. Seeking quantitative

structure-activity relationship (QSAR) and quantitative structure-property relationship (QSRR)

has been being a dream in chemistry for a long time. In this work, projection pursuit technique

proposed in statistics was utilized to do data mining from the data of retention index and structural

descriptors. In order to find out some valuable information about the relationship, an algorithm

based on projection pursuit was developed so as to search a reasonable projection direction to

reduce the dimension of the employed high-dimensional data and see the structure of data. Samples

of alkane, alkene and cycloalkane have been studied, and it is found that when a good projection

direction is used, compounds can be separated into different classes based on special chemical

structures, such as, different number of carbon atoms in molecules, different number of branches,

double bonds number, position of double bonds, having conjugated double bonds or not and number

of rings and etc.. In order to build accurate regression models, the classification information

obtained by projection pursuit was utilized to establish different models for different classes. With

the help of the class distance, an excellent regression model was then obtained. Its estimation

errors and prediction errors are all very small and within the measurement error level, which really

gives some quite useful insight explaining why and how the chemical structure will influence the

retention behaviors of the different molecules.

References

[1] Du YP, Liang YZ, Yun D, Data mining for seeking an accurate quantitative relationship between

molecular structure and GC retention indices of alkenes by projection pursuit, Journal Of Chemical

Information And Computer Sciences 42 (6) (2002) 1283-1292

[2] Du YP, Liang YZ, Wang WM, Studies on QSPR between GC retention indices and topolog-

ical indices of cycloalkanes by using projection pursuit method, Chemical Journal Of Chinese

Universities-Chinese 24 (10) (2003) 1795-1797

[3] Du YP, Liang YZ, Data mining for seeking accurate quantitative relationship between molecular

structure and GC retention indices of alkanes by projection pursuit, Computational Biology And

Chemistry 27 (3) (2003) 339-353

[4] Hu QN, Liang YZ, Yin H, et al., Structural interpretation of the topological index. 2. The

molecular connectivity index, the Kappa index, and the atom-type E-State index, Journal Of

Chemical Information And Computer Sciences 44 (4) (2004) 1193-1201

[5] Hu QN, Liang YZ, Peng XL, et al., Structural interpretation of a topological index. 1. External

factor variable connectivity index (EFVCI), Journal Of Chemical Information And Computer

Sciences 44 (2) (2004) 437-446

15

Page 18: To save our environment, you are adviced not to print them out

Invited Talks

Self-weighted LAD Estimation for Infinite Variance AutoregressiveModels

Shiqing Ling

Department of Mathematics, Hong Kong University of Science and Technology, Hong

Kong

[email protected]

How to undertake statistical inference for infinite variance autoregressive models has been a long-

standing open problem. In order to solve this problem, we propose a self-weighted LAD estimator and

show that this estimator is asymptotically normal if the density of errors and its derivative are uniformly

bounded. Furthermore, a Wald test statistic is developed for the linear restriction on the parameters, and

it is shown to have non-trivial local power.

Simulation experiments are carried out to assess the performance of the theory and method in finite

samples and a real data example is given. The results in this paper are entirely different from results in

the literature and should provide new insights for future research on heavy-tailed time series.

Toward the Mapping of Complex Human Disorders: StatisticalMethods and Applications

Shaw-Hwa Lo

Department of Statistics, Columbia University, USA

[email protected]

The mapping of complex traits is one of the central areas of human genetics. Many common human

disorders are believed to be “complex“ or multifactorial, meaning that they cannot be attributed to alleles

of a single gene or one risk factor. Many genes and environmental factors contribute modest effects to a

combined action in deciding these traits. Despite a number of novel methods that have been proposed

during the past 20 years to detect the responsible genes, the success, however, has been largely restricted to

simple Mendelian diseases. For common /complex human disorders, the progress has been slow, and results

are limited. This is perhaps due in part to the need for capable statistical methods that accommodate

large amounts of correlated genotypic and phenotypic data. Most current methods that make use of

marginal information only, fail to include the information of the interaction among the disease loci. It is

thus less likely for these methods to have adequate power to detect the responsible loci. Since interactive

information among markers reflects the joint information of the traits due to multiple genes (and perhaps

other risk factors), we believe, mapping methodologies that are able to simultaneously inspect disjoint

marker loci (possibly on different chromosomes) seem crucial for the success of future genes mappings. I

shall introduce an alternative approach to address the difficulties. I will first review the methods using

family-trio data and several disease models, the backward haplotype transmission association (BHTA)

algorithm, proposed in Lo and Zheng (2002, 2004). Applications and findings using this approach from

recent projects on large-scale datasets will be presented. The methods that are applicable to other type of

data and designs are to be discussed. Time permitting, the issues of multiple comparisons and statistical

significance for a large number of tests will be addressed.

16

Page 19: To save our environment, you are adviced not to print them out

Invited Talks

Return Distribution and Risk Estimation: Some Empirical Evidence

Dietmar Maringer

Faculty of Economics, Politics, and Social Sciences, University of Erfurt, Germany

[email protected]

Value at risk (VaR) has become a standard measure of portfolio risk over the last decade. It even

became one of the corner stones in the Basel II accord about banks’ equity requirements. Nevertheless,

the practical application of the VaR concept suffers from two problems: how to estimate VaR and how

to optimize a portfolio for a given level of VaR? The optimization problem can be tackled using recent

advances in heuristic optimization algorithms. For the estimation problem, several approaches have been

suggested including the use of parametric and empirical distributions; the former are computationally

less demanding whereas the latter are often seen to be more reliable. However, our application to bond

portfolios shows that a solution to the two aforementioned problems gives raise to a third one: the actual

VaR of bond portfolios optimized under a VaR constraint might exceed its nominal level to a large extent.

Thus, optimizing bond portfolios under a VaR constraint might increase risk. This finding is of relevance

not only for investors, but even more so for bank regulation authorities.

Role of Official Statistics Under the Regional Cooperation Framework

Iun Lei Mok

Statistics and Census Service, Macao SAR

[email protected]

Regional economy, a widely discussed issue that has been catching much attention in recent years,

is indeed a leading trend of the economic development worldwide. Around the globe, various forms of

regional cooperation are meant to sharpen its overall competitiveness. The basis of regional cooperation

lies in the synergy of complementing one another to pursue economic growth. Formations of the Greater

Pearl River Delta region, Yangtze Delta region and the Pan-Pearl River Delta region, epitomes of the

needs in coordinating development between different parts of China, showed a conducive move toward the

promotion of regional cooperation.

To spur sustainable economic growth in Hong Kong and Macao, the Central Government has estab-

lished Closer Economic Partnership Arrangement (CEPA) with its two Special Administrative Regions.

Furthermore, signing of the ”Pan-Pearl River Delta (9+2) Regional Cooperation Frame Agreement” will

accelerate economic cooperation within the region.

Economic integration of the Pearl River Delta region has intensified closer economic ties between

Guangdong, Hong Kong and Macao. In light of these new circumstances, official statistics compilers have

to fine-tune on the scope of data collection and other statistical endeavours, so as to provide timely, relevant

and accurate information that serves as reference to the data users. This presentation will focus on the

challenges and opportunities faced by the statistical offices and the directions on future co-operation.

17

Page 20: To save our environment, you are adviced not to print them out

Invited Talks

Linear Regression in Case-cohort Studies: Theory and NumericalAspects

Bin Nan

Department of Biostatistics, University of Michigan, USA

[email protected]

Menggang Yu and John D. Kalbfleisch

Right censored data from a classical case-cohort design and a stratified case-cohort design are consid-

ered. In the classical case-cohort design, the subcohort is obtained as a simple random sample of the entire

cohort, whereas in the stratified design, the subcohort is selected by independent Bernoulli sampling with

arbitrary selection probabilities. For each design and under a linear regression model, methods for esti-

mating the regression parameters are proposed and analyzed. These methods are derived by modifying the

linear ranks tests and estimating equations that arise from full-cohort data using methods that are similar

to the ”pseudo-likelihood” estimating equation that has been used in relative risk regression for these mod-

els. The estimates so obtained are shown to be consistent and asymptotically normal. When generalized

Gehan-type weights are used, the estimating functions are shown to be monotone and a Newton-type iter-

ated method is proposed to solve the estimating equations. Variance estimation and numerical illustrations

are also provided.

Joint Modelling of Mean-Covariance Structures in Longitudinal Studies

Jianxin Pan

School of Mathematics, University of Manchester, United Kingdom

[email protected]

In the literature of longitudinal data analysis (LDA), modelling of mean structure was considered under

a specification of within-subject covariance structure. For example, compound symmetry and AR(1) are

common choices among others. Alternatively, it may be selected from a class of candidates according

to certain information criteria like AIC or BIC. If unfortunately the true covariance structure is not

contained in the class, the selected structure may be not close to the truth. Accordingly, misspecification

of covariance structure may severely compromise statistical inferences. Some approaches were proposed to

model the covariance structures, e.g., Chiu et al (1996), but they may suffer from either no clear statistical

interpretation or no guarantee of positive definiteness for resulted covariance matrices.

In this talk I will give a brief review on recent developments that do not have the above problems.

These include a) using iteratively re-weighted least squares algorithm to update the parameter estimates

in the new models, b) selecting a reasonable model when polynomials of time are used to model the

mean-covariance structures (Pan and MacKenzie, 2003), c) modelling heterogeneity arising often in LDA,

d) modelling conditional covariance structures in linear mixed models, and e) modelling the covariance

structure in generalized estimating equations.

These approaches will be illustrated through analyzing real data and simulation studies. It is concluded

that the covariance structures should be modelled with the means, simultaneously.

References

[1] Chiu, T. Y. M., Leonard, T. & Tsui, K. W. (1996). The matrix-logarithm covariance model.

Journal of the American Statistical Association, 91, 198-210.

[2] Pan J X & MacKenzie G (2003). On modelling mean-covariance structures in longitudinal studies.

Biometrika, 90, 239-244.

Key words and phrases. Covariance Modelling, longitudinal data.

18

Page 21: To save our environment, you are adviced not to print them out

Invited Talks

Comparison of Discrimination Methods for High Dimensional Data

M.S. Srivastava

University of Toronto, Canada

[email protected]

T. Kubokawa

University of Tokyo, Japan

Dudoit, Fridlyand, and Speed(2002) compares several discrimination methods for the classification of

tumors using gene expression data. The comparison includes the Fisher(1936)’s linear discrimination analy-

sis methods (FLDA), classification and regression tree (CART) method of Breiman et al.(1984), aggregating

classifiers of Breiman(1996) and Breiman(1996) which include“bagging” methods of Friedman(1998) and

“boosting” method of Freund and Schapire(1997). The comparison also included two more methods called

DQDA method and DLDA method respectively. In DLDA method, it is assumed that the population

covariance matrix are not only diagonal matrices but they are all equal. However, among all the preceding

methods considered by Dudoit, Fridlyand, and Speed(2002), only DLDA did well. While it is not possible

to give reasons as to why other methods did not perform well, the poor performance of the FLDA method

may be due to the large dimension p of the data even when the degrees of freedom associated with the

sample covariance n > p. In large dimensions, the sample covariance may become near singular with very

small eigenvalues. For this reason, it may be reasonable to consider a version of the principal component

method which is applicable even when p > n. Using the Moore-Penrose inverse, a general method based

on minimum distance rule is proposed. Another method which uses an empirical Bayes estimate of the

inverse of the covariance matrix along with a variation of this method are also proposed. We compare

these three new methods with DLDA method of Dudoit, Fridlyand, and Speed(2002).

Key words and phrases. Classification, discrimination analysis, minimum distance, Moore-Penrose inverse.

Optimal and Efficient Crossover Designs when Subject Effects areRandom

John Stufken

Department of Statistics, University of Georgia, USA

[email protected]

Crossover designs are often evaluated under the assumption that subject effects are fixed. One justifica-

tion for this is that most information about treatment comparisons is based on within subject information.

But how efficient are designs that are optimal for fixed subject effects when the subject effects are really

random? Which designs are optimal when subject effects are really random, and how does this change with

the size of the subject effects variance relative to the size of the random error variance? We investigate

these questions in the presence of carry-over effects for the situation that the number of periods is at most

equal to the number of treatments.

19

Page 22: To save our environment, you are adviced not to print them out

Invited Talks

A Semiparametric Partly Linear Model for Censored Survival Data

Gang Li

Department of Biostatistics, University of California, Los Angeles, USA

[email protected]

Qihua Wang

Academy of Mathematics and Systems Science, Chinese Academy of Science, China

[email protected]

This article studies a semiparametric partly linear model fro regression analysis of right censored data.

The model postulates that the mean regression function is the sum of a linear part and a completely

unspecified nonlinear component and that the error distribution is unknown. An iterative estimation

procedure is proposed to estimate the regression coefficients of the linear part and the nonlinear function.

The partly linear model allows one to study the effects of certain covariates that are of primary interest,

while imposing minimal assumptions on other independent variables. The nonlinear component can also

be used to check the linearity assumption of a covariate and further suggest more parsimonious parametric

models. Some numerical studies are conducted to evaluate the performance of the proposed estimators.

We also illustrate our methods using two real data sets.

Polynomial-Time Algorithms for Multivariate Linear Problems withSmall Effective Dimension; Average Case Setting

G. W. Wasilkowski

Department of Computer Science, University of Kentucky, USA

[email protected]

There is a host of practical problems that deal with functions of very many variables. As observed in a

number of papers, some of those problems (e.g., in mathematical finance or physics) deal with functions of,

so called small effective dimension, i.e., functions which essentially depend only on groups of few variables.

For some applications, the effective dimension q∗ is fairly small, e.g., q∗ = 1 or 2. In the average case

setting, such problems can be modeled by stochastic processes that are special weighted tensor products

of Gaussian processes with finite-order weights. In this talk we present resent results on tractability of

such problems. More specifically, assuming that the univariate problem admits algorithms reducing the

initial error by ε in cost proportional to ε−p, we provide a construction of algorithms Ad,ε for the general

d-variate problem reducing the initial error ε times in cost essentially bounded by ε−p dq for q independent

of d and n. The exponent q depends on q∗ and often q = q∗, i.e., it is small. For some problems q = 0,

i.e., such problems are essentially no more difficult that the corresponding scalar problem.

20

Page 23: To save our environment, you are adviced not to print them out

Invited Talks

The Transformation on the Measures of Underemployment: When aDeveloping Country Transferred into Developed One

Duan Wei

Tamkang University, Taiwan

[email protected]

Hsien-Tang Tsai

National Sun Yat-Sen University, Taiwan

Chung-Han Lu

Cheng Shiu University, Taiwan

Meng-Hsun Shih

National Sun Yat-Sen University, Taiwan

Most developing countries lack unemployment relief programs, and many unemployed workers only

have to engage in marginal economic activities to survive. On the other hand, employed persons in

developed countries are experiencing lack of adequate employment opportunities, with persons who have

jobs often being compelled to use their skills less fully or to earn lower hourly wages or work fewer

hours than they are willing and able to. But the traditional unemployment figure does not account for

these workers. Statistics on underemployment, therefore, should be used to complement statistics on

both employment and unemployment. During her transition from a developing country to developed

one, Taiwan has enacted and implemented the measures of underemployment from 1977. But fundamental

changes in Taiwans labor market, employment law, and expansion of higher education have taken place over

the past decades. Therefore, overall refinements in the measures of underemployment are most urgently

needs because the current measurement methodology has not been modified since 1993 and it is hard to

reflect the reality of labor force today. The other aim of this study is to redesign the data collection,

processing procedures and methods of reporting on the issue of underemployment in accordance with the

principles set by the International Labour Organization (ILO) to enhance international comparability.

This study firstly decomposed the underemployment issue into three dimensions, inadequate working

hours, lower income, and occupational mismatch, by reviewing literatures on this topic and analyzing the

responses from questionnaire survey on senior academics and government officers who have deep insights

of underemployment problems by the Analytic Hierarchy Process (AHP) method. Results recommended

that the measurement of low hourly income underemployment should take precedence over others forms

of underemployment. This study also retrospected to historical labor force surveys data with proposed

new measures and found that more workers were affected by inadequate employment situation than by

time-related underemployment.

Key words and phrases. underemployment, low hourly income, low working hours, occupational mismatch,

AHP.

Confidence Band and Hypothesis Testing of Leaf Area Index Trend

Lijian Yang

Department of Statistics and Probability, Michigan State University, USA

[email protected]

Asymptotically exact/conservative confidence bands are obtained for nonparametric regression func-

tion, based on piecewise constant/linear polynomial spline estimation respectively. Compared to the

pointwise nonparametric confidence interval of Huang (2003), the confidence bands are inflated only by a

factor of square root of log(n). Simulation experiments have provided strong evidence that corroborates

with the asymptotic theory. The method has been applied to the testing of the rigonometric trend of Leaf

Area Index, prescribed by the pupular RAMS model, on data collected from several land types from East

Africa. The spline confidence band provides strong evidence against the RAMS trend model.

21

Page 24: To save our environment, you are adviced not to print them out

Invited Talks

Strong Tractability of Quasi-Monte Carlo Quadrature Using Nets forCertain Banach Spaces1

Rong-Xian Yue

E-Institute of Shanghai Universities and Shanghai Normal University, China

[email protected]

Fred J. Hickernell

Department of Mathematics, Hong Kong Baptist University, Hong Kong

[email protected]

We consider the problem of approximating the weighted multivariate integral

Iρ(f) =

Z

D

f(x)ρ(x)dx,

where D is a bounded or unbounded subset of the Euclidean space Rs, the dimension s can be large, and the

weight function ρ(x) is nonnegative. Two quasi-Monte Carlo rules are considered. One uses deterministic

Niederreiter (T, s)-sequences, and another uses randomly scrambled Niederreiter digital (T, m, s)-nets. For

deterministic Niederreiter sequence rules we assume that the integrands f lie in a weighted Banach space,

F(1)p,q,γ,s, of functions whose mixed anchored first derivatives, ∂|u|f(xu, cu)/∂xu, are bounded in Lp norms

with anchor c fixed in the domain D, and the weighted coefficients, γ = γkk, are introduced via `q

norms over the index u, where p, q ∈ [1,∞]. For the randomly scrambled Niederreiter net rules, the class

of integrands is a weighted Banach space, F(2)p,q,γ,s, of functions whose unanchored mixed first derivatives,

∂|u|f(x)/∂xu, are bounded in Lp norms and the weighted coefficients, γ = γkk, are introduced via `q

norms, where p, q ∈ [1,∞]. The worst-case error and randomized error are investigated for quasi-Monte

Carlo quadrature rules. For the worst-case setting the quadrature rule uses deterministic Niederreiter

sequences, and for the randomized setting the quadrature rule uses randomly scrambled Niederreiter digital

nets. Sufficient conditions are found under which multivariate integration is strongly tractable in the worst-

case and randomized settings, respectively. Results presented in this article extend and improve upon those

found previously.

Key words and phrases. Multivariate integration, quasi-Monte Carlo quadrature rules, tractability.

1This work was partially supported by Hong Kong Research Grants Council grant RGC/HKBU/2020/02P, and

by NSFC grant 10271078, E-Institutes of Shanghai Municipal Education Commission (Project Number E03004) and

the Special Funds for Major Specialties of Shanghai Education Committee.

22

Page 25: To save our environment, you are adviced not to print them out

Invited Talks

Statistical Methods to Integrate Different Data Sources in GenomicsStudies

Hongyu Zhao

Department of Epidemiology and Public Health, Yale University, USA

[email protected]

Recent advances in large-scale RNA expression measurements, DNA-protein interactions, protein-

protein interactions and the availability of genome sequences from many organisms have opened the op-

portunity for massively parallel biological data acquisition and integrated understanding of the genetic

networks underlying complex biological phenotypes. Many established statistical procedures have been

proposed to analyze a single data type, e.g. clustering algorithms for microarray data and motif find-

ing methods for sequence data. However, different data sources offer different perspectives on the same

underlying system, and they can be combined to increase our chance of uncovering underlying biological

mechanisms. In this talk, we will describe various statistical methods that have been developed (as well as

need to be developed) to integrate diverse genomics and proteomics information to dissect transcriptional

regulatory networks and infer protein-protein interaction networks. Some of these methods will be illus-

trated through their applications to understand transcription regulation and protein-protein interaction in

yeast.

A Nonparametric Multipoint Screening Method for QTL Mapping

Tian Zheng, Hui Wang and Shaw-Hwa Lo

Department of Statistics, Columbia University, USA

[email protected]

It is believed that most human disorders are polygenic, which means the variation in the quantitative

traits of such orders can not be attributed to a single gene. Rather, multiple genes, with complicated

interactions, may contribute to the spectrum of variation of such traits. To study the genetics of such

traits, one should inspect multiple loci simultaneously. In this talk, we present an efficient and robust

statistical screening algorithm for the mapping of quantitative trait loci (QTL). The algorithm is based on

a measure of association between the trait and the genotypes on multiple marker loci under investigation.

Through the use of multi-loci genotypes, one can take into consideration both the marginal and joint

association information with respect to the trait. The algorithm evaluates the genes in an iterative fashion

and screens out those marker loci that do not contain much information w.r.t. the trait. We will show the

advantages of this method through theoretical justification and simulation studies.

Key words and phrases. QTL mapping, nonparametric algorithm, rank-based, association mapping.

23

Page 26: To save our environment, you are adviced not to print them out

Invited Talks

Reform and Improvement of China’s National Account System

Xiangdong Zhu

National Bureau of Statistics of China, China

I. Brief Review of Economic Accounting System in China

• Early 1950s : Set up economic accounting system and a series of balance sheets under the Material

Production System (MPS), with total product of society and national income as core indicators.

• 1985 : Compiled Gross Domestic Product (GDP).

• 1992 : Formulated the National Economic Accounting System of China (Pilot Programme), cap-

turing the international standards as stipulated in the 1968 version of System of National Accounts

(SNA) of the United Nations and some MPS modules.

• 1993 : Removed national income aggregates from the accounting system.

• 1999 : Started the revision of the 1992 Pilot Programme, bringing about the prevailing National

Economic Accounting System of China 2002.

II. Current Status of Economic Accounting System in China

• National Economic Accounting System of China 2002 consists of basic accounting tables, economic

accounts and satellite tables.

• Schedules

– Quarterly : GDP by production approach at both current and constant prices

– Annual : GDP by production and expenditure approaches at both current and constant

prices, flow-of-fund tables, balance of payments tables, assets and liability tables and eco-

nomic accounts

– Every 5 years : Input-output tables

III. Reform and Improvement of China’s National Accounts System

• To put in place a sample survey system for the service sector and to improve the mechanism for

data reporting/sharing with other ministries.

• To revamp historical GDP time series based on economic census results.

• To improve quarterly statistical systems for better compilation of production and use accounts of

quarterly GDP.

• To compile producer price indices for selected service sectors and price indices of trade in services

to cater for the estimation of constant price GDP.

• To enhance data collection methods to facilitate further breakdowns of source data for national

accounts statistics.

• To standardize the GDP compilation methodology at regional level for better management of

key series and to establish a joint assessment mechanism for GDP figures to improve consistency

between national and regional data.

• To implement studies and pilot projects on resource and environment accounting in collaboration

with viable agencies.

• To study non-observed economic activities in China for future integration of such activities into

the GDP estimation.

24

Page 27: To save our environment, you are adviced not to print them out

Contributed Talks

Generalized Liu Type Estimators Under the Balanced Loss Function

Fikri Akdeniz

Department of Management, University of Cukurova, Turkey

[email protected]

Alan T. K. Wan

Department of Statistics, City University of Hong Kong, Hong Kong

Esra Akdeniz

Department of Statistics, Pennsylvania State University, USA

[email protected]

In regression analysis, ridge regression estimator and Liu type estimator (alternative biased

estimator), where 0 < d < 1 [Kejian, Liu, Commun. Statist. Theory Methods 22 (1993): 393-402]

are often used to overcome the problem of multicollinearity. These estimators have been evaluated

using the risk under quadratic loss criterion, which places sole emphasis on estimators’ precision.

The traditional mean square error (MSE) as the measure of efficiency of an estimator only takes

the error of estimation into account. In 1994, Zellner [see, Statistical Decision Theory and Related

Topics: 377-390, S.S. Gupta and J.O. Berger (Eds.)] proposed a balanced loss function. Here,

we consider the balanced loss function which incorporates a measure for the goodness of fit of the

model as well as estimation precision.

Key words and phrases. Balanced loss, collinearity, Liu type estimator, ridge regression, risk.

Estimation of a Multivariate Normal Mean Under Balanced LossFunction

Akbar Asgharzadeh

Department of Statistics, University of Mazandaran, Iran

[email protected] or [email protected]

This paper considers the Bayesian analysis of the multivariate normal distribution using a loss

function that reflects both goodness of fit and precision of estimation. The Bayes estimators of the

mean vector are obtained and the admissibility of cX + d for the mean vector is also studied.

Key words and phrases. Admissibility, balanced loss function, Bayes estimator, inadmissibility, multivari-

ate normal distribution.

25

Page 28: To save our environment, you are adviced not to print them out

Contributed Talks

Mixture of Generalized Pareto Distributions

Abdurrazagh M. Baeshu

Department of Statistics, Al-Fatah University, Libya

[email protected]

We introduce the mixture of two-component generalized Pareto distributions with special cases.

Estimation of the parameters of the mixture using the maximum likelihood method. The quasi-

Newton algorithm for finding minimum(maximum) of a function of several variables is employed

to maximize the log-likelihood function of the mixture. Applications such as modeling failure data

are applied in which two kinds of mixture distributions, which are special cases of two-component

mixture of GPD’s to be fitted to these data.

Key words and phrases. Mixture distributions, maximum likelihood, EM algorithm, quasi-Newton algo-

rithm, simulated annealing, goodnes of fit, P-P plots, Q-Q plots, confidence envelops.

Modelling of Mean-covariance Structures in Linear Mixed Model forCensored Data

Yanchun Bao and Jianxin Pan

Department of Mathematics, Manchester University, United Kingdom

[email protected], [email protected]

Linear mixed models(LMMs) have been widely used in multivariate survival studies, which

incorporate random effects into the model to account for possible correlation among data. By

integrating out random effects from the model Hughes et al (1999) and Klein et al (1999) proposed

to use the marginalized likelihood-based procedure to estimate the parameters. In order to avoid

the high-dimensional integrals, Ha et al (2002) suggested maximizing the hierarchical likelihood

(Lee and Nelder 1996) to obtain the parameter estimates. Their approach, however, assumes that

data are conditionally independent given random effects. This unfortunately may not be true in

practice.

In this paper we propose a new procedure to jointly model the mean and covariance structures

for multivariate survival data within the framework of LMMs. We remove Ha et al’s (2002)

conditional independence assumption and develop a data-driven approach to model the covariance

structures. The main idea is as follows. First, we propose a new model for the extended pseudo-

response variables in the sense of Buckley and James (1979). Accordingly, modelling of multivariate

survival data reduces to modelling of ordinary longitudinal data. Second, we proposed to use the

unconstrained parameterizations of Pourahmadi (1999) and Pan and MacKenzie (2003) to jointly

model the mean-covariance structures in the new model. The parameters are then estimated within

the framework of the h-likelihood (Lee and Nelder, 1996).

For illustration the proposed approach is used to analyze the famous Litter Rat data (Mantel

et al, 1977) and is also compared to Ha et al’s (2002) and Klein et al’s (1999) results. We find that

the proposed approach improves the estimate efficiencies significantly. Simulation studies confirm

this finding and further show the new procedure produces rather accurate parameter estimates for

both mean and covariance parameters even if the censoring rate is high.

26

Page 29: To save our environment, you are adviced not to print them out

Contributed Talks

2005 International Comparison Program

Marion Shui-yu Chan

Price Statistics Branch, Census and Statistics Department, Hong Kong

[email protected]

Inter-country comparison of Gross Domestic Product (GDP) based on exchange rate conversion

is subject to considerable limitations as it does not take into account differences in price levels across

countries. The International Comparison Program (ICP) aims at collecting and comparing price

data for a basket of comparable items across countries and producing a set of Purchasing Power

Parities (PPPs) which would enable meaningful volume comparisons of GDP and other expenditure

aggregates among different countries. A PPP is the rate of currency conversion at which a given

amount of currency will purchase the same volume of goods and services in two countries.

The 2005 ICP is a global statistical initiative led by the World Bank and covers around 150

countries/territories worldwide. It is organized on a regional basis and the regional results will

be linked through a ”ring comparison” to obtain the global results. The ring comparison involves

choosing a subset of countries from each region to price a common product list in addition to their

regional lists. Hong Kong is currently participating in both the comparison for the Asia Pacific

region and the ring comparison.

In this paper, the methodological framework of the 2005 ICP is presented. The proposed method

for calculating the PPPs, known as the CPRD (which stands for country, product, representativity

and dummy) method, is explained and illustrated with examples. The CPRD method adopts a

multilateral approach to estimate a set of transitive parities simultaneously for a group of countries

using data for all countries in the group. The pricing survey being carried out in Hong Kong is

also introduced.

Key words and phrases. Purchasing power parity (PPP), ring comparison, CPRD method, representativ-

ity, transitivity.

27

Page 30: To save our environment, you are adviced not to print them out

Contributed Talks

Statistical Assessment for Dynamic Multimedia Transport andTransformation Model

Chu-Chih Chen

Department of Mathematics, Tamkang University, Taiwan

[email protected]

Kuen-Yuh Wu

Division of Environmental Health and Occupational Medicine, National Health Research

Institutes, Taiwan

Multimedia model is a dynamic model that can be used to assess time-varying concentrations

of contaminants introduced initially to soil layers or for contaminants released continuously to

air or water. A typical multimedia model consists of the following ”compartments”: air, plants,

ground-surface soil, root soil, vadose soil, surface water, and sediments (McKone and Enoch,

2002). In this paper, we apply a state-space model approach to estimate the dynamic multimedia

model parameters such as transfer and contamination input rates among different compartments.

Missing values of certain compartments are estimated by their theoretical expectations given other

compartments measurements. Total amount of contaminant within a compartment is obtained

by integrating out the contaminant spatial distributions using kriging method. The unobserved

”true” contaminants are then simulated from their posterior distributions following the Kalman

filter procedure. The model parameters and the true contaminants are then simulated iteratively

using the Markov Chain Monte Carlo simulations.

Key words and phrases. Kalman filter, kriging, Markov chain Monte Carlo, missing values, state-space

model.

A Kind of Urn Models in Clinical Trial

Guijing Chen

Department of Mathematics, Anhui University, China

[email protected]

Chunhua Zhu

Department of Finance and Statistics, USTC, China

Yao Hung Wang

Department of Statistics, Tunghai University, Taiwan

In this paper we study urn model, using some available estimates of successes probabilities and

adding particle parameter, we establish adaptive models. We obtain some strong convergence the-

orems, rates of convergence, asymptotic normality of components in the urn, and estimates. With

these asymptotical results, we show that the adaptive designs given in this paper are asymptotically

optimal designs.

Key words and phrases. Urn model, sequential design, strong convergence, asymptotical normality, optimal

design.

Mathematics Subject Classification. Primary 60F15; Secondary 62L05, 60G42

28

Page 31: To save our environment, you are adviced not to print them out

Contributed Talks

Asymptotic Distributions of Error Density Estimators in First-orderAutoregressive Models

Fuxia Cheng

Department of Mathematics, Illinois State University, USA

[email protected]

This paper considers the asymptotic distributions of the error density estimators in first-order

autoregressive models. At a fixed point, the distribution of the error density estimator is shown

to be normal. Globally, the asymptotic distribution of the maximum of a suitably normalized

deviation of the density estimator from the expectation of the kernel error density (based on the

true error) is the same as in the case of the one sample set up, which is given in Bickel and

Rosenblatt (1973).

Key words and phrases. AR(1) process, residuals, kernel density estimation.

Power Analysis and the Cochran-Mantel-Haenszel Test

Philip E. Cheng, Michelle Liou and John D. Aston

Institute of Statistical Science, Academia Sinica, Taiwan

[email protected], [email protected]

Fisher’s exact test for testing independence in a 2x2 contingency table has been criticized as

conservative by the discrete nature of the null distribution. A calibration study however establishes

that the conditional distributions of the Pearson chi-square, the Fisher exact and the likelihood

ratio are closely comparable, and are also invariant under the alternative hypotheses based upon

mutual information identity. Power analysis for the conditional tests is thereby validated and the

delusive nature of conservative test is remedied. As an application of the information identity,

tests for homogeneity of odds ratios and for conditional independence across strata in a series of

2x2 tables are analyzed. The Cochran-Mantel-Haenszel test is examined for evidence of invalid

conclusion. Simple algorithm for computing the maximum likelihood estimate of the common odds

ratio is developed using the relative entropy.

Key words and phrases. Cochran-Mantel-Haenszel test, conditional maximum likelihood estimate, entropy,

Fisher’s exact test, Kullback-Leibler divergence, likelihood ratio test, mutual information, odds ratio, Pear-

son’s Chi-square test, three-way effect.

29

Page 32: To save our environment, you are adviced not to print them out

Contributed Talks

Deseasonalisation of Official Statistical Series

Thomas C K Cheung

Census and Statistics Department, Hong Kong

[email protected]

As the seasonal component is usually one of the predominant components in a time series, its

presence often complicates the interpretation of the time series since it is difficult to discern whether

changes in data for a given period really reflect the trend-cycle movement or are merely due to

seasonal variations. Seasonal adjustment is a statistical technique commonly used to estimate and

remove the seasonal variations from a time series so as to make the underlying trend-cycle of the

series being analysed more discernible.

In Hong Kong, the Census and Statistics Department has adopted the X-11-ARIMA method

for deseasonalising official statistical series. The X-11-ARIMA method, developed by Statistics

Canada, is an internationally well-recognised method which is widely used for compiling season-

ally adjusted series by official statistical authorities. The presentation will give a brief account

on the basic principle of the X-11-ARIMA method and its application to time series of official

statistics in Hong Kong. Some practical considerations about seasonal adjustment, which are cru-

cial to the proper use of the technique, will also be discussed by using seasonal adjustment of the

unemployment rate as an illustration.

Key words and phrases. Seasonal adjustment, X-11-ARIMA method.

30

Page 33: To save our environment, you are adviced not to print them out

Contributed Talks

Optimal Policy for Hiring of Expertise Service in Manpower Planning

Ramaiyan Elangovan

Department of Statistics, Annamalai University, India

[email protected]

Manpower planning is an interdisciplinary activity. It requires combined technical skill of statis-

ticians, economists and behavioral scientists together with the practical knowledge of managers and

planners. Manpower planning techniques have become an essential tool for the modern managers,

especially in a climate of economic recession and government cutbacks. In manpower planning the

completed length of service until leaving is of great interest, since it enables us to predict staff turn

over. For a detailed account of this subject in this direction, refers to McClean et al.(1991). At the

level of the firm, the availability of appropriate manpower is an important factor that contributes

to the task force available for the completion of planned jobs and projects including industrial

production. With the growing need for highly specialized and technical manpower to deal with

very complex real life situations, suitable policies regarding recruitment, training and other aspects

should be evolved taking into considerations all the existing ground realities. This reiterates the

need for utilizing the expertise of the specialists in manpower planning. Service may be on hourly

contract basis and also likely to be fairly costly. If the number of man hours on contract is in

excess of the requirements there is wastage on one hand and on the other if the requirement is

larger than the hired man hours, there is a shortage loss arising due to non completion of the work.

In this paper an optimal policy for hiring of expertise service in manpower planning is discussed,

and the optimal number of man hours to be hired is determined. Numerical examples are provided

for exponential, truncated normal and Pearsonian type XI distributions.

Key words and phrases. Manpower planning, completed length of service, wastage optimal policy.

31

Page 34: To save our environment, you are adviced not to print them out

Contributed Talks

Optimization of Risk and Dividend for a Firm with the Presence ofLiability and Transaction Costs

Rosana Fok

Department of Mathematical and Statistical Science, University of Alberta, USA

[email protected]

A model of a financial corporation is being considered. This corporation controls its business

activities, which includes the timing and amount of dividends paid out to the shareholders, in

order to reduce its risk. We consider cases when the risk process has different bounds. This model

also includes a constant liability factor, such as bond liability or loan amortization. The general

objective of this model is to find the optimal policy that maximizes the expected total discounted

dividends paid out until the time of bankruptcy. Due to the presence of a fixed transaction cost, a

mixed classical-impulse stochastic control problem is resulted. The value function with the optimal

policy is found to be a solution to this problem and can be used to justify this problem. In the

end, a numerical analysis will be used to support the problem considered in this model.

This talk is based on a joint work with Tahir Choulli ([email protected]) and Yau Shu

Wong ([email protected]) (my supervisors).

References

[1] Cadenillas, A., Choulli, T., Taksar, M., Zhang, L. (2005) Classical and impulse control for the

dividend optimization and risk for an insurance firm. To be appeared in Mathematical Finance.

[2] Choulli, T., Taksar, M., and Zhou, X.Y.: A diffusion model for optimal dividend distribution for a

company with constraints on risk control. SIAM Journal of Control and Optimization forthcoming

(2003).

Key words and phrases. Dividends, risk, liability.

Study on the Techniques of Financial Risk Measurement in theSituation of Small Sample

He Sihui

Department of Actuarial Science, Xi‘an University of Finance and Economics, China

[email protected]

Wang Mao

Rejoy Group Ltd. Cor., China

On the study of financial risk management, how to analysis the qualitative and quantitative attribution

in the population by means of small sample is very important in theory and practice. The small sample

fitting technique with the Weibull-distribution be discussed in the paper, and a risk measurement method

which can be applied in financial market risk management technique be setup also. At last, the real data

demonstrate its characters in practical application.

Key words and phrases. Small sample database, financial risk measurement techniques, Weibull-distribution,

Kaplan-Meier estimator, extreme theory.

32

Page 35: To save our environment, you are adviced not to print them out

Contributed Talks

Comparison between 6 Sigman and 3 Sigma

Xiaoqun He

Department of Statistics, Renmin University of China, China

[email protected]

In the process of 6σ’s popularization in enterprises, some students very often can’t understand why

6σ is better than 3σ, and even think the specification limit of 6σ’s is wider than 3σ’s so that the 6σ has

high pass rate. This paper considers that comparison table showed by reference [1] and [2] leads to that

misunderstandings. So, the paper gives another comparison and interpretation.

References

[1] Pyzdek, T., 2001. The Six Sigma Handbook: A Complete Guide for Greenbelts, Blackbelts and

Managers at All Levels. McGraw-Hill Companies, Inc.

[2] ,2003.

Key words and phrases. 6σ, 3σ, comparison.

Spatially Weighted Finite Population Sampling Design

Sami Helle

PvTT, Finland

[email protected]

Erkki Liski

Department of Mathematics, Statistics and Philosophy, University of Tampere, Finland

[email protected]

Often in the finite population sampling we have some prior information about the subject of study i.e.

some prior knowledge of the spatial height distribution of the trees in the forest. By using prior information

of the study variable it is possible to spatially adjust the net of sampling design points so, that more design

points are in an area which affects more to the variance of the estimator and less sampling points to an

area which has less effect to the variance of the estimator. Weighting the sampling design points net by

using representative points of the spatial prior distribution it is possible to increase effectiveness of the

estimator. In this paper we present a method how to use the representative points of the prior distribution

as the sampling design points in the spatial finite population sampling and effectiveness of the method

is compared with respect to the random plot sampling and the uniform sampling designs in a simulated

forestry sampling case.

Key words and phrases. Finite population sampling, spatial sampling, representative points.

Information Theoretic Models for Dependence Analysis and MissingData Estimation

D. S. Hooda

Jaypee Institute of Engineering and Technology, India

ds [email protected]

In the present paper we derive a new information theoretic model for testing and measurement of de-

pendence among attributes in a contingency table. A relationship between information theoretic measure

and chi-square statistic is established and discussed with numerical problems. A new generalized informa-

tion theoretic measure is defined and studied in details. Maximum entropy model for estimation of missing

data in design of experiment is also explained.

33

Page 36: To save our environment, you are adviced not to print them out

Contributed Talks

Factors Associated with Haemoglobin Level in 43-month-old Children

S. M. Hosseini, J. H. McColl

Department of Statistics, University of Glasgow, United Kingdom

[email protected]

A. Sherriff

Department of Child Health, University of Bristol, United Kingdom

Aims: To investigate the relation of haemoglobin levels with some of the nutrient covariates and sex,

age education level, ethnicity, birth weight, current weight, infection, whether the child was a twin or a

singleton at 43 months.

Methods: Normal values for haemoglobin were obtained from a representative cohort of children

in focus sample at 43 months old who were randomly selected from children taking part in the Avon

Longitudinal Study of Parents and Children (ALSPAC).

Results: Haemoglobin data were symmetrically, but not normally, distributed. The Non Haem iron,

vitamin C, iron, calcium and NSP intake were skewed and were transformed to approximate normality

using the natural logarithm. Haemoglobin concentration at 43 months was negatively associated with birth

weight, positively associated with energy adjusted vitamin C intake and was higher in children who were

singleton pregnancy.

Conclusion: The prevalence of anaemia varies strongly with singleton/multiple pregnancy, and sug-

gests that the effects of singleton pregnancy may be most closely associated with higher haemoglobin level

in children at 43. Higher of vitamin C intake are associated with higher haemoglobin level in children of

this age and the inclusion of vitamin C in the diet of children is advisable.

Infants born with low weight were at increased risk of developing iron deficiency; advice to mothers

should focus on the importance of introducing nutrient dense complementary foods during pregnancy.

References

[1] Sherriff A., Emond A., Hawkins N. and Golding J., Haemoglobin and ferritin concentration in

children age 12 and 18 months, Archives of Disease in Childhood, 80 (1999), 0-5.

[2] Cowin I., Emond A., Emmett P. and ALSPAC study team, Association between composition of

the diet and haemoglobin and ferritin level in 18 month old children, European Journal of Clinical

Nutrition, 55 (2001), 278-286.

On Markov Processes in Space-Time Random Environments

Dihe Hu

School of Mathematics and Statistics, Wuhan University, China

[email protected]

The Markov chains in time random environments have been pursued for some time.Nawrotzki(1981)

established a general theory of this topic. Cogburn(1980-1991) developed this theory in a wide context

by making use of more powerful tools such as the theory of Hopf Markov chain. Orey(1991) reviewed the

works on this field. Hu(2004) introduced the concept of Markov processes(continuous time) in time random

environments and obtained the construction theorem and several equivalence theorems. Hu(2004), in

another paper, proved the existence and uniqueness of q-process in time random environment. Berard(2004)

introduced the concept of random walks in a space-time random environment and proved the CLT is

a.s. true. In this paper we would like to introduce the general Markov chains in space-time random

environments and prove the existence theorem, equivalence theorem and give some properties of the skew

preduct Markov chain generated by the Markov chain in space-time random environment.

34

Page 37: To save our environment, you are adviced not to print them out

Contributed Talks

The Sinh-Arcsinh Transformation

M. C. Jones

Department of Statistics, Open University, United Kingdom

[email protected]

Well, perhaps not the sinh(arcsinh) – or identity – function per se (!), but simple one- and two-parameter

variations thereon which, when their inverses are applied to normal random variables, yield a family of

distributions on the real line that:

(a) include symmetric distributions with both heavier and lighter tails than the normal;

(b) essentially incorporate the best of both Johnson’s S U distributions and the sinh-normal distribu-

tion but not the worst of the latter i.e. they are all unimodal;

(c) are reasonably tractable;

(d) control tailweights separately, so that distributions with one heavier and one lighter tail than

normal are available;

(e) afford skewness;

(f) always have their median at zero;

(g) are, as four-parameter families with the addition of location and scale parameters in the usual

way, readily fit to data; and

(h) are readily extended to the multivariate case.

I might also say a few general words about the interplay between scale and tailweight in the symmetric

case.

The Identification of Outliers in Time Series Using AutocorrelationFunction and Partial Autocorrelation Function

Rida M. Khaga and Maryouma Elakder Enaami

Department of Statistics, Alfateh-University, Libya

[email protected]

Outliers have recently been studied more and more in the statistical time series literature. This work

considers the problem of detecting outliers in time series data and proposes a general detection methods

based on autocorrelation and partial autocorrelation function. We have presented a simple diagnostic test

for detecting an Additive Single outlier (AO) and an Additive Step outlier (AS) using the influence function

is based on the influences of autocorrelation and partial autocorrelation functions. In this paper, the work of

Cherinck et al.(1982) is extended to form quantitative outlier detection statistics. The statistics are formed

from the absolute elements of the matrix which gives the influence function matrix, where each element

of the matrix which gives the influence on estimated autocorrelation function rk or partial autocorrelation

function Φkk of a pair of observations at time lag k. Their behaviors are different for single outlier and

step outlier. Hence, they may be used not only to detect an outlier, but also to distinguish a single outlier

from a step outlier. The proposed methods are compared with other existing outlier detection procedures.

Comparisons based on various models, sample size, parameter values, real and simulated data are used

to illustrate the effectiveness of the proposed methods, and identify outlier types and distinguish between

them. In particular, additive single outlier (AO) and additive step outlier (AS). The methods are evaluated

using both simulated time series from autoregressive and moving average added, and a set of observed time

series.

Key words and phrases. Outliers, influence function, acf, pacf.

35

Page 38: To save our environment, you are adviced not to print them out

Contributed Talks

Profit Analysis of a Two-unit Hot Standby Programmable LogicController (PLC)

S. M. Rizwan

Department of Mathematics and Science, Sultanate of Oman, India

Vipin Khurana

Department of Mathematics, JSS Academy of Technical Education, India

Gulshan Taneja

Department of Statistics, M D University, India

A two-unit hot standby PLCs system is described where different types of failures are repairs are noted.

It is observed that the failure of a unit is due to various reasons. The concept of inspection for detecting

the failure of a unit is also introduced. The repairman comes immediately on failure detection. System

is analyzed and expressions for various reliability measures such as mean time to system failure (MTSF),

steady-state availability, expected number of repairs, expected number of replacements (Type I and Type

II), expected number of times reinstallations and inspections carried out, busy period of the repairman

(Repair time, Type I replacement time, Type II replacement time, reinstallation time) are obtained by

using semi-Markov process and regenerative processes. Profit incurred to the system is evaluated.

Official Tourism Statistics of Macao

Pek Fong Kong

Statistics and Census Service, Macao SAR

[email protected]

Tourism sector has always been an important economic growth engine of Macao over the years; in

light of its rapid development, particularly after the Handover, tourism statistics become one of the more

popular subjects sought by data users.

In Macao, official tourism statistics comprise a number of indicators, viz. visitor arrivals, package tour &

hotel occupancy rate, visitor expenditure and tourist price index. Compilation of Macaos official tourism

statistics demonstrates close collaborations among different government departments and the business

sector; for instance, Immigration Department, Macao Tourism Office, Macao International Airport, hotels

and travel agencies, etc.

This presentation will briefly outline some of the special features of the indicators encompassed in

Macaos tourism statistics and, more importantly, the challenges that laid ahead.

36

Page 39: To save our environment, you are adviced not to print them out

Contributed Talks

Optimization of Surface Finish Parameters on Ground Face of BevelGear Using Parameter Design: Doe Way of Improvement in the Process

C. S. Pathak, A. K. Bewoor

Department of Mechanical Engineering, Sinhgad College of Engineering, India

V. A. Kulkarni

Department of Production Engineering, D. Y. Patil College of Engineering, India

[email protected]

S. G. Tillu

D. Y. Patil College of Engineering, India

The Problem: The differential gearbox consisting of four bevel gears mounted at 90 degrees to each

other inside the casing was facing the series problem of frequent failure of bevel gears. The Kaizen team

identified that the failure of the back face of the bevel gear is due to poor surface finish, which results in

faster crack initiation and propagation. Due to this, pre-tension in the gears were reduced leading to failure

of the gearbox. The gear face was ground using CNC machines with a recommended Ra value of 0.8µ. In

real practice, Kaizen team found out that the parameters were not in control and Ra value obtained were

consistently around 1 − 1.2µ, which is unacceptable.

The Method: An attempt has been made in this paper to recondition the grinding process by using

Taguchi’s parameter design methodology. The parameters affecting surface finish on ground face of bevel

gear were identified and set of experiments known as orthogonal array was formed to conduct the statistical

experiments.

The Result: Based on experiments, ANOM is formed to determine the relative magnitude of the

effect of each factor and to estimate the error variance. ANOVA showed the adequality of the model and

23.5% improvement in the process.

The complete methodology is presented in the paper.

How Neyman-Pearson would Advise a Clearinghouse on its MarginSetting Methodology

K. Lam and C. Y. Sin

Hong Kong Baptist University, Hong Kong

[email protected]

The margin system is the first line of defense against the default risk of a clearinghouse. From the

perspectives of a clearinghouse, the utmost concern is to have a prudential system to control the default

exposure. Once the level of prudentiality is set, the next concern will be the opportunity cost of the

investors. It is because high opportunity cost discourages people from hedging through futures trading

and thus defeats the function of a futures market. In this paper, we first develop different measures of

prudentiality and opportunity cost. We then borrow from Neyman-Pearsons idea to formulate a statistical

framework to evaluate different margin setting methodologies. Five margin-setting methodologies, namely,

(1) using simple moving averages of historical volatility; (2) using exponentially weighted moving averages

of historical volatility; (3) using a GARCH approach on historical volatility; (4) using implied volatility and

(5) using realized volatility are applied to the Hang Seng Index futures. Keeping the same prudentiality

level, it is shown that the approach using implied volatility approach by and large gives the lowest average

overcharge.

37

Page 40: To save our environment, you are adviced not to print them out

Contributed Talks

Geometric Process

Yeh Lam

Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong

[email protected]

Lam [1998a, b] introduced the following geometric process.

Definition. A stochastic process Xn, n = 1, 2, . . . is a geometric process (GP), if there exists some

a > 0 such that an−1Xn, n = 1, 2, . . . forms a renewal process. The number a is called the ratio of the

geometric process.

Clearly, a GP is stochastically increasing if the ratio 0 < a ≤ 1; it is stochastically decreasing if the

ratio a ≥ 1. A GP will become a renewal process if the ratio a = 1. Thus, the GP is a simple monotone

process and is a generalization of renewal process.

Let E(X1) = λ and V ar(X1) = σ2, then

E(Xn) =λ

an−1,

V ar(Xn) =σ2

a2(n−1).

Therefore, a, λ and σ2 are three important parameters in the GP.

In this talk, we shall introduce the fundamental probability theory of GP, then study the statistical

inference of the GP. Furthermore, we shall consider the application of GP to reliability, especially to the

maintenance problem. We shall also investigate the application of GP to the analysis of data. From

real data analysis, it has been shown that on average, the GP model is the best one among four models

including a Poisson process model and two nonhomogeneous Poisson process models. See Lam (2005) for

a brief review and more references.

References

[1] Lam, Y. (1988a). A note on the optimal replacement problem. Adv. Appl. Prob., 20, 479-782.

[2] Lam, Y. (1988b). Geometric processes and replacement problem. Acta Math. Appl. Sinica, 4,

366-377.

[3] Lam, Y. (2005) Geometric process. In The Encyclopedia of Statistical Sciences, 2nd edition, N.

Balakrishnan, C. Read, S. Kotz and B. Vidakovic ed., John Wiley & Sons, Inc., New York. To

appear.

Key words and phrases. Geometric process, renewal process, reliability, data analysis.

Modeling for Analysis and Evaluating the Stability of Supply ChainSystems

Jiukun Li and Philip L.Y. Chan

Department of Industrial and Manufacturing Systems Engineering, The University of

Hong Kong, Hong Kong

[email protected], [email protected]

In this paper, we examine this stability of supply chain systems under different demand situations,

such as stochastic or deterministic, dynamic or stability, by developing a theoretical model with combining

the statistics methods and the traffic network theory. A numerical example is used to demonstrate the

applicability of the model.

Key words and phrases. Stability analysis, supply chain, statistics methods, traffic network theory.

38

Page 41: To save our environment, you are adviced not to print them out

Contributed Talks

Variable Selection via MM Algorithms

Runze Li

Department of Statistics, Penn State University, USA

[email protected]

In this talk, I give a brief review on variable selection via nonconcave penalized likelihood. An algorithm

is proosed for maximizing the penalized likelihood via MM algorithms, extensions of the well known EM

algorithm. I will demonstrate the convergence of the proposed algorithm using techniques related to

the EM algroithm (Wu, 1983). The proposed algorithm allows us using numerical algorithm (such as

Newton-Raphson algroithm) to deal with combinatorial optimization in varible selection which is a NP

hard problem. Numerical comparisons will also presented.

The Use of Lee-Carter Model to Project Mortality Trends of HongKong

Billy Y G Li

Census and Statistics Department, Hong Kong

[email protected]

The Lee-Carter model was developed in the early 1990s to model and project mortality trend of

a country/territory. The Census and Statistics Department of the Hong Kong Special Administrative

Region Government has been using this model since 2001. The Lee-Carter model decomposes age-sex

specific mortality rates into three components: a general shape of the age-sex mortality profile, the rate of

increase or decrease from the general age-sex profile and an index on the mortality level. Modelling and

projecting the index on the mortality level provide the projected age-sex specific mortality rates.

The presentation will provide a background on mortality projections together with the technical details

of the Lee-Carter model. Estimation procedures and limitations of the Lee-Carter model will also be

discussed.

Key words and phrases. Lee-Carter, mortality, projection.

A New Parametric Test for AR and Bilinear Time Series withGraphical Models

Wai-Cheung Ip, Heung Wong

Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong

Yuan Li

School of Mathematics and Information Science, Guangzhou University, China

[email protected]

The classic AR and Bilinear time series models are expressed as directed graphical models. Based on

the directed graphical models, it is shown that a coefficient of AR and Bilinear models is the conditional

correlation coefficient conditioned on the other components of the time series. Then a new procedure

is proposed to test the coefficients of AR and Bilinear time series models. Simulations shows that our

procedure is better than the classic tests both in sizes and powers.

Key words and phrases. AR model, bilinear model, graphical models.

39

Page 42: To save our environment, you are adviced not to print them out

Contributed Talks

T3-Plot for Testing Spherical Symmetry for High-Dimensional Datawith a Small Sample Size

Jiajuan Liang

Department of Quantitative Analysis, University of New Haven, USA

[email protected]

High-dimensional data with a small sample size, such as microarray data and image data, are commonly

encountered in some practical problems for which many variables have to be measured but it is too costly

or time-consuming to repeat the measurements for many times. Analysis of this kind of data poses a great

challenge for statisticians. In this paper, we develop a new graphical method for testing spherical symmetry

that is especially suitable for high-dimensional data with a small sample size. The new graphical method

associated with the local acceptance regions can provide a quick visual perception on the assumption of

spherical symmetry. The performance of the new graphical method is demonstrated by a Monte Carlo

study and illustrated by a real data set.

Model Based Approaches for Simultaneous Dimension Reduction andClustering

Xiaodong Lin

University of Cincinnati and National Institute of Statistical Science, USA

[email protected]

Recently, very high dimensional data have been generated from a variety of sources. In many of

these high dimensional problems, it is often the case that the data-generating process is nonhomogeneous.

Traditional data analysis usually deals with such heterogeneity by performing dimension reduction and

clustering separately. In this talk, we present a constrained mixture of factor analyzers model to address

the problem of simultaneous clustering and dimension reduction. We propose a constraint on the factor

loading matrix so that the total variation explained by the noise component is restricted by a threshold.

Two approaches, namely, the mixture likelihood approach and the classification likelihood approach are

used to fit the model. A constrained EM algorithm is developed for parameter estimation. In our model,

the number of factors is allowed to differ across different mixture components. This flexibility generates

model selection challenges. To overcome the difficulty, we propose a two-step model selection procedure

during which parameter estimations and model specifications are altered dynamically. We show that this

procedure converges under different model selection criteria. Finally, we demonstrate the performance of

our methods on an image dataset and a metabolomic dataset.

Key words and phrases. Dimension reduction, clustering, EM algorithm, mixture models, model selection.

40

Page 43: To save our environment, you are adviced not to print them out

Contributed Talks

Strong Near-epoch Dependent Random Variables

Zhengyan Lin

Department of Mathematics, Zhejiang University, China

[email protected]

We introduce a new class of dependent sequences of random variables, which is a subclass of near-epoch

dependent (NED) sequences, but can also be approximated by mixing sequences. We call them strong

near epoch dependent sequence. Many important econometric models, such as linear processes, a sort of

popular nonlinear diference equations, ARMA model, GARCH model etc., are strong NED under usual

conditions. Under dependence conditions substantially weaker than for NED sequences, we show a p-order,

p > 2, (maximum) moment inequality for strong NED sequences. Then, using this inequality, we derive a

central limit theorem and a functional central limit theorem and based on these results, we can also obtain

limit distributions of many importan processes with strong NED innovations, such as linear processes with

strong NED innovations. Moreover, we show a result on the variances of partial sums of a strong NED

sequence, but it is usually considered as a prior-assumption in discussing the large sample behavior for an

NED sequence.

Normalized Maximum Likelihood and MDL in Model Selection

Erkki P. Liski

Department of Mathematics, Statistics and Philosophy University of Tampere, Finland

[email protected]

By viewing models as a means of providing statistical descriptions of observed data, the comparison

between competing models is based on the stochastic complexity (SC) of each description. The Normal-

ized Maximum Likelihood (NML) form of the the stochastic complexity (SC) (Rissanen 1996) contains a

component that may be interpreted as the parametric complexity of the model class. The SC for the

data, relative to a class of suggested models, serves as a criterion for selecting the optimal model with the

smallest SC.

We calculate the SC for the Gaussian linear regression by using the NML density and consider it as a

criterion for model selection. The final form of the selection criterion depends on the method for bounding

the parametric complexity. As opposed to traditional fixed penalty criteria, this technique yields adaptive

criteria that have demonstrated success in certain applications.

Reference

[1] Rissanen, J. (1996). Fisher information and stochastic complexity. IEEE Transactions on Infor-

mation Theory 42(1), 40–47.

Key words and phrases. Minimum description length, Stochastic complexity, normalized maximum likeli-

hood, parametric complexity, adaptive selection criteria.

41

Page 44: To save our environment, you are adviced not to print them out

Contributed Talks

Connections Among Different Criteria for Asymmetrical FractionalFactorial Designs

Min-Qian Liu

Department of Statistics, Nankai University, China

[email protected]

Kai-Tai Fang

Department of Mathematics, Hong Kong Baptist University, Hong Kong

[email protected]

Fred J. Hickernell

Department of Applied Mathematics, Illinois Institute of Technology, USA

[email protected]

In most recent years, there has been increasing interest in the study of asymmetrical fractional fac-

torial designs. Various new optimality criteria have been proposed from different principles for design

construction and comparison, such as the generalized minimum aberration, minimum moment aberration,

minimum projection uniformity and the χ2(D) (for design D) criterion. In this paper, those criteria are

reviewed, the χ2(D) criterion is generalized to the so-called minimum χ2 criterion. Connections among

these different criteria are investigated, which show that there exist some equivalencies among them. These

connections provide strong statistical justification for each of them from other viewpoints. Some general

optimality results are developed, which not only unify several results (including results for symmetrical

case), but also are useful for constructing asymmetrical supersaturated designs.

Key words and phrases. Discrepancy, fractional factorial design, generalized minimum aberration, mini-

mum moment aberration, orthogonal array, supersaturated design, uniformity.

Two-stage Response Surface Designs

Xuan Lu and Xi Wang

Department of Mathematical Science, Tsinghua University, China

[email protected]

The construction of two-stage response surface designs with high estimation efficiency is motivated

from a real case and studied. In the first stage, two-level points and central point replicates are used for

screening significant variables and testing the possible curvature of the response surface, and then, in the

second stage, additional three-level points are used together with the first stage points to identify a second

order model. Excepting the well-known D criterion, a new criterion, C, is proposed to find points in the

second stage, given the points of the first stage. The log C is a weighted sum of log efficiency measures

for four subsets of parameters. By selecting suitable weights, one can construct two-stage response surface

designs with high estimation efficiency. A construction algorithm is introduced. The superiorities of new

designs are demonstrated by comparing them with existing response surface designs. An answer is given

to the motivating case.

42

Page 45: To save our environment, you are adviced not to print them out

Contributed Talks

Marginal Permutations in Linear Models

Tatjana Nahtman

Institute of Mathematical Statistics, University of Tartu, Estonia

[email protected]

Dietrich von Rosen

Department of Biometry and Egineering, Swedish University of Agricultural Sciences,

Sweden

[email protected]

The purpose with the project is to study covariance structures in balanced linear models which are

invariant with respect to marginal permutations (including shift-permutations). We shall focus on model

formulation and interpretation of variance components rather than the prediction of them. Marginal

permutation invariance implies a Kronecker product structure with specific patterns on the covariance

matrices. In particular, under shift-invariant permutations Kronecker products of Toplitz matrices appear.

Useful results are obtained for the spectrum of these covariance matrices.

Via the spectrum of the covariance matrices the reparameterization of factor levels, i.e. imposing

certain constraints on parameters, are studied. In particular we focus on the most commonly used con-

straints, the so called ”sum-to-zero” constraints. The constraints imposed on the spectrum lead to singular

covariance matrices and one of the main results of the project is that only some constraints provide useful

reparametrizations.

Key words and phrases. Covariance structures, eigenspace, invariance, marginal permutations, reparame-

terization, spectrum, toeplitz matrix.

Bayesian Semiparametric Structural Equation Models with LatentVariables

D. Dunson and J. Palomo

National Institute of Environmental Health Sciences, Duke University, USA

[email protected], [email protected]

Structural equation models (SEMs) provide a general framework for modeling of multivariate data,

particularly in settings in which measured variables are designed to measure one or more latent variables.

In implementing SEM analysis, it is typically assumed that the model structure is known and that the latent

variables have normal distributions. To relax these assumptions, this article proposes a semiparametric

Bayesian approach. Categorical latent variables with an unknown number of classes are accommodated

using Dirichlet process (DP) priors, while DP mixtures of normals allow continuous latent variables to have

unknown distributions. Robustness to the assumed SEM structure is accommodated by choosing mixture

priors, which allow uncertainty in the occurrence of certain links within the path diagram. A Gibbs

sampling algorithm is developed for posterior computation. The methods are illustrated using biomedical

and social science examples.

Key words and phrases. Covariance structural model, Dirichlet process, graphical model, latent class,

latent trait, MCMC algorithm, measurement error, mixture model, variable selection.

43

Page 46: To save our environment, you are adviced not to print them out

Contributed Talks

Stochastic Optimization Method to Estimate the Parameters of theTwo-parameter Pareto Distribution - A Short Communication

Wan-Kai Pang

Department of Applied Mathematics, Hong Kong Polytechnic University, Hong Kong

[email protected]

Pareto distribution plays an important role in modelling income distribution in economic models.

Parameter estimation of the two-parameter Pareto distribution has been studies by others in the past. A

number of optimization methods have been proposed. In this paper, we use the Markov Chain Monte Carlo

(MCMC) techniques to estimate the Pareto parameter. The method is quite successful and the method

performed well in estimating the threshold parameter of the Pareto distribution.

Key words and phrases. Pareto probability distribution, Markov Chain Monte Carlo, maximum likelihood

estimation, Bayesian estimation

Generalized Gamma Frailty Model

Yingwei Peng

Department of Mathematics and Statistics, Memorial University of Newfoundland, Canada

[email protected]

N. Balakrishnan

McMaster University, Canada

In this article, we present a frailty model using the generalized gamma distribution as the frailty

distribution. It is a generalization of the popular gamma frailty model. It also includes other frailty

models such as the lognormal and Weibull frailty models as special cases. The flexibility of this frailty

distribution makes it possible to detect a complex frailty distribution structure which may otherwise be

missed. Due to the intractable integrals in the likelihood function and its derivatives, we proposed to

approximate the integrals either by the Monte Carlo simulation or by a quadrature method and then

find the maximum likelihood estimates of the parameters in the model. We explore the properties of the

proposed frailty model and the computation method through a simulation study. The study shows that the

proposed model can potentially reduce errors in the estimation, and that it provides a viable alternative

for correlated data. We also illustrate the model with a real-life data set.

Key words and phrases. Censoring, clustered data, Generalized gamma distribution, lognormal distribu-

tion, Monte Carlo approximation, piecewise constant hazards.

44

Page 47: To save our environment, you are adviced not to print them out

Contributed Talks

Probabilistic Model of Learning

Jonny B. Pornel

Iloilo National High School, USA

[email protected]

Leonardo Sotaridona

CTB McGraw-Hill, USA

Leonardo [email protected]

This paper proposed an algorithm to simulate learning. It postulates that Learning is divided into

three importance phases of knowledge acquisition, recall, and connection. By using probability to quantify

the three processes of learning, the Model draws incisive explanations of many phenomena in learning and

cognitive science. In the knowledge acquisition algorithm, each incoming information(impulse) augments

the probability of the stored up knowledge on which such information was attached to. Each knowledge

is then sorted by decreasing probability. These probabilities then will be the factor that affects the recall

and connection functions. The Model unifies the three currently dominant branches of Learning Theories:

Behavioral, Cognitive and Constructivism. It is more flexible than the Self Organizing Map (Kohonen,

2001) and other Neural Network structure. The algorithm can have important application in the field of

artificial intelligence and information technology.

Key words and phrases. Probability, theory of learning, self organizing map.

Connection Between Uniformity and Orthogonality

Hong Qin

Faculty of Mathematics and Statistics, Central China Normal University, China

[email protected]

Considerable effort has been done for studying the usefulness of uniformity in fractional factorial designs.

Uniformity is a geometric concept, which is related to computer experiment design and quasi-Monte Carlo

methods. Orthogonality is an important criterion to compare factorial designs. It looks unrelated to

uniformity of factorial designs. In this talk, we will give a justifiable interpretation for orthogonality by

the consideration of uniformity. Two orthogonality measure criteria, proposed by Fang, Ma and Mukerjee

(2002) and Fang, Lu and Winker (2003) respectively, are employed, which can be viewed as extensions of the

concept of strength in orthogonal arrays and have been utilized to evaluate the orthogonality of factorials.

We will give an analytic link between uniformity measured by the discrete discrepancy (Hickernell and

Liu, 2002; Fang, Lin and Liu, 2003; Qin and Fang, 2004) and two orthogonality criteria mentioned above,

and show that comparing the orthogonality of two symmetrical factorials is equivalent to comparing their

uniformity.

Key words and phrases. Discrete discrepancy, factorial design, orthogonality, uniformity, uniform design.

45

Page 48: To save our environment, you are adviced not to print them out

Contributed Talks

Probabilistic Analysis of a Two Unit Cold Stand by System withRandom Checks and Various Repair Policies

Syed Mohd Rizwan

Department of Mathematics and Science, Sultanate of Oman, Oman

[email protected]

A two unit cold standby system with two repairmen is analyzed. The failed unit goes under the

repairman who is rarely available. After the repair, his job is to check the system randomly about the

working of the system, if not found satisfactory an efficient repairman, who is 100% efficient, is called.

Efficient repairman once completes the repair; the unit becomes as good as new. Various reliability

measures have been calculated, using semi-markov processes and regenerative point technique.

Key words and phrases. Cold standby system, semi-markov process, regenerative point technique.

Single Server Queue with Batch Arrival and α - Poisson Distribution

V. R. Saji Kumar

Kerala University Library, University of Kerala, India

saji [email protected]

We consider a generalization of the Mx|G| 1 queue with α - Poisson arrivals (or Mittag-leffler inter-

arrival time distribution). When α = 1, it is the usual classical Mx|G| 1 model with batch arrivals. We

analyze single server retrial queue and queue with server and vacations.

Key words and phrases. Batch arrival, infinite mean waiting time, Mittag Leffler function, retrial queue,

vacations, waiting server.

Robust R-estimation of a Consensus Value in Multi-Center Studies

Inkyung Jung

Harvard Medical Center, USA

Pranab K. Sen

University of North Carlina, Chapel Hill, USA

[email protected]

There are various situations where a multi-center experiment is conducted under possibly hetergeneous

set-ups (for determining a consensus value). This results in unbalanced heteroscedastic one-way random

effects models. Standard parametric approaches based on the stringent assumption of normality of both

the error and the random effect components may not perform well when either of these conditions is

vitiated. Moreover, for heteroscdastic erros (across the centers or blocks), parametric estimators may lack

robustness properties to a greater extent. Two robust R-estimators for the common location parameter

(the consensus value) based on Wilcoxon signed-rank statistics are proposed and their properties studied.

When the extent of heteroscedasticity is large, or the distributions of the random effect and/or errors

deviate from normality, the proposed estimators perform better than the classical weighted least squares

and some other parametric estimators. Along with the supporting methodology, this robust-efficiency

perspective is illustrated with an ’Arsenic in oyster tissue’ problem. Some other simulastion studies are

also made in this connection.

46

Page 49: To save our environment, you are adviced not to print them out

Contributed Talks

Robust Estimation for PAR Time Series

Qin Shao

Department of Mathematics, University of Toledo, USA

[email protected]

Considerable attention has been given to the periodically stationary time series due to their wide ap-

plications in climatology, hydrology, electrical engineering, economics, etc. Periodic autoregressive models

with order p (PAR(p)) are commonly applied when modelling periodically stationary time series. The

discussion on the parameter estimation was focused on the moment estimation, the least squares estima-

tion, and the maximum likelihood estimation. As it is known, the weakness of these methods is that their

corresponding estimates are sensitive to outliers and small changes in distributions. Robust estimation

procedures that resist the adverse effects of abnormal random disturbances and should be applied in such

scenarios have not been considered for PAR time series. The primary interest of this talk is to propose

a robust estimation method for PAR(p) model paramenters. The proposed method not only robustifies

the residuals and their weights in estimating equations with odd bounded and differentiable functions, but

utilizess the systematic structure of time series as well.

Key words and phrases. Periodically stationary time series, periodic autoregressive models, robust estima-

tors, estimation equations, asymptotic relative efficiency

Empirical Bayes Estimation for the Reliability Indexes of a ColdStandby Series System Based on Type-II Censored Samples

Yimin Shi

Department of Applied Mathematics, Northwestern Polytechnical University, China

[email protected]

Based on Type-II Censored Samples,we investigate the empirical Bayes estimation and approximate

confidence limits of the reliability indexes (such as failure rate, reliability function and average life) for a

cold standby series system.

Suppose that a standby system consists of k series working units and n′s independent cold standby

units ( the unit is said to be in the case of cold standby if it does not fail in the period of standby ). The

switching is perfect, i.e., it never fails and it is instantaneous. When a working unit fails, a cold standby

unit replaces it immediately and the system can work as before. If the standby units are used up and one

of k series units becomes unusable, then the system is said to be invalidation. Let the lifetime of all units

in the system be independent and have identical exponential distribution exp(λ), where λ is the failure

rate of unit.

In this paper, Based on Type-II Censored Samples, the empirical Bayes estimation(EBE) and approx-

imate upper confidence limit of failure rate are obtained firstly, and next Bayesian approximate lower

confidence limits for reliability function and average life are presented. The expressions for calculating

Bayesian lower confidence limits of the reliability function and average life are also obtained. Furthermore,

the maximum likelihood estimation(MLE)of the failure rate is also obtained, and an illustrative example is

examined numerically by means of the Monte-Carlo simulation. Finally, the accuracy of confidence limits

is discussed. The numerical results show that the accuracy of EBE is better than that MLE’s and our

confidence limits are also effecient.

Mathematics Subject Classification. 62F25, 62N05.

47

Page 50: To save our environment, you are adviced not to print them out

Contributed Talks

A Lagrange Multiplier Test for GARCH in Partially Nonstationary andStationary Multivariate Autoregressive Models: with Applications to

Economic Data

Chor-Yiu Sin

Department of Economics, Hong Kong Baptist University, Hong Kong

[email protected]

Economic time-series data are often modelled with partially nonstationary multivariate autoregression

and GARCH (Generalized Auto-Regressive Conditional Heteroskedasticity). Noticeable examples include

those studies of price discovery, in which stock prices of the same underlying asset are cointegrated and they

exhibit multivariate GARCH. It was not until recently that Li, Ling and Wong (2001) formally derive the

asymptotic distribution of the estimators for the parameters in this model. The efficiency gain in some of the

parameters are huge even when the deflated error is symmetrically distributed (the symmetry assumption).

Taking into consideration the different rates of convergence, we derive the asymptotic distribution of the

usual LM (Lagrange multiplier) test in partially nonstationary multivariate autoregression. Under the

symmetry assumption, the distribution is the same as that in stationary multivariate autoregression. The

null can be no multivariate GARCH, or it can be a specific multivariate GARCH. We then apply our test

to the monthly or quarterly Nelson-Plosser data, embedded with some prototype multivariate models. We

also apply the tests to the intra-daily stock price indices and their derivatives. Comparisions are made

with the portmanteau test suggested in Ling and Li (1997) and the residual-based test suggested in Tse

(2002), both of which do not specify a definite alternative hypothesis.

Key words and phrases. Cointegration, efficiency gain, LM test, multivariate GARCH, Portmanteau test,

residual-based test.

On Some Aspects of Data Integration Techniques with EnvironmentalApplications

Bimal Sinha

Department of Mathematics University of Maryland, Baltimore County, USA

[email protected]

In a canonical form, the problem is to meaningfully combine the columns of an N x K matrix whose

elements represent some ‘pollution’ emissions, columns correspond to different kinds of pollutions, and

rows present various sources of pollution, in order to define an overall or combined index of pollution.

Multiple Criteria Decision Making (MCDM) method will be described and used to accomplish this goal.

Various modifications of MCDM method will also be discussed.

48

Page 51: To save our environment, you are adviced not to print them out

Contributed Talks

An M/M/1 Retrial Queue with Two Orbits

R Sivasamy

Department of Statistics, Annamalai University, India

cdl [email protected]

This paper deals with a retrial queue where customers joining an M/M/1 type are admitted into two

orbits. Any arriving customer finding the server busy enters into a group called Orbit-1 or occupies the

server for his first service. The server applies a set Ω of specifications and declares each customer as the

’Satisfied Type (ST)’ if the customer satisfies all the specifications of Ω; otherwise he declares him as the

’dissatisfied type (DST)’ at the time of completing first service. For the present study such events of ST

and DST are assumed to occur with probabilities p and q respectively, where p + q = 1. Each ST leaves

the service area while each DST joins the other group called ’Orbit-2’ of unsatisfied customers. Each

customer can reapply for service from both the orbits after the passage of a random amount of time and

can get in for service only when the server is free. The server does not apply the verification process to any

customer coming from Orbit-2 to get service i.e. he is served when he receives his service, the second time,

to his satisfaction without any verification. Steady state conditions, joint distribution of X1 = number of

busy servers and X2 = number of customers present in the system at a random epoch are derived. Some

measures of X2 are then obtained.

Key words and phrases. Satisfied type of customer, unsatisfied type of customer, steady state condition,

and joint distribution.

Simultanenous Comparison of Several Population Dispertions with anApplication to Livestock Bases

Yeong-Tzay Su

Department of Mathematics, National Kaohsiung Normal University, Taiwan

[email protected]

We present a robust testing procedure to compare the variability levels from arbitrary number of

populations. No specific distribution or moment condition for the populations is required. We also prove

some asymtotic properties of the test and demontrate the testing procedure by using the livestock bases

data from three regional markets in the U.S..

Key words and phrases. Rank sums, homogeneity of variance, power of the test, small sample distribution,

asymtotic behaviors.

49

Page 52: To save our environment, you are adviced not to print them out

Contributed Talks

Applicability of the Vitis GeneChip for Transcriptome Analysis inHeterologous Grapevine Species

Wenping Qiu, Laszlo G. Kovacs

Department of Fruit Science, Southwest Missouri State University, USA

Yingcai Su

Department of Mathematics, Southwest Missouri State University, USA

[email protected]

The Vitis GeneChip (Affymetrix) contains probe sets for 16,437 grape genes, most of which (> 14, 000)

are derived from Vitis vinifera, the cultivated grape. Transcriptome analysis, however, is also important in

other Vitis species, because the agronomically most valuable genetic resources are represented by the wild,

non-cultivated members of the genues. The purpose of this work is to determine if the Vitis GeneChip can

be utilized for transcriptome analysis in another graphevine species, namely Vitis aestivalis.

The Additive Hazards Model for Recurrent Gap Times

Liuquan Sun, Do-Hwan Park and Jianguo Sun

Chinese Academy of Sciences, China and University of Missouri, USA

[email protected]

Recurrent event data and gap times between recurrent events are often the targets in the analysis of

longitudinal follow-up or epidemiological studies. To analyze the gap times, among others, Huang and

Chen (2003) proposed to fit them to the proportional hazards model. It is well-known, however, that the

proportional hazards model may not fit the data well sometimes. To address this, this paper investigates

the fit of the additive hazards model to gap time data and an estimating equation approach is presented

for inference about regression parameters. Both asymptotic and finite sample properties of the proposed

parameter estimates are established. One major advantage of the use of the additive hazards model over

the proportional hazards model is that the resulting parameter estimator has a closed form and thus can

be easily obtained. The method is applied to an cancer study.

Key words and phrases. Additive hazards model, estimating equations, gap time, recurrent event data,

regression analysis.

Stirling Numbers and the Variance of Chi-square Statistic

Ping Sun

Department of Mathematics, Northeastern University, China

[email protected]

Stirling number of the second kind S(n, k) in combinatorics is the number of n distinct elements

x1, x2, · · · , xn distributed into k indistinguishable sets such that no set is empty, it should be noticed

that every state of discrete population must have observed samples in statistics, then Stirling numbers can

help to derive the exact distribution of some statistics. This paper gives the exact formula of the variance

of Chi-square statistic applying the theory of Stirling numbers of the second kind in Combinatorics.

Key words and phrases. Stirling numbers of the second kind, Chi-square statistics, variance.

50

Page 53: To save our environment, you are adviced not to print them out

Contributed Talks

Robust Designs for GLMM Models when Covariates are Involved

Frans E. S. Tan

Department of Methodology and Statistics, University of Maastricht, Netherlands

[email protected]

When planning an experimental design, it is often advocated to randomly allocate an equal number

of subjects to each of the groups of the treatment factor. In general, the primary virtue of this random

allocation is to balance the treatment groups with respect to several covariates.

From an optimal design perspective, it is known that the equal assignment strategy (balanced design)

is not always Dk-optimal. A low efficiency for the balanced design may be encountered if the absolute

values of the model parameters are large, or if the distribution of the covariates is much skewed.

In the presentation, we will review our main findings on Dk-optimal allocation problems for generalized

linear mixed models and discuss the optimal allocation problem when several important covariates are

present. Some general guidelines will be presented that characterizes the Dk-optimal distribution of the

treatment factor or of a collection of independent variables.

Under certain conditions the minimum relative efficiency of the balanced design for all possible distri-

butions of covariates is maximized.

Sufficient conditions, under which the balanced design is maximin optimal, will be given for models

with an arbitrary number of continuous and discrete independent variables and an arbitrary number of

random effects. Still, the efficiency of balanced designs may be low. Additional information about (the

sign of) the parameter values obtained from literature or from a pilot of study and applying the maximin

procedure may lead to designs with a higher efficiency than the balanced design.

Key words and phrases. GLMM, Dk-optimality, maximin, balanced design, robust designs, relative effi-

ciency.

Empirical Likelihood Method for a Common Mean Problem

Min Tsao

Department of Mathematics and Statistics, University of Victoria, Canada

[email protected]

Changbao Wu

University of Waterloo, Canada

[email protected]

We discuss empirical likelihood (EL) based methods of inference for a common mean using data from

several independent but nonhomogeneous samples. For point estimation, we propose a maximum empirical

likelihood (MEL) estimator and show that it is root-n consistent and is asymptotically optimal. For

constructing confidence intervals, we discuss the weighted EL method and the naive application of the

EL method. Finite sample performances of the MEL estimator and the EL based confidence intervals are

evaluated through a simulation study. Numerical results indicate that overall the MEL estimator and the

weighted EL confidence interval are superior alternatives to the existing methods.

51

Page 54: To save our environment, you are adviced not to print them out

Contributed Talks

Regression Coefficient and Autoregressive Order Shrinkage andSelection via the RA-lasso

Hansheng Wang

Guanghua School of Management, Peking University, China

[email protected]

Chin-Ling Tsai

Graduate School of Management, University of California, Davis, USA

[email protected]

The least absolute shrinkage and selection operator (lasso) has been widely used in regression shrinkage

and selection. However, the lasso is not designed to take into account the autoregressive process in a nested

fashion. In this article, we propose the regression and autocorrelated lasso (RA-lasso) to jointly shrink

the regression and the nested autocorrelated coefficients in the REGression model with AutoRegressive

errors (REGAR). We show that the RA-lasso estimator performs as well as the oracle estimator (i.e., it

works as well as if the correct submodel were known). Our extensive simulation studies demonstrate that

the RA-lasso outperforms the lasso. An empirical example is also presented to illustrate the usefulness

of RA-lasso. Finally, the extension of RA-lasso to the autoregression with exogenous variables (ARX) is

discussed.

Key words and phrases. ARX, Lasso, RA-lasso, REGAR.

Generalized Sphericity Test

Jing-Long Wang

Department of Statistics, East China Normal University, China

[email protected]

The generalized sphericity test and an asymptotic expansion of generalized sphericity test are studied.

Also sphericity test in nested repeated measures model and in one-way multivariate repeated measurements

analysis of variance model are given as an applications of generalized sphericity test.

52

Page 55: To save our environment, you are adviced not to print them out

Contributed Talks

Evaluating the Power of Minitab’s Data Subsetting Lack of Fit Test inMultiple Linear Regression

Daniel Xiaohong Wang

Central Michigan University, USA

[email protected]

Michael D. Conerly

University of Alabama, USA

Minitab’s data sub-setting lack of fit test (denoted XLOF) is a combination of Burn & Ryan’s test

and Utts’ test for testing lack of fit in linear regression models. As an alternative to the classical or pure

error lack of fit test, it does not require replicates of predictor variables. However, due to the uncertainty

about its performance, XLOF still remains unfamiliar to regression users while the well-known classical

lack of fit test is not applicable to regression data without replicates. So far this procedure has not been

mentioned in any textbooks and has not been included in any other software packages. This study assesses

the performance of XLOF in detecting lack of fit in linear regressions without replicates by comparing the

power with the classic test. The power of XLOF is simulated using Minitab macros for variables with

several forms of curvature. These comparisons lead to pragmatic suggestions on the use of XLOF. The

performance of XLOF was shown to be superior to the classical test based on the results. It should be

noted that the replicates required for the classical test made itself unavailable for most of the regression

data while XLOF can still be as powerful as the classic test even without replicates.

Key words and phrases. Minitab XLOF, lack of fit test, linear regression, diagnosis, power, simulation.

53

Page 56: To save our environment, you are adviced not to print them out

Contributed Talks

Local Equilibrium Ruin Probability and Ruin Probability for theRenewal Risk Model without the Lundberg Exponent2

Yuebao Wang and Dongya Cheng

Department of Mathematics, Soochow University, China

[email protected]

It is well-known that ruin probability for the classical renewal risk model has been extensively studied

during 1970’s and 1980’s. See Teugels (1975), Veraverbeke (1977), Embrechts, Goldie and Veraverbekee

(1979), Embrechts and Veraverbeke (1982), Embrechts and Goldie (1982) et al.. But when the Lundgerg

exponent does not exist, there remains some interesting problems to be solved. This paper will further

discuss these problems. Theorem 1 and 2 of Veraverbeke (1977) gave an asymptotic estimate of ruin

probability for the renewal risk model. But it was pointed out in Embrechts and Goldie (1982) that the

proof for the case γ > 0 and when the Lundberg exponent does not exist is incomplete. This paper

attempts to improve the proof of Theorem 1 and 2 of Veraverbeke (1977) using different method such that

all its conclusions remain valid.

Theorem 1 Let B < ∞ and −γ be the left abscissa of convergence of fK(iλ) =R

−∞e−λxdK(x). If

γ > 0 and fK(−iγ) < 1, then

ψ(x) = o(e−γx),

but

K ∈ S(γ) ⇔ K+ ∈ S(γ) ⇔ W ∈ S(γ).

and each of the above implies

W (x) ∼ C1K(x)

where C1 = (eB(1 − g+(−γ))(1 − fK(−iγ)))−1 = gW (−γ)(1 − fK(−iγ))−1.

To prove Theorem 1, we will first discuss local properties of exponential equilibrium distribution, which

include asymptotic estimates for local exponential equilibrium ruin probability.

Key words and phrases. Lundberg exponent, exponential equilibrium distribution, ruin probability, asymp-

totics.

2This work was supported by National Science Foundation of China (NO. 10271087).

54

Page 57: To save our environment, you are adviced not to print them out

Contributed Talks

Selection of Integrated Statistical Evaluation Indices

Jianwu Wen

Research Institute of Statistical Sciences, National Bureau of Statistics of China, China

I. Background

Features of integrated statistical evaluation lie in the numerical circumscriptions of evaluating conclusions.

Integrated statistical evaluation can also help to scrutinize types of issues comprehensively and systemat-

ically. Methods applied by integrated statistical evaluation are more and more diversified; applied areas

are more and more widely, which is the same thing happening to its future developments.

II. Issues to be Discussed and the Solutions

Following ten points should be concerned when selecting integrated statistical evaluating indices:

1, Objectives of Evaluation: Special attention should be paid on how well acquainting the objectives of

evaluating. Choices of the objectives depend on the selection of definitions, which is the basic problem to

be resolved firstly.

2, Examining Evaluation and Achieving Evaluation: Make sure to differentiate the examining evaluation

and the evaluating of achievements properly, which is popularly omitted by many academic research works.

A great deal of achievements is lack of persuasion, the reason of which is the ignorance of the difference.

3, Scientific and Operational: Dealing carefully with relationship of scientific indices system and operational

indices system.

4, Independence and Correlation: Dealing properly with relationship of independence and correlation.

5, Stability and Sensitivity: If it would have to apply extreme stagnant indices, make sure to take measures

to make them sensitive; and much more stability should be enforced on the extreme stagnant indices.

6, Unification and Diversity: When comparing two different regions, dealing properly with relationship of

unification and diversity.

7, Complication and Simplicity: Attempt to integrated evaluate not only comprehensively, but also sim-

ply. It should be right to attach importance to how well major indices are indicating circumstances; yet

secondary indices can still produce reference value.

8, Subjectivity and Objectivity: When having to apply subjective indices, dealing properly with relation-

ship of subjectivity and objectivity.

9, Value Index and Volume Index: If the same set of indices system contains both value indices and volume

indices, make sure to deal with the relationship of these two types of indices.

10, Limitation and Infinity: Methods of integrated statistical evaluation are applied through statistical

perspective. Those interpretations of the evaluating results that are beyond statistical field are apparently

implausible.

III. Conclusion

When proceeding integrated statistical evaluation, the idea of ”to what extent” is extremely significant.

The critical factor of Changing from volume into quality is to master the ”extent” accurately.

55

Page 58: To save our environment, you are adviced not to print them out

Contributed Talks

The Ethical Experiment Model in Clinical Trial and it’s StatisticalInference

Minyu Xie, Bo Li and Jianhui Ning

Faculty Statistics, Center China Normal University, China

[email protected]

In order to acquiring whether a new treatment or medicine to a disease is effective or not, or comparing

it with the old one, modern clinical trials always gain valid statistical information through designing some

experiment and carrying it out. Because all the attention are centralized at collecting information, design

might less care about whether the current patient get the best treatment. Designer always separate the

pool of research subjects into distinct groups at the outset and then gather information about the responses

of these groups to their assigned treatments. These kinds of trials obviously contravene the obligation: to

apply knowledge for the best possible treatment of each individual patient, stated in the Declaration of

Helsinki [1], and criticized by more and more people especially when it used to treating those desperate

disease. Hence finding a trial design which not only can gather the valid information but also can care

about the current patient’s well being, is interested by many researcher in this field. It also get a lot

of attention as an essentially bandit problem [2]. In this paper, by absorbing the sequential statistical

principle, we have established an experiment model, which makes each patient in the trial has much more

possibility to be treated by the best treatment, at lest in the current acknowledge background. It’s a pit

we have no valid statistical means to coin to it at present. While it is exactly the things we do want to

discuss in this paper. We have been obtained the Maximum Likelihood Estimators of the parameters in

this medical statistical model, and we also find a effectively statistical way to compare the efficiency of

different treatments involved in the model.

References

[1] World Medical Association. Declaration of Helsinki(2000). Reprinted in JAMA2000, 284:1043-

1045.

[2] Daryl Pullman, Xikui Wang(2001). Adaptive Design, Informed Consent, and the Ethics of Re-

search. Controlled Clinical Trials, 22:203-210.

Key words and phrases. Clinical trial, maximum Likelihood estimator.

Mathematics Subject Classification. 62K05, 62G10.

56

Page 59: To save our environment, you are adviced not to print them out

Contributed Talks

Robust Designs and Weights for Biased Regression Models withPossible Heteroscedasticity in Accelerated Life Testing

Xiaojian Xu and Douglas P. Wiens

Department of Mathematical and Statistical Sciences, University of Alberta, Canada

[email protected], [email protected]

We consider the construction of designs for accelerated life testing, allowing both for possible het-

eroscedasticity and for imprecision in the specification of the response function. We find minimax designs

and corresponding optimal estimation weights in the context of the following problems: (1) For ordinary

least squares estimation with homoscedasticity, determine a design to minimize the maximum value of

the mean squared prediction error (MSPE), with the maximum being evaluated over the departure of the

response function; (2) For ordinary least squares estimation with heteroscedasticity, determine a design to

minimize the maximum value of MSPE, with the maximum being evaluated over both types of departure;

(3) for weighted least squares estimation, determine both weights and a design to minimize the maximum

MSPE; (4) Choose weights and design points to minimize the maximum MSPE, subject to a side con-

dition of unbiasedness. All solutions to (1)–(4) are given in complete generality. Applications to several

life-stress relationship models in accelerated life testing are discussed. Numerical comparisons indicate

that our designs and weights perform well in compromising robustness and efficiency.

Key words and phrases. Regression design, accelerated life testing, least square estimates, generalized

linear response model, extrapolation, heteroscedasticity, mean squared prediction error.

Application of Design of Experiments in Computer Simulation Study

Shu Yamada and Hiroe Tsubaki

Graduate School of Business Sciences, University of Tsukuba, Japan

[email protected], [email protected]

This paper presents an approach of design of experiments in computer simulation with some case studies

in automobile industry.

In recent days, computer simulation has been applied in many fields, such as Computer Aided En-

gineering in manufacturing industry and so forth. In order to apply computer simulation effectively, we

need to consider the following two points: (1) Exploring a model for computer simulation, (2) Effective

application of simulation based on the explored model. As regard (1), once a tentative model is derived

based on knowledge in the field, it is necessary to examine validity of the model. At this examination,

design of experiments plays an important role. After exploring a computer model, the next stage is (2),

such as optimization of the response by utilizing computer simulation. This paper presents an approach of

design of experiments in computer simulation in terms of (1) and (2) with some case studies in automobile

industry. For example, in order to optimize a response by many factors, the first step may be screening

active factors from many candidate factors. Design of experiments, such as supersaturated design, etc

help at this screening problem. After fining some active factors, the next step may be approximation of

the response by an appropriate function. Composite design, Uniform design is helpful to fit second order

model as an approximation.

Key words and phrases. Model fitting, supersaturated design, uniform design, screening factors, validation

& verification.

57

Page 60: To save our environment, you are adviced not to print them out

Contributed Talks

Probability Distributions in Infectious Disease Transmission Modelsand Risks of Major Outbreaks

Ping Yan

Surveillance and Risk Assessment Division, Centre for Infectious Disease Prevention and

Control Public Health Agency of Canada, Canada

Ping [email protected]

When an infectious agent enters a susceptible population of size n, with probability π, the outbreak

of a disease terminates with few cases, of which, the average number remains constant as n → ∞ and

the outbreak size as a proportion, f, concentrates at zero. This quantifies a “minor outbreak”. With

probability 1 − π, the initial growth of infected individuals over time t may be approximated by an

exponential function Cert with rate r and the outbreak size as a number, nf , scales linearly with n, while

f > 0 is a proportionality constant. This quantifies a “major outbreak”. N is the random number of

infections produced by an infective individual throughout its infectious period. PrN = n depends on the

point process for contacts, the transmission probability and the infectious period distribution. It determines

the risk of major outbreaks 1−π. The intrinsic growth rate r during the early phase of an outbreak is also

determined by the same set of assumptions that affect PrN = n. In many classical infectious disease

modelling literature, under the assumptions that the contacts are homogeneous(Poisson); the probability

per contact is homogeneous (Bernoulli); and the infectious period is exponentially distributed, r = E[N ]−1µ

where µ is the mean infectious period and π = E[N ]−1. These assumptions, far from being realistic,

give geometric distribution for N, which is only a special case of a family of models for PrN = n. A

general model will be presented using probability generating functions, various forms of age-dependent

and continuous time branching processes and random effect models. This presentation will show how

transmission heterogeneity, such as the ”super-spreading events” that observed in the outbreaks of the

Severe Acute Respiratory Syndrome (SARS) in 2003, affects π and r. A further extension of this topic

is to examine the relationships between this family of probability model for PrN = n and a family of

distributions recently developed in random graph theory with focus on the contact network structure in

the transmission of infectious diseases.

Cure Rate Models: A Unified Approach

Guosheng Yin

Department of Biostatistics, University of Texas, USA

[email protected]

This is a joint work with Joseph G. Ibrahim.

The authors propose a novel class of cure rate models for right-censored failure time data. The class is

formulated through a transformation on the unknown population survival function. It includes the mixture

cure model and the promotion time cure model as two special cases. The authors propose a general form

of the covariate structure which automatically satisfies an inherent parameter constraint and includes the

corresponding binomial and exponential covariate structures in the two main formulations of cure models.

The proposed class provides a natural link between the mixture and promotion time cure models, and it

offers a wide variety of new modelling structures as well. Within the Bayesian paradigm, a Markov chain

Monte Carlo computational scheme is implemented for sampling from the full conditional distributions of

the parameters. Model selection is based on the conditional predictive ordinate criterion. The class of

models is illustrated with a real dataset involving a melanoma clinical trial.

Key words and phrases. Bayesian inference, Box-Cox transformation, cure fraction, Gibbs sampling, mix-

ture cure model, promotion time cure model.

58

Page 61: To save our environment, you are adviced not to print them out

Contributed Talks

Estimating Secondary Parameters After Termination of a MultivariateGroup Sequential Test

C. Wu, A. Liu and Kai Fun Yu

National Institute of Child Health and Human Development, USA

[email protected]

We consider estimation of secondary parameters following a group sequential test, with stopping re-

gions determined by testing hypotheses concerning a set of primary parameters. We derive statistics that

are jointly sufficient for the primary and secondary parameters and show that the maximum likelihood

estimators remain unchanged but no longer possess unbiasedness and minimum variance. We construct

bias-reduced and unbiased estimators for the vector of secondary parameters and show them to substan-

tially reduce the bias and improve the precision of estimation.

Key words and phrases. Bias and bias-reduction, correlated endpoints, medical trials, minimum variance;

primary andsecondary endpoints, restricted completeness, truncation adaptation.

Normal Theory Based Missing Data Procedure with Violation ofDistribution Assumption

Ke-Hai Yuan

Department of Psychology, University of Notre Dame, USA

[email protected]

Missing data exist in almost all areas of empirical research. Many statistical developments have been

made towards the analysis of missing data. When missing data are either missing completely at random

(MCAR) or missing at random (MAR), the maximum likelihood (ML) estimation procedure preserves

many of its properties. However, in any statistical modeling, the distribution specification is at best only

an approximation to the real world, especially for higher-dimensional data. We study the properties of

the ML procedure based on the normal distribution assumption. Specifically, we study the consistency

and asymptotic normality of the MLE when data are not normally distributed and missing data are MAR

or MCAR. When data are not missing at random, factors affect the asymptotic biases in the MLE will

be discussed. Consistent estimates of standard errors using the sandwich-type covariance matrix will be

obtained. Our results indicate that formula or conclusions in the existing literature are not all correct.

59

Page 62: To save our environment, you are adviced not to print them out

Contributed Talks

A Weighted Bivariate Density Estimation Method

W. K. Yuen and M. L. Huang3

Department of Mathematics, Brock University, Canada

[email protected], [email protected]

A method of nonparametric bivariate density estimation based on a bivariate sample level crossing

function is introduced in this paper, which leads to the construction of a weighted bivariate kernel density

estimator (WBKDE). A mean square integrated error (MSIE) function and an efficiency functions for this

WBKDE relative to the classical bivariate kernel density estimator is derived. The WBKDE gives more

efficient estimates and better convergence rate than the classical method, in the tails of any underlying

continuous distribution, for both small and large sample sizes. We run simulation on various distributions,

the results of simulations confirm the theoretical results.

References

[1] Huang, M. L. and Brill, P. H. (2004). ”A distribution estimation method based on level crossing”,

Journal of Statistical Planning and Inference, 124, 45–62.

[2] Silverman, B. W. (1996). Density Estimation for Statistics and Data Analysis, Chapman and

Hall, New York.

Key words and phrases. Mean square integrated error, efficiency, multivariate level crossing sample func-

tion, nonparametric kernel density estimator, order statistics.

Mathematics Subject Classification. 62G07, 62E20

Schur-power Discrimination Theory for Orthogonal Designs withGeneralized Minimum Aberration

Aijun Zhang

Department of Statistics, University of Michigan, USA

[email protected]

The majorization framework for fractional factorial designs is a two-stage investigation scheme based

upon pairwise coincidences among experimental runs and their Schur-convex functions. The weak equidis-

tant designs are argued by Zhang et. al (2005; Ann. Statist. to appear) to be universally optimum in the

sense of majorization. In this talk we concentrate on orthogonal designs, introduce the mean and variance

of pairwise coincidences to analyze the existence of weak equidistant benchmark, and develop a Schur-power

discrimination theory. The equivalence is established between Schur-power criteria and the state-of-the-art

criterion of generalized minimum aberration. For both criteria, we derive their lower and upper bounds

via a constrained optimization approach, which extends Zhang et. al (2005) from resolution-II designs to

III and also lay down a foundation for further study of orthogonal designs with higher resolution.

3Both authors are supported by NSERC Canada grants.

60

Page 63: To save our environment, you are adviced not to print them out

Contributed Talks

Some Notes on the Business Survey

Yongguang Zhang

The Institute of Systems Science, Academia Sinica, China

[email protected]

In recent years Chinese government has issued the Business survey index at each quarter, which is a

important index for Macro economys run and it is already used by many countries in the world. The

construction of this index is not very difficult, but there are still some problem for the perfect use and

understanding it. For example, the design of sampling questionnaires, how to weight the answer question-

naires and how to explain it by some model, such as Logistic model, Ordered probit model. Besides, we

try to fit it more precisely by using of Arctangent model, the result is satisfactory..

Bayesian Inference for Two-parameter Exponential Distribution underType-II Doubly Censored

Xuanmin Zhao and Yanlin Li

Department of Applied Mathematics, Northwester Polytechnical University, China

In this paper, Bayesian estimation of parameters and index of reliability from two-parameter exponen-

tial distribution under Type-II doubly censored were given, and prediction bounds for future observations

are obtained using Bayesian approach. Prediction intervals are derived for unobserved lifetimes in one-

sample prediction and two-sample prediction based on type II doubly censored samples.

Key words and phrases. Type-II doubly censored, two-parameter exponential distribution, Bayesian esti-

mation, Bayesian prediction.

Empirical Likelihood for a Class of Function of Survival Distributionwith Censored Data

Ming Zheng, Sihua Li and Yi Yang

Department of Statistics, Fudan University, China

[email protected]

The empirical likelihood was first introduced by Owen in 1990 in the complete data case. Wang applied

this method to a class of functionals of survival function in the presence of censoring. In this paper, a

generally adjusted empirical likelihood is defined. It is showed that the adjusted empirical likelihood

asymptotically follows a chi square distribution also. Some simulation studies indicate that the better

results may be got than those of Wang.

Key words and phrases. Empirical likelihood, Kaplan-Meier estimate, censoring.

61

Page 64: To save our environment, you are adviced not to print them out

Contributed Talks

Marginal Hazard Models with Varying-coefficients for MultivariateFailure Time Data4

Jianwen Cai

Department of Biostatistics, University of North Carolina, USA

Jianqing Fan

Department of Operations Research and Financial Engineering, Princeton University, USA

Haibo Zhou

Department of Biostatistics, University of North Carolina, USA

Yong Zhou

University of North Carolina, USA and Chinese Academy of Science, China

[email protected]

Statistical inference for the marginal hazard models with varying-coefficients for multivariate failure

time data is studied in this paper. A local pseudo-partial likelihood procedure is proposed for estimating

the unknown coefficients and the intercept function. A weighted average estimator is also proposed in

an attempt to improve the efficiency of the estimator. The consistency and the asymptotic normality

of the proposed estimators are established and the standard error formulas for the estimated coefficients

are derived and empirically tested. To reduce the computational burden of the maximum local pseudo-

partial likelihood estimator, a simple and useful one-step estimator is proposed. Statistical properties of

the weighted avarage optimal estimator and one-step estimator are established and simulation studies are

conducted to compare the performance of the one-step estimator to the maximum local pseudo-partial

likelihood estimator. The results show that the one-step estimator can save computational cost without

deteriorating its performance both asymptotically and empirically and the weight average optimal average

estimator are more efficient than the maximum local pseudo-partial likelihood estimator. A data set from

the Busselton Population Health Surveys is analyzed to illustrate our proposed methodology.

Key words and phrases. Local pseudo-partial likelihood, marginal hazard model, martingale, multivariate

failure time, one-step estimator, varying coefficients.

4This research was partially, supported by NIH grant RO1 HL69720. Fan’s research was also partially supported

by NSF grant DMS-0355174, and a RGC grant CUHK4262/01P of HKSAR. Y. Zhou’s research was done at the

University of North Carolina at Chapel Hill while on leave from the Chinese Academy of Sciences, Beijing and was

also partially supported by the Fund of National Natural Science (No.10171103) of China. The authors thank Dr.

Matthew Knuiman and the Busselton Population Medical Research Foundation in Western Australia for providing

the data used in the illustration.

62

Page 65: To save our environment, you are adviced not to print them out

Contributed Talks

Using SPSS as a Powerful Tool to Teach Probability Better

Yu Zhu

Department of Statistics, Xi’an University of Finance & Economics, China

[email protected]

This paper aims at conveying a practical and important idea in conducting probability teaching in a

better way through operable examples. Other statistical software may work as good as SPSS. SPSS is a

wide-spread statistical software. Normally, it can be used for data editing and data analyzing. But because

its nice graph-drawing feature, it can also be used to facilitate teaching of probability course, especially

the teaching of probability to a relatively low level students such as professional training. For the low level

teaching, the core of the probability course is not the theoretical aspect. Rather, it is the understanding of

the basic concepts that matters. In general, visual effect is much more impressive and understandable than

that of conceptual explanation. Years of experience tells me that it is also true to probability teaching.

For example, pdf can not be easily understood literally, but can be easily understood via a graphical or

visual display. A fictitious pdf graph can be easily drawn by teachers on the board using a piece of chalk,

or on a piece of paper using a pencil. But it is always imprecise, sometimes could even be wrong. To get

rid of this harassment, one can use the computer to draw precise pdfs to facilitate the teaching. More

important, if we teach this drawing approach to the students, their homework assignments can be done

more effectively, and the relevant concept of probability theory can stay longer with them in their minds.

Key words and phrases. Probability teaching, graph, pdf.

63