Workshop On Multivariate Analysis Today Programme and Book ...

WOMAT

Workshop On Multivariate Analysis Today

Programme and Book of abstracts

Scientific organisers: Frank Critchley (OU), Bing Li (Penn State), Hannu Oja (Turku)

Local organisers: Sara Griffin, Tracy Johns, Radka Sabolova, Germain Van Bever

Contents

Programme 3

Talk abstracts 5

Yanyuan Ma: A Validated Information Criterion (VIC) to Find the Structural Dimension . . . . 5

Joao Branco: High dimensionality: the trouble with Mahalanobis distance . . . . . . . . . . . . . 5

Tim Cannings: Random projection ensemble classification . . . . . . . . . . . . . . . . . . . . . . 5

Kjersti Aas: Pair-copula constructions–even more flexible than copulas . . . . . . . . . . . . . . . 6

Sara Fontanella: A Bayesian approach to sparse latent variables modelling: Factor Analysis and

Multidimensional Item Response Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Shahin Tavakoli: Dynamics of DNA Minicircles in Motion via Fourier Analysis of Functional

Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Lutz Duembgen: New Algorithms for M -Estimation of Multivariate Scatter and Location . . . . 8

Jim Smith: Chain event graphs for discrete multivariate processes . . . . . . . . . . . . . . . . . 8

John Kent: Some new perspectives on partial least squares . . . . . . . . . . . . . . . . . . . . . 8

Poster abstracts 10

Comparison of statistical methods for multivariate outliers detection . . . . . . . . . . . . . . . . 10

On point estimation of the abnormality of a Mahalanobis distance . . . . . . . . . . . . . . . . . 11

Sparse Linear Discriminant Analysis with Common Principal Components . . . . . . . . . . . . . 12

Recovering Fisher linear discriminant subspace by Invariate Coordinate Selection . . . . . . . . . 13

Hilbertian Fourth Order Blind Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2

Programme

9:00Yanyuan Ma (University of South Carolina): A Validated Information Cri-terion to Find the Structural Dimension

9:30Joao Branco (CEMAT, IST, Lisbon): High dimensionality: the troublewith Mahalanobis distance

10:00 Tim Cannings (Cambridge): Random projection ensemble classification

10:30 Coffee & Poster Session

11:00Kjersti Aas (Norwegian Computing Centre): Pair-copula constructions–even more flexible than copulas

11:30Sara Fontanella (The Open University): A Bayesian approach to sparselatent variables modelling: Factor Analysis and Multidimensional ItemResponse Theory

12:00Shahin Tavakoli (Cambridge): Dynamics of DNA Minicircles in Motionvia Fourier Analysis of Functional Time Series

12:30Lutz Duembgen (Bern): New Algorithms for M -Estimation of MultivariateScatter and Location

13:00 Lunch and Poster Session

14:30Jim Smith (Warwick): Chain event graphs for discrete multivariate pro-cesses

15:00 John Kent (Leeds): Some new perspectives on partial least squares

15:30 Roundtable Discussion

16:00 Tea and Departures

3

Talk abstracts

A Validated Information Criterion to Find the Structural Dimension

Yanyuan Ma, University of South CarolinaE-mail: [email protected]

A crucial component in performing sufficient dimension reduction is to determine the structural di-

mension of the reduction model. We propose a novel information criterion-based method to achieve this

purpose, whose special feature is that when examining the goodness-of-fit of the current model, we need to

obtain model evaluation by using an enlarged candidate model. Although the procedure does not require

estimation under the enlarged model with dimension k + 1, the decision on how well the current model

with dimension k fits relies on the validation provided by the enlarged model. This leads to the name

validated information criterion, calculated as VIC(k). The method is different from existing information

criteria based model selection methods. It breaks free from the dependence on the connection between

dimension reduction models and their corresponding matrix eigenstructures, which heavily relies on a

linearity condition that we no longer assume. Its consistency is proved and its finite sample performance

is demonstrated numerically. (Joint work with Xinyu Zhang.)

High dimensionality: the trouble with Mahalanobis distance

Joao Branco, Ana M. Pires, CEMAT, Instituto Superior Tecnico, Lisbon

The recent massive production of high-dimensional data has brought great difficulties and concomitant

challenges to statistics since its usual methods were not designed to cope with such kind of data. High

dimensionality triggers the curse of dimensionality and unexpected behaviour of some statistical tools may

surprise even those aware of the intricacies of multidimensional spaces with a large number of dimensions.

We look at the Mahalanobis distance, a tool that is crucial to the functioning of the traditional

multivariate statistical methods, and see how it progresses as p approaches n and when it is greater than

n. Can the Mahalanobis distance keep the fundamental role in high-dimensional spaces as it does in low

dimensional spaces (p n)? And if it does not what are the consequences? We will attempt to answer

these questions.

Random projection ensemble classification

Timothy I. Cannings and Richard J. Samworth, Statistical Laboratory, University of Cambridge

We introduce a very general method for high-dimensional classification, based on careful combination

of the results of applying an arbitrary base classifier to random projections of the feature vectors into a

lower-dimensional space. In one special case presented here, the random projections are divided into non-

overlapping blocks, and within each block we select the projection yielding the smallest estimate of the test

error. Our random projection ensemble classifier then aggregates the results of applying the base classifier

on the selected projections, with a data-driven voting threshold to determine the final assignment. We

provide theoretical understanding to justify the methodology, and a simulation comparison with several

other popular high-dimensional classifiers reveals its excellent finite-sample performance.

5

Pair-copula constructions–even more flexible than copulas

Kjersti Aas, Norwegian Computing Centre

A copula is a multivariate distribution with standard uniform marginal distributions. While the lit-

erature on copulas is substantial, most of the research is still limited to the bivariate case. However,

some years ago hierarchical copula-based structures were proposed as an alternative to the standard cop-

ula methodology. One of the most promising of these structures is the pair-copula construction (PCC).

The PCC modeling scheme is based on a decomposition of a multivariate density into a cascade of pair

copulae, applied on original variables and on their conditional and unconditional distribution functions.

Each pair copula can be chosen arbitrarily and the full model exhibit complex dependence patterns such

as asymmetry and tail dependence. In this talk I will give an introduction to pair-copula constructions

and apply the methodology to a 19-dimensional financial data set.

A Bayesian approach to sparse latent variables modelling: FactorAnalysis and Multidimensional Item Response Theory

Sara Fontanella, N. Trendafilov, P. Valentini, L. Fontanella

In the last decades, sparse modeling has inspired many studies in different research fields, such as

statistics, machine learning and bioinformatics. Its importance is due to the following main advantages:

first, it enhances the interpretability of the results; second, it reflects reality, as any real-world system is

sparse and third, predictive performance is improved, since the sparsity helps prevent overfitting.

In this work, we consider sparse modeling in the context of two multivariate statistical techniques:

Factor Analysis (FA) and Multidimensional Item Response Theory (MIRT). They are strongly related to

each other in terms of modeling despite the different types of data they are applied to.

FA is a well-known model-based multivariate technique used to describe observed continuous variables

by means of a smaller set of latent factors. Item response theory (IRT) models the probability for a correct

response (to a test, questionnaire, etc) as function of disjoint sets of parameters, related respectively to

the person and the item. MIRT is its multidimensional extension.

Both FA and MIRT suffer from solution/factor indeterminacy. In particular, the main issue to be

addressed is the rotational invariance of the final solution: for a given set of data, any orthogonal trans-

formation of the matrix of parameters would produce the same covariance structure. In this context, we

show that the sparsity plays a double role: on one side it improves the interpretability of the results, while,

on the other side, it allows to overcome the rotational indeterminacy.

To this end, we follow a Bayesian approach to sparse modeling. The prior belief in sparsity is modeled

by a sparse-inducing prior distribution on the parameters. In this context, a popular choice is to apply

spike and slab priors, which present several computational advantages. A spike and slab prior assumes

that the parameters of interest are mutually independent with a two-point mixture distribution made up

of a degenerate distribution at zero (the spike), to provide strong shrinkage near zero and a uniform flat

distribution (the slab), to allow signals to escape strong shrinkage. The performances of the considered

methods are evaluated through simulation studies.

6

Dynamics of DNA Minicircles in Motion via Fourier Analysis ofFunctional Time Series

Shahin Tavakoli, Statslab, University of Cambridge

We consider the problem of studying the dynamics of DNA minicircles that are vibrating in solution.

At a large scale, DNA minicircles are modelled as elastic rods, and the problem of understanding their dy-

namics can be recasted into the problem of estimating the second order structure of a stationary functional

time series (FTS). We tackle this problem by a frequency domain approach, where we estimate the spectral

density operators (or spectra) of the DNA minicircle. We then carry out hypothesis tests to compare the

spectra of two specific DNA minicircles. The comparison is broken down to a hierarchy of stages: at a

global level, we compare the spectral density operators of the two DNA minicircles, across frequencies and

curvelength, based on a Hilbert-Schmidt criterion; then, we localize any differences to specific frequencies;

and, finally, we further localize any differences along the length of the DNA minicircles, i.e. in physical

space. A hierarchical multiple testing approach guarantees control of the averaged false discovery rate

over the selected frequencies. In this sense, we are able to attribute any differences to distinct dynamic

(frequency) and spatial (curvelength) contributions.

Keywords. Functional Data Analysis; Spectral Analysis; DNA Minicircle; Molecular Dynamics;

Multiple Testing.

7

New Algorithms for M-Estimation of Multivariate Scatter andLocation

Lutz Duembgen, Bern

We present new algorithms for M-estimators of multivariate scatter and location and for symmetrized

M-estimators of multivariate scatter. The new algorithms are considerably faster than currently used

fixed-point and other algorithms. The main idea is to utilize local parametrizations of scatter matrices via

matrix exponentials with a corresponding second order Taylor expansion of the target functional and to

devise a partial Newton-Raphson procedure. In connection with symmetrized M-estimators we work with

incomplete U-statistics to accelerate our procedures initially.

This talk is based on joint work with Klaus Nordhausen (Turku) and Heike Schuhmacher (Bern).

Chain event graphs for discrete multivariate processes

Jim Smith, Warwick

Statistical models of multivariate discrete processes often need to express various hypotheses about

how events might unfold and associated hypotheses about the symmetries within these unfoldings. A

natural way to express such hypotheses is via a statistical model on a finite set of atoms structured around

collections of different probability trees with different symmetries. One such family is the class of Chain

Event Graphs. This family contains the class of discrete Bayes Nets as a very special case. It can be shown

that most inferential techniques used for Bayesian Networks readily translate to this new family because of

thier modular form. Furthermore because different models in the class can be associated with families of

polynomials, the inferential implications of one hypothesis against another can be elegantly analysed. In

this talk I will present some recent results associated with CEGs and the challenges they bring to effective

model choice. This is joint work with two PhD students, Christiane Gorgen and Rodrigo Collazo.

Some new perspectives on partial least squares

John Kent, Department of Statistics, University of Leeds

Partial least squares a regularization technique in high-dimensional multiple regression analysis. It has

sometimes had a somewhat dubious reputation in mainstream statistics. Part of the reason seems to be

that the methodology was originally proposed in terms of an algorithm, and only later was it noticed that

it can be viewed as an attempt to fit a particular statistical model, the Krylov model.

In this talk we describe how the Krylov model can be formulated most simply in the setting of inverse

regression and how the PLS estimator can be viewed as an approximate MLE for this model. We then

describe some comparisons with the exact MLE under this model.

8

Poster abstracts

Comparison of statistical methods for multivariate outliers detection

Aurore Archimbaud1, Klaus Nordhausen2 & Anne Ruiz-Gazen1

1 Gremaq (TSE), Universite Toulouse 1 Capitole,

E-mail: [email protected]

[email protected] Department of Mathematics and Statistics, University of Turku,

E-mail: [email protected]

In this poster, we are interested in detecting outliers, like for example manufacturing defects, in mul-

tivariate numerical data sets. Several non-supervised methods that are based on robust and non-robust

covariance matrix estimators exist in the statistical literature. Our first aim is to exhibit the links between

three outliers detection methods: the Invariant Coordinate Selection method as proposed by Caussinus

and Ruiz-Gazen (1993) and generalized by Tyler et al. (2009), the method based on the Mahalanobis dis-

tance as detailed in Rousseeuw and Van Zomeren (1990), and the robust Principal Component Analysis

(PCA) method with its diagnostic plot as proposed by Hubert et al. (2005).

Caussinus and Ruiz-Gazen (1993) proposed a Generalized PCA which diagonalizes a scatter matrix

relative to another: V1V−12 where V2 is a more robust covariance estimator than V1, the usual empirical

covariance estimator. These authors compute scores by projecting V −12 -orthogonally all the observations on

some of the components and high scores are associated with potential outliers. We note that computing

euclidean distances between observations using all the components is equivalent to the computation of

robust Mahalanobis distances according to the matrix V2 using the initial data. Tyler et al. (2009)

generalized this method and called it Invariant Coordinate Selection (ICS). Contrary to Caussinus and

Ruiz-Gazen (1993), they diagonalize V −11 V2 which leads to the same eigen elements but to different scores

that are proportional to each other. As explained in Tyler et al. (2009), the method is equivalent to a

robust PCA with a scatter matrix V2 after making the data spherical using V1. However, the euclidean

distances between observations based on all the components of ICS corresponds now to Mahalanobis

distances according to V1 and not to V2.

Note that each of the three methods leads to a score for each observation and high scores are associated

with potential outliers. We compare the three methods on some simulated and real data sets and show

in particular that the ICS method is the only method that permits a selection of the relevant components

for detecting outliers.

Keywords. Invariant Coordinate Selection; Mahalanobis distance; robust PCA.

Bibliography

[1] Caussinus, H. and Ruiz-Gazen, A. (1993), Projection pursuit and generalized principal component

analysis, In New Directions in Statistical Data Analysis and Robustness (eds S. Morgenthaler, E.

Ronchetti and W. A. Stahel), 35–46, Basel: Birkhauser.

[2] Hubert, M., Rousseeuw, P. J. and Vanden Branden, K. (2005), ROBPCA: a new approach to robust

principal component analysis, Technometrics, 47(1), 64–79.

[3] Rousseeuw, P. J. and Van Zomeren, B. C. (1990), Unmasking multivariate outliers and leverage points,

Journal of the American Statistical Association, 85(411), 633–639.

[4] Tyler, D. E., Critchley, F., Dumbgen, L. and Oja, H. (2009), Invariant coordinate selection, Journal

of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 549–592.

10

On point estimation of the abnormality of a Mahalanobis distance

Fadlalla G. Elfadaly1, Paul H. Garthwaite1 & John R. Crawford2

1 The Open University2 University of Aberdeen

Email: [email protected]

When a patient appears to have unusual symptoms, measurements or test scores, the degree to which

this patient is unusual becomes of interest. For example, clinical neuropsychologists sometimes need to

assess how a patient with some brain disorder or a head injury is different from the general population or

some particular subpopulation. This is usually based on the patient’s scores in a set of tests that measure

different abilities. Then, the question is “What proportion of the population would give a set of test scores

as extreme as that of the patient?” The abnormality of the patient’s profile of scores is expressed in terms

of the Mahalanobis distance between his profile and the average profile of the normative population. The

degree to which the patient’s profile is unusual can then be equated to the proportion of the population

who would have a larger Mahalanobis distance than the individual. This presentation will focus on forming

an estimator of this proportion using a normative sample. The estimators that are examined include plug-

in maximum likelihood estimators, medians, the posterior mean from a Bayesian probability matching

prior, an estimator derived from a Taylor expansion, and two forms of polynomial approximation, one

based on Bernstein polynomial and one on a quadrature method. Simulations show that some estimators,

including the commonly-used plug-in maximum likelihood estimators, can have substantial bias for small

or moderate sample sizes. The polynomial approximations yield estimators that have low bias, with the

quadrature method marginally to be preferred over Bernstein polynomials. Moreover, simulations of the

median estimators have a nearly zero median error. This latter estimator has much to recommend it when

unbiasedness is not of paramount importance, while the quadrature method is recommended when bias is

the dominant issue.

Keywords. Bernstein polynomials; Mahalanobis distance; median estimator; quadrature approxima-

tion; unbiased estimation.

11

Sparse Linear Discriminant Analysis with Common PrincipalComponents

Tsegay G. Gebru & Nickolay T. Trendafilov

Department of Mathematics and Statistics, The Open University, UK

Linear discriminant analysis (LDA) is a commonly used method for classifying a new observation into

one of g-populations. However, in high-dimensional classification problems the classical LDA has poor

performance. When the number of variables is much larger than the number of observations, the within-

group covariance matrix is singular which leads to unstable results. In addition, the large number of input

variables needs considerable reduction which nowadays is addressed by producing sparse discriminant

functions.

Here, we propose a method to tackle the (low-sample) high-dimensional discrimination problem by

using common principal components (CPC). LDA based on CPC is a general approach to the problem

because it does not need the assumption of equal covariance matrix in each groups. We find sparse CPCs

by modifying the stepwise estimation method proposed by Trendafilov (2010). Our aim is to find few im-

portant spare discriminant vectors which are easily interpretable. For numerical illustrations, the method

is applied on some known real data sets and compared to other methods for sparse LDA.

Bibliography

[1] Trendafilov, N.T. Stepwise estimation of common principal components. Computational Statistics and

Data Analysis 54:3446-3457, 2010.

12

Recovering Fisher linear discriminant subspace by InvariateCoordinate Selection

Radka Sabolova1,2, H. Oja3, G. Van Bever1 & F. Critchley1.

1 MCT Faculty, The Open University, Milton Keynes2 Email: [email protected]

3 Turku University

It is a remarkable fact that, using any pair of scatter matrices, invariant coordinate selection (ICS) can

recover the Fisher linear discriminant subspace without knowing group membership, see [5]. The subspace

is found by using two different scatter matrices S1 and S2 and joint eigendecomposition of one scatter

matrix relative to another.

In this poster, we focus on the two group normal subpopulation problem and discuss the optimal choice

of such a pair of scatter matrices in terms of asymptotic accuracy of recovery. The first matrix is fixed

as the covariance matrix while the second one is chosen within a one-parameter family based on powers

of squared Mahalanobis distance, indexed by α ∈ R. Special cases of this approach include Fourth Order

Blind Identification (FOBI, see [1]) and Principal Axis Analysis (PAA, see [4]).

The use of two scatter matrices in discrimination was studied by [2] and later elaborated in [3], who

proposed generalised PCA (GPCA) based on a family of scatter matrices with decreasing weight functions

of a single real parameter β > 0. They then discussed appropriate choice of β, while concentrating on

outlier detection.

Their form of weight function and the consequent restriction to β > 0 implies downweighting outliers.

On the other hand, in our approach, considering any α ∈ R allows us also to upweight outliers. Further,

we may, in addition to the outlier case, study mixtures of subpopulations.

Theoretical results are underpinned by an extensive numerical study.

The UK-based authors thank the EPSRC for their support under grant EP/L010429/1.

Bibliography

[1] Cardoso, J.-F. Source Separation Using Higher Moments Proceedings of IEEE international conference

on acoustics, speech and signal processing 2109-2112.

[2] Caussinus, H. and Ruiz-Gazen, A. Projection pursuit and generalized principal component analyses

New direction in Statistical Data Analysis and Robustness 35-46.

[3] Caussinus, H., Fekri, M., Hakam, S. and Ruiz-Gazen, A. A monitoring display of multivariate outliers

Computational Statistics & Data Analysis, 2003, 44, 237–252.

[4] Critchley, F., Pires, A. and Amado, C. Principal Axis Analysis technical report, Open University,

2006.

[5] Tyler, D., Critchley, F., Dumbgen, L. and Oja, H. Invariant Co-ordinate Selection J. R. Statist. Soc.

B., 2009, 71, 549–592.

13

Hilbertian Fourth Order Blind Identification

Germain Van Bever1,2, B. Li3, H. Oja4, R. Sabolova1 & F. Critchley1.

1 MCT Faculty, The Open University, Milton Keynes2 Email: [email protected]

3 Penn State University4 Turku University

In the classical Independent Component (IC) model, the observations X1, · · · , Xn are assumed to sat-

isfy Xi = ΩZi, i = 1, . . . , n, where the Zi’s are i.i.d. random vectors with independent marginals and Ω

is the mixing matrix. Independent component analysis (ICA) encompasses the set of all methods aiming

at unmixing X = (X1, . . . , Xn), that is estimating a (non unique) unmixing matrix Γ such that ΓXi,

i = 1, . . . , n, has independent components. Cardoso ([1]) introduced the celebrated Fourth Order Blind

Identification (FOBI) procedure, in which an estimate of Γ is provided, based on the regular covariance

matrix and a scatter matrix based on fourth moments. Building on robustness considerations and gener-

alizing FOBI, Invariant Coordinate Selection (ICS, [2]) was originally introduced as an exploratory tool

generating an affine invariant coordinate system. The obtained coordinates, however, are proved to be

independent in most IC models.

Nowadays, functional data (FD) are occurring more and more often in practice, and relatively few

statistical techniques have been developed to analyze this type of data (see, for example [3]). Functional

PCA is one such technique which focuses on dimension reduction with very little theoretical considerations.

We propose an extension of the FOBI methodology to the case of Hilbertian data, FD being the go-to

example used throughout. When dealing with distributions on Hilbert spaces, two major problems arise: (i)

the scatter operator is, in general, non-invertible and (ii) there may not exist two different affine equivariant

scatter functionals. Projections on finite dimensional subspaces and Karhunen-Loeve expansions are used

to overcome these issues and provide an alternative to FPCA. More importantly, we show that the proposed

construction is Fisher consistent for the independent components of an appropriate Hilbertian IC model

and enjoy the affine invariance property.

This work is supported by the EPSRC grant EP/L010429/1.

Keywords. Invariant Coordinate Selection; Functional Data; Symmetric Component Analysis; Inde-

pendent Component Analysis.

Bibliography

[1] Cardoso, J.-F. (1989), Source Separation Using Higher Moments Proceedings of IEEE international

conference on acoustics, speech and signal processing 2109-2112.

[2] Tyler, D., Critchley, F., Dumbgen, L. and Oja, H. (2009) Invariant Co-ordinate Selection J. R. Statist.

Soc. B., 71, 549–592.

[3] Ramsay, J. and Silverman, B.W. (2006) Functional Data Analysis 2nd edn. Springer, New York.

14

Workshop On Multivariate Analysis Today Programme and Book ...

Documents

Transcript of Workshop On Multivariate Analysis Today Programme and Book ...