Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as...

29
Cram´ er-Karhunen-Lo` eve Representation and Harmonic Principal Component Analysis of Functional Time Series 1 Victor M. Panaretos & Shahin Tavakoli Ecole Polytechnique F´ ed´ erale de Lausanne Abstract We develop a doubly spectral representation of a stationary functional time series, and study the properties of its empirical version. The representation decomposes the time series into an integral of uncorrelated frequency components (Cram´ er representation), each of which is in turn expanded in a Karhunen-Lo` eve series. The construction is based on the spectral density operator, the functional analogue of the spectral density matrix, whose eigenvalues and eigenfunctions at different frequencies provide the building blocks of the representation. By truncating the representation at a finite level, we obtain a harmonic principal component analysis of the time series, an optimal finite dimensional reduction of the time series that captures both the temporal dynamics of the process, as well as the within-curve dynamics. Empirical versions of the decompositions are introduced, and a rigorous analysis of their large-sample behaviour is provided, that does not require any prior structural assumptions such as linearity or Gaussianity of the functional time series, but rather hinges on Brillinger-type mixing conditions involving cumulants. Keywords: spectral representation, spectral density operator, functional data analysis, functional principal components, discrete Fourier transform, cumulants, mixing 2008 MSC: 60G10, 62M15, 62M10 Introduction Though spectral decompositions can play an important role in the statistical analysis of many classes of stochastic processes, it may not be an exaggeration to claim that in functional data analysis in particular, they are not simply important, but essential. Functional data analysis consists in drawing inferences pertaining to the law of a con- tinuous time stochastic process {X(τ ); τ [0, 1]} with mean function and covariance operator m(τ )= E[X(τ )] and R 0 := E[(X - m) (X - m)], respectively, on the basis of a collection of T (independent identically distributed) reali- sations of this stochastic process, {X t (τ )} T -1 t=0 . The process {X(τ ); τ [0, 1]} is typically modeled as a random element of a separable Hilbert space of functions, often that of 1 Research Supported by an ERC Starting Grant Award. Preprint submitted to Stochastic Processes and their Applications March 20, 2013

Transcript of Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as...

Page 1: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

Cramer-Karhunen-Loeve Representation and HarmonicPrincipal Component Analysis of Functional Time Series1

Victor M. Panaretos & Shahin Tavakoli

Ecole Polytechnique Federale de Lausanne

Abstract

We develop a doubly spectral representation of a stationary functional time series, andstudy the properties of its empirical version. The representation decomposes the timeseries into an integral of uncorrelated frequency components (Cramer representation),each of which is in turn expanded in a Karhunen-Loeve series. The construction is basedon the spectral density operator, the functional analogue of the spectral density matrix,whose eigenvalues and eigenfunctions at different frequencies provide the building blocksof the representation. By truncating the representation at a finite level, we obtain aharmonic principal component analysis of the time series, an optimal finite dimensionalreduction of the time series that captures both the temporal dynamics of the process,as well as the within-curve dynamics. Empirical versions of the decompositions areintroduced, and a rigorous analysis of their large-sample behaviour is provided, thatdoes not require any prior structural assumptions such as linearity or Gaussianity of thefunctional time series, but rather hinges on Brillinger-type mixing conditions involvingcumulants.

Keywords: spectral representation, spectral density operator, functional data analysis,functional principal components, discrete Fourier transform, cumulants, mixing2008 MSC: 60G10, 62M15, 62M10

Introduction

Though spectral decompositions can play an important role in the statistical analysisof many classes of stochastic processes, it may not be an exaggeration to claim thatin functional data analysis in particular, they are not simply important, but essential.Functional data analysis consists in drawing inferences pertaining to the law of a con-tinuous time stochastic process X(τ); τ ∈ [0, 1] with mean function and covarianceoperator

m(τ) = E[X(τ)] and R0 := E[(X −m)⊗ (X −m)],

respectively, on the basis of a collection of T (independent identically distributed) reali-sations of this stochastic process, Xt(τ)T−1t=0 . The process X(τ); τ ∈ [0, 1] is typicallymodeled as a random element of a separable Hilbert space of functions, often that of

1Research Supported by an ERC Starting Grant Award.

Preprint submitted to Stochastic Processes and their Applications March 20, 2013

Page 2: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

square integrable complex functions defined on [0, 1], say L2 ([0, 1],C). As such, it ad-mits a Karhunen-Loeve decomposition, a spectral representation of the form

X(τ) = m(τ) +

∞∑n=1

ξnϕn(τ), (1)

where ϕn∞n=1 are the orthonormal eigenfunctions of the operator R0, and ξn∞n=1 are

the corresponding uncorrelated Fourier coefficients, ξn =∫ 1

0ϕn(τ)[X(τ)−m(τ)]dτ , with

variance equal to the respective eigenvalue of R0, say λn. Convergence is in mean square,and can in fact be seen to be uniform over τ , provided X is continuous in mean square.The decomposition is essentially unique (assuming no multiplicities in the eigenvalues),and characterizes the law of X.

The functional principal component decomposition (1) is fundamental for a numberof reasons. First and foremost, it yields a separation of variables: the stochastic partof X, represented by the countable collection ξn, is separated from the functionalpart, represented by the deterministic functions ϕn. Furthermore, it provides insightinto the smoothness properties of the random function, which are encapsulated in thesmoothness of the the functions ϕn, each “relatively contributing” according to the ratioλn/

∑n≥1 λn. Finally, it allows for an optimal finite-dimensional approximation of the

random function X, a functional Principal Component Analysis, in that the projectionof X onto the space spanned by the first K eigenfunctions ϕnKn=1 provides the best K-dimensional approximation of X in mean square. As a consequence, the Karhunen-Loeverepresentation has become both the object of and the means for much of the statisticalmethodology developed for functional data. It has defined what today is accepted asthe canonical framework for functional data analysis and has provided a bridge allowingfor a technology transfer of tools from multivariate statistics to problems of functionalstatistics.

As the name suggests, the Karhunen-Loeve expansion can be traced back to thework of Loeve [27] and Karhunen [24], the former in the context of series representa-tions of Wiener measures and the latter in the context of linear filtering of stochasticprocesses. From the statistical perspective, Grenander [16] used the countable representa-tion afforded by the expansion as a coordinate system to construct inferential proceduresfor random functions (perhaps marking the birth of functional data analysis; see alsoGrenander [17]). Large sample asymptotic properties of the empirical functional princi-pal components, constructed on the basis of an iid sample Xt(τ); τ ∈ [0, 1]T−1t=0 , wereconsidered by Kleffe [25], who proved their consistency for the true functional princi-pal components, and Dauxois et al. [13], who determined their asymptotic distributions.The empirical functional principal components were subsequently put to use to general-ize finite-dimensional methods to the functional case, notably by Besse and Ramsay [2]and Rice and Silverman [35], leading on the one hand to a surge in methodological workon functional principal components: smooth components (e.g. Silverman [36]), higherorder theory (Hall and Hosseini-Nasab [18, 19]), nonparametric and conditional compo-nents (e.g. Cardot [10, 11]) and components for irregularly sampled functional data (e.g.Yao et al. [38], Hall et al. [20], Amini and Wainwright [1]); and on the other hand, to asurge in methodology for functional data hinging on Karhunen-Loeve representations andthe corresponding functional principal component decompositions: functional regression,functional classification, functional testing, and functional robustness, to name a few (see

2

Page 3: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

Ramsay and Silverman [34], Ferraty and Romain [15], and Horvath and Kokoszka [22]for detailed overviews). It is interesting to note that, in this body of work, functionalprincipal components arose both as a basis for motivating methodology, but also as a toolto apply regularization (via spectral truncation) to problems such as prediction, testingand regression which are ill-posed in the infinite dimensional case.

Parallel to the development of statistical work for iid functional samples, an importantbody of literature on dependent but stationary sequences of random functions Xt(τ); τ ∈[0, 1]T−1t=0 was developed. In the dependent case, one needs to consider both the withincurve dynamics, described by the covariance operator R0 and its spectral decomposition,as well as the between curve dynamics, captured (up to second order) by autocovarianceoperators

Rt := E[(Xt −m)⊗ (X0 −m)] (2)

and their respective singular value decompositions. Work on this front was pioneered byBosq [3, 4], who focussed on the estimation of spectral decompositions of autocovarianceoperators in the special case of stationary AR(1) functional process, work later extendedto more general linear stationary functional processes (Mas [28, 29]; Bosq [6]). As inthe iid case, these decompositions are interesting in themselves, as they provide variableseparation, smoothness information and dimension reduction for the different ordersof dependence of the functional process; but they are also fundamental as a steppingstone for the development of time series methodology used in prediction, filtering, orderestimation and change detection, to name only a few (see Bosq [5] and Bosq and Blanke[7] for an overview, and the recent review by Mas and Pumo [31]). However, manyof these decompositions concern isolated aspects of the functional process: no singleautocovariance operator captures the global dynamics of the functional process, andthe “sum” of all separate spectral decompositions of each autocovariance operator doesnot provide a coherent simultaneous spectral decomposition of the entire second-orderdynamics of the process; for example, there is no clear representation of the originalseries in terms of these separate decompositions. Furthermore, most of the work carriedout thus far has assumed the linearity of the underlying process – an assumption thatis often reasonable, but by no means a weak one. More recent work has attempted tomove away from the linear process model. Hormann and Kokoszka [21] consider theestimation of decompositions of R0 under general mixing conditions, without assuminglinearity. Under similar assumptions, Horvath et al. [23] estimate the long-run covarianceoperator, an average of all covariance operators. Panaretos and Tavakoli [33] introducea frequency-domain approach and estimate the complete second order structure of theprocess, by means of estimators of the spectral density operator. See Kokoszka [26] for areview of recent developments in dependent functional data.

The purpose of this paper is to develop spectral representations for stationary se-quences that simultaneously capture both the within curve dynamics (the dynamics ofthe curve X0(τ) : τ ∈ [0, 1]) as well as the between curve dynamics (the dynamics ofthe sequence Xt : t ∈ Z) and to estimate them without assuming any prior structuralproperties for the stationary sequence (e.g. linearity or Gaussianity) except some weakdependence conditions based on cumulants. Our approach is a frequency domain one,exploiting the notion of a spectral density operator

Fω :=1

∑t

Rte−iωt,

3

Page 4: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

the Fourier transform of the autocovariance operators (with respect to the lag argument)introduced and investigated in Panaretos and Tavakoli [33].

Assuming without loss of generality that m = 0, we develop end estimate a doublyspectral representation of the functional process Xt(τ) of the form

Xt(τ) =

[∫ π

−πeiωt

( ∞∑n=1

ϕωn ⊗ ϕωn

)dZω

](τ),

which can be formally represented as

Xt(τ) =

∫ π

−πeiωt

∞∑n=1

〈ϕωn , dZω〉ϕωn(τ),

and which we call a Cramer-Karhunen-Loeve representation; here, for each frequencyω, ϕωn(τ)∞n=1 is an orthonormal basis of L2 ([0, 1],C) comprised of eigenfunctions ofthe the spectral density operator Fω, and the Fourier coefficients 〈ϕωn , dZω〉∞n=1 areuncorrelated (both w.r.t to n and ω) random variables with variance equal to the ntheigenvalue of Fω, say µn(ω).

Similarly with the Karhunen-Loeve decomposition, such a representation yields aseparation of variables (separating the functional from the stochastic component); itprovides insight on the smoothness properties of the random functions, and how theseinteract with the dependence structure of the sequence: not only does it decompose theprocess into uncorrelated functional frequency components, but it reveals which basis isoptimal to represent each frequency component (by looking at the corresponding eigen-functions ϕn(ω)), and what the effective dimensionality of each frequency componentis (by looking at the relative decay of the corresponding eigenvalues µn(ω)); finally,it serves to yield an optimal reduction of the process Xt, to a process with only K-degrees of freedom which nevertheless captures its temporal dynamics, by truncation ofthe series inside the integral at a finite level K,∫ π

−πeiωt

K∑n=1

〈ϕωn , dZω〉ϕωn(τ),

providing a Harmonic Functional Principal Component Analysis of the Process Xt.Further to developing these formal representations, we consider the explicit construc-

tion of their empirical counterparts, on the basis of a finite stretch of length T of thetime series Xt. We derive the asymptotic distributions of the different elements of theempirical representation, obtaining analogues of the results that Dauxois et al. [13] for iidfunctional data. The paper is organized in two parts. The first part (Part (I)) providesan accessible heuristic motivation and overview of the main results. The second part(Part (II)) provides the corresponding rigorous statements and their formal derivations.

(I) Heuristic Outline of the Main Results

The autocovariance operators Rtt∈Z of a second-order stationary time series Xt ∈L2 ([0, 1],R) (defined in (2)) encode the complete second-order structure of the timeseries Xtt∈Z, assumed to have mean zero. Corresponding to this sequence of operators

4

Page 5: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

is a collection of spectral density operators Fωω∈[−π,π], defined as their discrete-timeFourier transform, and yielding the Fourier pair

Fω =1

∑t∈Z

e−iωtRt, Rt =

∫ π

−πFωe

itωdω, (3)

provided that the Rt are summable in an appropriate sense (a rigorous definition—andsufficient conditions for its validity—will be given in Section 1.2). Now, assume that wecan approximate the integral in the second equation in (3) by a Riemann sum, to get

Rs =

∫ π

−πeisωFωdω ≈

J∑j=1

Fωjeisωj (ωj+1 − ωj),

where −π = ω1 < · · · < ωJ+1 = π is a partition. Then, we are naturally tempted toconjecture that Xt ought to be decomposable into a sum of distinct and uncorrelatedfrequency components,

Xt ≈J∑j=1

eiωjtXt(ωj) (4)

where each Xt(ωj) would be a mean-zero functional time series in L2 ([0, 1],C) withcovariance operator close to Fωj (ωj+1 − ωj), since, in this case, Xt would indeed have

covariance Rt =∑Jj=1 Fωje

itωj (ωj+1−ωj). We pursue such a decomposition in Section 2,where we formalize it as the functional Cramer representation (Theorem 2.1),

Xt =

∫ π

−πeiωtdZω, a.s. a.e.,

for a functional orthogonal increment process Z (independent of t), thus extending theclassical Cramer representation of multivariate stationary processes (e.g. Brillinger [8]).

The Cramer representation provides a spectral decomposition with respect to fre-quency. Nevertheless, we may pursue a second “layer” of spectral decomposition, onein terms of dimension. Going back to the heuristic form (4), we notice that, for eachj = 1, . . . , J , Xt(ωj) is a random element of L2 ([0, 1],C). We may thus represent itthrough its Karhunen-Loeve (KL) expansion (1), leading to the heuristic representation

Xt ≈J∑j=1

eiωjt∞∑i=1

ξi,jϕi,j(τ), (5)

with ϕi,ji≥1 being the eigenfunctions of the covariance operator of Xt(ωj) and ξi,ji≥1the corresponding Fourier coefficients. Truncating the second series at some K <∞, willyield a decomposition into distinct frequency elements that are uncorrelated, and finitedimensional,

Xt ≈J∑j=1

eiωjtK∑i=1

ξi,jϕi,j(τ). (6)

The finite dimensional subspace in which each frequency component takes its values neednot be the same for distinct j’s, even though each of them is of dimension K. In fact,

5

Page 6: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

it will turn out that this truncated representation only posseses K degrees of freedom.One would then hope that this reduced version of Xt(ωj) would retain the property ofbeing the optimal (in the L2 sense) K-dimensional reduction of the process Xt. Rigorousversions of decomposition (5), and its truncated version (6), are formally carried out inSection 2. Specifically, we derive the Cramer-Karhunen-Loeve decomposition

Xt =

∫ π

−πeiωt

( ∞∑n=1

ϕωn ⊗ ϕωn

)dZω =

∫ π

−πeiωt

∞∑n=1

〈ϕωn , dZω〉ϕωn , (7)

where the last equality is understood formally (Remarks 2.4 and 3.10). This is a Cramerrepresentation with respect to frequency, but also a Karhunen-Loeve expansion in termsof dimension, since it can be seen that ϕωnn≥1 is the basis of eigenfunctions of Fω (thecovariance operator of dZω). Furthermore, by considering the bounded operator-valued

function∑Kn=1 ϕ

ωn(τ) ⊗ ϕωn(σ) as a function over [−π, π], and defining the notion of its

stochastic integral (Section 3.1), we show that the truncated representation

X∗t =

∫ π

−πeiωt

(K∑n=1

ϕωn ⊗ ϕωn

)dZω (8)

is well defined, possesses K degrees of freedom, and converges to Xt in mean square asK →∞. More importantly, by considering the process X∗t for different values of K, weobtain a harmonic principal component analysis of Xt (Section 3.2). That is, we prove(Theorem 3.7 and Remark 3.10) that, among all linear reductions Xt to a process Wt ofonly K degrees of freedom, we have

E‖Xt −X∗t ‖2 ≤ E‖Xt −Wt‖2.

Section 3.2 explains how the process X∗t can be constructed explicitly, when the spec-tral density estimator Fω is known, and how it can be represented as a stationary vectorvalued process with uncorrelated coordinates in RK .

Parallel to the rank K reduction, one may want to have a better finite dimensionalapproximation of Xt(ωj) for some j’s, and a cruder one for other j’s, depending on howmuch each ωj contributes to the power of the signal and/or the effective dimension ofeach Xt(ωj). This can be done by letting the dimension K vary with j, leading to theheuristic approximation

Xt ≈J∑j=1

eiωjtXKjt (ωj) =

J∑j=1

eiωjtKj∑i=1

ξi,jϕi,j , (9)

where XKjt (ωj) is Kj-dimensional. It will turn out that such a representation is also

rigorously valid (Section 3.1), and of the form

X∗∗t =

∫ π

−πeiωt

K(ω)∑n=1

ϕωn ⊗ ϕωn

dZω

provided that the function K : [−π, π] → 0, 1, . . . yielding the desired finite rank foreach frequency component is cadlag. In fact, it will be shown, that among all linear trans-formations of the process Xt having finite rank K(ω) at each frequency component,this is the optimal one, in the L2 sense (Theorem 3.7).

6

Page 7: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

The final part of the paper (Section 4) is devoted to the construction of empiricalversions of the representations presented, on the basis of a finite stretch of the time series,say XtTt=0, T <∞. These require estimates of the eigenfunctions and eigenvalues of thespectral density operator. We estimate these by the eigenfunctions and eigenvalues of the

estimate F(T )ω of the spectral density operator Fω introduced by Panaretos and Tavakoli

[33], constructed via the discrete Fourier transform of the observed time series. We derivethe asymptotic distributions of the estimated eigenfunctions and eigenvalues, showingthat they are jointly asymptotically normal for a finite number of distinct frequenciesω ∈ [0, π]. Moreover, the estimators are independent for distinct frequencies, and, in fact,the estimators of the eigenvalues are independent between different orders at the samefrequency ω, and independent of the eigenfunctions (See Theorem 4.3). Some technicalmaterial is contained in Section 5 (and in Panaretos and Tavakoli [32]).

(II) Formal Statements and their Proofs

1. Background Material

1.1. Notation

We shall denote by H the Hilbert space L2 ([0, 1],C), and by H the Hilbert spaceL2(Ω, H,P) of H-valued random variables with finite second moment, where P is theunderlying probability measure. Their norms and inner products will be denoted by〈·, ·〉, ‖·‖ and 〈·, ·〉H, ‖·‖H, respectively. We denote by Sp(H) the Schatten p-class of oper-ators on H, and denote its norm by |||·|||p. The cases p = 1, p = 2 and p =∞ correspondto the spaces of nuclear (or trace-class), Hilbert-Schmidt, and bounded operators on H,respectively. If T ∈ S1(H), we will denote its trace by trace (T ). The space S2(H) beinga Hilbert space, we will denote its inner product by 〈·, ·〉S2 . We will denote the identityoperator on H by I , and, for operators A and B, the term AB will denote the composi-tion of A and B, and A† will denote the adjoint of A. If A is an integral operator, we willsometimes abuse notation and denote its kernel by A(τ, σ). Hence if A,B are integraloperators on H, then the kernel of AB is

AB(τ, σ) =

∫ 1

0

A(τ, x)B(x, σ)dx.

We will denote the imaginary number by i ∈ C. For a set S ⊂ [−π, π], we definethe indicator function of S by 1S . For u, v ∈ H, we define the tensor product between uand v, u ⊗ v ∈ S2(H), by u ⊗ v(f) = 〈f, v〉u, for f ∈ H. Notice that for A,B ∈ S2(H),we can also define A⊗B ∈ S2(S2(H)). To avoid confusion, we will denote the latter byA⊗B. We also define the Kronecker product A⊗B ∈ S2(S2(H)) by A⊗B(C) = ACB†,

for C ∈ S2(H). If HR is a real Hilbert space, then for two operators A,B ∈ S2(HR),we also define their transpose Kronecker product A⊗

TB ∈ S2(S2(HR)), by A⊗

TB(C) =

(A⊗B)(CT) = ACTBT, for C ∈ S2(HR). Here, AT also denotes the adjoint of A, butstresses that it operates on a real Hilbert space. Useful properties of the tensor, Kroneckerand transpose Kronecker product are given in Section 5.1.

7

Page 8: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

1.2. Basic Definitions and Assumptions

Our basic assumptions concerning the smoothness of the curves Xt(τ) and thestrength of dependence between the elements of the sequence Xt follow those in Panare-tos and Tavakoli [33]. In particular, we will assume that the following conditions hold:

Conditions 1.1 (and Definitions). Xt is a second order stationary (temporally trans-lation invariant only up to the second order characteristics of its law), times series in

L2 ([0, 1],R), with mean zero, and E‖X0‖2 < ∞. Its autocovariance kernel at lag t willbe defined as

rt(τ, σ) = E [Xt(τ)X0(σ)], τ, σ ∈ [0, 1] & t ∈ Z,inducing a corresponding operator Rt : L2 ([0, 1],R)→ L2 ([0, 1],R) by right integration,the autocovariance operator at lag t,

Rth = E [Xt〈h,X0〉], h ∈ L2 ([0, 1],R).

Furthermore, we assume the following conditions hold:

(i)∑t∈Z |||Rt|||1 <∞.

(ii) (τ, σ) 7→ rt(τ, σ) is continuous, and∑t∈Z ‖rt‖∞ <∞.

Panaretos and Tavakoli [33] discuss the role of these conditions, and at what costthey may be weakened. Under these conditions, we define the spectral density kernel,

fω(·, ·) =1

∑t∈Z

rt(·, ·),

where the convergence is uniform, and we denote by Fω the operator on L2 ([0, 1],C)induced by fω. The autocovariance kernel can be recovered from the spectral densitykernels by means of the following inversion formula:∫ π

−πfω(τ, σ)eiωtdω = rt(τ, σ), ∀t ∈ Z and τ, σ ∈ [0, 1].

Each Fω is trace-class, ω 7→ |||Fω|||1 is uniformly continuous, fω(τ, σ) is continuous inω, τ, σ; Fω is non-negative definite, and thus∫ 1

0

fω(τ, τ)dτ = |||Fω|||1.

Furthermore, Fω is self-adjoint, 2π-periodic with respect to ω, and

f−ω(τ, σ) = fω(τ, σ) = fω(σ, τ).

The reader is referred to Panaretos and Tavakoli [33] for proofs of these assertions. Thespectral density operator thus admits a singular value decomposition of the form

Fω =

∞∑j=1

µj(ω)ϕωj ⊗ ϕωj , (10)

where µj(ω)j≥1 is a non-increasing sequence of positive real numbers, tending to zeroand ϕωnn≥1 is an orthonormal system L2 ([0, 1],C). When Fω is strictly positive-definite, the orthonormal system ϕωnn≥1 is, in fact, complete for L2 ([0, 1],C).

8

Page 9: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

2. Towards a Cramer-Karhunen-Loeve Representation

We begin by deriving a functional version of the classical spectral representationof a stationary time series, thus extending Cramer’s representation [12] to the infinitedimensional case.

Theorem 2.1 (Functional Cramer Representation). Under Conditions 1.1, Xt admitsthe representation

Xt =

∫ π

−πeiωtdZω, a.s. a.e., (11)

where for fixed ω, Zω is random element of L2 ([0, 1],C) with E‖Zω‖22 =∫ ω−π |||Fα|||1dα,

and the process ω 7→ Zω has orthogonal increments:

E〈Zω1− Zω2

, Zω3− Zω4

〉 = 0, if ω1 > ω2 ≥ ω3 > ω4. (12)

The representation (11) is called the Cramer representation of Xt, and the stochasticintegral involved can be understood as a Riemann-Stieltjes limit, in the sense that

E

∥∥∥∥∥∥Xt −J∑j=1

eiωjt(Zωj+1− Zωj )

∥∥∥∥∥∥2

→ 0, as J →∞, (13)

where −π = ω1 < · · · < ωJ+1 = π and maxj=1,...,J |ωj+1 − ωj | → 0 as J →∞.

Remark 2.2. Consider the particular case where Xt is a linear process, i.e. Xt =∑t∈ZAt−sεs, with εtt∈Z an iid sequence of random elements of L2 ([0, 1],R) and Att∈Z

a sequence of linear operators in S2(H); then, Conditions 1.1 will be satisfied if: (1) theεtt∈Z are mean square continuous and E‖εt‖2 < ∞, ∀ t ∈ Z, (2) the Att∈Z admitcontinuous kernels and satisfy

∑t∈Z |||At|||2 <∞.

Proof of Theorem 2.1. Let M0 be the complex linear space spanned by all finite linearcombinations of the Xt’s,

M0 :=

n∑j=1

ajXtj : n = 1, 2, . . . ; aj ∈ C, tj ∈ Z

⊂ H,

and M be the closure of M0 in H. Let H ′ be the space of all measurable complex functiong : [0, 1]→ C such that ∫ 1

0

|g(α)|2|||Fα|||1dα <∞,

where |||Fα|||1 is the nuclear norm of the spectral density operator Fα. We endow H ′

with the inner product

〈g, h〉H′ =

∫ 1

0

g(α)h(α)|||Fα|||1dα, g, h ∈ H ′,

9

Page 10: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

which makes H ′ a Hilbert space. Now, let et ∈ H ′ denote the function α 7→ eitα. Wedefine an operator T : M0 → H ′ by linear extension of the mapping Xt 7→ et, that is

T

n∑j=1

ajXtj

=

n∑j=1

ajetj , (14)

for any aj ∈ C, and tj ∈ Z. Using the inversion formula, 〈Xt, Xs〉H = 〈et, es〉H′ , t, s ∈Z, and hence for any A,B ∈ M0, 〈T (A), T (B)〉H′ = 〈A,B〉H. In particular, T is welldefined and is a linear isometry. We extend its domain to M in the following way: forany A ∈ M, let T (A) be the limit in H ′ of T (An), where Ann≥1 ⊂ M0 is a sequenceconverging to A. If A′nn≥1 ⊂M0 is another sequence converging to A, then

‖T (An)− T (A′n)‖H′ = ‖An −A′n‖H → 0 as n→∞,

and hence the extension of T is well defined. It is also linear, maintains the isometryproperty on the entire M, and satisfies T (M) = H ′ since the linear space of the functionset is dense in H ′. Hence T admits a well-defined inverse, say T−1 : H ′ →M.

For any ω ∈ (−π, π], we define Zω = T−1(1[−π,ω)) ∈ M, and also Z−π ≡ 0 ∈ M. Bythe isometry property of T ,

〈Zω, Zβ〉H =⟨T−11[−π,ω), T

−11[−π,β)⟩H =

∫ min(ω,β)

−π|||Fα|||1dα. (15)

It follows that ω 7→ Zω is an orthogonal increment process. We shall now define theintegral with respect to Zω. Let D ⊂ H ′ be the space of simple functions of the form

g =

n∑j=1

gj1[ωj ,ωj+1),

where gj ∈ C and −π = ω1 < ω2 < · · · < ωn+1 = π. We equip D with the scalar productof H ′, and define φ : D → H by

φ

n∑j=1

gj1[ωj ,ωj+1)

=n∑j=1

gj(Zωj+1− Zωj ).

By (15), φ is isomorphism, whose domain is D = H ′. Moreover, it is straightforward tosee that φ is equal to T−1 on D, hence φ = T−1. This in turn implies Xt = T−1(et) =φ(et). If g is cadlag with a finite number of jumps, then φ(g) is in fact the Riemann–Stieltjes integral (in the mean square sense) with respect to the orthogonal incrementprocess Zω :

φ(g) =

∫ π

−πg(α)dZα, g ∈ H ′. (16)

In other words, we have shown that Xt =∫ π−π e

itαdZα, as claimed.

As expected from the finite-dimensional version of the Cramer representation, the second-order properties of the orthogonal increment process Z are inextricably linked with thespectral density operator of Xt. In fact, we may prove that these properties extend tothe functional case not only in an L2 sense, but in a stronger sense:

10

Page 11: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

Proposition 2.3 (Second-Order Properties of the Orthogonal Increment Process). Letα 7→ Zα be the process defined in Theorem 2.1, and assume Conditions 1.1 hold. Ifπ ≥ ω1 > ω2 ≥ ω3 > ω4 ≥ −π, then

E[(Zω1

(τ)− Zω2(τ))(Zω3

(σ)− Zω4(σ))

]= 0 a.e.,

and for π ≥ ω > β > −π, we have

E[(Zω(τ)− Zβ(τ))(Zω(σ)− Zβ(σ))

]=

∫ ω

β

fα(τ, σ)dα, a.e.

Proof. It suffices to show that

E[Zω(τ)Zβ(σ)

]=

∫ min(ω,β)

−πfα(τ, σ)dα, a.e.

Let B′ = L1([0, 1]2,C) with norm

‖g‖B′ =

∫∫[0,1]2

|g(τ, σ)|dτdσ, g ∈ B′.

We will show that for any A1, A2 ∈M,∥∥∥∥E [A1 ⊗A2]−∫ π

−πT (A1)(α)T (A2)(α)fα

∥∥∥∥B′

= 0, (17)

whereE [A1 ⊗A2](τ, σ) = E [A1 ⊗A2(τ, σ)] = E [A1(τ)A2(σ)].

First, let us mention two properties that will simplify the proof:

(i) ‖E [A1 ⊗A2]‖B′ ≤ ‖A1‖H‖A2‖H,(ii)

∥∥∥∫ π−π T (A1)(α)T (A2)(α)fα∥∥∥B′≤ ‖A1‖H‖A2‖H,

Property (i) is a consequence of the Cauchy-Schwarz inequality. Property (ii) followsfrom∥∥∥∥∫ π

−πT (A1)(α)T (A2)(α)fα

∥∥∥∥B′

=

∫∫[0,1]2

∣∣∣∣∫ π

−πT (A1)(α)T (A2)(α)fα(τ, σ)dα

∣∣∣∣ dτdσ≤∫ π

−π

(∫∫[0,1]2

∣∣fα(τ, σ)∣∣ dτdσ)∣∣∣T (A1)(α)T (A2)(α)

∣∣∣ dα≤∫ π

−π|||Fα|||1

∣∣∣T (A1)(α)T (A2)(α)∣∣∣ dα

≤ ‖T (A1)‖H′‖T (A2)‖H′= ‖A1‖H‖A2‖H,

where we have used the Cauchy-Schwarz inequality and the fact that the Hilbert-Schmidtnorm is dominated by the nuclear norm. The operator T and the Hilbert space H ′ =

11

Page 12: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

L2([−π, π],C, |||Fα|||1dα) are defined in the proof of Theorem 2.1. We can now show(17). Let A1,n, A2,n ⊂ M0 be sequences converging to A1, respectively A2 in the normof H. Using the triangle inequality, (17) is bounded above by the sum of

‖E [A1 ⊗A2]− E [A1,n ⊗A2,n]‖B′ (18)

and ∥∥∥∥E [A1,n ⊗A2,n]−∫ π

−πT (A1)(α)T (A2)(α)fα

∥∥∥∥B′. (19)

By (i), we may make (18) sufficiently small by choice of a sufficiently large n. Indeed,

‖E [A1 ⊗A2]− E [A1,n ⊗A2,n]‖B′ ≤ ‖A1 −A1,n‖H‖A2‖H + ‖A1,n‖H‖A2 −A2,n‖H < ε/2,

for n large enough. Let us now show that (19) is bounded by ε/2 for large n. Notice that

E [Xt(τ)Xs(σ)] =

∫ π

−πT (Xt)(α)T (Xs)(α)fα(τ, σ)dα,

hence by linearity, we have

E [A1,n(τ)A2,n(σ)] =

∫ π

−πT (A1,n)(α)T (A2,n)(α)fα(τ, σ)dα.

Thus (19) is bounded by∥∥∥∥∫ π

−πT (A1,n −A1)(α)T (A2,n)(α)fα

∥∥∥∥B′

+

∥∥∥∥∫ π

−πT (A1)(α)T (A2,n −A2)(α)fα

∥∥∥∥B′.

Using (ii), this is in turn bounded by

‖A1,n −A1‖H‖A2,n‖H + ‖A1‖H‖A2,n −A2‖H < ε/2

for n large enough.

Remark 2.4 (A Cramer-Karhunen-Loeve Representation). Assuming Conditions 1.1and that, for all ω ∈ [−π, π], the spectral density operator Fω is strictly positive-definite,we may now straightforwardly write

Xt =

∫ π

−πeiωt

( ∞∑n=1

ϕωn ⊗ ϕωn

)dZω. (20)

where Zω is as in Theorem 2.1, and ϕωnn≥1 are the eigenfunctions of Fω.

Though the representation in Remark 2.4 is a mere reformulation of Theorem 2.1, itsets the scene for the natural question of the nature of the approximation of Xt thatmight arise if we where able to truncate the identity operator

∑∞n=1 ϕ

ωn ⊗ ϕωn to have

finite rank

X∗t :=

∫ π

−πeiωt

(K∑n=1

ϕωn ⊗ ϕωn

)dZω. (21)

12

Page 13: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

i.e. to consider the limiting behaviour of E‖Xt −X∗t ‖2 as K → ∞ (See Remark 3.10).It is such truncations (and their approximation error) that are at the essence of repre-sentations of the Karhunen-Loeve type. This is the subject of the next section, wherewe provide a formalism to make sense of an integral of the form (21) (it is not a prioriclear that it is well-defined, since now the operator in the integrand depends on ω), andprove that it provides an optimal rank K approximation of the original process, yield-ing a harmonic principal component analysis of the process Xt (in fact, we will not berequiring that Fω be strictly positive).

Remark 2.5. Note that the action of the operator∑∞n=1 ϕ

ωn ⊗ ϕωn on an element g ∈

L2 ([0, 1],C) is described by [∑∞n=1 ϕ

ωn ⊗ ϕωn ] g =

∑∞n=1〈ϕωn , g〉ϕωn. Therefore, we may

formally interpret the Cramer-Karhunen-Loeve representation as

Xt =

∫ π

−πeiωt

∞∑n=1

〈ϕωn , dZω〉ϕωn ,

a form which emphasizes the doubly spectral decomposition of Xt as discussed in Part(I) and the Introduction.

3. Harmonic Principal Component Analysis

3.1. Stochastic Integrals of Operators

Let ω 7→ Aω be a mapping [−π, π]→ S∞(H). In order to make sense of expressionssuch as (21), we wish to give a meaning to the stochastic integral

∫ π−π AωdZω. This

is done in a fashion similar to the Ito integral (e.g. Steele [37]); however the majordifferences here are that no filtration is involved, Zω is a random element of L2 ([0, 1],C),and the functions Aω are operator-valued. Let

B = A : [−π, π]→ S∞(H) such that ‖A‖B <∞ ,

where

‖A‖2B =

∫[−π,π]

|||Aω|||2∞|||Fω|||1dω.

The space B is, in fact, the Bochner space L2([−π, π],S∞(H)), equipped with the measureµ(E) =

∫E|||Fω|||1dω; in particular, it is a Banach space (see e.g. Dinculeanu [14]). Let

M0 ⊂ B be the subspace of step functions, spanned by the elements A1[α,β), where either

A ∈ S2(H) or A = I, and α, β ∈ [−π, π], α < β. We also define M = M0 ⊂ B, where theclosure is taken in B. We first define I : M0 → H by linear extension of the mapping

I(A1[α,β)) = A(Zβ − Zα), A ∈ S2(H) or A = I.

In order to show that the image of I is indeed in H, we need a Lemma:

Lemma 3.1. Assume Conditions 1.1 hold, and let T1 = A1 +γ1I, T2 = A2 +γ2I, whereAi ∈ S2(H), γi ∈ C for i = 1, 2. Then

〈T1Zα, T2Zβ〉H = trace

(T1

[∫ min(α,β)

−πFωdω

]T †2

). (22)

13

Page 14: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

‖T1Zα‖2H ≤ |||T1|||2∞

∫ α

−π|||Fω|||1dω. (23)

Hence if α1 < α2 ≤ α3 < α4,

〈T1(Zα2 − Zα1), T2(Zα4 − Zα3)〉H = 0. (24)

and

‖T1(Zα2− Zα1

)‖2H ≤ |||T1|||2∞

∫ α2

α1

|||Fω|||1dω. (25)

Proof. First notice that

〈T1Zα, T2Zβ〉H = 〈A1Zα, A2Zβ〉H + γ1〈Zα, A2Zβ〉H + γ2〈A1Zα, Zβ〉H + γ1γ2〈Zα, Zβ〉H.

By linearity, it suffices to show that the formula (22) holds for each of the terms on theright hand side. We will only show that if α ≤ β,

〈A1Zα, A2Zβ〉H = trace

(A1

[∫ α

−πFωdω

]A†2

),

as the other equalities follow in a similar fashion. We have

〈A1Zα, A2Zβ〉H = E∫ 1

0

T1Zα(τ)T2Zβ(τ)dτ

= E∫ 1

0

∫∫[0,1]2

A1(τ, σ1)A2(τ, σ2)Zα(σ1)Zβ(σ2)dσ1dσ2dτ

=

∫ 1

0

∫∫[0,1]2

A1(τ, σ1)A2(τ, σ2)E[Zα(σ1)Zβ(σ2)

]dσ1dσ2dτ

=

∫ 1

0

A1

[∫ α

−πFωdω

]A†2(τ, τ)dτ = trace

(A1

[∫ α

−πFωdω

]A†2

),

where the last equality is justified by Brislawn [9, Proposition 3.3]. The permutation ofintegrals is justified by Fubini’s Theorem, since∫ 1

0

∫∫[0,1]2

E|Zα(σ1)Zβ(σ2)A1(τ, σ1)A2(τ, σ2)|dσ1dσ2dτ

≤∫ 1

0

∫∫[0,1]2

√∫ α

−πfω(σ1, σ1)dω

∫ β

−πfω(σ2, σ2)dω|A1(τ, σ1)A2(τ, σ2)|dσ1dσ2dτ

≤∫ 1

0

∫∫[0,1]2

[∫ α

−πfω(σ1, σ1)dω

∫ β

−πfω(σ2, σ2)dω

]dσ1dσ2

∫∫[0,1]2

|A1(τ, σ1)|2|A2(τ, σ2)|2dσ1dσ2

1/2

=

√∫ α

−π|||Fω|||1dω

∫ β

−π|||Fω|||1dω

∫ 1

0

(∫ 1

0

|A1(τ, σ1)|2dσ1 ·∫ 1

0

|A2(τ, σ2)|2dσ2)1/2

√∫ α

−π|||Fω|||1dω

∫ β

−π|||Fω|||1dω|||A1|||2|||A2|||2 <∞,

by the Cauchy-Schwarz inequality. The proof of (23) follows from (22), Lemma 5.4 andHolder’s inequality. Statements (24) and (25) follow then directly from (22) and (23).

14

Page 15: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

We can now show that the image of I is indeed in H. Since any T ∈ M0 can bewritten T =

∑nj=1 Tj1[ωj ,ωj+1), where Tj = Aj + γjI for some Aj ∈ S2(H), γj ∈ C and

−π ≤ ω1 < ω2 < · · · < ωn+1 ≤ π, Lemma 3.1 yields

‖I(T )‖2H =

n∑j,l=1

⟨Tj(Zωj+1

− Zωj ), Tl(Zωl+1− Zωl)

⟩H =

n∑j=1

∥∥Tj(Zωj+1− Zωj )

∥∥2H

≤n∑j=1

|||Tj |||2∞∫ ωj+1

ωj

|||Fω|||1dω = ‖T‖2B.

Hence I : M0 → H is continuous. We can therefore extend its domain to M = M0

(with the closure taken in B) in the following way. Fix T ∈ M . For any sequenceTnn≥1 ⊂M0 converging to T , notice that I(Tn)n≥1 is also a Cauchy sequence in H,by continuity of the operator I. We then define

I(T ) = limn→∞

I(Tn), in H,

where the definition does not depend on the choice of the sequence Tnn≥1 because Iis linear and continuous. Moreover, I : M → H is linear, and

‖I(T )‖H ≤ ‖T‖Bis valid for any T ∈ M . The precise characterization of the space M on which thestochastic integral is defined is rather involved, and beyond the scope of this paper;however, M does contain elements of the form

ω ∈ [−π, π] 7→ g(ω)I +Aω, (26)

where g : [−π, π] → C is a cadlag function with a finite number of jumps and A :[−π, π] → S2(H) is cadlag (where continuity is meant with respect to |||·|||∞) with a

finite number of jumps, such that∫ π−π |||Aω|||

22|||Fω|||1dω < ∞. This is all that will be

required for our results. A noteworthy property of the stochastic integral is that I(T )can be seen as a Riemann-Stieltjes limit, for elements T of the form (26). The nextproposition gives the pointwise covariance of the stochastic integral.

Proposition 3.2. Assume Conditions 1.1 hold, and let T, S ∈M . Then

E

[∫ π

−πTωdZω(τ) ·

∫ π

−πSωdZω(σ)

]=

∫ π

−πTωFωS

†ω(τ, σ)dω, a.e.

Proof. The proof is similar to the proofs of Proposition 2.3 and Lemma 3.1 and so weonly provide a sketch. Let T1 = A1 + γ1I, T2 = A2 + γ2I, where Ai ∈ S2(H), γi ∈ C fori = 1, 2. We first show, similarly to Lemma 3.1, that

E 〈T1Zα ⊗ T2Zβ , u⊗ v〉S2 =

⟨T1

∫ min(α,β)

−πFωdωT

†2 , u⊗ v

⟩S2

,

for all u, v ∈ H. Using this, we then extend to M0 by linearity:

E 〈I(T )⊗ I(S), u⊗ v〉S2 =

⟨∫ π

−πTωFωS

†ωdω, u⊗ v

⟩S2, T, S ∈M0.

15

Page 16: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

Hence

E [I(T )(τ)I(S)(σ)] =

∫ π

−πTωFωS

†ω(τ, σ)dω, a.e., T, S ∈M0. (27)

The rest is similar to Proposition 2.3: let B′ = L1([0, 1]2,C), with norm

‖g‖B′ =

∫∫[0,1]2

|g(τ, σ)|dτdσ, g ∈ B′.

Notice that for T, S ∈M, the Cauchy-Schwarz inequality gives

(i) E‖I(T )⊗ I(S)‖B′ ≤ ‖T‖B‖S‖B,(ii)

∥∥∥∫ π−π TωFωS†ωdω

∥∥∥B′≤ ‖T‖B‖S‖B.

The proof is then completed by writing∥∥∥∥E [I(T )⊗ I(S)]−∫ π

−πTωFωS

†ωdω

∥∥∥∥B′≤ ‖E [I(T )⊗ I(S)]− E [I(Tn)⊗ I(Sn)]‖B′

+

∥∥∥∥E [I(Tn)⊗ I(Sn)]−∫ π

−πTωFωS

†ωdω

∥∥∥∥B′,

where Tnn≥1 and Snn≥1 are sequences in M0 converging to T and S, respectively.The terms on the right hand side can be made arbitrarly small using (i), (ii) and (27).

As a direct Corollary, we obtain a formula for the scalar product between two stochas-tic integrals:

Corollary 3.3. Assume Conditions 1.1 hold, and let T, S ∈M . Then⟨∫ π

−πTωdZω,

∫ π

−πSωdZω

⟩H

=

∫ 1

0

∫ π

−πTωFωS

†ω(τ, τ)dωdτ.

3.2. Optimal Dimension Reduction

It follows from the discussion in the previous section that the stochastic integral (21)defined by a truncation of the Cramer-Karhunen-Loeve representation is well-defined.The purpose of this section is to prove that it provides a version of Xt with only Kdegrees of freedom, which optimally approximates Xt in mean square among all otherlinear transformations of Xt with K degrees of freedom. First we make sense of lineartransformations of Xt, or as they are known in time series, filtered versions of Xt.

Given the stationary time series Xtt∈Z in L2 ([0, 1],R) and a sequence Ass∈Z ofHilbert-Schmidt operators on L2 ([0, 1],R), we can construct a new time series

Yt =∑s∈Z

At−sXs,

where Yt is a random element of L2 ([0, 1],R), which is said to be obtained by linearfiltering of Xt. Notice that the rank K approximation of Xt based on the PCA of R0

can also be expressed as a filtered version of Xt.The following proposition formalizes the construction of filtered series Yt, and gives

their Cramer representation:16

Page 17: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

Proposition 3.4. Assume Conditions 1.1 hold, and let Ass∈Z ⊂ S2(H) such that∑s∈Z |||As|||2 <∞. Then Yt =

∑s∈ZAsXt−s converges in H, and for each t ∈ Z,

Yt =

∫ π

−πeitωA ωdZω, a.s. a.e.,

where A ω =∑s e−isωAs.

Proof. First notice that

AsXt−s =

∫ π

−πei(t−s)ωAsdZω = I(Bs),

where Bs denotes the mapping ω 7→ ei(t−s)ωAs. Now Bs ∈ B since

‖Bs‖B ≤

√∫ π

−π|||Fω|||1dω|||As|||2.

Hence, ∑s

‖Bs‖B ≤

√∫ π

−π|||Fω|||1dω

∑s

|||As|||2 <∞,

and the partial sums B(N) :=∑|s|≤N Bs converge in B to

B =∑s

Bs = eitω∑s

e−iωsAs = eitωA ω,

where A ω =∑s e−iωsAs. Hence, by continuity, we obtain∑s

AsXt−s =∑s

I(Bs) = I(B) =

∫ π

−πeitωA ωdZω.

Writing ZX for the orthogonal increment process associated with Xt (and similarlyZY for that of Yt), we see that the Cramer representation of Y is is related to the one

of X through the formal relation dZYω = A ωdZXω . The spectral density operator of Yt is

given by the following Proposition:

Proposition 3.5. Under the assumptions of Proposition 3.4, the spectral density oper-ator of Y is given by

FYω = A ωFX

ω A†ω,

where FX denotes the spectral density operator of Xt.

Proof. We first show that the spectral density operator of Y is well defined. Using thesame techniques as in the proof of Proposition 7.15 of Panaretos and Tavakoli [33], weget ∣∣∣∣∣∣RY

t

∣∣∣∣∣∣2≤∑s,u∈Z

|||As|||2|||Au|||2∣∣∣∣∣∣RX

t+u−s∣∣∣∣∣∣

2,

17

Page 18: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

where RXt denotes the autocovariance operator of X at lag t (and similarly for Y ). Hence∑

t

∣∣∣∣∣∣RYt

∣∣∣∣∣∣2≤∑s,u∈Z

|||As|||2|||Au|||2∑t

∣∣∣∣∣∣RXt+u−s

∣∣∣∣∣∣2≤ κ

∑s,u∈Z

|||As|||2|||Au|||2 <∞

where κ =∑t

∣∣∣∣∣∣RXt

∣∣∣∣∣∣2. The spectral density operator of Y is therefore well defined in

S2(H). Using Proposition 3.2, we get

RYt =

∫ π

−πeiωtA ωFX

ω A†ωdω, in H,

which corresponds to the inversion formula for the spectral density operator. HenceFYω = A ωFX

ω A†ω.

Consider now the problem of reducing the functional time series Xt to a finite dimen-sional vector series (say of dimensions q), by filtering Xt:

Yt =∑s

AsXt−s ∈ Rq, As ∈ S2(H,Rq), (28)

where S2(H,Rq) denotes the space of Hilbert-Schmidt operators from H to Rq. Thoughthe series Yt is no longer interpretable in a functional sense, it may be filtered anew toyield a functional process

X∗t =∑s

BsYt−s, Bs ∈ S2(Rq, H), (29)

which is interpretable in a functional sense, and is in fact a rank q approximation of Xt.The corresponding Cramer representation of X∗t will be:

Lemma 3.6. Let Ass∈Z ⊂ S2(H,Rq) and Bss∈Z ⊂ S2(Rq, H) such that∑s∈Z |||As|||2+

|||Bs|||2 <∞. Then, the Cramer representation of X∗t , defined in (29), is

X∗t =

∫ π

−πeiωtB ωA ωdZ

Xω , a.s. a.e.,

where A ω =∑s∈Z e

−iωsAs and B ω =∑s∈Z e

−iωsBs.

Proof. The proof is similar to the proof of Proposition 3.4 and is thus omitted.

Therefore, the Cramer representation of X∗t is given by

X∗t =

∫ π

−πeiωtC ωdZ

Xω ,

where C ω = B ωA ω, and is hence of rank at most q. We are now in a position to show thatthe truncated Cramer-Karhunen-Loeve expansion (21) provides a Harmonic PrincipalComponent Analysis of Xt: under the mean square error approximation criterion

E‖Xt −X∗t ‖2,

which is independent of t by stationarity, the optimal choice of C ω is given by∑q(ω)n=1 ϕ

ωn⊗

ϕωn , where we recall that ϕωn are the eigenfunctions of Fω.18

Page 19: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

Theorem 3.7 (Harmonic Principal Component Analysis). Assume Conditions 1.1 hold.

Let Xt =∫ π−π e

iωtdZω be a stationary time series in L2 ([0, 1],R), and let X∗t =∫ π−π e

iωtC ωdZω,

with C ∈M . Let q : [−π, π]→ N be a cadlag function. Then, the solution to

minC∈M

E‖Xt −X∗t ‖2

subject to rank(C ω

)≤ q(ω),

is given by

C ω =

q(ω)∑j=1

ϕωj ⊗ ϕωj ,

where Fω =∑∞j=1 µj(ω)ϕωj ⊗ϕωj is the spectral decomposition of Fω. The approximation

error is given by

E‖Xt −X∗t ‖2

=

∫ π

−π

∑j>q(ω)

µj(ω)

dω.

Proof. The proof is an adaptation of Brillinger [8, Theorem 9.3.1] to our case. Since

Xt −X∗t =∫ π−π e

iωt(I − C ω)dZω, Corollary 3.3 yields

E‖Xt −X∗t ‖2

=

∫ 1

0

∫ π

−π(I − C ω)Fω(I − C ω)†(τ, τ)dωdτ

=

∫ π

−π

∫ 1

0

(I − C ω)F 1/2ω

[(I − C ω)F 1/2

ω

]†(τ, τ)dτdω

=

∫ π

−π

∣∣∣∣∣∣∣∣∣(I − C ω)F 1/2ω

∣∣∣∣∣∣∣∣∣22dω, (30)

where F1/2ω =

∑∞j=1

√µj(ω)ϕωj ⊗ ϕωj and the permutation of integrals is justified by

Fubini’s Theorem. The term (30) is minimized by minimizing∣∣∣∣∣∣∣∣∣(I − C ω)F

1/2ω

∣∣∣∣∣∣∣∣∣2

for

each ω. This is achieved, under our constraints, by C ω =∑q(ω)j=1 ϕ

ωj ⊗ϕωj . The expression

for the error term follows directly.

Remark 3.8. Contrary to the classical finite-dimensional results (e.g. Brillinger [8]),we do not restrict q(ω) to be constant over ω. In this sense, when restricted to finitedimensional Hilbert spaces, our results are more general than analogous results for vector-valued time series.

Remark 3.9. Restricting q(ω) = q ∈ N yields a rank q version X∗t of Xt. This can berepresented in a 1-1 fashion by the filtered vector series Yt (an element of Rq) in (28),whose important characteristic is the lack of correlation between its coordinates – just asone expects with the scores obtained in a traditional principal component analysis. TheYt can therefore serve as the harmonic principal component scores.

19

Page 20: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

Remark 3.10 (Cramer-Karhunen-Loeve Decomposition). The theorem makes precisethe way in which a Cramer-Karhunen-Loeve representation of the form (20) holds, inthat we may now explicitly and rigorously state

E

∥∥∥∥∥Xt −∫ π

−πeiωt

(K∑n=1

ϕωn ⊗ ϕωn

)dZω

∥∥∥∥∥2

=

∫ π

−π

∑j>K

µj(ω)

dω. (31)

as an immediate corollary, under the same conditions as in Theorem 3.7.

We now provide the expressions of the filters A,B involved in (28) and (29). Write

q(ω) =∑Ll=1 1[ωl,ωl+1)ql, where −π = ω1 < · · · < ωL+1 = π and the qls are non-negative

integers. Note that allowing ql = 0 for some l corresponds to filtering out the frequenciesin the range [ωl, ωl+1). If we let q = maxω q(ω), and v1, . . . , vq be an orthonomal basisof Rq, then

A ω =

q(ω)∑j=1

vj ⊗ ϕωj ,

and B ω = A †ω. The corresponding filters are given by

As = (2π)−1∫ π

−πeiαsA αdα = (2π)−1

∫ π

−πeiαs

q(α)∑j=1

vj ⊗ ϕαj

dα= (2π)−1

L∑l=1

∫ ωl+1

ωl

eiαs

ql∑j=1

vj ⊗ ϕαj

dα = (2π)−1L∑l=1

ql∑j=1

vj ⊗[∫ ωl+1

ωl

e−iαsϕαj

],

and Bs = A†−s.

4. Asymptotics for the Empirical Harmonic PCA

It is clear from the results presented that, to carry out a harmonic principal com-ponent analysis in practice, when the spectral density operator may be unknown, oneneeds estimators of the eigenstructure of Fω. The purpose of this section is to providesuch empirical versions of the eigenstructure of Fω, and to describe their large sampleproperties. Our approach will be to estimate the eigenstructre via the plug in method:

given a finite stretch of the stationary time series XtTt=0, we will use an estimator F(T )ω

of Fω proposed in Panaretos and Tavakoli [33], and use its eigenstructure as an estima-tor of the eigenstructure of of Fω. In brief, Panaretos and Tavakoli [33] define their

estimator F(T )ω as follows. Let W (x) be a real function defined on R such that: (1) W is

positive, even, and bounded in variation; (2) W (x) = 0 if |x| ≥ 1; (3)∫∞−∞W (x)dx = 1;

(4)∫∞−∞W (x)2dx <∞. For a bandwidth BT > 0, write

W (T )(x) =∑j∈Z

1

BTW

(x+ 2πj

BT

). (32)

20

Page 21: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

Define the estimator of the spectral density kernel by

f (T )ω (τ, σ) =

T

T−1∑s=1

W (T )

(ω − 2πs

T

)X

(T )2πsT

(τ)X(T )

− 2πsT

(σ),

where

X (T )ω (τ) = (2πT )−1/2

T−1∑t=0

Xt(τ) exp(−iωt)

is the discrete Fourier transform of the segment XtTt=1. Denote by F(T )ω the operator

with kernel f (T )ω .

To quantify the strength of dependence among the observations Xt we will usethe cumulant kernels of the series, as introduced in Panaretos and Tavakoli [33]; thepointwise definition of a k-th order cumulant kernel is,

cum (Xt1(τ1), . . . , Xtk(τk)) =∑

ν=(ν1,...,νp)

(−1)p−1(p− 1)!

p∏l=1

E

∏j∈νl

Xtj (τj)

,where the sum extends over all unordered partitions of 1, . . . , k. Especially the cu-mulant kernel of order 4 gives rise to a corresponding 4-th order cumulant operatorRt1,t2,t3 : L2([0, 1]2,R)→ L2([0, 1]2,R), defined by

Rt1,t2,t3(u⊗ v) = cum (Xt1 ⊗Xt2〈u,Xt3〉〈v,X0〉) , u, v ∈ L2 ([0, 1],R).

Weak dependence will be quantified in terms of the following mixing conditions, which aresummability conditions on the cumulant kernels extending the classical mixing conditionsof Brillinger [8]. Throughout this Section, we will assume the following conditions hold:

Conditions 4.1. Xt is a stationary times series in L2 ([0, 1],R), satisfying:

(0) E‖X0‖k <∞ for all k ≥ 1

(i)∑∞t1,...,tk−1=−∞

∥∥cum(Xt1 , . . . , Xtk−1

, X0

)∥∥2<∞, for all k ≥ 2.

(i’)∑∞t1,...,tk−1=−∞(1+ |tj |)

∥∥cum(Xt1 , . . . , Xtk−1

, X0

)∥∥2<∞, for k ∈ 2, 4 and j < k.

(ii)∑t∈Z(1 + |t|)|||Rt|||1 <∞.

(iii)∑t1,t2,t3∈Z |||Rt1,t2,t3 |||1 <∞.

(iv) (τ, σ) 7→ rt(τ, σ) is continuous, and∑t∈Z ‖rt‖∞ <∞.

We note that these conditions are not the weakest possible for the results that we willstate — however, they considerably simplify the technical aspects of the proofs. For adetailed discussion of the interpretation and role of these conditions, and a comparativediscussion in relation with the finite-dimensional versions thereof (as given in Brillinger[8]), the reader is referred to Panaretos and Tavakoli [33].

The following Theorem gives the asymptotic distribution of the spectral density esti-mators, and follows directly from results in Panaretos and Tavakoli [33]:

21

Page 22: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

Theorem 4.2. Assume that Conditions 4.1 hold. If BT → 0 such that TBT → ∞ andTB3

T → 0. Then, for any distinct frequencies ω1, . . . , ωJ ∈ [0, π], with J <∞,√TBT

(F (T )ωj −Fωj

)d−→ Fωj , j = 1, . . . , J,

where Fωj , j = 1, . . . , J are independent mean zero complex Gaussian elements in L2([0, 1]2,C

),

with covariance kernel

cov(Fω, Fω) = 2π

∫RW (α)2dα ·Fω⊗Fω, ω ∈ (0, π),

and

cov(Fω, Fω) = 2π

∫RW (α)2dα ·

[Fω⊗Fω + Fω⊗T

], ω = 0, π.

In particular, Fω is real Gaussian if ω = 0, π.

Before stating our results concerning the asymptotic distribution of the estimators ofthe eigenvalues/eigenfunctions, we need to introduce some necessary notation. For anyω ∈ [0, π], let

F (T )ω =

∞∑i=1

µi,T (ω)ϕωi,T ⊗ ϕωi,T

be the spectral decomposition of F(T )ω , and recall that

Fω =

∞∑i=1

µi(ω)ϕωi ⊗ ϕωi

is the spectral decomposition of Fω. For any fixed ω, µi,T (ω)i≥1 and µi(ω)i≥1are non-increasing positive sequences tending to zero. We denote by λi(ω)i≥1 thedecreasing sequence of distinct elements of µi(ω)i≥1, define the set Ik(ω) = i ≥ 1 :µi(ω) = λk(ω) and we denote its cardinality by mk(ω) = |Ik(ω)|. We can now define

Πk(ω) =∑

i∈Ik(ω)

ϕωi ⊗ ϕωi ,

which is the projection onto the kth eigenspace of Fω. This way,

Fω =

∞∑i=1

λi(ω)Πi(ω).

The estimator of Πk(ω) is defined by

Πk,T (ω) =∑

i∈Ik(ω)

ϕωi,T ⊗ ϕωi,T .

We also define Sk(ω) =∑j 6=k(λk(ω) − λj(ω))−1Πj(ω), where the sum is over all j 6= k

such that λj(ω) 6= 0. We define the operator

ηωk = Sk(ω)⊗Πk(ω) + Πk(ω)⊗Sk(ω) ∈ S∞(S2(H)),22

Page 23: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

and the bounded operator pωk : S2(H)→ C by pk(A) = 〈A,Πk(ω)〉S2 .The following Theorem gives the asymptotic distribution of the estimators of the

eigenvalues and eigenvectors of the spectral density matrix.

Theorem 4.3. Let ω1, . . . , ωK ∈ [0, π] be distinct, and L ⊂ N∗ be a set of cardinality|L| <∞. Provided Conditions 4.1 hold, and BT → 0 such that TBT →∞ and TB3

T → 0,then √

TBT Πj,T (ωi)−Πj(ωi) : j ∈ L d→ ηωiL (Fωi), i = 1, . . . ,K.

and

√TBT

∑s∈Ij(ωi)

[µs,T (ωi)− λj(ωi)] : j ∈ L

d→ pωiL (Fωi), i = 1, . . . ,K.

The limiting random elements ηωiL (Fωi)i=1,...,K and pωiL (Fωi)i=1,...,K are all inde-pendent complex Gaussian random elements. Their covariances are given by the follow-ing formulas (in which we have written λk instead of λk(ω) for clarity, and similarly forΠk, ϕk)

E[ηωk (Fω)⊗ ηωl (Fω)

]=

−κλkλl(λk − λl)−2

[Πk⊗Πl + Πl⊗Πk+Aωkl + (Aωkl)

†] if k 6= l

κ∑s 6=k λkλs(λk − λs)−2

[Πk⊗Πs + Πs⊗Πk+Aωks + (Aωks)

†] if k = l,

where Aωks = 10,π(ω)(ϕωk ⊗ ϕωs )⊗(ϕωs ⊗ ϕωk ), and

cov(pωl (Fω), pωk (Fω)) =(1 + 10,π(ω)

)κδlkλ

2lml,

with κ = 2π∫RW

2(x)dx.

Proof. The proof rests on the adaptation of Theorem 1.3 of Mas and Menneteau [30]to our case, and we therefore give only a sketch. For l ≥ 1, we denote by Sl the l-foldproduct space S2(H)× · · · × S2(H), equipped with the norm

‖(A1, . . . , Al)‖Sl = maxj=1,...,l

|||Aj |||2,

and equip Cl with the norm

|(α1, . . . , αl)|∞ = maxj=1,...,l

|αj |.

We endow the space Sl × Cl with the norm

‖(A1, . . . , Al, α1, . . . , αl)‖∗ = maxj=1,...,l

|||Aj |||2, |αj |.

Defining the bounded linear operator J : SK → SKl × CKl by

J(A1, . . . , AK) = (ηL(A1), . . . , ηL(AK), pL(A1), . . . , pL(AK)) ,

23

Page 24: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

we show that

√TBT

Πj,T (ωi)−Πj(ωi)j∈L;i=1,...,K ,

∑s∈Ij(ωi)

[µs,T (ωi)− λj(ωi)]

j∈L;i=1,...,K

= J(

√TBT F (T )

ωi −Fωii=1,...,K)+√TBTR,

where R = (RL,T (ωi)i=1,...,K , rL,T (ωi)i=1,...,K) , where RL,T (ωi), rL,T (ωi) are givenby Mas and Menneteau [30, Proposition 2.3]. The proof is completed by showing that√TBTR

p→ 0 and applying the continuous mapping Theorem. The determination of thecovariance structure of the limiting random elements is given separately in Section 4.1.

Notice that the estimators of the eigenspaces are not asymptotically independent,which is expected since they are constrained to be mutually orthogonal.

4.1. Computation of covariances

In this Section, we determine the asymptotic covariances of estimators of the eigenpro-jections and eigenvalues of Fω, as stipulated in Theorem 4.3. Notice that the covarianceoperator of Fω is given by either

C = κ · C⊗C, ω ∈ (0, π), (33)

orC = κ · [C⊗C + C⊗

TC], ω ∈ 0, π (34)

where κ = 2π∫RW (ω)2dω, C is a nuclear operator on L2 ([0, 1],C) in the first case, and

on L2 ([0, 1],R) in the second case. We therefore restrict ourselves to the computationof the covariances between elements ηk(Y ) and pk(Y ), where Y is either a self-adjointrandom element in S2(L2 ([0, 1],C)) with covariance operator given by (33), or Y is aself-adjoint random element in S2(L2 ([0, 1],R)) with covariance operator given by (34).To simplify the presentation, we first state and prove some useful lemmas. In this contextwe let C =

∑i µiϕi ⊗ ϕi be the singular value decomposition of C.

Lemma 4.4. Let H be a complex Hilbert space, and C be a nuclear and self-adjointoperator on H, with spectral decomposition C =

∑i µiϕii, where ϕij = ϕi ⊗ ϕj . If Y is

a complex Gaussian random element on S2(H) that takes self-adjoint values, with mean0 and covariance operator

C = C⊗C, (35)

thenY =

∑i

ξiϕii +∑i<j

ξijeij + iζij eij , (36)

where the convergence holds in expected mean square (in S2(H)), eij = 2−1/2(ϕij +ϕji),eij = 2−1/2(ϕij − ϕji) and ξii ∪ ξiji<j ∪ ζiji<j are independent real Gaussianrandom variables, defined by

ξi = 〈Y, ϕii〉S2 , ξij = 〈Y, eij〉S2 , ζij = −i 〈Y, eij〉S2 . (37)24

Page 25: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

They have mean zero and variance

varξi = µ2i , varξij = µiµj , varζij = µiµj .

Proof. First notice that using Proposition 5.2,

C =∑i

µ2iϕii

⊗ϕii +

∑i 6=j

µiµjϕij⊗

ϕij . (38)

Using this, we see that

C =∑i

µ2iϕii

⊗ϕii +

∑i<j

µiµj(eij⊗

eij + eij⊗

eij).

The elements ϕii, eij(i < j) and eij(i < j) are orthonormal, and are eigenvectors of C.Thus (36) follows directly. Since Y is Gaussian with mean zero, the variables definedin (37) are jointly Gaussian and have mean zero. Using the fact that Y = Y † and Propo-sition 5.1, it is straightforward to see that they are real, uncorrelated (thus independent)random variables, with variance as given in the statement of the Lemma.

Lemma 4.5. Let H be a real Hilbert space, and C be a nuclear and self-adjoint operatoron H, with spectral decomposition

C =∑i

µiϕii,

where ϕij = ϕi ⊗ ϕj .If Y is a Gaussian random element on S2(H) that takes self-adjoint values, with mean

0 and covariance operatorC = C⊗C + C⊗

TC, (39)

thenY =

∑i

ξiϕii +∑i<j

ξijeij , (40)

where the convergence holds in expected mean square (in S2(H)), eij = 2−1/2(ϕij +ϕji),and

(ξi)i ∪ (ξij)i<j

are independent real Gaussian random variables, defined by

ξi = 〈Y, ϕii〉S2 , ξij = 〈Y, eij〉S2 . (41)

They have mean zero and variance

varξi = 2µ2i , varξij = 2µiµj .

Proof. First we notice that

C =∑i

2µ2iϕii

⊗ϕii +

∑i<j

2µiµjeij⊗

eij ,

by evaluating this expression and (39) on the elements ϕij . The elements ϕii and eij areorthonormal eigenelements of C. The rest of the proof follows by arguments similar tothose used to prove Lemma 4.4.

25

Page 26: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

We will use the same notation as in Section 4, but will suppress dependence on thefrequency ω, for tidiness. Simple calculations yield

ηk(ϕi ⊗ ϕj) = Ek(i, j)ϕi ⊗ ϕj ,

where Ek is defined by

Ek(i, j) = (λk − λs)−1 if for some s 6= k, i ∈ Ik & j ∈ Is or j ∈ Ik & i ∈ Is,

and Ek(i, j) = 0 otherwise. Notice that Ek(i, i) = 0, and Ek(i, j) = Ek(j, i). Thusηk(eij) = Ek(i, j)eij and ηk(eij) = Ek(i, j)eij . If Y is a random element of S2(H) of theform (36), we have

ηk(Y ) =∑i<j

Ek(i, j) [ξijeij + iζij eij ] ,

and thus

ηk(Y )⊗ ηl(Y ) =∑i<j

∑s<t

Ek(i, j)El(s, t)×

× [ξijξsteij ⊗ est − iξijζsteij ⊗ est + iζijξsteij ⊗ est + ζijζsteij ⊗ est] .

Therefore, if Y is a random element of S2(H) of the form (36), the covariance operatorbetween ηk(Y ) and ηl(Y ) is given by

E [ηk(Y )⊗ ηl(Y )] =∑i<j

Ek(i, j)El(i, j) [var(ξij)eij ⊗ eij + var(ζij)eij ⊗ eij ]

=∑i<j

Ek(i, j)El(i, j)µiµj[ϕii⊗ϕjj + ϕjj⊗ϕii

]=

−λkλl(λk − λl)−2

[Πk⊗Πl + Πl⊗Πk

]if k 6= l∑

s 6=k λkλs(λk − λs)−2[Πk⊗Πs + Πs⊗Πk

]if k = l.

If Y is of the form (40), we obtain

E [ηk(Y )⊗ ηl(Y )] =∑i<j

Ek(i, j)El(i, j) [var(ξij)eij ⊗ eij ]

=

−λkλl(λk − λl)−2[Πk⊗Πl + Πl⊗Πk +Akl +A†kl

]if k 6= l∑

s6=k λkλs(λk − λs)−2[Πk⊗Πs + Πs⊗Πk +Aks +A†ks

]if k = l,

where Akl = ϕkl⊗ϕlk.We now turn our attention to the covariance operator between pl(Y ) and ηk(Y ),

E [pl(Y )⊗ ηk(Y )] : S2(H)→ C.

For any f, g ∈ H, we have

E [pl(Y )⊗ ηk(Y )](f ⊗ g) = Epl(Y ) 〈f ⊗ g, ηk(Y )〉S2= Epl(Y )〈f, SkYΠkg + ΠkY Skg〉

=⟨Skf,E

[pl(Y )Y

]Πkg

⟩+⟨

Πkf,E[pl(Y )Y

]Skg

⟩26

Page 27: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

which, using the fact that E[pl(Y )Y

]= E [〈Πl, Y 〉Y ] = C(Πl), reduces to

〈f ⊗ g, ηk(CΠl)〉S2 =⟨f ⊗ g, ηk(λ2lΠl)

⟩S2

= 0,

for all l, k.For the covariance of the estimated eigenvalues, we use the KL expansion of Y : in

both cases (36) and (40), it is given by

pl(Y ) =∑i∈Ik

ξi. (42)

This automatically gives us cov(pl(Y ), pk(Y )) = δlkλ2l trace (Πl), if Y is of the form (36),

and cov(pl(Y ), pk(Y )) = δlk2λ2l trace (Πl), if Y is of the form (40).

5. Technical Lemmas

For completeness, we present in this Section some technical results used in the paper.

5.1. Tensor and Kronecker products

Proposition 5.1. For any u, v, f, g ∈ H, A,B ∈ S2(H)

1. · ⊗ · is linear on the left, and sesquilinear on the right.

2. 〈u⊗ v, f ⊗ g〉S2 = 〈u, f〉〈g, v〉 = 〈(u⊗ v)g, f〉3. 〈A, u⊗ v〉S2 = 〈Av, u〉 =

⟨v ⊗ u,A†

⟩S2.

4. |||u⊗ v|||2 = ‖u‖‖v‖5. (u⊗ v)(f ⊗ g) = 〈f, v〉u⊗ g6. (u⊗ v)† = v ⊗ u, (A

⊗B)† = B

⊗A.

For two operatorsA,B ∈ S2(H), we define their Kronecker product A⊗B ∈ S2(S2(H)),by A⊗B(C) = ACB†, for C ∈ S2(H). It has the following properties:

Proposition 5.2. For any A,B,C,D ∈ S2(H), u, v, f, g ∈ H,

1. ·⊗· is linear on the left, and sesquilinear on the right.

2. (A⊗B)(u⊗ v) = Au⊗Bv3.⟨A⊗B,C⊗D

⟩S2

= 〈A,C〉S2 〈D,B〉S24.∣∣∣∣∣∣A⊗B∣∣∣∣∣∣

2= |||A|||2|||B|||2

5. (A⊗B)(C⊗D) = AC⊗BD6. (u⊗ v)⊗(f ⊗ g) = (u⊗ f)

⊗(v ⊗ g)

7. (A⊗B)† = A†⊗B†

In the case H = L2 ([0, 1],C), A,B ∈ S2(H) are Hilbert-Schmidt operators, hencethey are also kernel operators, with kernels a(τ, σ), b(τ, σ), respectively. The opera-tor A⊗B is then also a Hilbert-Schmidt operator on S2(H), with kernel k(τ, σ, x, y) =a(τ, x)b(σ, y). If HR is a real Hilbert space, then for two operators A,B ∈ S2(HR), wealso define their transpose Kronecker product A⊗

TB ∈ S2(S2(HR)), by A⊗

TB(C) =

(A⊗B)(CT) = ACTBT, for C ∈ S2(HR).27

Page 28: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

Proposition 5.3. For any A,B,C,D ∈ S2(HR), u, v, f, g ∈ HR,

1. ·⊗T· is bilinear.

2. (A⊗TB)(u⊗ v) = Av ⊗Bu

3.⟨A⊗

TB,C⊗

TD⟩S2

= 〈A,C〉S2 〈D,B〉S24.∣∣∣∣∣∣A⊗

TB∣∣∣∣∣∣

2= |||A|||2|||B|||2

In the case HR = L2 ([0, 1],R), if A,B ∈ S2(HR) are Hilbert-Schmidt operators, theyare also kernel operators, with kernels a(τ, σ), b(τ, σ), respectively. The operator A⊗

TB

is then also a Hilbert-Schmidt operator on S2(HR), with kernel

k(τ, σ, x, y) = a(τ, y)b(σ, x).

Lemma 5.4. If∑t |||Rt|||1 <∞, and α < β, the operator

∫ βα

Fωdω is non-negative, and∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∫ β

α

Fωdω

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣1

=

∫ β

α

|||Fω|||1dω.

Lemma 5.5. For A ∈ S2(H), H = L2 ([0, 1],C), we have

A

∫ π

−πeitωdZω =

∫ π

−πeitωAdZω.

Acknowledgements

This research was supported by a European Research Council (ERC) Starting GrantAward. We wish to thank the editors and reviewers for their thoughtful and constructivecomments.

References

[1] Amini, A. A. and M. J. Wainwright (2012). Sampled forms of functional pca in reproducing kernelhilbert spaces. Annals of Statistics 40 (5), 2452–2482.

[2] Besse, P. and J. O. Ramsay (1986). Principal components-analysis of sampled functions. Psychome-trika 51, 285–311.

[3] Bosq, D. (1989). Proprietes des operateurs de covariance empiriques d’un processus stationnairehilbertien. Comptes Rendus de l’Academie des Sciences. Serie I 309 (14), 873–875.

[4] Bosq, D. (1990). Modele autoregressif hilbertien. application a la prevision du comportement d’unprocessus a temps continu sur un intervalle de temps donne. Comptes Rendus de l’Academie desSciences. Serie I 310 (11), 787–790.

[5] Bosq, D. (2000). Linear Processes in Function Spaces. Springer.[6] Bosq, D. (2002). Estimation of mean and covariance operator of autoregressive processes in banach

spaces. Statistical Inference for Stochastic Processes 5, 287–306. 10.1023/A:1021279131053.[7] Bosq, D. and D. Blanke (2008). Inference and Prediction in Large Dimensions. Wiley Series in

Probability and Statistics. John Wiley & Sons.[8] Brillinger, D. R. (2001). Time Series: Data Analysis and Theory ((classics edition) ed.). Classics

in Applied Mathematics. SIAM.[9] Brislawn, C. (1988). Kernels of trace class operators. In Proc. Am. Math. Soc, Volume 104, pp.

1181–1190.[10] Cardot, H. (2000). Nonparametric estimation of smoothed principal components analysis of sampled

noisy functions. Journal of Nonparametric Statistics 12, 503–538.

28

Page 29: Cram er-Karhunen-Lo eve Representation and Harmonic ...st624/documents/spa-2013.pdf · as well as the within-curve dynamics. Empirical versions of the decompositions are ... it may

[11] Cardot, H. (2007). Conditional functional principal components analysis. Scandinavian Journal ofStatistics 34 (2), 317–335.

[12] Cramer, H. (1942). On harmonic analysis in certain functional spaces. Ark. Mat. Astron. Fys. 28B,1–7.

[13] Dauxois, J., A. Pousse, and Y. Romain (1982). Asymptotic theory for the principal componentanalysis of a vector random function: Some applications to statistical inference. J. MultivariateAnal. 12, 136–154.

[14] Dinculeanu, N. (2000). Vector integration and stochastic integration in Banach spaces. Pure andapplied mathematics. New York: Wiley.

[15] Ferraty, F. and Y. Romain (2011). The Oxford Handbook of Functional Data Analysis. OxfordUniversity Press.

[16] Grenander, U. (1950). Stochastic processes and statistical inference. Arkiv for Matematik 1, 195–277.

[17] Grenander, U. (1981). Abstract inference. Wiley Series in Probability and Mathematical Statistics.New York etc.: John Wiley &amp; Sons. IX, 526 p. $ 35.00 .

[18] Hall, P. and M. Hosseini-Nasab (2006). On properties of functional principal components analysis.J. R. Stat. Soc., Ser. B, Stat. Methodol. 68 (1), 109–126.

[19] Hall, P. and M. Hosseini-Nasab (2009). Theory for high-order bounds in functional principal com-ponents analysis. Mathematical Proceedings of the Cambridge Philosophical Society 146 (1), 225–256.

[20] Hall, P., H. G. Muller, and J. L. Wang (2006). Properties of principal component methods forfunctional and longitudinal data analysis. Annals of Statistics 34 (3), 1493–1517.

[21] Hormann, S. and P. Kokoszka (2010). Weakly dependent functional data. Ann. Statist. 38 (3),1845–1884.

[22] Horvath, L. and P. Kokoszka (2012). Inference for functional data with applications. Springer seriesin statistics. New York, NY: Springer.

[23] Horvath, L., P. Kokoszka, and R. Reeder (2012). Estimation of the mean of functional time seriesand a two-sample problem. Journal of the Royal Statistical Society: Series B (Statistical Methodol-ogy), no–no.

[24] Karhunen, K. (1947). Uber lineare Methoden in der Wahrscheinlichkeitsrechnung. Ann. Acad. Sci.Fenn., Ser. A I, No. 37, 79 p.

[25] Kleffe, J. (1973). Principal components of random variables with values in a seperable hilbert space.Mathematische Operationsforschung Statistik 4 (5), 391–406.

[26] Kokoszka, P. (2012). Dependent functional data. ISRN Probability and Statistics 2012.[27] Levy, P. (1948). Fonctions aleatoires du seconde ordre. In Processus stochastiques et mouvement

brownien. Suivi d’une note de M. Loeve. Levy, P. Paris: Gauthier-Villars, Editeur 365 p. .[28] Mas, A. (2000). Estimation d’operateurs de correlation de processus lineaires fonctionnels: lois

limites, deviations moderees. Ph. D. thesis, Universite Paris VI.[29] Mas, A. (2002). Weak convergence for the covariance operators of a hilbertian linear process.

Stochastic Processes and their Applications 99, 117–135.[30] Mas, A. and L. Menneteau (2003). Perturbation approach applied to the asymptotic study of

random operators. In High Dimensional Probability III, Volume 55, pp. 127–133. Birkhauser.[31] Mas, A. and B. Pumo (2009). Linear processes for functional data. Arxiv preprint arXiv:0901.2503 .[32] Panaretos, V. M. and S. Tavakoli (2012, November). Cramer-karhunen-loeve representation and

harmonic principal component analysis of functional time series. Technical Report #03-12 (November2012), Chair of Mathematical Statistics (EPFL).

[33] Panaretos, V. M. and S. Tavakoli (2013). Fourier analysis of stationary processes in function space.Annals of Statistics, to appear.

[34] Ramsay, J. and B. Silverman (2005). Functional Data Analysis (2 ed.). Springer.[35] Rice, J. A. and B. Silverman (1991). Estimating the mean and covariance structure nonparametri-

cally when the data are curves. J. R. Stat. Soc., Ser. B 53 (1), 233–243.[36] Silverman, B. W. (1996). Smoothed functional principal components analysis by choice of norm.

Annals of Statistics 24 (1), 1–24.[37] Steele, J. M. (2003). Stochastic calculus and financial applications (Corr. 3rd printing ed.), Volume

45, 3rd corr. pr of Stochastic modelling and applied probability. New York: Springer.[38] Yao, F., H. G. Muller, and J. L. Wang (2005). Functional data analysis of sparse longitudinal data.

Journal of the American Statistical Association 100, 577–590.

29