Chapter 5: Exploratory Methods for Spatio-Temporal Data · Chapter 5: Exploratory Methods for...

Chapter 5: Exploratory Methods for

Spatio-Temporal Data

Jim Faulkner

Space-Time Reading GroupUniversity of Washington

Feb 5, 2018

Outline

1. Spectral Analysis

2. Empirical Covariance Functions

3. Empirical Orthogonal Function Analysis

4. Principal Oscillation Patterns

5. Canonical Correlation Analysis

1 / 31

Spectral Analysis

Why use spectral analysis?

I Partition overall variation into different scales, e.g., seasonal,annual, decadal

I Determine which components account for the most variation

I Simplifies correlation structure – spectral transformationdecorrelates

I Can allow for reduced-rank approximations

2 / 31

3.5.1 Spectral Representations via Orthogonal

Series

Consider a process that can be written as a function of time, f (t),over the interval (a, b). Define a sequence of spectral basis functionsφk(t) for k = 0, 1, . . . , to be orthogonal over the interval (a, b) if∫ b

a

φk(t)φl(t)dt =

{0, k 6= l ,

δ, k = l ,(3.118)

where δ > 0. These functions are orthonormal if δ = 1.

3 / 31

Now we expand f (t) in terms of these basis functions:

f (t) =∞∑k=0

αkφk(t), (3.119)

where αk are the weights or spectral coefficients of the expansion.

αk =

∫ b

a

f (t)φk(t)dt k = 1, 2, . . . , (3.120)

The spectral coefficients are the projection of f (t) onto theorthogonal basis functions.

4 / 31

Trigonometric Series ExpansionConsider f (t) defined on interval (−1/2, 1/2). Let φk(t) correspondto the orthonormal trigonometric basis functions sin(2πkt) andcos(2πkt). We can write

f (t) =a02

+∞∑k=0

{ak cos(2πkt) + bk sin(2πkt)} (3.122)

The Fourier coefficients are:

ak = 2

∫ 12

− 12

f (t) cos(2πkt)dt, k = 0, 1, . . . (3.123)

bk = 2

∫ 12

− 12

f (t) sin(2πkt)dt, k = 0, 1, . . . (3.124)

5 / 31

Using Euler’s relationship, e±i2πkt ≡ cos(2πkt)± i sin(2πkt), we cansimplify :

f (t) =∞∑k=0

αke±i2πkt − 1/2 ≤ t ≤ 1/2, (3.125)

where φk(t) ≡ e±i2πkt and αk ≡ ak + ibk are both complex functions.This result is the Fourier Transform or the spectral representationtheorem.

6 / 31

3.5.2 Discrete-Time Spectral Expansion

Let {Yt : t = 1, . . . ,T} be a times series and define{φk(t) : t = 1, . . . ,T ; k = 1, . . . , pα} to be a complete set of basisfunctions. Define Y ≡ (Y1, . . . ,YT )′, α ≡ (α1, . . . , αpα), andΦ ≡ (φ1, . . . ,φpα), where φk ≡ (φk(1), . . . , φk(T ). Then thespectral expansion of Y

Yt =

pα∑k=1

αkφk(t) (3.126)

can be written in matrix notation as

Y = Φα (3.127)

7 / 31

Multiply both sides of (3.127) by Φ to obtain the spectral coefficientvector,

α = (Φ′Φ)−1

Φ′Y (3.128)

which is in the form of a least-squares estimator. When the basisfunctions are orthonormal, Φ′Φ = I , then (3.128) becomes

α = Φ′Y (3.129)

Note that the operation Φ′Y can be carried out using the FastFourier Transform (FFT), and the operation Φα can be carried outusing the inverse FFT.

8 / 31

Figure 5.1. SST anomalies one location through time.

9 / 31

Figure 5.4. SST anomalies in tropical Pacific through time.

10 / 31

Empirical Covariance Functions

Assume we have observations Zt ≡ (Z (s1; t), . . . ,Z (sm; t)′) fort = 1, . . . ,T . An m ×m empirical (averaged over time) lag-τ spatialcovariance matrix is given by

C(τ)Z ≡

1

t − τ

T∑t=τ+1

(Zt − µZ )(Zt−τ − µZ )′, τ = 0, 1, . . . ,T (5.1)

where the empirical spatial mean is given by

µZ ≡1

T

T∑t=1

Zt

11 / 31

Figure 5.6. Lag-0 covariance and correlation along equator for SST.

12 / 31

Figure 5.7. Lag-6 covariance and correlation along equator for SST.

13 / 31

The estimated spatio-temporal covariance at spatial lag h and timelag τ is given by

CZ (h; τ) ≡ 1

|Ns(h)|1

|Nt(τ)|×∑

si ,sj∈Ns(h)

∑t,r∈Nt(τ)

(Z (si ; t)− µ(si))(Z (sj ; r)− µ(sj))

where Ns(h) refers to pairs of spatial locations with spatial lag withinsome tolerance of h, Nt(τ) refers to pairs of time points with timelag within some tolerance of τ , and |N(·)| refers to the cardinality(number of elements) in the set N(·). Also,

µZ (si) ≡1

T

T∑t=1

Z (si ; t)

14 / 31

Figure 5.9. SST spatio-temporal covariance.

15 / 31

5.3 Empirical Orthogonal Function (EOF) Analysis

I EOFs are eigenvectors from eigen (spectral) decomposition ofcovariance matrix

I In discrete formulation is PCA, in continuous is Karhunen-Loeveexpansion.

I Typically used

1. Diagnostically to find principal spatial structures and how thosevary

2. To reduce dimensionality in spatio-temporal data sets and toreduce noise

16 / 31

5.3.1 Spatially Continuous Formulation

Consider data {Zt(s) : s ∈ Dt , t = 1, 2, . . . } from a continuousspatial process measured at discrete time intervals. We want to findan optimal separable orthogonal decomposition:

Zt(s) =∞∑k=1

αt(k)φk(s), (5.17)

such that var(αt(1)) > var(αt(1)) > . . . , and cov(αt(i), αt(k)) = 0for all i 6= k .

17 / 31

The solution is the Karhunen-Loeve expansion, which allows thedecomposition:

C(0)Z (s, r) =

∞∑k=1

λkφk(s)φk(r), (5.18)

where {φk(·)} are the eigenfunctions and {λk} are the eigenvalues ofthe Fredholm integral equation:∫

Ds

C(0)Z (s, r)φk(s)ds = λkφk(r) (5.19)

This can be solved numerically, but is difficult and usually not done inpractice.The kth ”amplitude” times series {αt(k) : t = 1, 2, . . . } is

αt(k) =

∫Ds

Zt(s)φk(s)ds, t = 1, 2, . . . (5.22)

18 / 31

5.3.2 Spatially Discrete Formulation

I Let Zt ≡ (Z1(s1), . . . ,Zt(sm))′ and define the kth discrete EOFto be ψk ≡ (ψk(s1), . . . , ψk(sm)), where ψk is the vector in thelinear combination at(k) = ψ′kZt , for k = 1, . . . ,m.

I In general, ψk is the vector that maximizes var(at(k)) subject tothe constraints ψ′kψk = 1 and cov(at(k), at(j)) = 0 for allj 6= k . This is equivalent to solving the eigen decomposition ofC

(0)Z :

C(0)Z = ΨΛΨ′

where Ψ ≡ (ψ1, . . . ,ψm) is the m ×m orthonormal matrix ofeigenvectors (Ψ′Ψ = I ), Λ = diag(λ1, . . . , λm) is the m ×mdiagonal matrix of eigenvalues decreasing down the diagonal,var(at(k) = λk), and Zt is centered with mean 0.

19 / 31

Figure 5.17. SST first and second EOFs.

20 / 31

Figure 5.18. SST third and fourth EOFs.

21 / 31

5.5 Principal Oscillation Patterns (POPs)

I Spectral decomposition of the propogator matrix for thefirst-order dynamical system provides information aboutunderlying dynamics

I POP analysis based on spectral decomposition of propogatormatrix of a spatio-temporal process expressed as a first-orderdynamical system

I Consider the first-order discrete linear system

Zt = MZt−1 (5.26)

where Zt ≡ (Zt(s1), . . . ,Zt(s1))′, and M is the m ×m, full-rankbut non-symmetric propogator matrix.

22 / 31

I In short, if we calculate the normalized SVD M = WΛV′ then

Zt = WV′Zt = Wat (5.28)

where at ≡ V′Zt are the POP coefficients.

I The columns of W, {wk} are the principal oscillation patternsand the columns of V, {vk} are called the adjoint bases.

I We can form the recursion at = Λat−1 or at(k) = λkat−1, whichresults in solutions at(k) = (λk)t .

I These coefficients evolve according to at(k) = λtkeiφk t , where

γk = |λk |.I Components of interest are at(k), γk , φk , a0(k)/e andτk = −1/ ln(γk).

23 / 31

5.5.1 Calculation of POPs

I To account for uncertainty in the data, we write the process as afirst-order vector-autoregressive process:

Zt = MZt−1 + ηt (5.33)

where ηt has mean 0 with uncorrelated elements.

I Under second-order stationarity, M = C(1)Z (C(0)

Z )−1

I A method-of-moments estimator is M = C(1)Z (C(0)

Z )−1

I Calculate eigenvalue decomposition of M, which results inM = WΛW−1

I Set V′ = W−1, then the POP-coefficient estimates can beobtained from at = V′Zt

24 / 31

Figure 5.23. SST POP for 10th eigenvalue. a) real part, b) negativeimaginary part, c) negative real part, c) imaginary part.

25 / 31

5.6 Spatio-Temporal Canonical Correlation

Analysis (CCA)

I CCA obtains linear combinations of two sets of random variableswhose correlations are maximal

I Can apply to two random variables indexed by space and time

I Suppose we have two data sets {Zt ≡ (Zt(s1), . . . ,Zt(sm))′}and {Xt ≡ (Xt(x1), . . . ,Xt(x`))′} with a possibly differentspatial domain but the same temporal domain (t = 1, . . . ,T )

I The kth canonical correlation is defined as

rk ≡ corr(ξ′kZt ,ψ′kXt) =

ξ′kC(0)Z ,Xψ

′k

(ξ′kC(0)Z ξk)1/2(ψ′kC(0)

X ψk)1/2(5.36)

26 / 31

I For k = 1 we can write:

r 21 =[ξ′1(C(0)

Z )−1/2C(0)Z ,X (C(0)

X )−1/2ψ1]2

(ξ′1ξ1)(ψ

′1ψ1)

where ξ1 ≡ (C(0)Z )1/2ξ1 and ψ1 = (C(0)

X )1/2ψ1.

I Turns out that r 21 is the largest singular value in the singularvalue decomposition of

(C(0)Z )−1/2C(0)

Z ,X (C(0)X )−1/2

where ξ1 and ψ1 are the left and right singular vectors.

I The vectors ψ1 and ξ1 can then be calculated and the timeseries of canonical variables at(1) ≡ ξ′1Zt and bt(1) ≡ ψ′1Xt canbe obtained.

27 / 31

Figure 5.26. First CCA patterns for SST and Mallard counts

28 / 31

Figure 5.27. First canonical variables for SST and Mallard counts

29 / 31

References

I Cressie, N., and C. K. Wikle. 2011. Statistics forspatio-temporal data. John Wiley & Sons, New Jersey.

30 / 31

Chapter 5: Exploratory Methods for Spatio-Temporal Data · Chapter 5: Exploratory Methods for...

Documents

Transcript of Chapter 5: Exploratory Methods for Spatio-Temporal Data · Chapter 5: Exploratory Methods for...