Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller...

30

Transcript of Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller...

Page 1: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal
Page 2: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal
Page 3: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

Spatio-temporal design

Page 4: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

STATISTICS IN PRACTICE

Series Advisory Editors

Marian ScottUniversity of Glasgow, UK

Stephen SennCRP-Sante, Luxembourg

Wolfgang JankUniversity of Maryland, USA

Founding Editor

Vic BarnettNottingham Trent University, UK

Statistics in Practice is an important international series of texts which providedetailed coverage of statistical concepts, methods and worked case studies inspecific fields of investigation and study.

With sound motivation and many worked practical examples, the books showin down-to-earth terms how to select and use an appropriate range of statisticaltechniques in a particular practical field within each title’s special topic area.

The books provide statistical support for professionals and research workersacross a range of employment fields and research environments. Subject areascovered include medicine and pharmaceutics; industry, finance and commerce;public services; the earth and environmental sciences, and so on.

The books also provide support to students studying statistical courses appliedto the above areas. The demand for graduates to be equipped for the work envi-ronment has led to such courses becoming increasingly prevalent at universitiesand colleges.

It is our aim to present judiciously chosen and well-written workbooks tomeet everyday practical needs. Feedback of views from readers will be mostvaluable to monitor the success of this aim.

A complete list of titles in this series appears at the end of the volume.

Page 5: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

Spatio-temporal design

Advances in efficient data acquisition

Edited by

Jorge Mateu

Department of Mathematics of the UniversityJaume I of Castellon, Spain

Werner G. Muller

Department of Applied StatisticsJohannes Kepler University Linz, Austria

A John Wiley & Sons, Ltd., Publication

Page 6: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

This edition first published 2013© 2013 John Wiley & Sons, Ltd

Registered officeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply forpermission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with theCopyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted,in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except aspermitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may notbe available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brandnames and product names used in this book are trade names, service marks, trademarks or registeredtrademarks of their respective owners. The publisher is not associated with any product or vendor mentionedin this book. This publication is designed to provide accurate and authoritative information in regard to thesubject matter covered. It is sold on the understanding that the publisher is not engaged in renderingprofessional services. If professional advice or other expert assistance is required, the services of a competentprofessional should be sought.

Library of Congress Cataloging-in-Publication Data

Spatio-temporal design : advances in efficient data acquisition / edited by Jorge Mateu, Department ofMathematics of the University Jaume I of Castellon, Spain, Werner G. Muller, Department of AppliedStatistics, Johannes Kepler University Linz, Austria.

pages cm. – (Statistics in practice)ISBN 978-0-470-97429-2 (hardback)

1. Sampling (Statistics) 2. Spatial analysis (Statistics) I. Mateu, Jorge, editor of compilation.II. Muller, W. G. (Werner G.), editor of compilation.QA276.6.S63 2013001.4′33–dc23

2012027161

A catalogue record for this book is available from the British Library.

ISBN: 978-0-470-97429-2

Set in 10/12pt Times by Laserwords Private Limited, Chennai, India.

Page 7: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

To Eva and Evelyn

Page 8: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal
Page 9: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

Contents

Contributors xv

Foreword xix

1 Collecting spatio-temporal data 1Jorge Mateu and Werner G. Muller1.1 Introduction 11.2 Paradigms in spatio-temporal design 21.3 Paradigms in spatio-temporal modeling 31.4 Geostatistics and spatio-temporal random functions 4

1.4.1 Relevant spatio-temporal concepts 41.4.2 Properties of the spatio-temporal covariance and

variogram functions 61.4.3 Spatio-temporal kriging 81.4.4 Spatio-temporal covariance models 101.4.5 Parametric estimation of spatio-temporal

covariograms 111.5 Types of design criteria and numerical optimization 131.6 The problem set: Upper Austria 17

1.6.1 Climatic data 171.6.2 Grassland usage 18

1.7 The chapters 23Acknowledgments 28References 28

2 Model-based frequentist design for univariateand multivariate geostatistics 37Dale L. Zimmerman and Jie Li2.1 Introduction 372.2 Design for univariate geostatistics 38

2.2.1 Data-model framework 382.2.2 Design criteria 382.2.3 Algorithms 422.2.4 Toy example 42

Page 10: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

viii CONTENTS

2.3 Design for multivariate geostatistics 452.3.1 Data-model framework 452.3.2 Design criteria 472.3.3 Toy example 48

2.4 Application: Austrian precipitation data network 502.5 Conclusions 52References 53

3 Model-based criteria heuristics for second-phase spatial sampling 54Eric M. Delmelle3.1 Introduction 543.2 Geometric and geostatistical designs 56

3.2.1 Efficiency of spatial sampling designs 563.2.2 Sampling spatial variables in a geostatistical context 573.2.3 Sampling designs minimizing the kriging variance 58

3.3 Augmented designs: Second-phase sampling 593.3.1 Additional sampling schemes to maximize change in the

kriging variance 593.3.2 A weighted kriging variance approach 60

3.4 A simulated annealing approach 633.5 Illustration 65

3.5.1 Initial sampling designs 663.5.2 Augmented designs 68

3.6 Discussion 68References 69

4 Spatial sampling design by means of spectral approximationsto the error process 72Gunter Spock and Jurgen Pilz4.1 Introduction 724.2 A brief review on spatial sampling design 754.3 The spatial mixed linear model 764.4 Classical Bayesian experimental design problem 774.5 The Smith and Zhu design criterion 794.6 Spatial sampling design for trans-Gaussian kriging 814.7 The spatDesign toolbox 82

4.7.1 Covariance estimation and variography software 834.7.2 Spatial interpolation and kriging software 844.7.3 Spatial sampling design software 85

4.8 An example session 894.8.1 Preparatory calculations 894.8.2 Optimal design for the BSLM 934.8.3 Design for the trans-Gaussian kriging 94

4.9 Conclusions 98References 99

Page 11: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

CONTENTS ix

5 Entropy-based network design using hierarchical Bayesiankriging 103Baisuo Jin, Yuehua Wu and Baiqi Miao5.1 Introduction 1035.2 Entropy-based network design using hierarchical Bayesian

kriging 1055.3 The data 1075.4 Spatio-temporal modeling 1075.5 Obtaining a staircase data structure 1115.6 Estimating the hyperparameters Hg and the spatial correlations

between gauge stations 1135.7 Spatial predictive distribution over the 445 areas located in the

18 districts of Upper Austria 1175.8 Adding gauge stations over the 445 areas located in the 18

districts of Upper Austria 1205.9 Closing down an existing gauge station 1225.10 Model evaluation 124Appendix 5.1: Hierarchical Bayesian spatio-temporal modeling (or

kriging) 124Appendix 5.2: Some estimated parameters 128Acknowledgments 129References 129

6 Accounting for design in the analysis of spatial data 131Brian J. Reich and Montserrat Fuentes6.1 Introduction 1316.2 Modeling approaches 134

6.2.1 Informative missingness 1346.2.2 Informative sampling 1356.2.3 A two-stage approach for informative sampling 136

6.3 Analysis of the Austrian precipitation data 1376.4 Discussion 139References 141

7 Spatial design for knot selection in knot-based dimensionreduction models 142Alan E. Gelfand, Sudipto Banerjee and Andrew O. Finley7.1 Introduction 1427.2 Handling large spatial datasets 1457.3 Dimension reduction approaches 146

7.3.1 Basic properties of low rank models 1467.3.2 Predictive process models: A brief review 148

7.4 Some basic knot design ideas 1497.4.1 A brief review of spatial design 1497.4.2 A strategy for selecting knots 151

Page 12: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

x CONTENTS

7.5 Illustrations 1537.5.1 A simulation example 1537.5.2 A simulation example using the two-step analysis 1597.5.3 Tree height and diameter analysis 1607.5.4 Austria precipitation analysis 162

7.6 Discussion and future work 165References 166

8 Exploratory designs for assessing spatial dependence 170Agnes Fussl, Werner G. Muller and Juan Rodrıguez-Dıaz8.1 Introduction 170

8.1.1 The dataset and its visualization 1728.2 Spatial links 174

8.2.1 Spatial neighbors 1758.2.2 Spatial weights 176

8.3 Measures of spatial dependence 1788.4 Models for areal data 180

8.4.1 H 0: A spaceless regression model 1818.4.2 H0: Spatial regression models 185

8.5 Design considerations 1908.5.1 A design criterion 1928.5.2 Example 194

8.6 Discussion 195Appendix 8.1: R code 198Acknowledgments 202References 203

9 Sampling design optimization for space-time kriging 207Gerard B.M. Heuvelink, Daniel A. Griffith, Tomislav Hengland Stephanie J. Melles9.1 Introduction 2079.2 Methodology 209

9.2.1 Space-time universal kriging 2099.2.2 Sampling design optimization with spatial simulated

annealing 2119.3 Upper Austria case study 212

9.3.1 Descriptive statistics 2129.3.2 Estimation of the space-time model and universal

kriging 2159.3.3 Optimal design scenario 1 2189.3.4 Optimal design scenario 2 2199.3.5 Optimal design scenario 3 219

9.4 Discussion and conclusions 221Appendix 9.1: R code 222

Page 13: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

CONTENTS xi

Acknowledgment 227References 228

10 Space-time adaptive sampling and data transformations 231Jose M. Angulo, Marıa C. Bueso and Francisco J. Alonso10.1 Introduction 23110.2 Adaptive sampling network design 233

10.2.1 A simulated illustration 23510.3 Predictive information based on data transformations 23810.4 Application to Upper Austria temperature data 24210.5 Summary 246Acknowledgments 247References 247

11 Adaptive sampling design for spatio-temporal prediction 249Thomas R. Fanshawe and Peter J. Diggle11.1 Introduction 24911.2 Review of spatial and spatio-temporal adaptive designs 25111.3 The stationary Gaussian model 253

11.3.1 Model specification 25311.3.2 Theoretically optimal designs 25411.3.3 A comparison of design strategies 254

11.4 The dynamic process convolution model 25711.4.1 Model specification 25711.4.2 A comparison of design strategies 258

11.5 Upper Austria rainfall data example 26211.6 Discussion 264Appendix 11.1 266References 267

12 Semiparametric dynamic design of monitoring networks fornon-Gaussian spatio-temporal data 269Scott H. Holan and Christopher K. Wikle12.1 Introduction 26912.2 Semiparametric non-Gaussian space-time dynamic design 271

12.2.1 Semiparametric spatio-temporal dynamic Gammamodel 271

12.2.2 Simulation-based dynamic design 27412.2.3 Extended Kalman filter for dynamic gamma models 27512.2.4 Extended Kalman filter design algorithm 277

12.3 Application: Upper Austria precipitation 27812.4 Discussion 282Acknowledgments 282References 283

Page 14: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

xii CONTENTS

13 Active learning for monitoring network optimization 285Devis Tuia, Alexei Pozdnoukhov, Loris Forestiand Mikhail Kanevski13.1 Introduction 28513.2 Statistical learning from data 287

13.2.1 Algorithmic approaches to learning 28813.2.2 Over-fitting and model selection 288

13.3 Support vector machines and kernel methods 28913.3.1 Classification: SVMs 29013.3.2 Density estimation: One-class SVM 29213.3.3 Regression: Kernel ridge regression 29313.3.4 Regression: SVR 294

13.4 Active learning 29413.4.1 A general framework 29513.4.2 First steps in active learning: Reducing output variance 29613.4.3 Exploration–exploitation strategies: Towards mixed

approaches 29713.5 Active learning with SVMs 297

13.5.1 Margin sampling 29713.5.2 Diversity of batches of samples 29913.5.3 Committees of models 299

13.6 Case studies 30013.6.1 Austrian climatological data 30013.6.2 Cesium-137 concentration after Chernobyl 30413.6.3 Wind power plants sites evaluation 307

13.7 Conclusions 312Acknowledgments 314References 314

14 Stationary sampling designs based on plume simulations 319Kristina B. Helle and Edzer Pebesma14.1 Introduction 31914.2 Plumes: From random fields to simulations 32014.3 Cost functions 324

14.3.1 Detecting plumes 32414.3.2 Mapping and characterising plumes 32514.3.3 Combined cost functions 325

14.4 Optimisation 32614.4.1 Greedy search 32614.4.2 Spatial simulated annealing 32814.4.3 Genetic algorithms 32914.4.4 Other methods 33114.4.5 Evaluation and sensitivity 33114.4.6 Use case: Combination and comparison

of optimisation algorithms 332

Page 15: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

CONTENTS xiii

14.5 Results 33414.5.1 Simulations 33414.5.2 Greedy search 33514.5.3 Sensitivity of greedy search to the plume simulations 33614.5.4 Comparison of optimisation algorithms 337

14.6 Discussion 340Acknowledgments 341References 341

Index 345

Page 16: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal
Page 17: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

Contributors

Francisco J. AlonsoDepartment of StatisticsUniversity of GranadaSpain

Jose M. AnguloDepartment of StatisticsUniversity of GranadaSpain

Sudipto BanerjeeDivision of BiostatisticsSchool of Public HealthUniversity of MinnesotaMinneapolis, USA

Marıa C. BuesoDepartment of Applied Mathematicsand StatisticsTechnical University of CartagenaMurcia, Spain

Eric M. DelmelleGeography and Earth SciencesUniversity of North Carolina atCharlotte, USA

Peter J. DiggleLancaster Medical SchoolLancaster University, UKandInstitute of Infection and GlobalHealthUniversity of Liverpool, UK

Thomas R. FanshaweLancaster Medical SchoolLancaster University, UK

Andrew O. FinleyDepartment of Geography andDepartment of ForestryMichigan State UniversityEast Lansing, USA

Loris ForestiInstitute of Geomatics and Analysis ofRisk (IGAR)University of LausanneSwitzerland

Montserrat FuentesDepartment of StatisticsNorth Carolina State UniversityUSA

Agnes FusslDepartment of Applied StatisticsJohannes Kepler University LinzAustria

Alan E. GelfandDepartment of Statistical ScienceDuke UniversityDurham, USA

Page 18: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

xvi CONTRIBUTORS

Daniel A. GriffithSchool of Economic, Political andPolicy SciencesUniversity of Texas at DallasUSA

Kristina B. HelleInstitute for Geoinformatics (IFGI)University of MuensterGermany

Tomislav HenglISRIC – World Soil InformationWageningenThe Netherlands

Gerard B.M. HeuvelinkDepartment of Environmental SciencesWageningen UniversityThe Netherlands

Scott H. HolanDepartment of StatisticsUniversity of MissouriColumbia, USA

Baisuo JinSchool of ManagementUniversity of Science and Technologyof China, HefeiPeople’s Republic of China

Mikhail KanevskiInstitute of Geomatics and Analysis ofRisk (IGAR)University of LausanneSwitzerland

Jie LiDepartment of StatisticsVirginia Tech UniversityUSA

Jorge MateuDepartment of MathematicsUniversity of Jaume I of CastellonSpain

Stephanie J. MellesBiology DepartmentTrent UniversityOntario, Canada

Baiqi MiaoSchool of ManagementUniversity of Science and Technologyof China, HefeiPeople’s Republic of China

Werner G. MullerDepartment of Applied StatisticsJohannes Kepler University LinzAustria

Edzer PebesmaInstitute for Geoinformatics (IFGI)University of MuensterGermany

Jurgen PilzDepartment of StatisticsUniversity of KlagenfurtAustria

Alexei PozdnoukhovNational Centre for GeocomputationNational University of IrelandMaynooth, Ireland

Brian J. ReichDepartment of StatisticsNorth Carolina State UniversityUSA

Juan Rodrıguez-DıazFaculty of ScienceUniversity of SalamancaSpain

Gunter SpockDepartment of StatisticsUniversity of KlagenfurtAustria

Page 19: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

CONTRIBUTORS xvii

Devis TuiaImage Processing LaboratoryUniversity of ValenciaSpainandLaboratory of GeographicInformation SystemsLausanne Institute of TechnologyEPFLSwitzerland

Christopher K. WikleDepartment of StatisticsUniversity of MissouriColumbia, USA

Yuehua WuDepartment of Mathematics andStatisticsYork UniversityToronto, Canada

Dale L. ZimmermanDepartment of Statistics and ActuarialScienceUniversity of Iowa, USA

Page 20: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal
Page 21: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

Foreword

Imagine driving a car that has not been built yet. Design the car: look at principlesof combustion, stability, and comfort; consider the manufacturing tools avail-able, including the individuals who will build the car; and keep it within budget.Spatio-temporal sampling design has some of the same features: the principlesof stratification, replication, and randomization are important; measuring instru-ments have to be bought or built and possibly moved around the spatio-temporaldomain by field teams; and there is still a bottom line to adhere to.

In this edited volume of chapters on spatio-temporal sampling design, byand large the long-term design focus has been on ‘driving the car,’ that is, dataanalysis and inference are very much on the minds of the designers. There, itis the spatio-temporal variability that is the coin of the realm. Controlling thisvariability allows for more precise inferences and a greater likelihood of detecting‘signals’ in the data. Importantly, relating this benefit to a cost, allows an efficientallocation of a study’s resources.

Readers of the book’s chapters will find a myriad of techniques for linkingthe design with the analysis, and by far the majority of the authors concentrate onmodel-based designs. The models are statistical and require some knowledge ofthe underlying spatio-temporal variability, presumably from a pilot study. (Builda prototype and test drive it!) Statistical analyses require assumptions, and animportant one in spatio-temporal design is that the spatio-temporal sampling inad-equacies not be confounded with sources of variability due to the process beingstudied. A small amount of randomization in the design can be very prudent,like putting a spare tire in the back of the car. So too can a sampling proto-col that includes samples very close together in space and time to disentanglemeasurement error from microscale variation.

One of the strengths of the book is that the editors asked the authors to use acommon dataset, namely rainfall, temperature, and grassland-usage measurementsin Upper Austria. Readers can see how different chapters’ design criteria relateto this dataset. While it is not large in size, there are many scientific applicationswhere sampling is limited (e.g., computer experiments, and wellington-boots-on-the-ground field studies).

To see which design approaches scale up to massive datasets, it is usu-ally better to simulate first from a known process and determine what canbe recovered from noisy, incompletely sampled, but massive data. In the geo-sciences, these are sometimes called Observing System Simulation Experiments

Page 22: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

xx FOREWORD

(OSSEs), and they are used to mimic the data explosion coming from satelliteremote-sensing instruments. This spatio-temporal data explosion is also comingfrom the mobile devices we carry around as we move through our space-timecontinuum; crowd-sourcing of this sort offers new statistical sampling challengesto build an accurate information base, and then a knowledge base for makingimportant societal decisions.

The authors of the chapters in this book are eminent in their field, and theeditors have meticulously framed a state-of-the-art snapshot for 2012. This is afertile area for future research, but we should not forget the bottom line, thatsampling is costly.

Noel CressieUniversity of Wollongong, Australiaand The Ohio State University, USA

Page 23: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

1

Collecting spatio-temporaldata

Jorge Mateu1 and Werner G. Muller2

1Department of Mathematics, University of Jaume I of Castellon, Spain2Department of Applied Statistics, Johannes Kepler University Linz,

Austria

1.1 Introduction

In this volume we intend to provide a comprehensive state-of-the-art presen-tation combining both classical and modern treatments of network design andplanning for spatial and spatio-temporal data acquisition. A common problem setis interwoven throughout the chapters, providing various perspectives to illus-trate a complete insight to the problem at hand. Motivated by the high demandfor statistical analysis of data that takes spatial and spatio-temporal informationinto account, this book incorporates ideas from the areas of time series, spa-tial statistics and stochastic processes, and combines them to discuss optimumspatio-temporal sampling design.

The past has seen, perhaps initiated by Gribik et al. (1976), a great num-ber of statistical papers devoted to the purely spatial aspect of sampling designmainly in the context of monitoring networks. Other early papers include thoseby Caselton and Zidek (1984), Olea (1984) and Fedorov and Muller (1988);book-length treatments are given by Muller (1998, 2007) and de Gruijter et al.

Spatio-temporal design: Advances in efficient data acquisition, First Edition.Edited by Jorge Mateu and Werner G. Muller.© 2013 John Wiley & Sons, Ltd. Published 2013 by John Wiley & Sons, Ltd.

Page 24: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

2 SPATIO-TEMPORAL DESIGN

(2006). An excellent recent overview over this literature is provided by Zidekand Zimmerman (2010) and we will take the liberty in this introduction to drawheavily from their structure and exposition, albeit complementing it with aspectsthat enter due to the additional temporal component. Another excellent reviewpaper is that by Dobbie et al. (2008), who provide some material on spatio-temporal aspects. As those two texts are comprehensive we can restrict ourselvesto a brief exposition with the goal and emphasis to lead the readers into the moresubstantial subsequent chapters.

1.2 Paradigms in spatio-temporal design

An important clarification that needs to be made before thinking about spati(o-tempor)al design is whether we assume the randomness of the observations tostem from stochastic disturbances or from the sampling process itself. This leadsto the distinction of so-called model-based and design-based (rather than thiscommon but confusing expression we prefer to call them probability-based)inferences and their respective design procedures.

The probability-based paradigm is rooted in classical sampling theory andassumes the ability of defining a population explicitly and a respective random-ness in the design. These methods aim at restoring unobserved observationsand more importantly general attributes of the spatial population, such as totalmeans (cf. de Gruijter and ter Braak 1990) and variances (cf. Fewster 2011).Probability-based inferences of these attributes are bias-free and allow uncer-tainty assessments under mild assumptions. The corresponding design techniquesreach from the benchmark random sampling to stratified, two-stage, cluster orsequential random sampling with a multitude of variants (Stehman 1999) thatall lend themselves to straightforward extensions into incorporating a temporaldimension (Brus and de Gruijter 2011). An excellent account of the latter can befound in Part IV of de Gruijter et al. (2006), which can in general be consideredthe most definitive text for the probability-based design paradigm.

The model-based paradigm on the other hand requires a statistical model todescribe the data-generating spatio-temporal process. Here we typically assumethat observations stem from a random field generally given by

Z(x, s, t) = η(x(s, t), s, t, β) + ε(x, s, t), s ∈ D, t ∈ T , (1.1)

where s denotes a spatial location, t a time point, x some potentially space andtime dependent regressors, and η a parametrized trend model (a nearly encyclo-pedic reference for these type of processes is Cressie and Wikle 2011). Note thatthe random element here is the error ε rather than the design mechanism. Thisallows to assign meaning to purely geometric designs, such as regular grids orspace-filling lattices, that are common in applications. Another advantage here isthat by borrowing inference strength from the model we can make meaningfulinferences from rather small samples and for very specific aspects derived from

Page 25: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

COLLECTING SPATIO-TEMPORAL DATA 3

the model parameters, such as threshold exceedances, times of trend-reversals,local outliers, etc.

We believe that unless a reasonable modeling is out of reach the model-basedapproach offers more flexibility and statistical power, which is why most of thecontributions in this volume will fall into this category. However, this issue hasbeen the subject of considerable debate in the literature and further details areprovided in Papritz and Webster (1995), Brus and de Gruijter (1997), Stevens(2006), and an overview is given in Table 1.1 of Dobbie et al. (2008). A generaldiscussion that goes beyond the spatio-temporal realm can be found in Thompson(2002). Recently, there also have been attempts to fuse the two paradigms, Brusand de Gruijter (2012) for instance employ probability-based sampling for thespatial coordinates, whereas they build a time series model and use a respectivedesign for the temporal trend. How to include probability-based design in ahierarchical statistical modeling framework is surveyed in Cressie et al. (2009).

1.3 Paradigms in spatio-temporal modeling

Another dichotomy clearly shows when one examines the spati(o-tempor)almodeling literature. Historically, two schools have somewhat independentlydeveloped, one based on discrete time series model analogies and the other onederived from generalizations of stochastic process methodologies. The formeris much used by geographers and economists particularly in the advent ofwhat was termed ‘new economic geography’ and was consequently referred toas ‘spatial econometrics’ (for perhaps the earliest full exposition see Anselin1988; a recent account of the history of the field thereafter by the same authorcan be found in Anselin 2010). The latter school stems from the theory ofregionalized variables developed among mining scientists and geologists and hasconsequently been named ‘geostatistics’ (a book-length treatment is providedby Chiles and Delfiner 1999). Comparative discussions on these two paradigmscan be found in the encyclopedic Cressie (1993) and more recently in Griffithand Paelinck (2007), Hae-Ryoung et al. (2008) and Haining et al. (2010).

Both of these modeling views are encompassed by the random field (1.1)and can be solely distinguished by the nature of the indexing variables s andt . In spatial econometrics spatial econometrics we usually assume the s’s toform a discrete geographic lattice and their relationships are usually describedin the form of a so-called spatial weight or link matrix W . Various typesof dependence structures can be modeled by assigning particular forms of η

and covariances of ε employing W , such as the common simultaneous andconditionally autoregressive regression models (SAR and CAR), the latter beingspatial manifestations of Gaussian Markov random fields (GMRF; see Rue andHeld 2005 for a definitive text).

In geostatistics the locations s are assumed to vary continuously in D andagain the implied models differ by the choice of η and the error dependence

Page 26: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

4 SPATIO-TEMPORAL DESIGN

usually determined by the so-called variogram γ (s, s′) = E(|Z(x, s, t) −Z(x, s′, t)|2). Under normality assumptions for the errors these processes alsocome under the notion of Gaussian random fields (GRF) and they are also muchin use in other contexts such as machine learning and computer simulationexperiments (cf. Rasmussen and Williams 2005). Though more flexible thecorresponding models are usually much more estimation intensive than thosefor GMRF, which can typically be employed for much larger spatial datasets.

Despite this divide there have lately been successful attempts to merge thosetwo spatio-temporal modeling paradigms. While previously only results for reg-ular sampling schemes were available (cf. Griffith and Csillag 1993), Lindgrenet al. (2011) provide an explicit link for arbitrary lattices, thus opening the issuefor the question of sampling design. In a discussion to this article Muller andWaldl (2011) indeed uncover relationships between the respective designs thatwill allow to exploit properties from both paradigms.

A great number of spatio-temporal extensions of these models exist particu-larly for GRF; see Cressie and Wikle (2011) for an extensive review and Baxevaniet al. (2011) for a particular representation using velocity fields. GMRF are usu-ally extended by modeling them in discrete time, so-called spatial panel models(see e.g., Elhorst 2012 for a recent survey); a continuous time extension of spatialpanels is given in Oud et al. (2012).

1.4 Geostatistics and spatio-temporalrandom functions

Geostatistical research has typically analyzed random fields, in which everyspatio-temporal location can be seen as a point on R

d × R. While from a math-ematical point of view R

d × R = Rd+1, from a physical perspective it would

make no sense to consider spatial and temporal aspects in the same way, due tothe significant differences between the two axes of coordinates. Therefore, whilethe time axis is ordered intrinsically (as it exists in the past, present and future),the same does not occur with the spatial coordinates.

Recalling (1.1), assume that observations stem from a random field (r.f.)given by Z(x, s, t) = η(x(s, t), s, t, β) + ε(x, s, t), s ∈ D, t ∈ T , where sdenotes a spatial location, t a time point, x some potentially space and timedependent regressors, η a parametrized trend model, D ⊂ R

d (very often d = 2),and T ⊂ R. For ease of notation, we remove the term in the covariates x, andwrite Z(s, t), assuming whenever necessary that any trend coming from a setof covariates has already been removed.

1.4.1 Relevant spatio-temporal concepts

A spatio-temporal r.f. Z(s, t) is said to be Gaussian if the random vec-tor Z = (Z(s1, t1), ..., Z(sn, tn))

′ for any set of spatio-temporal locations

Page 27: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

COLLECTING SPATIO-TEMPORAL DATA 5

{(s1, t1), ..., (sn, tn)} follows a multivariate normal distribution. When not statedexplicitly, the indexes i and j will go from 1 to n.

The spatio-temporal r.f. Z(s, t) is said to have a spatially stationary covari-ance function if, for any two pairs (si , ti) and (sj , tj ) on R

d × R, the covarianceC((si , ti), (sj , tj )) only depends on the distance between the locations si and sj

and the times ti and tj . And the spatio-temporal r.f. Z(s, t) is said to have a tem-porarily stationary covariance function if, for any two pairs (si , ti) and (sj , tj ) onR

d × R, the covariance C((si , ti), (sj , tj )) only depends on the distance betweenthe times ti and tj and the spatial locations si and sj . If the spatio-temporal r.f.Z(s, t) has a stationary covariance function in both spatial and temporal terms,then it is said to have a stationary covariance function. In this case, the covariancefunction can be expressed as

C((si , ti), (sj , tj )) = C(h, u) (1.2)

with h = si − sj and u = ti − tj the distances in space and time, respectively.A spatio-temporal r.f. Z(s,t) has a separable covariance function if there is

a purely spatial covariance function Cs(si , sj ) and a purely temporal covariancefunction Ct(ti , tj ) such that

C((si , ti), (sj , tj )) = Cs(si , sj )Ct (ti , tj ) (1.3)

for any pair of spatio-temporal locations (si , ti) and (sj , tj ) ∈ Rd × R.

A spatio-temporal r.f. Z(s,t) has a fully symmetrical covariance function if

C((si , ti), (sj ,tj )) = Cs(si , tj )Ct (sj , ti) (1.4)

for any pair of spatio-temporal locations (si , ti) and (sj , tj ) ∈ Rd × R.

Separability is a particular case of complete symmetry and, as such, any testto verify complete symmetry can be used to reject separability. In the case ofstationary spatio-temporal covariance functions, the condition of full symmetryreduces to

C(h, u) = C(h, −u) = C(−h, u) = C(−h,−u), ∀(h, u) ∈ Rd × R. (1.5)

A spatio-temporal r.f. has a compactly supported covariance function if, forany pair of spatio-temporal locations (si , ti) and (sj , tj ) ∈ R

d × R, the covari-ance function C((si , ti), (sj , tj )) tends towards zero when the spatial or temporaldistance is sufficiently large.

If C(si − sj , ti − tj ) depends only on the distance between positions, that is,(‖si − sj‖, ti − tj), the r.f., apart from being stationary, is also isotropic in space

and time. Note that if the covariance function of a stationary r.f. is isotropic inspace and time, then it is fully symmetrical.

The spatio-temporal variogram is defined as the function

2γ ((si , ti), (sj , tj )) = V (Z(si , ti) − Z(sj , tj )), (1.6)

where V is the variance, and half this quantity is called a semivariogram.

Page 28: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

6 SPATIO-TEMPORAL DESIGN

In the case of a r.f. with a zero mean,

2γ ((si , ti), (sj , tj )) = E[(Z(si , ti) − Z(sj , tj ))2]. (1.7)

Whenever it is possible to define the covariance function and the variogram,they will be related by means of the following expression

2γ ((si , ti), (sj , tj )) = V (Z(si , ti)) + V (Z(sj , tj )) − 2C((si , ti), (sj , tj )). (1.8)

If the spatio-temporal r.f. Z(s, t) has an intrinsically stationary variogram inboth space and time, then it is said to have an intrinsically stationary variogram.In this case, the variogram can be expressed as

2γ ((si , ti), (sj , tj )) = 2γ (h, u). (1.9)

The marginals 2γ (·, u) and 2γ (h, ·) are called purely spatial and purely tem-poral variograms, respectively.

A r.f. Z(s, t) is strictly stationary if its probability distribution is translationinvariant. Second-order stationarity is a less demanding condition than strict sta-tionarity. A spatio-temporal r.f. Z(s, t) is second-order stationary in the broadsense or weakly stationary if it has a constant mean and the covariance functiondepends on h and u.

A spatio-temporal r.f. Z(s, t) is said to be intrinsically stationary if it hasa constant mean and an intrinsically stationary variogram. Intrinsic stationarityis less restrictive than second-order stationarity. Another widely used functionwhen modeling implicit spatio-temporal dependence in a stationary r.f. is thecorrelation function. Let Z(s, t) be a second-order stationary r.f. with a priorivariance σ 2 = C(0, 0)> 0. The autocorrelation function of this r.f. is defined as

ρ(h, u) = C(h, u)

C(0, 0). (1.10)

If ρ(h, u) is a correlation function on Rd × R, then its marginal functions

ρ(0, u) and ρ(h, 0) will respectively be the spatial correlation function on Rd

and the temporal correlation function on R.

1.4.2 Properties of the spatio-temporal covarianceand variogram functions

A function C((si , ti), (sj , tj )) of real values, defined on Rd × R is a covari-

ance function if it is symmetrical, C((si , ti), (sj , tj )) = C((sj , tj ), (si , ti)) andpositive-definite, that is,

n∑i=1

n∑j=1

aiajC((si , ti), (sj , tj )) ≥ 0 (1.11)

Page 29: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

COLLECTING SPATIO-TEMPORAL DATA 7

for any n ∈ N, (si , ti) ∈ Rd × R, and ai ∈ R, i = 1, ..., n. The condition (1.11)

is sufficient if the covariance function can take complex values. Similarly, onenecessary and sufficient condition for a non-negative function of real valuesγ ((si , ti), (sj , tj )) defined on R

d × R to be a semivariogram is that it is a sym-metrical function and conditionally negative-definite, that is,

n∑i=1

n∑j=1

aiajγ ((si , ti), (sj , tj )) ≤ 0 (1.12)

withn∑

i=1ai = 0.

Schoenberg (1938) proved the following theorem characterizing the spatio-temporal semivariogram. Let γ ((si , ti), (sj , tj )) be a function defined on R

d × R,with γ ((s, t), (s, t)) = 0, ∀(s, t) ∈ R

d × R. Then the following statements areequivalent:

• γ ((si , ti), (sj , tj )) is a semivariogram on Rd × R.

• exp(−θγ ((si , ti), (sj , tj ))

)is a covariance function on R

d × R, for anyθ > 0.

• C((si , ti), (sj , tj )) = γ ((si , ti), (0, 0)) + γ ((sj , tj ), (0, 0)) − γ ((si , ti),

(sj , tj )) is a covariance function on Rd × R.

In case of stationarity, the above results reduce to functions depending onspatial and temporal lags. Another seminal result that characterizes covariancefunctions is that given in Bochner (1933). A function C(h, u) defined on R

d × R

is a stationary covariance function if, and only if, it has the following form

C(h, u) =∫ ∫

ei(ω′h+τu)dF (ω, τ ), (h, u) ∈ Rd × R (1.13)

where the function F is a non-negative distribution function with a finite meandefined on R

d × R, which is known as a spectral distribution function . Therefore,the class of stationary spatio-temporal covariance functions on R

d × R is identicalto the class of Fourier transforms of non-negative distribution functions with finitemeans on that domain. If the function C can also be integrated, then the spectraldistribution function F is absolutely continuous and the representation (1.13)simplifies to

C(h, u) =∫ ∫

ei(ω′h+τu)f (ω, τ )dωdτ, (h, u) ∈ Rd × R (1.14)

where f is a non-negative, continuous and integrable function that is known asa spectral density function . The covariance function C and the spectral densityfunction f then form a pair of Fourier transforms , and

f (ω, τ ) = (2π)−d−1∫ ∫

e−i(ω′h+τu)C(h, u)dhdu (1.15)

Page 30: Spatio-temporal design · 1 Collecting spatio-temporal data 1 Jorge Mateu and Werner G. M¨uller 1.1 Introduction 1 1.2 Paradigms in spatio-temporal design 2 1.3 Paradigms in spatio-temporal

8 SPATIO-TEMPORAL DESIGN

The decomposition (1.13) can be specialized for fully symmetrical covariancefunctions. Let C(·, ·) be a continuous function defined on R

d × R, then C(·, ·) isa fully symmetrical stationary covariance function if, and only if, the followingdecomposition is possible

C(h, u) =∫ ∫

cos(ω′h) cos(τu)dF (ω, τ ), (h, u) ∈ Rd × R (1.16)

where F is the non-negative and symmetrical spectral distribution functiondefined on R

d × R.Cressie and Huang (1999) provide a theorem for characterizing the class of

stationary spatio-temporal covariance functions under the additional hypothesisof integrability. Let C(·, ·) be a continuous, bounded, symmetrical and integrablefunction defined on R

d × R, then C(·, ·) is a stationary covariance function if,and only if, in view of u ∈ R,

Cω(u) =∫

e−iω′hC(h, u)dh, (1.17)

is a covariance function for every ω ∈ Rd except, at the most, in a set with a

null Lebesgue mean. Gneiting (2002) generalizes this result for C defined onR

d × Rl , from which the previous statement is a particular case for l = 1.

Both the covariance function and the spectral density function are importanttools for characterizing random stationary spatio-temporal fields. Mathematicallyspeaking, both functions are closely related as a pair of Fourier transforms.Furthermore, the spectral density function is particularly useful in situationswhere there is no explicit expression of the covariance function. Stein (2005)shows the benefit of using smooth covariance functions far from the origin,which can be tested by verifying whether their spectral densities have derivativesof certain orders.

1.4.3 Spatio-temporal kriging

Kriging is aimed at predicting an unknown point value Z(s0, t0) at a point (s0, t0)

that does not belong to the sample. To do so, all the information available aboutthe regionalized variable is used, either at the points in the entire domain or ina subset of the domain called the neighborhood .

Assume that the value of the r.f. has been observed on a set of n

spatio-temporal locations {Z(s1, t1), ..., Z(sn, tn)}. We now want to predict thevalue of the r.f. on a new spatio-temporal location (s0, t0), for which we use thelinear predictor

Z∗(s0, t0) =n∑

i=1

λiZ(si, ti) (1.18)

constructed from the random variables Z(si , ti). As in the spatial case, spatio-temporal kriging equations will depend on the degree of stationarity attributed to