Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models
description
Transcript of Using the Maryland Biological Stream Survey Data to Test Spatial Statistical Models
MSS/MBSS # 1
N. Scott Urquhart N. Scott Urquhart
Joint work withJoint work withErin P. Peterson, Andrew A. Merton, Erin P. Peterson, Andrew A. Merton, David M. Theobald, and Jennifer A. David M. Theobald, and Jennifer A.
HoetingHoeting
All of Colorado State University, Fort All of Colorado State University, Fort Collins, CO 80523-1877Collins, CO 80523-1877
N. Scott Urquhart N. Scott Urquhart
Joint work withJoint work withErin P. Peterson, Andrew A. Merton, Erin P. Peterson, Andrew A. Merton, David M. Theobald, and Jennifer A. David M. Theobald, and Jennifer A.
HoetingHoeting
All of Colorado State University, Fort All of Colorado State University, Fort Collins, CO 80523-1877Collins, CO 80523-1877
Using the Maryland Biological Using the Maryland Biological Stream Survey Data Stream Survey Data
to to Test Spatial Statistical ModelsTest Spatial Statistical Models
Using the Maryland Biological Using the Maryland Biological Stream Survey Data Stream Survey Data
to to Test Spatial Statistical ModelsTest Spatial Statistical Models
MSS/MBSS # 2
This research is funded by
U.S.EPA – Science To AchieveResults (STAR) ProgramCooperativeAgreement
# CR - 829095
The work reported here today was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of presenter and STARMAP, the Program he represents. EPA does not endorse any products or commercial services mentioned in this presentation.
FUNDING ACKNOWLEDGEMENT
MSS/MBSS # 3
0 5,000 Meters
Maryland Bioglogical Stream Survey (MBSS) Sample Site Locations
Legend
MBSS sample sites
1:100,000 National Hydrography Dataset
Maryland
¯
0 30Kilometers
MSS/MBSS # 4
OUR PATH TODAYOUR PATH TODAYOUR PATH TODAYOUR PATH TODAY
What are “Spatial Statistical What are “Spatial Statistical Models”?Models”?
Measuring Distance in SpaceMeasuring Distance in Space The Maryland Biological Stream The Maryland Biological Stream
SurveySurvey Outstanding data set to compare
models A Few ResultsA Few Results Work in ProgressWork in Progress
What are “Spatial Statistical What are “Spatial Statistical Models”?Models”?
Measuring Distance in SpaceMeasuring Distance in Space The Maryland Biological Stream The Maryland Biological Stream
SurveySurvey Outstanding data set to compare
models A Few ResultsA Few Results Work in ProgressWork in Progress
MSS/MBSS # 5
GATHERING SOME INSIGHTSGATHERING SOME INSIGHTSGATHERING SOME INSIGHTSGATHERING SOME INSIGHTS
Raise your hand if you Raise your hand if you Had a statistics course – even in the
distant past Remember doing a t-test Did a simple linear regression (fitted a
line) Did a multiple regression Examined model failures Did analyses accommodating
“correlated errors” Have used spatial statistics, eg,
kreiging
Raise your hand if you Raise your hand if you Had a statistics course – even in the
distant past Remember doing a t-test Did a simple linear regression (fitted a
line) Did a multiple regression Examined model failures Did analyses accommodating
“correlated errors” Have used spatial statistics, eg,
kreiging
MSS/MBSS # 6
STATISTICS AND PREDICTIONSTATISTICS AND PREDICTIONSTATISTICS AND PREDICTIONSTATISTICS AND PREDICTION
OBJECTIVE: Measure relevant OBJECTIVE: Measure relevant responses, responses, Like dissolved organic carbon (DOC),
and Related variables at suitable sites, then Develop formula to predict DOC at
Unvisited sites
Why? Why? Clean Water Act (CWA) 303(d)
requires states to identify “impacted” waters
and plan to eliminate impact What state has the $ to evaluate every
water? Predict, instead.
OBJECTIVE: Measure relevant OBJECTIVE: Measure relevant responses, responses, Like dissolved organic carbon (DOC),
and Related variables at suitable sites, then Develop formula to predict DOC at
Unvisited sites
Why? Why? Clean Water Act (CWA) 303(d)
requires states to identify “impacted” waters
and plan to eliminate impact What state has the $ to evaluate every
water? Predict, instead.
MSS/MBSS # 7
PREDICTIVE VARIABLESPREDICTIVE VARIABLESPREDICTIVE VARIABLESPREDICTIVE VARIABLES
Predict DOC from measures such asPredict DOC from measures such as Area above the stream evaluation
point % Barren % High Intensity Urban % Woody Wetland (*) % Conifer or Evergreen Forest Type (*) % Mixed Forest Type (*) % low intensity Urban (*) To accommodate year diff’s:
1996 & 1997 (*)
Predict DOC from measures such asPredict DOC from measures such as Area above the stream evaluation
point % Barren % High Intensity Urban % Woody Wetland (*) % Conifer or Evergreen Forest Type (*) % Mixed Forest Type (*) % low intensity Urban (*) To accommodate year diff’s:
1996 & 1997 (*)
MSS/MBSS # 8
GIS TOOLSGIS TOOLSGIS TOOLSGIS TOOLS
These variables require These variables require Efficient delineation of watershed
above any point STARMAP has developed such
software It is available Documented in a poster
These variables require These variables require Efficient delineation of watershed
above any point STARMAP has developed such
software It is available Documented in a poster
MSS/MBSS # 9
PREDICTIVE MODELSPREDICTIVE MODELSPREDICTIVE MODELSPREDICTIVE MODELS
Classical regression model would be:Classical regression model would be:
BUT “Everything is related to everything BUT “Everything is related to everything else, but near things are more related else, but near things are more related than distant things” Tobler (1970).than distant things” Tobler (1970). Thus the “uncorrelated” above is
indefensible in many cases
Classical regression model would be:Classical regression model would be:
BUT “Everything is related to everything BUT “Everything is related to everything else, but near things are more related else, but near things are more related than distant things” Tobler (1970).than distant things” Tobler (1970). Thus the “uncorrelated” above is
indefensible in many cases
0 1 1 2 2
where have a constant variance
and are UNCORRELATED.
i i i p pi i
i
Y X X X
MSS/MBSS # 10
SO WHAT ISSO WHAT IS SPATIAL STATISTICS?SPATIAL STATISTICS?
SO WHAT ISSO WHAT IS SPATIAL STATISTICS?SPATIAL STATISTICS?
Spatial Statistics is a set of Spatial Statistics is a set of techniques whichtechniques which Allow correlated data Index the amount of correlation by
distance the points are apart Incorporate this correlation into
predictions
Spatial Statistics is a set of Spatial Statistics is a set of techniques whichtechniques which Allow correlated data Index the amount of correlation by
distance the points are apart Incorporate this correlation into
predictions
MSS/MBSS # 11
SO WHAT ISSO WHAT IS SPATIAL STATISTICS II?SPATIAL STATISTICS II?
SO WHAT ISSO WHAT IS SPATIAL STATISTICS II?SPATIAL STATISTICS II?
MSS/MBSS # 12
WHAT ARE “SPATIAL STATISTICAL WHAT ARE “SPATIAL STATISTICAL MODELS”?MODELS”?
WHAT ARE “SPATIAL STATISTICAL WHAT ARE “SPATIAL STATISTICAL MODELS”?MODELS”?
MSS/MBSS # 13
MEASURING DISTANCE IN SPACEMEASURING DISTANCE IN SPACEMEASURING DISTANCE IN SPACEMEASURING DISTANCE IN SPACE
MSS/MBSS # 14
The Maryland Biological Stream SurveyThe Maryland Biological Stream SurveyThe Maryland Biological Stream SurveyThe Maryland Biological Stream Survey
Outstanding data set to compare Outstanding data set to compare modelsmodels
Outstanding data set to compare Outstanding data set to compare modelsmodels
MSS/MBSS # 15
A FEW RESULTSA FEW RESULTS A FEW RESULTSA FEW RESULTS
MSS/MBSS # 16
WORK IN PROGRESSWORK IN PROGRESS WORK IN PROGRESSWORK IN PROGRESS
MSS/MBSS # 17
MSS/MBSS # 18
The Clean Water Act (CWA) of 1972 requires• States, tribes, & territories to identify water quality (WQ) impaired stream segments• Create a priority ranking of those segments• Calculate the Total Maximum Daily Load (TMDL) for each impaired segment based upon
chemical and physical WQ standards• A biannual inventory characterizing regional WQ
The Problem• It is impossible to physically sample every stream within a large area
• Too many stream segments• Limited personnel• Cost associated with sampling
• Probability-based inferences used to generate regional estimates of WQ• In miles by stream order• Does not indicate where WQ impaired segments are located
• A rapid and cost-efficient method needed to locate potentially impaired stream segments throughout large areas
Our Approach• Develop a geostatistical model based on coarse-scale geographical information system
(GIS) data• Make predictions for every stream segment throughout a large area
• Generate a regional estimate of stream condition• Identify potentially WQ impaired stream segments
MSS/MBSS # 19
Dissolved Organic Carbon (DOC) ExampleFit a geostatistical model to DOC data and coarse-scale watershed characteristics• Maryland Biological Stream Survey data 1996• 7 interbasins & 343 DOC survey sites• GIS data:
GIS data, scale, and sources.Dataset Scale SourceUSGS National Hydrography Dataset (NHD) 1:250,000 http://nhd.usgs.gov/USGS National Land Cover Dataset (NLCD) 30 meter http://landcover.usgs.gov/natllandcover.aspNational Elevation Dataset (NED) 30 meter http://ned.usgs.gov/Omernik's Level III Ecoregion 1:7,500,000 http://www.epa.gov/wed/pages/ecoregions/level_iii.htmUSGS Lithology 1:250,000 USEPA Western Ecology Division, Corvallis, ORPRISM (Parameter-elevation Regressions on Independent Slopes Model) temperature data
4 kilometer http://www.ocs.orst.edu/prism/faq.phtml
MSS/MBSS # 20
MethodsPre-process GIS data• “Snap” survey sites to streams• Calculate watershed attributes using the Functional Linkage of Watersheds and Streams
(FLoWS) tools (Theobald et al., 2005; Peterson et al., in review)
Calculate distance matrices for model selection• R statistical software• x,y coordinates for observed survey sites
Covariates selected using the Leaps and Bounds regression algorithm.
Covariate DescriptionWATER % Water
EMERGWET % Emergent WetlandsWOODYWET % Woody wetlands
FELPERC % Felsic rock type in watershedMINTEMP Mean minimum temperature (°C)
(January to April)ER64 Omernik's Level 3 Ecoregion 64ER65 Omernik's Level 3 Ecoregion 65ER66 Omernik's Level 3 Ecoregion 66ER67 Omernik's Level 3 Ecoregion 67ER69 Omernik's Level 3 Ecoregion 69
• Test all possible linear models using the 10 covariates• 1024 models (210 = 1024)
• Distance measure: Straight-line distance (aka Euclidean)• Autocorrelation function: Mariah • Estimate autocorrelation parameters: nugget, sill, and range
• Profile-log likelihood function• Model Selection
• Spatial Akaike Information Corrected Criterion (AICC)• (Hoeting et al., in press)
• Mean square prediction error (MSPE)
MSS/MBSS # 21
Model Results• Range of spatial autocorrelation: 21.09 kilometers• Significant watershed attributes = WATER, EMERGWET, WOODYWET, FELPERC, and
MIN TEMP
Summary statistics for log10 DOC and model covariates.Variable Min 1st Qu. Median Mean 3rd Qu. Max σ2log10 DOC (mg/l) -0.22 0.08 0.24 0.28 0.43 1.20 0.25WATER (%) 0 0 0.16 0.25 0.28 4.64 0.44EMERGWET (%) 0 0 0.13 0.26 0.35 4.85 0.44WOODYWET (%) 0 0 0.27 1.24 1.15 22.01 3.28FELPERC (%) 0 0 0.31 26.81 55.26 100 36.14MINTEMP (°C) -5.88 -3.06 -2.39 -2.49 -1.4 0.03 1.47
Model fit• Leave-one-out cross validation method and Universal kriging• Overall MSPE = 0.93, R2 = 0.72
• One strongly influential site• R2 without the influential site = 0.66
MSS/MBSS # 22
• East-West trend in model fit• Conservative model fit: tends to underestimate
DOC• 35 MSPE values > 1.5
• These sites have similar covariate values to nearby sites, but considerably different DOC values than nearby sites
MSS/MBSS # 23
Model PredictionsCreate prediction sites • 1st, 2nd, and 3rd order non-tidal stream segments• 3083 prediction sites = downstream node of each GIS stream segment • Downstream node ensures that entire segment is located in same watershed
• More than one prediction location at stream confluences• Covariates for prediction sites represent the conditions upstream from the segment,
not the stream confluence
Calculate distance matrices for model predictions• Include observed and predicted survey sites
Generate predictions and prediction variances• Assign values back to stream segments in GIS• Universal kriging Algorithm
Prediction statisticsSummary Statistics for DOC predictions and prediction variances.Variable Min 1st Qu. Median Mean 3rd Qu. MaxPredictions 0.8 1.5 1.9 2.7 3.0 40.4
Prediction Variances 0.049 0.095 0.122 0.171 0.193 2.597
MSS/MBSS # 24
• 18 prediction values > 15.9 mg/l • Also possessed 18 largest prediction variances • Located in watersheds with large WATER, EMERGWET, or WOODYWET
values• Large covariate values are not represented in the observed covariate data
• Represent 5973.03 kilometers of stream miles
Stream habitat characterization estimated as a percentage of stream miles in DOC (mg/l) during 1996. Thesholds Miles Kilometers PercentDOC < 5 3347.74 5387.67 90.25 ≤ DOC ≤ 8 248.67 400.19 6.7DOC > 8 115.06 185.16 3.1Total 3711.46 5973.03 100
MSS/MBSS # 25
Products• Geostatistical model used to predict segment-scale WQ conditions at unobserved
locations• Map of the study area that shows the likelihood of WQ impairment for each segment
• Can be tied to threshold values or WQ standards• Technical and Regulatory Services Administration within the Maryland Department of
the Environment• Modifying the USGS NHD to include:
• watershed impairments & stream-use designations by NHD segment • Frank Siano, personal communication
• A methodology that illustrates how agencies can accomplish spatial analysis using GIS data, MBSS data, and geostatistics
The Advantages• Additional sampling is not necessary• Compliments existing methodologies
• Derive a regional estimate of stream condition in two ways:• Probability-based inferences about stream miles by stream order• Sum prediction values in miles by stream order
• Identify potentially WQ impaired stream segments• Methodology can be used for regulated constituents as well
• Nitrate, acid neutralizing capacity, pH, and conductivity can be accurately predicted using geostatistical models (Peterson et al., in review2)
• Identify spatial patterns of WQ throughout a large area• Identify areas where additional samples would provide the most information• Model results can be displayed visually
• Allows professionals to communicate results with a wide variety of audiences easily
MSS/MBSS # 26
ReferencesHoeting J.A., Davis R.A., & Merton A.A., Thompson S.E. (in press) Model Selection for
Geostatistical Models. Ecological Applications. http://www.stat.colostate.edu /%7Ejah/papers/index.html
Peterson E.E., Theobald D.M., & Ver Hoef J.M. (in review1) Support for geostatistical modeling on stream networks: Developing valid covariance matrices based on hydrologic distance and stream flow. Freshwater Biology.
Peterson E.E., Merton A.A., Theobald D.M., & Urquhart N.S. (in review2) Patterns of Spatial Autocorrelation in Stream Water Chemistry. Environmental Monitoring.
Theobald D.M., Norman J., Peterson E.E., Ferraz S. (2005) Functional Linkage of Watersheds and Streams (FLoWs) Network-based ArcGIS tools to analyze freshwater ecosystems. Proceedings of the ESRI User Conference 2005. July 26, 2005, San Diego, CA, USA.
AcknowledgementsThe work reported here was developed under STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency to the Space Time Aquatic Resource Modeling and Analysis Program (STARMAP) at Colorado State University. This poster has not been formally reviewed by the EPA. The views expressed here are solely those of the authors. The EPA does not endorse any products or commercial services presented in this poster.