Exploring California Central Valley Groundwater Quality with Python

Post on 09-Jan-2017

108 views 4 download

Transcript of Exploring California Central Valley Groundwater Quality with Python

Walt McNab, Ph.D.

September 2016

Explore portions of California Groundwater Ambient Monitoring and Assessment (GAMA) data set for Central Valley.

Demonstration of how easy it is to explore a moderately large data set and gain insights with python scripting (pandas, scikit-learn, and scipy; see links on last slide to view script).

QGIS used to view results.

Analytes:

◦ Arsenic

◦ Barium

◦ Bicarbonate alkalinity

◦ Boron

◦ Calcium

◦ Chloride

◦ Chromium

◦ Copper

◦ Magnesium

◦ Manganese

◦ Nitrate

◦ Potassium

◦ Sodium

◦ Sulfate

◦ Zinc

Central Valley alluvium boundary

• Download all as text files from GAMA website.• Use pandas to filter and create pivot tables for analytes.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Ayotte et al., 2016. Predicting Arsenic in Drinking Water Wells of the Central Valley, California, Environ. Sci. Tech., 50(14), 7555-7563.

Historic medians provide a useful comparative metric, in spotty data, for the potential for encountering concentrations at a specific value.

Median As in GAMA data set, by quantile, for water supply wells through 2016.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Historic median concentrations scaled across 10 quantiles (yellow = lowest; red = highest) using QGIS.

Identify factors in the multi-parameter data set that contribute to variance.

Plot 1st, 2nd, 3rd, etc. principal component values by location to indicate spatial distribution.

Plot loadings of principal components with respect to individual analytes.

Calculations conducted with scikit-learn package.

Valley interior

Principal Component #1

Southern end of valley

West side of valley

Principal Component #2

Principal Component #3

Loadings by Principal Component (axes, colors)

NO3 and Mn reflect 2nd

principal component (opposite effect)

Boron strongly associated with 1st principal component

Delineate multi-dimensional parameter space into clusters.

Plot locations of clusters to see if cluster characteristics exhibit spatial distributions.

Calculations conducted with scikit-learn package.

Cluster 1

Cluster 5

Cluster 7

Cluster 8

1

5

7

8

0.001

0.01

0.1

1

10

100

1000

AS

MN

SO4

CL

Sodium

MG

CA

NO3Cluster 1: (1) low Mn and As, (2) high NO3

Cluster 5: (1) high concentrations of most ions, (2) high Mn and low NO3

Cluster 8: (1) high Mn and As, (2) low NO3

Cluster 7: relatively elevated Na-Cl-SO4 compared to Ca-Mg-HCO3

Conc (mg/L)

Selected Cluster Centroids

Selected clusters appear differentiated on Piper diagram.

Note that other analytes not considered in Piper diagram (trace metals) are also used to delineate K-means clusters.

Cluster 1

Cluster 5

Cluster 7

Cluster 8

Theil-Sen slope calculated for all analytes in all wells meeting certain criteria (at least 10 sample events since 1980).

Temporal trend set is much smaller than median historic concentration data set.

Calculations conducted with scipy.

Fresno Visalia

Bakersfield

Unlike median concentrations, Theil-Sen slopes do not exhibit regional-scale spatial patterns.

Spatial patterns evident on smaller scales.

Variability at small scale likely reflect local sources (for nitrate, etc.) or local hydrologic constraints.

Warm colors indicate increasing trends while cool color denote decreases.

0.00001

0.0001

0.001

0.01

0.1

1

0.00001 0.0001 0.001 0.01 0.1 1

Mg

Co

nce

ntr

atio

n T

ren

d (

mg

L-1d

ay-1

)

Ca Concentration Trend (mg L-1 day-1)

0.00001

0.0001

0.001

0.01

0.1

1

0.00001 0.0001 0.001 0.01 0.1 1

NO

3C

on

cen

trat

ion

Tre

nd

(m

g L-1

day

-1as

NO

3)

Ca Concentration Trend (mg L-1 day-1)

Positive Temporal Trends Subset

Positive Temporal Trends Subset

Temporal trend relationships are stronger between some analytesthan others.

◦ Different source terms

◦ Evaporative effects

◦ Geochemical effects (e.g., carbonate mineral equilibration for Ca and Mg)

Good correlation.

Poor correlation.

Conclusion: separate mechanisms explain rise in nitrate and rise in salt concentrations.

Loadings by Principal Component (axes, colors) for Temporal Trends

For additional background, see Exploring a Large Groundwater Quality Data Set.

To see the python script used to conduct the analyses, see script.

For questions or comments, please visit https://numericalenvironmental.wordpress.com/contact/