Understanding Dominant Factors for Precipitation in Great Lakes...

22
Understanding Dominant Factors for Precipitation in Great Lakes Region Soumyadeep Chatterjee Dept. of Computer Science & Engineering University of Minnesota, Twin Cities Fifth Workshop on Understanding Climate Change from Data August 4, 2015

Transcript of Understanding Dominant Factors for Precipitation in Great Lakes...

Page 1: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Understanding Dominant Factors for Precipitation in Great Lakes Region

Soumyadeep Chatterjee

Dept. of Computer Science & EngineeringUniversity of Minnesota, Twin Cities

Fifth Workshop on Understanding Climate Change from DataAugust 4, 2015

Page 2: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Today

• Sparse Models for feature selection • Problem: Factors affecting precipitation• Finding Dominant Factors

2

Page 3: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Statistical Dependencies

Key Requirements:• Understanding of statistical dependence• Feature Selection from many possible predictors

– E.g. given by a domain expert, derived variables etc.

Large ScaleEffects Local Predictands

e.g. Ocean Circulationse.g. Station Precipitation[Regression Model]

3

Page 4: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Feature Selection

• Regression in LOW dimensions (p<n)• Challenges

– Overfitting– Dependent covariates– Dependent samples

• Sparse Regression– Select dominant features

• Testing robustness of selected features• Hypothesis generation for dominant factors

n

p

4

Page 5: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

LASSO (Sparse Regression)

5

• Estimation under “structural constraints”, e.g. sparsity• LASSO

: regularization encouraging sparsity, i.e. most elements to be zero

0 0 0 0 0 0 0 0 0 0 0 0 0 0

β1 β2 β3 β4 β5 ………………………….................. βp

Sparse Regression Coefficients

Presenter
Presentation Notes
Implemented in SLEP [2], a MATLAB based interface
Page 6: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Understanding Factors affecting Precipitation over Great Lakes

6

Page 7: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Great Lakes States

7

Page 8: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Response: Precipitation

• Station Data (USHCN*) over 8 states (~200 Stations)• Near lakes (<200km.)

• Seasonal (DJF) Precipitation • Transformed (z-score) to be approximately Gaussian• Years: 1979 – 2011

8

Page 9: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Regions

9

Presenter
Presentation Notes
Consider 3 regions
Page 10: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Seasonal Covariates

ATMOSPHERIC

LARGE-SCALE CLIMATE INDICES

Station Level

RegionalAverages

PREDICTORS

10

Page 11: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Covariates: Atmospheric Variables

Regional Averages:• Regional Minimum Temperature (TRegmin),• Regional Maximum Temperature (TRegmax)• Regional Mean Air Temperature at 500mb (Reg_AIR_500)

Station Level:• Winter Minimum Temperature (Tmin),• Mean Winter Maximum Temperature (Tmax)• Mean Air Temperature at 500mb (AIR_500)

11

Page 12: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Large-Scale Climate Indices*All 12 months as covariates:1. North Atlantic Oscillation (NAO)2. East Atlantic Pattern (EA)3. West Pacific Pattern (WP)4. East Pacific/North Pacific Pattern (EPNP)5. Pacific/North American Pattern (PNA)6. East Atlantic/West Russia Pattern (EAWR)7. Scandinavia Pattern (SCA)8. Tropical/Northern Hemisphere Pattern (TNH)9. Polar/Eurasia Pattern (POL)10. Pacific Transition Pattern (PT)11. Nino 1+212. Nino 313. Nino 3.414. Nino 415. Southern Oscillation Index (SOI)16. Pacific Decadal Oscillation (PDO)17. Northern Pacific Oscillation (NP)18. Tropical/Northern Atlantic Index (TNA)19. Tropical/Southern Atlantic Index (TSA)20. Western Hemisphere Warm Pool (WHWP)

*CPC Data: http://www.cpc.ncep.noaa.gov/products/MD_index.shtml12

Page 13: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

• y = Seasonal (DJF/JJA) Precip at Station in Region• Covariates (X):

– Ocean Indices (12 values for each index)– Seasonal Regional Temperature– Seasonal Station Temperature– TOTAL: 232 features

• LASSO with penalty chosen using validation-set• Permutation test by randomizing Y [Pendse et. al., SDM 2012]

• Dominant feature: – non-zero Lasso weight– Less than 5% chance of having random large value

LASSO with Random Permutation Test

13

STABLE UNSTABLE

Page 14: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Dominant Features

• Predictive Power?• Split Data 70/30• Permutation Test on 70% data

• Obtain Dominant Features

• Leave-1-out CV on 30 with OLS• Climatological Mean• ALL 232 Features (Regional+Ocean Indices+Local)• Only dominant features

14

Page 15: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

DJF (WINTER)PRECIPITATION

15

Page 16: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

16

LocalAtlantic

Pacific

Pacific

AtlanticLocal

Pacific

Atlantic

Presenter
Presentation Notes
** VERY THIN ERROR BARS ** Wind Speed for NorthEast
Page 17: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Stable Sets

“Pruning” withlarger penalty

17

Presenter
Presentation Notes
** Independent Permutation tests for separate Lambda
Page 18: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

DJF (Winter) Precip: MSE

All = ALL 232 featuresDominant = Dominant features ONLY

18

p-value = 0.021

p-value = 0.032

p-value = 0.013

Presenter
Presentation Notes
** Information content in dominant predictors ** Eastern Great Lakes: More variable, additional local/regional factors
Page 19: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Data Subspace

• # of Dominant Features = 17

Cross-Correlation between Dominant and Discarded Features

19

Dominant features Discarded features

Presenter
Presentation Notes
** High Positive+Negative Correlation: Similar Information Content ** Uncorrelated Features: Uninformative + discarded
Page 20: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Summary

• “Stable” Covariates Dominant Features• Hypothesis Testing

– Permutation Test

• Predictive power• Future Directions

1. Non-Gaussianity: GLM’s2. Spatio-Temporal Dependencies3. Understanding Processes: Representation

20

Page 21: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Acknowledgements

Collaborators• Stefan Liess (UMN)• Arindam Banerjee (UMN)• Debasish Das (Verizon)• Auroop Ganguly (NEU)

Funding

21

Page 22: Understanding Dominant Factors for Precipitation in Great Lakes …climatechange.cs.umn.edu/docs/ws15_SoChatterjee.pdf · 2015-08-05 · Understanding Dominant Factors for Precipitation

Thanks

22