The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for...

15
The Performance Impacts of Machine Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning Configurations for Gridded Solar Irradiance Forecasting”, Solar Energy, Under Review. David John Gagne II, NCAR Sue Ellen Haupt, NCAR Amy McGovern, University of Oklahoma John Williams, The Weather Company, an IBM Business Seth Linden, NCAR Doug Nychka, NCAR 1

Transcript of The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for...

Page 1: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

ThePerformanceImpactsofMachineLearningDesignChoicesforGridded

SolarIrradianceForecastingFeaturesworkfrom“EvaluatingStatisticalLearningConfigurationsforGridded

SolarIrradianceForecasting”,SolarEnergy,UnderReview.

DavidJohnGagneII,NCARSueEllenHaupt,NCARAmyMcGovern,UniversityofOklahomaJohnWilliams,TheWeatherCompany,anIBMBusinessSethLinden,NCARDougNychka,NCAR

1

Page 2: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

Motivation:SolarIrradiance

Solarirradiancepredictionsareneededforsiteswithouthistoricaldata(Source:http://www.adventurecats.org/cat-tales/maine-coon-deaf-sailors-ears-sea/)

• Solarelectricitygenerationcontinuestogrowrapidlyanddecreaseincost• Accuratesolarirradiancepredictionsneededbyelectricutilitiestobalancesupplywithexpecteddemand• Solarpowerisbeinggeneratedmoreatsitesthatdonothaveobservationsorhistoricalrecordsofirradiance• Contributions

• DevelopedaGriddedAtmosphericForecastingSystem(GRAFS)forsolarirradiance

• Evaluateddifferentmachinelearningmodelconfigurationsforpredictiveaccuracyatunobservedsitesfordayaheadsolarirradianceforecasts

2

Page 3: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

SolarForecastingIngredients

• Positionofsuninsky• Scatteringbyatmosphere&aerosols• Cloudcovereffects• Precipitation• Non-meteorologicalobstructions

Sun Position

Panel Orientation

Cloud PropertiesHeight

CoverageTransparency

Aerosols and Water Vapor(Turbidity)

Panel Obstructions

Panel Temperature

ShadingPrecipitation

SolarfactorsdiagramfromGagne(2014)

3

Page 4: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

SolarData• NOAAGlobalForecastSystem(GFS)

• Interpolatedto4kmgrid• 3hourlyoutputinterpolatedintimetohourlyoutput

• Variables:Solarirradiance,temperature,cloudcover,sunangles,spatialstatistics

• EvaluationPeriod:June-August2015• OklahomaMesonet (McPhersonetal.2007)

• Sitesrecordsolarirradianceevery5minuteswithaLi-Cor pyranometer

• Hourly-averagedirradianceandclearnessindexcomputedfromrawobservations

• Clearnessindex:ratioofobservedirradiancetotop-of-atmosphereirradiance

4

Page 5: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

MachineLearningConfigurations:Solar

• Mesonet stationsrandomlysplitinto“training”and“testing”sites

• Evaluationperiodsplitintotrainingandtestingdays:every3rd dayusedfortesting

• Models:RandomForest,GradientBoosting,LassoLinearRegression

• MultiSiteTraining• Onemachinelearningmodelfittedwithalltrainingsites’data• Appliedattestingsitesusinginputdatacollocatedwithsite

• SingleSiteTraining• Separatemachinelearningmodelsfittedateachtrainingsite• Predictionsmadeattrainingsitesandinterpolatedtotesting

siteswithCressman interpolation(Cressman 1959)• SimilartoapproachusedbyGriddedMOS(Glahn etal.2009)

NWP Model Output

Oklahoma Mesonet

5-minute solar irradiance

Calculate solar position and clear sky irradiance at

each site and time

Extract input variables at each

site

Calculate clearness index and hourly

means

Calculate neighborhood statistics for each variable

Match model and observation

data

Split sites into training and testing sets

Train machine learning models to predict clearness

index

Training site data

Testing site data

Apply machine learning models at testing sites

5

Page 6: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

GradientBoostingRegression

• Stagewise,additivedecisiontreeensemble• Initialtreepredictsexactvalue,subsequenttreespredictresidualsoftotalpredictionsfromallprevioustrees• Usedbytop4finishersofAMSSolarEnergyPredictionContest

Irradiance>500?

0.1 0.8

Temperature>30?

-0.1 0.3

Dewpoint>2?

0.05 -0.03

+0.1*+0.1*

6

Page 7: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

DetailedConfiguration

• RandomForest• Default:500trees,minsamplessplit10,features=sqrt• ShortTrees:maxdepth3• AllFeatures:features=all

• GradientBoosting• Default:loss=“lad”,500trees,maxdepth5,features=sqrt,learningrate=0.1• LeastSquares:loss=“ls”• BigTrees:minsamplesplit=10• AllFeatures:features=“all”• SlowLearningRate:learnrate=0.01

• LassoLinearRegression• Top16variablesbyF-Score,Alpha=0.5

7

Page 8: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

Solar:GFSClearnessIndexError

8

GradientBoosting:OptimizeswithMAE,TreeDepthof5,SamplessubsetoffeaturesGradientBoostingLeastSquares:UsesMSEinsteadofMAEGradientBoostingAllFeatures:EvaluatesallinputfeaturesGradientBoostingSlowLearningRate:Usesalearningrateof0.01insteadof0.1GradientBoostingBigTrees:AllowstreestogrowtominimizetrainingsamplesineachbranchRandomForest:fullygrowntrees,evaluatessubsetoffeaturesRandomForestAllFeatures:evaluatesallfeaturesRandomForestShortTrees:treedepthof3LinearRegression:Lassowithtop16variablesRawGFS:DownwardshortwaveirradiancePersistence:Interpolatedirradianceattestsitesbasedonobservationsfrom24hoursbefore

Page 9: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

GFSSolarDistributions

9

Page 10: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

GFSForecastDistributions

10

Page 11: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

GFSSolarStationErrors

11

Page 12: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

NextSteps:DeepLearning

• Investigatingtheuseofdeeplearningmodelsforweatherfeatureandregimeidentification• Goal:TrainmodelstorecognizemultiscalefeaturesinNWPoutput• Potentialapplicationforimprovedsolarirradiancemeanandvariabilityforecastsbasedonweatherregime• Manyotherweatherandclimateapplications

12

DeepConvolutionalGenerativeAdversarialNetworkarchitecturefromRadfordetal.(2016)

Page 13: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

GenerativeAdversarialNetworks

13

UnsupervisedmethodoflearningcomplexfeaturerepresentationsfromdataRequires2deepneuralnetworks

Discriminator:determineswhichsamplesarefromthetrainingsetandwhicharenot

Generator:Createssyntheticexamplessimilartotrainingdatatofooldiscriminator

Bothnetworkshavea“battleofwits”eithertothedeathor

untilthediscriminatorisfooledoftenenough

Advantages• Unsupervisedpre-training:learnfeatureswithoutneedingalargelabeleddataset• Dimensionalityreduction:reduceimagetosmallervector• Learnssharper,moredetailedfeatures thanautoencoder models• Donotneedtospecifyacomplexlossfunction

Page 14: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

PreliminaryResults:MeanSeaLevelPressure

14

• Trainedon4096GEFSpressureforecasts• Produces”realistic”pressurefieldsafter

100epochsoftraining

• Generatoruses100-valuevectorasinput• Eachinputadjustdifferentpartsoffield

Page 15: The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for Gridded Solar Irradiance Forecasting Features work from “Evaluating Statistical Learning

Summary• Developedgriddedstatisticalforecastingsystemforsolarirradiance• Evaluateddifferentmachinelearningmodelsandconfigurationsontheirabilitytopredictirradianceatmultiplesites• GradientBoostingconsistentlyshowedlowesterrors• Allmachinelearningmodelsunderestimatedcloudcoverfrequency• MLmodelshadlowererrorsatsiteswithfewerclouds• GenerativeAdversarialNetworksshowpotentialforextractinginformationfromweatherdata

15

Acknowledgements• RichLoft• TomHamill• TheOklahomaMesonet

ContactMe• Email:[email protected]• Twitter:@DJGagneDos• Github:github.com/djgagne