The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for...

Post on 20-Jul-2020

2 views 0 download

Transcript of The Performance Impacts of Machine Learning Design Choices ... · Learning Design Choices for...

ThePerformanceImpactsofMachineLearningDesignChoicesforGridded

SolarIrradianceForecastingFeaturesworkfrom“EvaluatingStatisticalLearningConfigurationsforGridded

SolarIrradianceForecasting”,SolarEnergy,UnderReview.

DavidJohnGagneII,NCARSueEllenHaupt,NCARAmyMcGovern,UniversityofOklahomaJohnWilliams,TheWeatherCompany,anIBMBusinessSethLinden,NCARDougNychka,NCAR

1

Motivation:SolarIrradiance

Solarirradiancepredictionsareneededforsiteswithouthistoricaldata(Source:http://www.adventurecats.org/cat-tales/maine-coon-deaf-sailors-ears-sea/)

• Solarelectricitygenerationcontinuestogrowrapidlyanddecreaseincost• Accuratesolarirradiancepredictionsneededbyelectricutilitiestobalancesupplywithexpecteddemand• Solarpowerisbeinggeneratedmoreatsitesthatdonothaveobservationsorhistoricalrecordsofirradiance• Contributions

• DevelopedaGriddedAtmosphericForecastingSystem(GRAFS)forsolarirradiance

• Evaluateddifferentmachinelearningmodelconfigurationsforpredictiveaccuracyatunobservedsitesfordayaheadsolarirradianceforecasts

2

SolarForecastingIngredients

• Positionofsuninsky• Scatteringbyatmosphere&aerosols• Cloudcovereffects• Precipitation• Non-meteorologicalobstructions

Sun Position

Panel Orientation

Cloud PropertiesHeight

CoverageTransparency

Aerosols and Water Vapor(Turbidity)

Panel Obstructions

Panel Temperature

ShadingPrecipitation

SolarfactorsdiagramfromGagne(2014)

3

SolarData• NOAAGlobalForecastSystem(GFS)

• Interpolatedto4kmgrid• 3hourlyoutputinterpolatedintimetohourlyoutput

• Variables:Solarirradiance,temperature,cloudcover,sunangles,spatialstatistics

• EvaluationPeriod:June-August2015• OklahomaMesonet (McPhersonetal.2007)

• Sitesrecordsolarirradianceevery5minuteswithaLi-Cor pyranometer

• Hourly-averagedirradianceandclearnessindexcomputedfromrawobservations

• Clearnessindex:ratioofobservedirradiancetotop-of-atmosphereirradiance

4

MachineLearningConfigurations:Solar

• Mesonet stationsrandomlysplitinto“training”and“testing”sites

• Evaluationperiodsplitintotrainingandtestingdays:every3rd dayusedfortesting

• Models:RandomForest,GradientBoosting,LassoLinearRegression

• MultiSiteTraining• Onemachinelearningmodelfittedwithalltrainingsites’data• Appliedattestingsitesusinginputdatacollocatedwithsite

• SingleSiteTraining• Separatemachinelearningmodelsfittedateachtrainingsite• Predictionsmadeattrainingsitesandinterpolatedtotesting

siteswithCressman interpolation(Cressman 1959)• SimilartoapproachusedbyGriddedMOS(Glahn etal.2009)

NWP Model Output

Oklahoma Mesonet

5-minute solar irradiance

Calculate solar position and clear sky irradiance at

each site and time

Extract input variables at each

site

Calculate clearness index and hourly

means

Calculate neighborhood statistics for each variable

Match model and observation

data

Split sites into training and testing sets

Train machine learning models to predict clearness

index

Training site data

Testing site data

Apply machine learning models at testing sites

5

GradientBoostingRegression

• Stagewise,additivedecisiontreeensemble• Initialtreepredictsexactvalue,subsequenttreespredictresidualsoftotalpredictionsfromallprevioustrees• Usedbytop4finishersofAMSSolarEnergyPredictionContest

Irradiance>500?

0.1 0.8

Temperature>30?

-0.1 0.3

Dewpoint>2?

0.05 -0.03

+0.1*+0.1*

6

DetailedConfiguration

• RandomForest• Default:500trees,minsamplessplit10,features=sqrt• ShortTrees:maxdepth3• AllFeatures:features=all

• GradientBoosting• Default:loss=“lad”,500trees,maxdepth5,features=sqrt,learningrate=0.1• LeastSquares:loss=“ls”• BigTrees:minsamplesplit=10• AllFeatures:features=“all”• SlowLearningRate:learnrate=0.01

• LassoLinearRegression• Top16variablesbyF-Score,Alpha=0.5

7

Solar:GFSClearnessIndexError

8

GradientBoosting:OptimizeswithMAE,TreeDepthof5,SamplessubsetoffeaturesGradientBoostingLeastSquares:UsesMSEinsteadofMAEGradientBoostingAllFeatures:EvaluatesallinputfeaturesGradientBoostingSlowLearningRate:Usesalearningrateof0.01insteadof0.1GradientBoostingBigTrees:AllowstreestogrowtominimizetrainingsamplesineachbranchRandomForest:fullygrowntrees,evaluatessubsetoffeaturesRandomForestAllFeatures:evaluatesallfeaturesRandomForestShortTrees:treedepthof3LinearRegression:Lassowithtop16variablesRawGFS:DownwardshortwaveirradiancePersistence:Interpolatedirradianceattestsitesbasedonobservationsfrom24hoursbefore

GFSSolarDistributions

9

GFSForecastDistributions

10

GFSSolarStationErrors

11

NextSteps:DeepLearning

• Investigatingtheuseofdeeplearningmodelsforweatherfeatureandregimeidentification• Goal:TrainmodelstorecognizemultiscalefeaturesinNWPoutput• Potentialapplicationforimprovedsolarirradiancemeanandvariabilityforecastsbasedonweatherregime• Manyotherweatherandclimateapplications

12

DeepConvolutionalGenerativeAdversarialNetworkarchitecturefromRadfordetal.(2016)

GenerativeAdversarialNetworks

13

UnsupervisedmethodoflearningcomplexfeaturerepresentationsfromdataRequires2deepneuralnetworks

Discriminator:determineswhichsamplesarefromthetrainingsetandwhicharenot

Generator:Createssyntheticexamplessimilartotrainingdatatofooldiscriminator

Bothnetworkshavea“battleofwits”eithertothedeathor

untilthediscriminatorisfooledoftenenough

Advantages• Unsupervisedpre-training:learnfeatureswithoutneedingalargelabeleddataset• Dimensionalityreduction:reduceimagetosmallervector• Learnssharper,moredetailedfeatures thanautoencoder models• Donotneedtospecifyacomplexlossfunction

PreliminaryResults:MeanSeaLevelPressure

14

• Trainedon4096GEFSpressureforecasts• Produces”realistic”pressurefieldsafter

100epochsoftraining

• Generatoruses100-valuevectorasinput• Eachinputadjustdifferentpartsoffield

Summary• Developedgriddedstatisticalforecastingsystemforsolarirradiance• Evaluateddifferentmachinelearningmodelsandconfigurationsontheirabilitytopredictirradianceatmultiplesites• GradientBoostingconsistentlyshowedlowesterrors• Allmachinelearningmodelsunderestimatedcloudcoverfrequency• MLmodelshadlowererrorsatsiteswithfewerclouds• GenerativeAdversarialNetworksshowpotentialforextractinginformationfromweatherdata

15

Acknowledgements• RichLoft• TomHamill• TheOklahomaMesonet

ContactMe• Email:dgagne@ucar.edu• Twitter:@DJGagneDos• Github:github.com/djgagne