NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big...

23
NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 1 Ad Hoc Big Data Task Force of the NASA Advisory Council Science Committee Meeting Minutes Inaugural Meeting February 16, 2016 NASA Headquarters Glennan Conference Room, 1Q39 _____ ________________________________________________________ Charles P. Holmes, Chair ____________________________________________________________ Erin C. Smith, Executive Secretary

Transcript of NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big...

Page 1: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

1

Ad Hoc Big Data Task Force of the

NASA Advisory Council Science Committee

Meeting Minutes

Inaugural Meeting February 16, 2016

NASA Headquarters Glennan Conference Room, 1Q39

_____________________________________________________________CharlesP.Holmes,Chair

____________________________________________________________ErinC.Smith,ExecutiveSecretary

Page 2: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

2

ReportpreparedbyJoanM.ZimmermannIngenicomm,Inc.

Page 3: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

3

TableofContentsIntroduction 3Charter/ScienceCommitteeandSubcommitteeFeedback 3LegacyfromNACITIC 4Discussion 5HPDBigData 6ScienceCommitteeGreetings 8BigDataandEarthScience 9SupercomputingandBigData 10APDandBigData 11Publiccomment 13OtherFederalBigDataInitiatives 13PlanetaryScienceBigData 14Discussion/wrap-up 15 AppendixA-AttendeesAppendixB-MembershiprosterAppendixC-PresentationsAppendixD-Agenda

Page 4: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

4

IntroductionDr.ErinSmith,ExecutiveSecretaryoftheNASAAdvisoryCouncil(NAC)AdHocBigDataTaskForce(BDTF),calledthemembershiptoorderandmadesomeadministrativeannouncements.Dr.CharlesHolmes,ChairoftheBDTF,openedtheinauguralmeetingoftheBDTF.Introductionsweremadearoundthetable.Charter/SubcommitteeFeedbackDr.SmithpresentedanoverviewoftheTaskForce,whichwascreatedinresponsetoanumberofWhiteHousedirectivesontheBigDataconcept,whichrelatedtothepurviewsofNASA’sHeliophysicsandEarthSciencesdivisions(HPDandPSD),whichengageinthestudyofsolaractivityandsolarstorms,andweatherforecasting.Theadministrationalsoexpressedagreatdealofinterestintheinteroperabilityofdatasets,andrelatedusesofBigData.Successfulapplicationsofscienceintheseareaswillrequirethebreakdownofsubdisciplinestovepipes,andtheinteroperabilityofNASAdatasetswiththoseoftheNationalOceanicandAtmosphericAdministration(NOAA)andtheUSGeologicalSurvey(USGS),makingdataavailabletonumerousenduserssuchasemergencyresponseanddisasterreliefagencies.BigDatamayalsoenabletheidentificationofactionablescienceinformation,makingdatausefulforunforeseenapplications.BigDataalsomeansdifferentthingstodifferentusers,andforspecificdata-handlingtools,dataformats,andthecreationofdatastandards.ApplicationsvaryfortheAstrophysics(supernovamodels),Planetary(identifyingexoplanets,galaxyformation),andHeliophysicsdivisions(onetarget/manymissions,coronalmassejections,radiationenvironmentforhumanexploration).NASA’sEarthScienceDivisionhasbeenmanagingandexploitingBigDataformanyyearsincreatingclimatemodels,andforsocietalapplicationssuchasdroughtforecastinganddisasterresponse.ManyNASAspacebornemeasurementsarecurrentlybeingusedtoimproveairqualitydecisionsupportsystemsinTexas,andinproducingaccuratecloudformationmodels.HPDdataandengineeringdataarebeingfedintoanIntegratedRadiationProtectionSystem,tohelpdeterminehowtogettoacceptableriskfiguresforradiationexposureinhumanexploration.Thetermsofreference(TOR)fortheBDTFformabroadcharter,whichcanbedescribedasexaminingwhatthecommunityasawholeisdoinginBigData,aswellaswhatotheragenciesaredoing,andidentifyingwhatcanbedonebetter.TheintentistocataloguebestpracticesinNASAandotherfederalagencies,aswellasinprivateindustry,researchinstitutions,andacademia.Oneofthefinalproductsmaybeawhitepaperreportingoutfindingsandrecommendations.AmajorchallengefortheTaskForcewillbetodefinewhattheterm‘bigdata’meanstothevariouscommunities;toanastronomeritisanarchiveissue.ToHPDandESD,itisinteroperabilityissuesandengineering.Otherchallengeswillbetodeterminethemostusefulandefficientarchitectures,storagemodes,dataaccessibility,datarates,datasecurity,andintellectualpropertyrequirements.Howdowecommunicatewhatdatasetsaresaying,andhowdowetrainpeopleinuseofdatasets?Itisadynamicarea.Todate,theBDTFhas

Page 5: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

5

completeditsethicstrainingandisintheprocessofsigningonitslasttwomemberstoroundoutthecommittee.TheNACScienceCommitteehasprovidedfeedbacktotheBDTF,namelytoacquiremorerepresentationfromcommercialentitiesandothernon-NASAsciences,aswellastoconsiderground-basedsciencesthatmayhaveproducedscientificdata;Feedbackwasalsotolookatdatavisualization;datapermanence;anddatausage.TheScienceCommitteehasaskedthattheBDTFactasago-betweenforcommunity,andtofindlinksandleveragepointswithexistingeffortsonbigdata.TheScienceCommitteealsorecommendedthatBDTFinvitepeoplefromtheNASAarchives,NASAAmesResearchCenter,simulationexperts,modelers,andindustrypartners.Withindisciplines,practitionersshouldbeabletounderstandthemselveswithintheirsubfields,andtoallowforcross-pollinationbetweensubfields.TheBDTFhasalsobeenaskedtofindthebestwaytogatherfeedbacksothattheScienceCommitteeanditssubcommitteescanbenefitfromthiseffort(surveytoindustrymembers,townhalls,e.g.).TheNACSciencesubcommitteeswouldliketheBDTFtoaddressdatausability,managementandaccess,utilization(includingreal-time),analysisanddataminingoflargedatasets,algorithmandstatisticsdevelopment,datacuration,archivingtoolsandtechnology,visualization(suchashyperwall),andusingstateoftheartinformationtechnology(IT)systemsandtools.Otherquestionstoaddress:Whatopportunitiesarethereinbigdata?Whichsubjectmatterexperts(SMEs)shouldbeconsulted?Whatkindofproductsaredesirable?Dr.Holmesnotedthatgiventheextensiveshoppinglist,hewishedtodeviseaworkplantousethelimitedtimeavailable,inordertodistilltheTaskForceoutputintosomethingvaluable.Astotheterm“interoperability,”hechallengedDr.Smithtofine-tunethisdefinition,asitisawide-opentopic.Hebelievedthatinnovationcomesfromthebottomup,andworriedthat“interoperable”raisessomeredflagsforthecreationoftop-downmanagement.Dr.ClaytonTinoworriedabout“needsforfutureuse,”whichwouldrequireafundamentalunderstandingofdataformats;itisnearlyanon-solvableproblemtomakedataunderstandabletoallcommunities.Dr.JamesKintercommentedthatinteroperabilitytendstobecomeacatchallphraseforsimulationandmodeling,bestpractices,andinteroperabilitybetweendisciplinescientists(includingmetadataanddocumentation).Dr.RetaBeebenotedthat“datamining”connotessomethingmagicalandisamajorquestion.Externally,peoplethinkthatdataminingismagicallydone.Datasetsaresodifferent,particularlyinPlanetaryScience,thatdataminingbecomesamajorproblem.Dr.Holmesreiteratedhisbeliefinthebottoms-upapproach,andtoallowsuccessesfromthisapproachtoreplicatethroughotherscientificareas.LegacyfromNACITInfrastructureCommitteeDr.HolmesgaveanoverviewoftheBDTF’shistory,havingservedasvicechairoftheNACInformationTechnologyInfrastructureCommittee(ITIC),whichstoodfrom2010-2013.ItsmainaffiliationwaswiththeNASAChiefInformationOfficer(CIO),butithadtiesacrossNASAaswell,inareassuchascybersecurity.TheNACrecommendedthatboththeITICandtheScienceCommitteeexploreanapproachtoimproveaccessto

Page 6: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

6

NASAsciencedatarepositories,withthatexplorationtoincludebestpractices,etc.,thathavebeentranslatedtothepresentTORfortheBDTF.InFall2013,theNACadvisorycommitteestructurewasrevamped,cybersecuritywasputundertheaegisofanewcommittee,andtheworkoftheformerITICnowcontinueswiththecurrentBigDataTaskForce,reportingtotheScienceCommittee.OneofthefirstrecommendationsoftheformerITICwasthatNASAshouldtakeadvantageofassetsintheFederalgovernment,suchasGPUclusters,cloudcomputingundertheNationalScienceFoundation(NSF),andothersponsorship.ITICalsorecommendedthatNASAimprovethecyberinfrastructurethatsupportsAgencyscience.OneofthefindingsoftheITICnotesthatNASAsciencedatadoesnotsitinoneplacebutisdistributedacrossNASAcenters,atUSGS,industry,anduniversities.NASAdatacentersarediscipline-focused,andaremanagedinthisway.Thenumberofsciencepublicationscomingoutofthesecentersisgrowingdramatically.EducationandPublicOutreachcontinuestotapintothesedatastores,sometimesdirectly,andsometimesthroughagroupthatprocessesitforthegeneralpublic.TheDepartmentofEnergy(DOE)hassetupabackbonethroughoutthecountrywithmanynodesnotfarfromtheNASAcenters;itwouldbegoodtoleveragethispipeline,aswellasa10-Gpsnetworkresearchthatlinksresearchinnovationlaboratories.UseofNASAsupercomputersatbothGoddardSpaceFlightResearchCenter(GSFC)andAmesResearchCenter(ARC)isgrowing.TheEarthObservingSystemDataandInformationSystem(EOS-DIS)isalsogrowinginitsdataproductdistribution.Webservicestosupportdisasterapplications,suchastheShort-termPredictionResearchandTransition(SPoRT)CenteratMarshall,aretransitioningresearchdatatotheoperationalweathercommunity.TheSolarDynamicsObservatory(SDO)isrevolutionizingthewayweunderstandthesun,andiscollectingroughlyapetabyteofdataperyear,with5petabytesperyearworthofprocessing.Therehasbeenatwo-order-of-magnitudejumpinwhatsolarphysicshadbeeningestingpreviouslyfromoldermissionssuchasHinode.NASA’sMultimissionArchiveatSpaceTelescope(MAST)isshowingalmostexponentialgrowth,andwhichwillgrowevenmorewhenfuturetelescopemissionscomeon-line.Thereare200-plusappsintheAppleiStorethatwillreturnfromasearchonNASA;manyoftheseappsareinhighdemandfromthepublic,andpullprocessedresultsoutofNASA’sdatastores.Morethan250,000peoplehavetakenpartinNASA’sGalaxyZooprogram.In2012,theOfficeofScienceandTechnologyPolicy(OSTP)sentoutamemotothepublicannouncingaBigDataInitiative,earmarking$200Mtobespentonimprovingaccesstothegovernment’sbigdatastores.In2013,thereweremorememosandExecutiveOrderscomingoutonthisissue,butNASAwasmissingfromthelistofrecipients(DOE,DepartmentofDefense,andothers);soitmustbeasked-wheredidNASAmisstheboat?Dr.HolmesnotedanITICfindinginNovember2012,thatNASAacquirefiber-opticpathwaystosupportcurrentandfuturedata,andarecommendationthattheybuyratherthanownthesepathways.DiscussionThecommitteediscussedadraftworkplantodeterminehowtheBDTFwouldmoveforward.Dr.HolmesfeltthattheBDTFshouldn’taddresstheareasofdatasearchability

Page 7: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

7

andavailability,proprietaryperiods,long-termarchiving,andotherfrequentrequeststhataremadeofNASA’sdatastores,feelingthatprocessesarealreadyinplaceforthisatNASA.TheBDTFshouldbreaknewgroundinstead,andshouldsurveythecommunity,choose3to4topics,andproduceproducts.TheBDTFshouldformaconciseproblemstatement,research,organizeanddeveloppositions,formaconsensus,anddraftandpresentresultsinawhitepaper(4-6pp)accompaniedbyaslidepresentation.BecausetheBDTFexpiresinDecember2017,thereareonly4-5moreface-to-facemeetingsinadvanceofeachofthefutureScienceCommitteemeetingsinwhichtodevelopfindingsandrecommendationstotaketotheScienceCommittee.Tothisend,theTaskForceshouldalsoholdteleconferencesasappropriate.Dr.HolmesreviewedhisdutiesasChairasprimarilybeingtherepresentativetotheScienceCommittee,andclosedwiththethought:“Dogood,workhard,NASAneedsus.”Dr.RayWalkeragreedthatdataavailability/searchabilitydidnotrequireahardlook,butnotedthatasdatavolumesgetlarger,itwillbenecessarytofigureoutthepieceswewanttouse;inthissensetheissueisstillimportanttoconsider.Dr.HolmesinvitedDr.WalkertowriteupanactionablerecommendationontheissueandsendittoDr.Smith.Dr.Tinocommentedthattherearemodel-level,internal,andexternalusedomains;whatisitthatareweactuallytryingtodo?Heagreedtowriteupanitemonthisquestion.Dr.Kintersaidthatitseemsthatbydefinition,BigDatameansthebiggestandbaddestdatasets;inthatrespect,wetypicallyweseeaccessibilityasawaytoaggregateandanalyzedatafromanentiredataset(petabytes);veryfewuserswillhavetheresourcestooperatedatasetsofsuchmagnitude.TheTaskForceshouldalsothinkaboutfacilitatingtheanalysisofdatasetsthataretoobigtomoveandtoobigtoanalyzein-situ.Dr.Holmesagreedtorevisetheworkplanwiththeadditionsofthewrittencontributions,andtolookatareasthatcanbeextendedbeyondthestateofwork;theBDTFneedstolookatbenchmarksregardingthisissue.HPDBigDataDr.JeffreyHayespresentedareasofconcernfortheHeliophysicsDivision(HPD)intermsofBigDataneeds.HPDstudiesthesun’svariance,theresponseofgeospace,andtheSun-Earthsystem’simpactsonhumanity.Todothis,HPDengagesinthescienceofspaceweather,triestounderstandtheinterconnectionsbetweentheSunandEarth,anddevelopsknowledgetoimprovethepredictionofextremeeventssuchasmajorcoronalmassejections(CMEs).Themissionportfolioincludesaresearchandanalysis(R&A)line,anExplorersmissionline,alongwithLivingwithaStar,SolarTerrestrialProbes,andthesoundingrocketsprogram.MissioninvestmentisguidedbytheDecadalSurveysandNASA’sadvisorybodies.TheHPSystemObservatoryincludesnumeroussatellitessuchasIRIS,Wind,STEREO,theVanAllenprobes,andtheInterstellarBoundaryExplorer(IBEX).Withinthecurrentmissionsandtheoperationsbudgets,thereisacertainamountoffundingfordataarchiving,andthecreationofstandardsandaccessibility.Dr.Hayesfeltthatmostmissionswereabletorespondquicklytodecisionsondataarchivingandcuration.SeniorReviewsaddressthescientificmeritsofHPDmissionseverytwoyears,andtakeintoaccounttheaccessibility,usabilityandutilityofdata(includingarchivingafterthemissioniscomplete).Asaresult,thedatapipelineisdoingverywell.

Page 8: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

8

About70-80%ofHPDdatacomefromextendedmissionphases.Thesunvariesinaroughly22-yearcycle;alloftheseHPDmissionsoperatingsimultaneouslyarebeginningtoenabletheunderstandingofaverycomplexsystem.TheaveragecostofaHeliophysicssatelliteoperationis$2.9Mannually.TheSolarDataAnalysisCenter(SDAC)andSpacePhysicsDataFacility(SPDF)aretheactivearchivesforHPDandrunatabout$3.3Mperyear.ThereisalsoaROSESelementamountingtoabout$1Mayear.Thus,thetotaltocuratethedataisabout$4.5Mperyear,plussomemoneyinthemissionlinesthemselves.Dr.Hayesnotedthat“Scientistswantallthedataallthetime,forever.”Intheearly2000s,theDecadalSurveycameoutwithapriorityforaVirtualObservatory,inwhichtheideawastocollectallthedata(bothAstrophysicsandHeliophysics)andmakeituniversallyaccessiblethroughcommonstandards.Atthetime,Astrophysicshadonestandard,andHeliophysicshadmultiplestandards.Overthelast20years,NASAhasbeentryingtogetthesestandardsinline,andDr.Hayesfeltthatgoodprogresswasoccurringinthisarea.Heliophysicshasanexplicitpolicythatestablishedstandards,whichareFITS,CDF,andNetCDF.NASAisinamuchbetterplacethanitwas10yearsagointermsofstandardization.HPDhasalsorestoredalargefractionofdatafromitsoldermissions,andhasbeensystematicallyexaminingoldarchivesandrestoringdataarchivesanddatasetsofscientificinterest.Foranymetadata,itisnecessarytogeteveryonetoagreeonkeywords.HPDhasgottengoodbuy-in,anduserscannowusetheSpacePhysicsArchiveSearchandExtract(SPASE)metadatawrapperstodoaninventory,searchbydateorevent,etc.,tohelpdosystemscience.Theprocesshasgottenalotbetter,andappearstobegoingfaster.HPD’sthreemostrecentmissionsaresuccessfullyusingtheSPASEmetadatawrappers.ThefirstdatafromMagnetosphericMultiscale(MMS),forexample,willbeavailableonSPDFonMarch1.HPDisstartingtogetterabytesofdata-thisisanewexperience.Thereare800TBfromSDOtodate,andthevolumeisgrowing.HPDisnowlookingatstoring1PBintheSDAC;thisdatavolumewillprobablytripleorquadrupleasfuturemissionscomeonline.StanfordUniversitywillnotalwayssupportSDAC;atsomepointthedatawillhavetobroughtbacktoNASA.Dr.Hayesfeltthatputtingdataonthecloudwasstillaniffyprospect,andcitedarecentaccidentaldeletionofstoreddataasoneofitspotentialdrawbacks.Solarprojectdatavolumegrowth,intermsofbothlifetimedatavolumeanddatarate,willcontinuetogrow.Thequestioniswhereandwhowillstoreit,andhowwillitbemovedaround?HPDcan’tthrowdataawaybecauseHeliophysicsscienceneedsthecontext.Datapolicyisworkingwell.HPDhasaregistryandinventoryofthedata,andisconstantlyupdating.Legacydatasetshaveprettymuchcompletedtheirextractions.NowHPDisconcentratingonstandards.AfuturechallengeishowtousetheSPASEmetadata,howtousethedata,andhowtomakeitaccessibletothenon-expertuser.Remotesensingvs.in-situmeasurementsareverydifferentandthesedifferencesmustbetakenintoaccount.Formodeling,howdowearchiveuseful,powerfulcomparisons?Atthispoint,modelsdonothaveastandard;weareworkingtowardit.Aswemoveawayfrom

Page 9: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

9

theVirtualObservatoryconcepttoamoreconsolidatedwayofgettingdataout,wemustfocusonmetadataandlinkstogenericaccessmethods,andavoidstovepiping.Theinterdisciplinaryaspectsofdatawillbeaddressedbyalargergroup.Dr.HayesnotedthattheVirtualObservatoryconceptdidnotfail,butthetechnologyhassincemovedon.Dr.HolmesaskedDr.HayestoidentifyHPDneedsfromtheBDTFstandpoint.Dr.Hayesrepliedthatoneusefulfindingacknowledgingthevalueofstandards.Theotherissueofconcernforhimwastheunfundedmandateaboutkeepingversionsofdatainperpetuity.ThereisaNASApolicyinresponsetotheOSTPaboutpublicaccessibilityandpublications,howevertheworrisomeissueiswhetherthereferencedatainapaperhascertainpedigreethatmayormaynotbepreservedinthearchive.Whoownsthefinaldata?Whichversionofthesoftware?Thereisneverenoughdiskspace.Anotherusefulfindingwouldbeastatementthathavingdataactive,on-line,isagoodthing.Data,especiallytaxpayer-fundeddata,shouldn’tbeburiedinsomeone’sdeskdrawer.NASAtendstogetpushbackfromprincipalinvestigatorsonthisissue-theyfeeltheirdataisproprietary.Dr.HayesagreedtowriteupanitemforDr.Smith.Dr.Kintercommentedthatthereisnodatastandardformodels,andthatthisisachallengeforthefuture;hewonderedhowmuchinteractionthereisbetweentheHeliophysicscommunityandthetroposphericandweathercommunities.Dr.Hayesfelttherewasnotmuchinteraction,certainlynotatthetroposphericlevel.Therearemeetingsongoing,however,andHPDwouldbeopentoanythingtheothercommunitieshavethatcanbeused.Thevariablesmaybedifferent,butitissomethingthatcouldbeexplored.Dr.WalkermentionedthattheNationalScienceFoundation(NSF)islookingintodataassimilation.Dr.HolmesnotedthatthecommunityhadlookedatcompatibilitybetweenEarthScienceandHeliophyicsdatatenyearsago,andstoppedbecauseofdatasparseness.Dr.NealHurlburtagreedthattheeffortwasstillatthecasestudy-level.IRISisagoodexampleofwherewewereforcedtousemodels.Dr.Kinternotedthattherearealsooceandataassimilationsthathaveasimilarproblemwithdatasparseness.Thetroposphericproblemhasmovedwellduringthelastdecade,andcanaccommodatedatasparsenessalittlebetter.GSFChassomeexpertisehere.Dr.HolmesaskedDr.KinterprovidePOCsatGoddard.Dr.WalkermentionedthatthePlanetaryDataSystem(PDS)hasbegunastudyofarchivingmodels,aswellastheCommunityCoordinatedModelingCenter(CCMC),andEuropeanworkinbothHeliophysicsandPlanetaryattheUniversityofParis;thesecanprovideusefulLessonsLearned.ScienceCommitteeGreetingsScienceCommitteeChair,Dr.BradleyPeterson,addressedthecommittee,thankingmembersfortheirimportantcontributions.Henotedthattimewasapressingissue,andurgedtheBDTFtofocusonfindingcommonalitiesandbestpracticesacrossthesubdisciplines,andbuildingontheexistinginfrastructureonlyifitisuseful.HeaskedthemembershiptoregardtheNASAbudgetisazero-sumgame,asNASAwillbuyintorecommendationsonlyiftheyareaffordable,orwhethertheyareworthgivingupsomethingfor.Eatingintothebudgetformissionsandresearchwouldbeanundesirableoutcome.Dr.PetersonsuggestedthattheBDTFconsultwithsubcommittee

Page 10: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

10

chairswhenuseful,inordertoiterateideasacrosstheScienceCommittee,subcommittees,andBDTF.BigDataandEarthScienceDr.KevinMurphypresentedanoverviewoftheEarthScienceDataSystemsprogram,andstatedthatregardlessofvaryingdefinitionsofbigdata,EarthSciencehasit,aswellasalargeuserbase.Objective2.2ofthe2014NASAStrategicPlaninformstheusageofEarthSciencedatatoformaviewofEarththatcanbeusedacrossdisciplines:ocean,atmosphere,cryosphere,etc.andtheirinteractions.TheEarthObservingSystemDataandInformationSystem(EOSDIS)isthelargestcomponentoftheEarthSciencedatasystem,andisassociatedwiththecompetitivelyselectedprograms,MakingEarthSystemdatarecordsforUseinResearchEnvironments(MEaSUREs)andAdvancingCollaborativeConnectionsforEarthSystemScience(ACCESS).EOSDISworksinternationallyandamongthefederalagenciestogetdatatothepublic,andprocessesdatafromlevel0tohigherproductstomakeavailabletousers.EOSDISwasinitiatedin1990,incorporatingheritagedatasetsin1994fromsatellites,aircraftandin-situsensors(e.g.fluxtowers),andwasdesignedtohandleaterabyteofdataperday.EOSDISreprocessesdataquiteoftenasinstrumentsdeteriorateorasbettersignalprocessingmethodsbecomeavailable.Thereareabout15petabytes(PB)ofdatacurrentlyavailable,allofwhichinteroperatewithotheragenciesandarchivesthroughestablishedstandards.EOSDIShasadistributedframework,andhashadanopendatapolicysince1997.Thesystemgeneratesbiophysicalproductsandgeolocatesthem,anddistributestotheendusers.EOSDIShasanextensivevolumeofdatarepresentedinover9200datatypes,whichrangeoverhumandimensions,land,atmosphere,oceandynamicsandthecryosphere.Thesystemworkscloselywithmissionsinformulationanddevelopmentinordertopreparedataplans.EOSDISisspreadoutovertheUS.MissiondataareprocessedbyScienceInvestigator-ledProcessingSystem(SIPS),whicharethenpassedalongtotheDistributedActiveArchiveCenters(DAACs)tosupporttheuserbase.DAACsarelocatedathostorganizationsthatarewidelyrecognizedbythecommunity,andeachDAAChasaworkinggroupthathelptodirecthowtheDAACswork.ThereisalsoaProgramScientistwithineachDAACthatroughlyalignswitheachsubdiscipline.ThetwocomponentsoverseeingtheDAACsareprimarilyHeadquartersformanagementandtheGoddardSpaceFlightCenter(GSFC)forimplementation.TheEarthScienceDataandInformationSystem(ESDIS)managesthecoordinationofEOSDISactivitiestoavoidduplicationofefforts.ESDISholdsannualmeetingsandcontinuallytakesinputthroughweeklyteleconferencesandannualmeetingswithDAACsmanagersandDAACsystemsengineers.Roughly160-180peoplegototheannualmeetings.TheEOSDISinfrastructurealsotiestogetherusersandDAACsthroughearthdata.nasa.gov,acommonmetadatarepository(CMR),GlobalImageryBrowseServices(GIBS),EOSDISMetricsSystem(EMS),andvarioususersupporttools.EOSDISperformsanannualcustomersatisfactionsurvey,andalsohasDAACUserWorking

Page 11: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

11

Groups,whichreceiveregularfeedback.EOSDISmetricsfrom2015show9462uniquedataproducts,and2.6MdistinctusersofEOSDISdataandservices.EOSDISdistributesabouttwiceasmuchdataasitingests.In2015,thesystemreceivedanACSIscoreof77(consideredverygood).Thetrendforproductdeliveryisincreasing.EOSDISconvertshigh-valueproductsintoimagery,suchastheNASAWorldviewwebsite,whichusesdatafromtheAqua/Terra/ModerateResolutionImagingSpectroradiometer(MODIS)satellites,andNOAA’sVisibleInfraredImagingRadiometerSuite(VIIRS).WorldviewworksmuchlikeGoogleEarth;userscanzoominandgobackintime.Userscanalsooverlaydata,suchastheSO2cloudoveraneruptingvolcano,andfindspecificdatasuchasfirehotspots.EOSDISholdsSeniorReviewstoevaluatethevarioussubsystemstoevaluateperformanceandscientificmerit.Dr.Walkernotedthemanyhighlyderiveddataproducts,andaskedhowEOSDISkeptupwithevolvingalgorithms.Dr.Murphyexplainedthatstandardproductsareproducedincollections,andEOSDISiscurrentlygoingfromMODIScollection5tocollection6,reprocessingdata.Collection5willbemaintaineduntilcollection6iscomplete.Scienceteamswilldeterminewhenthenewcollectionisdone.Dr.HolmesaskedwhattheBDTFcouldforEarthScience.Dr.MurphyfeltthatNASAreceivedlittlerecognitionforthisimportantwork,asitisgenerallynotwellunderstood.Thedataproductrampiscurrentlylimitedbyadaptingtoinputfromnewinstruments.EOSDIShastoputalgorithmsclosertothedatainawaythatallowsunimpededaccesstoproducts;howtodothisisstillanopenquestion.NASAalsoneedstolearnhowtoworkwithcommercialhigh-performancecomputinggroups,maybe.Dr.Hurlburtaskedhowmanyofthe2.9Mdistinctuserswerepartoftheactive(science)community.Dr.Murphyrepliedthatpeoplewhousealotofthedatawillfrequentlyuseallofit(operationaluserswhouseLevel1data).Thenumbersofgraduatestudents,etc.,arehardtoestimate.Dr.KinteraskedhowESODISdealtwiththebudgetrealities.Dr.MurphynotedthatEOSDISrecognizestheneedtodeveloporadoptstandardized-enoughcomponentstoallowpeopletodeveloptheirowntools,astrategythatsavesbothtimeandeffort.NASAdoesn’twanttobethefirstadopterorthelast.Thestrategydependsonthecommunity.EOSDISkeepstheprincipleofopenapplicationprogramminginterfaces(APIs),andopenaccess.Thecommunityiswellawareofthedatapolicy.Dr.WalkeraskedabouttheextentofwhichNASAprovidesinteroperabilityinitsjointworkwithNOAA.Dr.MurphyexplainedthatNASAoperateswithNOAAonacataloguelevel,usesopensoftwaresourcing,sharesobservations,andworkscloselywithNOAAontheClimateInitiativeandintheairborneprogram.SupercomputingBigDataDr.TsengdarLee,ProgramManageroftheEarthScienceDivisionSupercomputingProgram,presentedanoverviewoftheprogram,andtheNASAvisionforfuturecomputingservices.NASAhastwosupercomputingcenters,oneatAmesResearchCenter(ARC),whichservestheentireagency)andoneatGSFC,whichservesprimarilyEarthScience.ARCsupportsagency-wideactivities,fromlaunchvehiclestogeneralrelativity.

Page 12: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

12

InAugust2015,theNASAFlagshipcomputer,Pleiades,reachedahalfbillionSBUs(computingcycles)deliveredaccumulativelyfrom2008,translatingtonearly$300Mofservices,atacostofroughly26centsperSBUin2015.NASAcontinuestogrowthesystem,relyingonMoore’slawtogoforward(Dr.Leenotingthatsomearguethatthelawhascometoitsend).Scientificandengineeringeffortswillgrow,thusNASAwillhavetocomeupwithauserpolicybecausethesystemhasbecomeoversubscribed.TheROSESselectionprocessisnowbeingtightlycoupledtotheavailabilityofcomputingtime.ForEarthScienceimagingandmodeling,thesystemcanpushtheresolutiondownto1.5kmcurrently;theholygrailofatmosphericscienceis0.5km.Theworkloadischanging,shiftingintodataprocessing.Asanexample,theKeplermissionisusingPleiadestosupportvalidationfornewexoplanets.Thishasbecometheprimaryavenueforproducingdiscoveriesinthatarea.Dataassimilationsystemsarebeingusedtocreatephysicallyconsistentlong-termdatasets,from1979tothepresent,andarealsodownscalingtohigherresolutiondataforclimatestudies.TheOrbitingCarbonObservatory(OCO-2)ispresentingdataprocessingchallenges.NASAisdoingadatare-processingcampaignwithnewalgorithms,withabout60%ofthisworkbeingdoneonthesupercomputerand40%ontheAmazoncloud.HighEndCapabilityComputing(HECC)isbeingusedtoclear5yearsofanunmannedaerialvehiclesyntheticapertureradar(UAVSAR)dataprocessingbacklog,toreducelatency.Processingismovingintothebigdataarea,pitchinghigh-performancecomputingagainstLargeScaleInternet.Canhigh-performancecomputing(HPC)beusedasaprivatecloud?Howdoweputtogetheranarchitecturetoprocess,analyzeandminedata?Currently,datastorageanddatamanagementisthecoreofthebusiness,withdatainthemiddle,andalltheserviceandprocessingsurroundingthedataset.AScienceCloudarchitectureideallyprovidesanagile,highlevelofsupport,withthesystemowningthedata,usingadatamanagementsystem,dataanalyticsservice,openstack,etc.NASAisconstantlylookingatnewtechnologies:cloudandvirtualization,high-performanceobjectstore,andSciDB(thelatterheavilysupportedbyDARPA).Thesciencebenefitofasciencecloudhashelpedtovalidatemanytypesofmeasurements,suchasglobalfires.CouplingHPCandcloudcomputingcancreateabest-of-breedcomputingserviceenvironment.HECC’spathtogrowthisconstrainedatpresent;NASAhasmaxedouttheinfrastructureintermsoffacilities,building,water,andelectricity,andisengagedinastudyonhowtobuildnext-generationdatacenters.Drs.Holmes,Walker,andHurlburtexpressedconcernsaboutuserconstraints,giventhat70-80%oftheprogram’sworkloadrequiresatightlycoupledprocess.Dr.LeeagreedtowriteastatementonthisstateofbeingforusebytheBDTF.Headdedthatcertaintypesofworkloadscouldbecloud-computed,andNASAisexploringthoseoptionsaswell.Dr.ClaytonTinoaskedifDr.Leehadanysenseofthecapacitytheprogramwaslosingduetomixedmodeservices.Dr.LeerepliedthatNASAwasdoingthemixedworkloadbecauseofthedemand.Someoftheprojectsdidn’tplanfortheirHPCuse,andneedtodoabetterjobofsuchplanninginthefuture.AstrophysicsandBigDataDr.PaulHertz,DirectoroftheAstrophysicsDivision(APD)presentedBigDataneedsasviewedbytheAstrophysicscommunity.Astrophysicsaddressestheevolutionofthe

Page 13: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

13

universe,theoriginofgalaxiesandstarsandthequestionofwhetherwearealoneintheuniverse.TheAPDisdrivenbytheDecadalSurveys,scienceroadmaps,andimplementationplanstosupportitsabilitytohandlelargedataquestions.Sixtypercentofthebudgetsupportsdevelopingspacemissions,20%operations,another5-10%isdedicatedtoresearchanddevelopment.Dataarchivesarefundedasaninfrastructureinvestment.APD’scurrentsuiteofmissionsrunfrommanysmallmissionssuchasNeutronstarInteriorCompositionExplorer(NICER),tothelargespacetelescopes,HubbleandthefutureJamesWebbSpaceTelescope(JWST).ThenextlargeflagshipafterJWSTisWide-FieldInfraredSurveyTelescope(WFIRST),whoseprimescienceistounderstanddarkenergyanddarkmatter,whichcanonlybedonebymeasuringthesmallimpacttheseforceshavehadinthehistoryoftheuniverse,bylookingatlargeswathsofuniverse;i.e.lookingatlargeamountsofdatatoseesmallperturbations.ThusWFIRSTwillbecomputationallyintensive.WFIRSTwillbelookingatmillionsofgalaxies,searchingforevidenceofmicrolensing,whichisalsocomputationallyintensive.Euclid,aEuropeanmissionwithsimilaritiestoWFIRST,willalsocreatelargedatasets.Anotherfutureground-basedobservatoryistheLargeSynopticSurveyTelescope(LSST).Allthreeoftheseprojectswillbecombiningtheirdatainpixel-by-pixelanalysis.Thevariousagenciesarestudyingthebestwayofcarryingoutthisdataprocessing,adecadeinadvanceoftheneed.Awhitepaperonthistopiccanbefoundat[[arxiv.org/abs/1501.07897]];Jainetal;TheWholeisGreaterThantheSumoftheParts.AllNASAAstrophysicssciencedataareopentothecommunity,andalldatacentersgothroughtheSeniorReviewprocesseverytwoyears.Allastrophysicsarchivesshareasetofcommonprotocolsandstandards,allowingtheusercommunitytocombinedatafrommultiplegroundandspaceobservatories.TheNASAAstrophysicsVirtualObservatory(NAVO)managestheprotocols,whileNSFfundsthetools.ThethreeAstrophysicsarchivesmanagetheNAVObackbone.APDrecentlyheldaSeniorReviewofthearchives,andrecommendedthattheybecomemoreproactiveandaggressiveaboutevolvingintothefuture(increasingbandwidth,keepingupwithtechnologicaladvances,preparingforlargevolumesofdata).Sometypesofcomputingmightbemoreexpensiveinthecloud,anditmustbedeterminedwhicharewhich.NASAandNSFarecurrentlyfundingtheoreticalandcomputationalAstrophysicsnetworks(TCAN).Dr.HertzwasnotawareofanyissuesthusfarongettingtimeonNSFsupercomputers.(Dr.LeenotedthatNASAcivilservantscan’ttypicallygetonNSFsupercomputers,butuniversityPrincipalInvestigatorscan.)AnothercomputationallyintensiveareaislaboratoryAstrophysics:interpretingx-raysfromChandra,farinfrareddatafromHerschel,andvisible-to-ultravioletHubblespectrallines.Theseatomiclinecalculationsareneededforcreatinglinecatalogues.Dr.TinoaskedifunderestimationofcomputingtimewereathemeinAPD.Dr.HertzexplainedthatprocessingKeplerdatahasbeenmorecomputationallyintensivethanwasappreciatedatthebeginningofthemission,butthatanewmission,TransitingExoplanetSurveySatellite(TESS),whichhasasimilardataproducttoKepler,hadplannedaccordinglytoLessonsLearnedontheneedforanticipatingcomputingtime.Dr.LeenotedthatNASAisalsomakingtighterconnectionsbetweenHPCandthebudget-planningprocess.Intermsof

Page 14: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

14

recommendations,Dr.HertznotedthatAstrophysicswasaminorityuserofHPC,andwasinterestedinareaswhereitcouldleverageexistingassets,orincommercialorotherresearchthatcanimproveAstrophysicsscience.APDhaspartneredwithDOEinthepast,whentheyareinterestedinthescienceproblem.DOEisnotinterestedinexoplanets,butitisinterestedindarkenergyanddarkmatter,thereforeAPDwillbeworkingwiththemonjointWFIRST-Euclid-LSSTanalysis.PubliccommentperiodNocommentswerenotedfromtheonlineaudience.AtNASAHeadquarters,TrippCorbettmadesomecommentsfromthevendorperspective,sayingthathewasnotingabitofdisconnect,astoolsareavailableatNSSCthatshouldbemorewidelycirculated.AtarecentNASAmeeting,hehadheardabriefingonworkingwiththecloud-computingcommunityinabudget-consciousway,andagreedtosendmorespecific.informationtotheBDTF.OtherFederalBigDataInitiatives(NSF)TheNSFBigDataHubsProgramdirector,Dr.FenZhao,briefedtheBDTFbyphoneonherprogram,whichisfundedatabout$20Myear.TherearerelatedprogramsatNSFthatlookatBigDatainfrastructure,pilotandimplementationefforts,andEducation-relatedactivitiessuchastheBigDataWorkForce($30Mayearlookingattraineeships).TheBigDataHubsprogramlooksatthecomplexrelationshipsbetweendataprojects,endusers,andcommercialentities,andinvolvescross-disciplinaryeffortsanddatasharingacrosstheresearchecosystem.TheinspirationforBDHubscamefromOSTP’s2012BigDataInitiative,inwhichaBigDataPartnershipsWorkshopinitiativeresultedin29newpartnerships,with90organizationsparticipating,representingareassuchasenergy,healthcare,andfinance.Theinitiativechosevariousissuessuchasclimatechangeandpersonalizedhealthcare,andNSFinitiatedtheBDHubsefforttoallowthesepartnershipstogel.BDHubswaslaunchedinMarch2015,withfourhubsinfourregionsoftheUS,andmadeawardsinSeptember2015(ColumbiaUniversityintheNortheast,GeorgiaTechandxintheSouth,UIUCintheMidwest,andUniversityofSD,UCBerkeley,andtheUniversityofWashingtonintheWest).Hubsaredifferentlyconstructedconsortia;thecurrentphaseisallowinghubstostartuptheiractivities.TheprojectsarecalledBDSpokes,whichrepresentspecificactivitywithineachtopicalarea,suchasaplatformforsharingneurosciencedata.Thespokesarefundedat$1Moverthreeyears,andaremeanttoleverageexistingefforts.TheHubsarecurrentlyorganizingdraftsforeachspoke,andfullproposalsareduethismonth.Alargenumberofideascameinonsmartcities,andInternetofThings;thefood/energy/waternexus;andhumanhealthcare.NSFintendstofundtheseproposalsthisfiscalyear,andtherearelatentprojectswaitinginthewingsthatcanhelptransitionsomeoftheseideastopractice.NSFhopestodothisagainnextyear.Dr.HolmesofferedkudostoNSFforsettingupthisopen-endedeffort.Dr.Zhaonotedthatthereisanendgoalofsorts,aseachHubisresponsibleforgenerating29projectsattheendofthreeyears.ThisideaisnotcompletelynovelatNSF.TheFoundationhopetofundeachspokeforasecondthreeyears,tohavethembecomeself-sustaining.AsimilareffortwasundertakenunderUS-Ignite,tosupportnetworking.The

Page 15: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

15

ideaistolookfortheunknowns,asinterestingthingscanhappenintheselarge,multiplecollaborations.Everyonebringstheirownphysicalinfrastructure,andalsotriestoidentifyserviceproviders.Dr.HolmesnotedthatmostoftheHubsweregeographicallyclosetoNASAPIs.Drs.HolmesandZhaoagreedthataclosercollaborationwouldbeideal.PlanetaryScienceBigDataDr.MichaelNew,ProgramScientistforthePlanetaryDataSystem(PDS),presentedtheneedsofBigDatafromtheplanetaryperspective.MostplanetarydataworkisbasedatGSFC.PlanetaryScienceDivision(PSD)datapoliciesstatethatallsciencedatareturnedfromplanetarymissionsbelongstothepublicdomain.Anyexclusivedataaccesscannotexceedsixmonths.Infundedscienceresearch,anydatanecessarytoreplicatepublishedresearchresults,thatarealsotheproductofaNASAaward,mustbemadeimmediatelyavailabletothepublic.TheplanetarydataenvironmentincludesPDS,thePlanetaryCartographyProgram(PCP;USGS),MinorPlanetsCenter(MPC;Harvard)andtheAstromaterialsCurationFacility(ACF;JohnsonSpaceCenter).Datarangesfromground-basedassets,individualinvestigators,mapping,dataanalysis(e.g.,trajectories),samplereturns,ANSMET(Antarcticmeteorites),toatmosphericdust.TheoutputofthePDSisprimarilytotaxpayers,educatorsandtalentedamateurs.AttheACF,NASAstoresspace-exposedhardware,lunarsamples,cosmicdustsamples,andHayabusa(comet)samples.NASAiscurrentlyre-engineeringitssamplecataloguetomakethesesamplesavailableonline.TheMPCisresponsibleforsmallbodies,andtheorbitsofminorplanetsandcomets.ThePCPmaintainsthecartographiccapabilityformappingtheplanetsandtheMoon,anddevelopsandmaintainstheIntegratedSystemforImagersandSpectrometers(ISIS),whichenablesthingslikespectrographicmapsofIo.ISISispreparingtoincorporateanopen-sourcevisualizationtool,theSPICE-basedCosmographia.(“SPICE”isaNASAinformationsystemanditsuseextendsfrommissionconceptthroughpost-missiondataanalysis,andithelpstocorrelateindividualinstrumentdatasetswiththosefromotherinstrumentsonthesameoronotherspacecraft.)PDSisafederatedarchive,withdatadistributedacrossthecountry;itsdisciplinenodeswererecentlyre-competed.Managementofthesystemasawholeisalsobasedonafederatedmodel.PlanetarydataaremanagedbyplanetarySMEs.Dataisphysicallystoredatthenodes,andthedeeparchiveismaintainedattheNASASpaceScienceDataCoordinatedArchive(NSSDCA).TheNavigationandAncillaryInformationFacility(NAIF)implementsstandardsandtoolsthatareneededtounderstandthemotionofcelestialobjects.Inplanetarydatasets,everythingismovingrelativetoeverythingelse:spacecraft,instrument,Earth,andSun,allofwhichneedtimeconversionstandards.ThecollectionofthesevariablesiscalledObservationGeometry(OG).ThecurrentPDSisdistributedacrosssixnodes,whichafterarecentcompetitionarenowintheirfirstyearofa5-yearCooperativeAgreement.ThePIsateachnodecollectivelyformamanagementcouncil,andprovideinputaboutstandardsanddecision-making.PDS-4hasjustrecentlybeenrolledout.ItisanXML-based,model-driven,service-orientedmodel,andamoderntechnicalfoundationforplanetarysciencedata.ExistingPDS-3

Page 16: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

16

productswillbeconvertedtoPDS4whenpracticalandsensible.TheEuropeanSpaceAgencyandJAXA’planetarydatasystemsarebothadoptingPDS-4standards.ThetotalvolumeofPDSisabout1PB.Almostallcomputationsareperformedonindividualworkstations.PDShasjuststarteditsnext10-yearroadmap,andwillbeannouncinganopportunitytoself-nominateinearlyMarch.Areasofimprovementtobeaddressedintheroadmaparetoinclude:simplifyingandimprovingthepipeline;improvingsearchcapability;developingmoreusefulmetrics;improvingtoolsforarchivingsmalldatasets;andimprovingarchivepreparationanddocumentation,especiallyfornon-missiondataproviders.Relevantwebsitesare:naif.jpl.nasa.govandpds.nasa.govDr.HurlburtaskedaboutPDSmetrics.Dr.Newadmittedtohavingpoormetricsofusageandusers,andnotedthattheroadmapeffortwouldhelptoidentifythemetricsPDSwants,andtoadaptthesystemtoprovidethem.Dr.BeebecommentedthattheinternationalplanetarydataallianceacceptedSPICEastheirdatatoolattheirlastmeeting,afavorableindicator.Dr.New,whenaskedaboutBigDataneeds,allowedthattherewerenotmanyspecificareasinplanetary,withtheexceptionofmagnetosphericandplasmadata,orwhengeneratingveryhigh-fidelitygravitymodels.Thelunargravitationalmappingmission,GRAIL,iscurrentlyworkingonagravityfieldmodelontheHPC.Hehadn’theardaboutanyissueswithpipelineassociatedwiththeGRAILwork.Dr.NewfelttheBDTFcoulddirectaquestiontotheAgencyastohowitwouldliketohandlethestorageofgrantdata.PSDneedsacleardirectstatementonthisissue,whichneedstobeinformedattheAgencylevelbecauseitwillbearesponsetoanOSTPdirective.Thereare1500granteesinPSD;itwouldtakealabor-intensiveefforttostorealltheirdata.AnotherquestioniswhatkindofdataPDSisexpectedtoarchive.Dr.Holmesnotedthatthedirectiveappliestotheotherdisciplinesaswell,andinstructedDr.Smithtonotethisasanissue.Ameetingparticipantnotedthatthegrantdispositionquestionwasbeingaddressedintheroadmappingtask,entailingacommunity-basedreappraisalofthesubjectoverthenext6-9months.DiscussionDr.HolmesfollowedupbrieflywithDr.LeeonHPC,andaskedwhatvisibilityexistedfortheprogram,andwhatthechancesforcollaborationwithDOEExascalemightbe.Dr.leeidentifiedhimselfasChairoftheHigh-EndComputingInteragencyWorkingGroup(HECIWG),butnotedthattheExascalecomputingfacilityisunderNationalStrategicComputingInitiative,adifferentgovernance.TheHECIWGismeetingmonthlyatthemoment,andDr.Leefelthecouldstartvectoringthediscussionintheirdirection.HenotedthatDOEsetsupaprocessforeligibility;ataskneedstohaveacertainprofile,andxnumberofcores.ThegateforeligibilitytogetontheDOE’sleadershipcomputingsystems,however,ishigherthanNASA’sentiresystem.NASAisfarbehindNSFandDOEinthesupercomputingarena.NASA’sleadingsystemislessthan5Tflops.Dr.HolmesconsideredthatBDTFmakeafindingonthematter,asNASAisworkingonprojectsofnationalsignificance.Dr.TinoaskedifExascalewasspecificallydesignedtosolveDOEproblems,withspecificallyimplementedarchitecture.Dr.LeereportedthatDOEhasaco-designconcept,andtheybringinanapplicationthatworksontheexascalesystem.

Page 17: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

17

Theyareconsideringclimate-changeasaco-designedsystem.DOEdoesn’thavetheinteroperabilityrequirement.Dr.WalkercommentedthatDOEhasspecificproblems,whileNASAismorebroad.Dr.HolmesnotedthatDOEisaddressingbothastronomyandclimate,andthatwhilesomeofthescalesaredifferent,thephysicsaresimilar.Dr.TinofeltthatNASAshouldeitherfocusonproductsandservices,oracceptgenerality.Dr.HolmessuggestedNASAmanagersaddressutilizationmodelsatfuturemeetings.Dr.KinteraskedaboutwhatHPCwoulduseBigIronforafteritsnominal3yearsofoperation..LeesaidthatNASAplanstorepurposeBigIronafter3years,backintoageneralizedcluster.NASAisstilllimitedbyfacilitiesre:powerandcooling.Dr.HolmesaskedDrs.TinoandKintertowriteatalkingpointonthefacilitiesissue.BDTFmembersraisedsomegeneraltopicsforfurtherexploration.Dr.Tinonotedthateachofthepresentershadadoptedsomeformofstandard,illustratingthatpeoplerecognizethatstandardsdomatter.Fromamanagementstandpoint,however,thesubdisciplineshadinconsistentmetricsonusers,andquestionedwhyarchiveshadtobemaintained,intheabsenceofusage.Dr.Walkerexplainedthatsomedatahaveextremelylonglives;everytimewegetanewmissiontoJupiter,forinstance,VoyagerandPioneerdatasetsareindemandagain.It’scriticalthatsomeofthesedatasetsbesafeguarded.Dr.HolmesnotedthattheSeniorReviewmightbeavehiclefordeterminingwhichdatashouldbekept.Dr.Hurlburtsuggestedusermetricsinformthesesortsofjudgments.Dr.Tinofeltusersurveyswerenotalwayseffective,andthatmetricsonactualusewouldbemoreusefulingettingsmartonwhatdatatostore.Dr.HolmesaskedDr.Tinoetal.tofleshthisoutthoughtanddomoreresearchinadvanceofthenextmeeting.Dr.Beebeaddedthatonealsoneedstoconsidertheintrinsicsizesofcommunitiesandtheirstability;theyalsotendtomovearoundwhenmajormissionsarise.Dr.HolmeswassurprisedatthelackofaclearvisionforthefutureandaskedDr.Hurlburttowriteafindingonthistopic.Dr.HolmesaskedDr.SmithtosoundouttheScienceMissionDirectoratetodeterminethelevelofconcernovergrantdatastorage.Dr.Beebereportedthatitwasamajorconcernthathasalreadyreachedthetopleveloftheadministration,whichhadestablishedworkshopsforpeoplepreparingforfederalgrants.Dr.HolmesgaveanactiontoDr.SmithtoclarifyDr.Murphy’sstatementontheuseofopensourcesoftware,andaskedBDTFmemberstoexaminetheNSFnodesoftheBDHubeffort,todeterminehowclosetheyaretoco-locatedNASAPIs.Dr.HolmesaskedthatthenextBDTFmeetingtakeplaceatGSFCfor2.5daysintheApril-Maytimeperiod,andtoperhapsconsiderasitevisittoARCinthefuture,toincludesomeinteractionwithSiliconValley.Dr.SmithreportedthatshewouldbeworkingonanextensionoftheTOR,off-line.Dr.Holmesadjournedthemeetingat4:59pm.

Page 18: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

18

AppendixAAttendees

AdHocBigDataTaskForceMembersCharlesP.Holmes,Chair,BigDataTaskForceRetaBeebe,NewMexicoStateUniversity(viatelecon/Webex)NealHurlburt,LockheedMartinJamesL.Kinter,GeorgeMasonUniversity(viatelecon/Webex)ClaytonTino,Virtustream,Inc.RayWalker,UniversityofCaliforniaatLosAngelesErinSmith,ExecutiveSecretary,NASAHQNASAAttendeesLouisBarbieri,NASADanCrichton,NASAJPLElaineDenning,NASAHQDeborahDiaz,OCIONASAJohnEvans,NASAT.JensFeeley,NASAHQNavidGolpayegani,NASAJeffreyHayes,NASAHQPaulHertz,NASAHQTsengdarLee,NASAHQEdwardMasuoka,NASADuaneMcMahon,NASATomMorgan,NASAHQKevinMurphy,NASAHQMichaelNew,NASAHQHerbertSchilling,NASAGrifSchilly,NASAJohnSprague,NASAOCIOElizabethYoseph,NASANon-NASAAttendeesJosephBredenkamp,NASAretiredTerryBlankenship,BoozAllenHamiltonJungByun,BoozAllenHamiltonChiehsanCheng,GlobalScienceandTechnologyTrippCorbett,ESRIJosephDohry,BoozAllenHamiltonAlexDuner,MedillNews,Inc.

Page 19: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

19

GraceHu,OMBEricFeigelson,PennStateUniversityRobertKohon,NovettaBradleyPeterson,OSU,Chair,NACScienceCommitteeAmyReis,Ingenicomm,Inc.AlyssaRetski,Lobbyit.comMarciaSmith,SpacePolicyOnlineConnieSpittler,GlobalScienceandTechnologyGeordanTilley,MedillNews,Inc.JoanZimmermann,Ingenicomm,Inc.

Page 20: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

20

AppendixBMembership

Dr. Charles P. Holmes, Chair NASA HQ (Retired) Dr. Reta F. Beebe New Mexico State University Dr. Neal E. Hurlburt Lockheed Martin Space Systems Company Dr. James L. Kinter George Mason University Dr. Clayton P. Tino Virtustream Incorporated Dr. Raymond J. Walker University of California, Los Angeles Dr. Erin Smith, Executive Secretary NASA Headquarters

Page 21: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

21

AppendixCPresentations

1. BigDataTaskForceCharter/SubcommitteeFeedback;ErinSmith2. LegacyfortheNACInformationTechnologyInfrastructureCommittee;Charles

Holmes3. HeliophysicsDivisionBigDataNeeds;JeffreyHayes4. BigDataandEarthScience;KevinMurphy5. SupercomputingandBigDataatNASA;TsengdarLee6. AstrophysicsDivisionBigDataNeeds;PaulHertz7. OtherFederalBigDataInitiatives(NSF);FenZhao8. PlanetaryScienceBigDataNeeds;MichaelNew

Page 22: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

22

Appendix D Agenda

Ad Hoc Big Data Task Force

of the NASA Advisory Council Science Committee

Inaugural Meeting February 16, 2016

NASA Headquarters

Glennan Conference Room, 1Q39

Agenda (Eastern Standard Time)

Tuesday, February 16 8:00 – 8:30 Opening Remarks / Introduction of Members Dr. Erin Smith

Dr. Charles Holmes

8:30 – 9:15 Big Data Task Force Charter / Subcommittee Feedback Dr. Erin Smith 9:15 – 9:30 BREAK 9:30 – 10:15 Legacy from NAC IT Infrastructure Committee Dr. Charles Holmes

10:15 – 10:30 Discussion 10:30 – 10:45 BREAK 10:45 – 11:15 Planetary Science Big Data Dr. Michael New 11:15 – 11:45 Heliophysics Big Data Dr. Jeffrey Hayes 11:45 – 12:45 LUNCH 12:45 – 1:00 Greetings from the Science Committee Dr. Bradley Peterson 1:00 – 1:30 Earth Science Big Data Dr. Kevin Murphy 1:30 – 2:00 Supercomputing Big Data Dr. Tsengdar Lee

Page 23: NASA Big Data Task Force February 16 minutes · 2020-04-28 · NASA Advisory Council Ad Hoc Big Data Task Force, February 16, 2016 6 NASA science data repositories, with that exploration

NASAAdvisoryCouncilAdHocBigDataTaskForce,February16,2016

23

2:00 – 2:30 Astrophysics Big Data Dr. Paul Hertz 2:30 – 2:45 Public Comment 2:45 – 3:00 Other Federal Big Data Initiatives (NSF) Dr. Fen Zhao

3:00 – 3:10 BREAK 3:10 – 3:30 Work Plan and Future Meetings 3:30 – 5:00 Discussion / Findings / Recommendations 5:00 ADJOURN Dial-In and WebEx Information

For entire meeting February 16, 2016 Dial-In(audio):DialtheUSAtoll-freeconferencecallnumber1-800-988-9663ortollnumber1-517-308-9427andthenenterthenumericparticipantpasscode:4718658.Youmustuseatouch-tonephonetoparticipateinthismeeting.WebEx(viewpresentationsonline):Theweblinkishttps://nasa.webex.com,themeetingnumberis999765122,andthepasswordisBigD@T@16.

* All times are Eastern Standard Time *