Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off...
Transcript of Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off...
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
CentresofExcellenceviewon:
“Whataretheimportantresearchgoalsforthenext5years”
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
EoCoE - EnergyorientedCentreofExcellenceforcomputer
BioExcel - CentreofExcellenceforBiomolecularResearch
NoMaD - TheNovelMaterialsDiscoveryLaboratory
MaX - MaterialsdesignattheeXascale
ESiWACE - ExcellenceinSImulation ofWeatherandClimateinEurope
E-CAM - Ane-infrastructureforsoftware,trainingandconsultancyinsimulationandmodelling
POP - PerformanceOptimisationandProductivity
COEGSS - Center ofExcellenceforGlobalSystemsScience
CompBioMed - ACentreofExcellenceinComputationalBiomedicine
1st GenerationofCoE
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
• CollaboratewithPRACE on:§ bestpractices§ softwareenvironment§ organizationalrequirementsforscience
domains§ training
• PRACEHLST• UsePRACEresourcesforbenchmarkingetc.• Usecentercompetenceforco-designand
exascaling (alltiers)• HelpcommunitiesaccessingPRACE
resources
• CollaboratewithFETresearchprojectsto:§ hardenandintegrateresultsto
productionquality§ providereal-worldtestcases
• ProvideforESDs:§ providerequirements§ optimizesoftwarefortargeted
ESDs– willalsobeimportantforsmallerscales
§ co-design
Ecosystem:cPPP
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
GoalsofCoEEnsureEuropeancompetitivenessintheapplicationofHPCtoaddressscientific,industrialorsocietalchallenges:
• Excellence inresearch,developmentandservice;• User-driven,withtheapplicationusersandownersplayingadecisiveroleingovernance;• Integrated:encompassingnotonlyHPCsoftwarebutalsorelevantaspectsofhardware,algorithms,data,workflows,connectivity,
security,etc.,coveringthefullscientific/industrialworkflowandaddressingfull-scale,widelyusedapplicationswithproven impact,notjustapplicationkernelsormini-apps;
• Multidisciplinary:withteamsthatcombineapplicationdomainanduserexpertisetogetherwithHPCsystem,softwareandalgorithmexpertise;
• Distributed withapossiblecentralhub,federatingcapabilitiesaroundEurope,exploitingavailablecompetences,andensuringsynergieswithnational/localprogrammes;
• ShowingaclearpathtowardsExascale coveringbothcomputing andextremedata• Committedtoacultureofcollaboration andsharingofbestpracticeamongCoEs oncross-cuttingtopics,suchasco-designand
Exascale technologies,softwaresustainability,andpotentialbusinessmodels;• CommittedtocollaboratewiththeupcomingExtremeScaleDemonstrators,providingthemwithahighperformanceand
scalableapplicationbase;• Activelycontributingtoeconomicandsocietalbenefit,byfacilitatinghigh-impactresearchandcollaboratingwithindustry
andSMEs.
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
PresentCoE
Vertical(domain)CoEs,whichprovideservicestowelldefinedcommunities:• Softwaredevelopment,support&consultancy,training• Frontierdevelopmentstowardsexascale® requirementsinMaterialSciences,Weather&Climate orBiomaybeverydifferent
Horizontal(transversal)CoEs,whichcovertopicsofimportancetoseveralverticalCoEs:• Softwaredevelopment,support&consultancy,training• Performanceanalysis® potentiallycomplexinteractionmatrix
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
FutureCoE
Vertical(domain)CoEs,whichprovideservicestowelldefinedcommunities:• Currentthemes+Engineering?+Fundamentalphysics?+Environmentalsciences(adaptationtoenvironmentalchange)?®Coverentirevaluechain(fromsciencetoindustry)®Computeanddata(fromexaflop toexabyte)
Horizontal(transversal)CoEs,whichcovertopicsofimportancetoseveralverticalCoEs:• Currentthemes+High-performancedataanalytics?+Softwaresustainability?+Mathematicsandalgorithms?® Enhancecultureofcollaboration(CoE matrix)
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
FutureCoE:Considerations• PotentiallymergingofexistingCoE (cf.Flagships):
• smallernumberofdomainsthathaveshownaclearpathtoexascale/co-designor largervariety?• mayprovidemoreefficientx-benefitforseveralcommunitiestowardsexascale but effectiveness
maysuffer(weaklinksaffectwiderarea)• providingsupportformergedcommunitieswillbechallenging
• ImportanceofExascale:• addressingkeysciencechallengesismostimportant(¹ Exaflop)• technologicaldevelopments(e.g.co-design® EsD® top3in2022)runawayfromapplications• onlyfewCoE havereallybigdatatasks(e.g.ESiWACE)
• StimulationofSME:• smallerdedicatedeffortsmaybemoreeffective(e.g.SHAPE,Fortissimo)• unifiedsoftwaredevelopmentvsspecializedSMEniches
• Sustainability:• assimilatingcutting-edgeFETdevelopments• businessmodelneedssimple/realisticapproach(andpublicfundingforsometimetocome)
®ManageexpectationsofCoE whatwillbeabletoachieve&sustain– giventheavailablefunding
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
CoE inSRA-2
• Section8EsD introduction,page67:“TheseEsDs shouldprovideplatformsdeployedbyHPCcentresandusedbyCoEs fortheirproductionofnewandrelevantapplications.”
• Section8.2,ProposalofETP4HPCforEsD Calls:“Thisandotherhardwarecharacteristics(energyefficiency,I/Obandwidth,resiliency,etc.)willbedetailedinthe2017releaseoftheSRA,alsotakingintoaccountresultsfromtheFETHPCprojectsandrequirementsfromtheCoEs.”
• Section9.3,CentresofExcellenceforComputingApplications:“TheCentresofExcellenceinComputingApplications(CoEs)formoneofthethreepillarsoftheEuropeanHPCEcosystemandrepresenttheEuropeanApplicationexpertise[…].ETP4HPCwillbeworkingtoincludetheCoEs intheprocessesofthecontractualPublic-PrivatePartnershipforHPCandsynchronisetheireffortswiththoseoftheothertwopillarsoftheEuropeanHPCEcosystem(i.e.ETP4HPCandtheFETHPCprojects,PRACE).
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
Targetforaddressingkeysciencechallengesinweather&climateprediction:Global1-kmEarthsystemsimulations@~1year/dayrate
Example:ESiWACE ScienceChallenge
10km
1km
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
CoE andSRA-3DomainCoE:Whatarethekeyscientificchallengesandwhatdoesittaketoachievethem?
BioExcel CompBioMed
MaX ESiWACE E-CAM POP CoEGSS
HPCSystemArchitectureandComponents
Largewidthvectorunits,low-latencynetworks,high-bandwidthandlargememory;fastCPU<->acceleratortransferrates,Heterogeneousacceleration,floating-point
Hybridsystems,GPUs,high-bandwidthmemory
Scientificchallanges:accuratecomputationsrequiringmorepowerfulHPCsystems,heterogeneoussystemsdesignedtooptimizeend-to-endworkflows
High-bandwidthmemory,networks,NVRAM
Highmemorybandwidth,
largeRAM
Throughputorienteddevices(vectors),memoryarchitecturesandhowtousethem,architecturalsupportforruntimes,mechanismstomonitorprogressandnotifyruntimesincasesofresourcepreemptions
ConvergencebetweenHPC&HPDA,NVRAM,Fastnetworks
SystemSoftwareandManagement
Dynamic(task)scheduling,Supportforworkflows
Dynamicscheduling, urgentcomputing
VirtualMachinemodelsupported
Dynamicscheduling,compilers
Crosscompiling,archiving
tools
Dynamic,interactiveuseofavailableresources,tightandbidirectionalcommunication/cooperationbetweenjobschedulersandruntimes
Dynamicallyscalingjobs,IntegrationofHPC&HPDA,Visualization(insitu),Dataanalytics(insitu)ProgrammingEnvironment
ProgrammingEnvironment
Standardization,portability,taskparallelism,fastcodedrivenby Pythoninterfaces
Portability,easeofscaleout,MPIextensions
supportnewMPIandOpenMP standards+newdevelopmenttoolslikepython
Faststandardization,DSL
Interactivetesting,OpenCLsupport,sustainablesupportof
standards
Programmingmodelandruntimesupportformalleability,asynchrony/outofordertaskexecution,hideheterogeneityandtoleratelatencyandvariability, powerfulperformanceanalyticsintools,toolsfortaskdependenciesandmemoryaccesspatterns,programmersmindsetfrombottom-uplatencydominatedto throughputorientedmentality
Well-definedstandards/andtoolsthatimplementthestandards)foragent-basedmodeling,Compilers,Debuggers
EnergyandResiliency
Distributedcomputingtechniquestohandleresiliency/faulttolerance
Reducedcost,improvedfaulttolerance
Energyoptimizedworkflows:HPCsystemsincludingenergymonitoringandprofiling.
Faulthandling,lessprecision
Energyaware
algorithms
Betterintegrationbetweenalgorithmicbasedfaultdetectiontechniquesandmechanismintheinfrastructurefromdetectederrors
NotimportantforCoeGSS
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
CoE andSRA-3(cont’d)DomainCoE:Whatarethekeyscientificchallengesandwhatdoesittaketoachievethem?
BioExcel CompBioMed
MaX ESiWACE E-CAM POP CoEGSS
BalanceCompute,I/OandStoragePerformance
Post-processingonthefly,Data-focusedworkflows,handlinglotsofsmallfilesinbioinformatics
Post-processingon the fly,easytransferbetweenstoragetiers
HTmaterialscienceworkloadbecomesquicklymemoryandI/Obound:systemswithhighIOPSandpostposixdataobjectsarerequired.
Post-processingonthefly,multi-tier software
Fastaccessofarchivedata,multi-platformworkflows,multi-threadedapplicationsforhybridproduction/analysis
applications
IntegrationofasynchronousI/Ointerfaceinprogrammingmodel,betterintegrationofprogrammingmodels/languagesandpersistentstorage
Convergedsystems,Livedataanalytics,Strongdatamovementcapabilities
BigDataandHPCusageModels
Proximityofdatagenerationandanalysis/visualizationresources,workflows,machinelearningforanalyzingsimulationdata, high-throughputsampling
Analyticsofsimulationoutputs,visualisation
Workflows,intelligentdataanalytics
Recomputing,data analytics
Fastaccesstodata
bases,dataminingBetterintegrationbetweenprogrammingmodelandstorageinterface, moredynamic,interactivesupercomputingpractices, makeusers/programmersawareofcost/benefitofeachindividualdataandcomputationforbetterresource/storagescheduling
HPDAPlatformsupport,Algorithms/ModelsforefficientHPDA
MathematicsandalgorithmsforextremescaleHPCsystems
Multi-scalealgorithms,task-parallel algorithms,Electrostatics solvers,ensemblesampling&clusteringtheory,ensemblesimulations
Noveltimesteppingalgorithms,automatedimplementationofmultiscalecomputingpatterns
Newalgorithmsavoidingsynchronous(unnecessary)datadependencyandexploitingunreducibledatadependencytree(nesting),toimproveconcurrencyandlocality
Disruptivenumericalmethods(discretization),dataplacement
Memory/cacheawarealgorithms,asynchronousalgorithms,efficienthandlingoflongrange/collective
correlations
Algorithmcomplexity(computation/communication).asynchronyandvariabilitytolerance,algorithm basedfaultdetection
Algorithmsforefficientdataanalytics
ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE
CoE andSRA-3/EsDAssumption:EsD willbeprecursorsofEuropeanexascale HPCfacilitieshostingnoveltechnologiesandsoftwarestack
® EsD areFLOPfocused,CoE applicationareasmaynotbe!
IfthekeyroleofCoE istoprovidewider(domain)usercommunitywithexpertise,softwareetc.:1. Howistransition from‘novel’to‘applicable’bemanaged?WhatistheroleofFET
projectsinthiscontext?(thereisalsonocertaintyforsustainabilityofFETprojectsperdomain)
2. Howarekeyapplications,toberunonEsD inoperationalmode,adaptedtonovelarchitecture,stack,programmingmodelsetc.?Bywhom?Withwhatfunding?
3. IffutureCoE becomewider(lessdomainspecific):a. howcantheaboveberealized?b. willthehorizontalCoE dothejob?