Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off...

12
ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE Centres of Excellence view on: “What are the important research goals for the next 5 years”

Transcript of Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off...

Page 1: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

CentresofExcellenceviewon:

“Whataretheimportantresearchgoalsforthenext5years”

Page 2: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

EoCoE - EnergyorientedCentreofExcellenceforcomputer

BioExcel - CentreofExcellenceforBiomolecularResearch

NoMaD - TheNovelMaterialsDiscoveryLaboratory

MaX - MaterialsdesignattheeXascale

ESiWACE - ExcellenceinSImulation ofWeatherandClimateinEurope

E-CAM - Ane-infrastructureforsoftware,trainingandconsultancyinsimulationandmodelling

POP - PerformanceOptimisationandProductivity

COEGSS - Center ofExcellenceforGlobalSystemsScience

CompBioMed - ACentreofExcellenceinComputationalBiomedicine

1st GenerationofCoE

Page 3: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

• CollaboratewithPRACE on:§ bestpractices§ softwareenvironment§ organizationalrequirementsforscience

domains§ training

• PRACEHLST• UsePRACEresourcesforbenchmarkingetc.• Usecentercompetenceforco-designand

exascaling (alltiers)• HelpcommunitiesaccessingPRACE

resources

• CollaboratewithFETresearchprojectsto:§ hardenandintegrateresultsto

productionquality§ providereal-worldtestcases

• ProvideforESDs:§ providerequirements§ optimizesoftwarefortargeted

ESDs– willalsobeimportantforsmallerscales

§ co-design

Ecosystem:cPPP

Page 4: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

GoalsofCoEEnsureEuropeancompetitivenessintheapplicationofHPCtoaddressscientific,industrialorsocietalchallenges:

• Excellence inresearch,developmentandservice;• User-driven,withtheapplicationusersandownersplayingadecisiveroleingovernance;• Integrated:encompassingnotonlyHPCsoftwarebutalsorelevantaspectsofhardware,algorithms,data,workflows,connectivity,

security,etc.,coveringthefullscientific/industrialworkflowandaddressingfull-scale,widelyusedapplicationswithproven impact,notjustapplicationkernelsormini-apps;

• Multidisciplinary:withteamsthatcombineapplicationdomainanduserexpertisetogetherwithHPCsystem,softwareandalgorithmexpertise;

• Distributed withapossiblecentralhub,federatingcapabilitiesaroundEurope,exploitingavailablecompetences,andensuringsynergieswithnational/localprogrammes;

• ShowingaclearpathtowardsExascale coveringbothcomputing andextremedata• Committedtoacultureofcollaboration andsharingofbestpracticeamongCoEs oncross-cuttingtopics,suchasco-designand

Exascale technologies,softwaresustainability,andpotentialbusinessmodels;• CommittedtocollaboratewiththeupcomingExtremeScaleDemonstrators,providingthemwithahighperformanceand

scalableapplicationbase;• Activelycontributingtoeconomicandsocietalbenefit,byfacilitatinghigh-impactresearchandcollaboratingwithindustry

andSMEs.

Page 5: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

PresentCoE

Vertical(domain)CoEs,whichprovideservicestowelldefinedcommunities:• Softwaredevelopment,support&consultancy,training• Frontierdevelopmentstowardsexascale® requirementsinMaterialSciences,Weather&Climate orBiomaybeverydifferent

Horizontal(transversal)CoEs,whichcovertopicsofimportancetoseveralverticalCoEs:• Softwaredevelopment,support&consultancy,training• Performanceanalysis® potentiallycomplexinteractionmatrix

Page 6: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

FutureCoE

Vertical(domain)CoEs,whichprovideservicestowelldefinedcommunities:• Currentthemes+Engineering?+Fundamentalphysics?+Environmentalsciences(adaptationtoenvironmentalchange)?®Coverentirevaluechain(fromsciencetoindustry)®Computeanddata(fromexaflop toexabyte)

Horizontal(transversal)CoEs,whichcovertopicsofimportancetoseveralverticalCoEs:• Currentthemes+High-performancedataanalytics?+Softwaresustainability?+Mathematicsandalgorithms?® Enhancecultureofcollaboration(CoE matrix)

Page 7: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

FutureCoE:Considerations• PotentiallymergingofexistingCoE (cf.Flagships):

• smallernumberofdomainsthathaveshownaclearpathtoexascale/co-designor largervariety?• mayprovidemoreefficientx-benefitforseveralcommunitiestowardsexascale but effectiveness

maysuffer(weaklinksaffectwiderarea)• providingsupportformergedcommunitieswillbechallenging

• ImportanceofExascale:• addressingkeysciencechallengesismostimportant(¹ Exaflop)• technologicaldevelopments(e.g.co-design® EsD® top3in2022)runawayfromapplications• onlyfewCoE havereallybigdatatasks(e.g.ESiWACE)

• StimulationofSME:• smallerdedicatedeffortsmaybemoreeffective(e.g.SHAPE,Fortissimo)• unifiedsoftwaredevelopmentvsspecializedSMEniches

• Sustainability:• assimilatingcutting-edgeFETdevelopments• businessmodelneedssimple/realisticapproach(andpublicfundingforsometimetocome)

®ManageexpectationsofCoE whatwillbeabletoachieve&sustain– giventheavailablefunding

Page 8: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

CoE inSRA-2

• Section8EsD introduction,page67:“TheseEsDs shouldprovideplatformsdeployedbyHPCcentresandusedbyCoEs fortheirproductionofnewandrelevantapplications.”

• Section8.2,ProposalofETP4HPCforEsD Calls:“Thisandotherhardwarecharacteristics(energyefficiency,I/Obandwidth,resiliency,etc.)willbedetailedinthe2017releaseoftheSRA,alsotakingintoaccountresultsfromtheFETHPCprojectsandrequirementsfromtheCoEs.”

• Section9.3,CentresofExcellenceforComputingApplications:“TheCentresofExcellenceinComputingApplications(CoEs)formoneofthethreepillarsoftheEuropeanHPCEcosystemandrepresenttheEuropeanApplicationexpertise[…].ETP4HPCwillbeworkingtoincludetheCoEs intheprocessesofthecontractualPublic-PrivatePartnershipforHPCandsynchronisetheireffortswiththoseoftheothertwopillarsoftheEuropeanHPCEcosystem(i.e.ETP4HPCandtheFETHPCprojects,PRACE).

Page 9: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

Targetforaddressingkeysciencechallengesinweather&climateprediction:Global1-kmEarthsystemsimulations@~1year/dayrate

Example:ESiWACE ScienceChallenge

10km

1km

Page 10: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

CoE andSRA-3DomainCoE:Whatarethekeyscientificchallengesandwhatdoesittaketoachievethem?

BioExcel CompBioMed

MaX ESiWACE E-CAM POP CoEGSS

HPCSystemArchitectureandComponents

Largewidthvectorunits,low-latencynetworks,high-bandwidthandlargememory;fastCPU<->acceleratortransferrates,Heterogeneousacceleration,floating-point

Hybridsystems,GPUs,high-bandwidthmemory

Scientificchallanges:accuratecomputationsrequiringmorepowerfulHPCsystems,heterogeneoussystemsdesignedtooptimizeend-to-endworkflows

High-bandwidthmemory,networks,NVRAM

Highmemorybandwidth,

largeRAM

Throughputorienteddevices(vectors),memoryarchitecturesandhowtousethem,architecturalsupportforruntimes,mechanismstomonitorprogressandnotifyruntimesincasesofresourcepreemptions

ConvergencebetweenHPC&HPDA,NVRAM,Fastnetworks

SystemSoftwareandManagement

Dynamic(task)scheduling,Supportforworkflows

Dynamicscheduling, urgentcomputing

VirtualMachinemodelsupported

Dynamicscheduling,compilers

Crosscompiling,archiving

tools

Dynamic,interactiveuseofavailableresources,tightandbidirectionalcommunication/cooperationbetweenjobschedulersandruntimes

Dynamicallyscalingjobs,IntegrationofHPC&HPDA,Visualization(insitu),Dataanalytics(insitu)ProgrammingEnvironment

ProgrammingEnvironment

Standardization,portability,taskparallelism,fastcodedrivenby Pythoninterfaces

Portability,easeofscaleout,MPIextensions

supportnewMPIandOpenMP standards+newdevelopmenttoolslikepython

Faststandardization,DSL

Interactivetesting,OpenCLsupport,sustainablesupportof

standards

Programmingmodelandruntimesupportformalleability,asynchrony/outofordertaskexecution,hideheterogeneityandtoleratelatencyandvariability, powerfulperformanceanalyticsintools,toolsfortaskdependenciesandmemoryaccesspatterns,programmersmindsetfrombottom-uplatencydominatedto throughputorientedmentality

Well-definedstandards/andtoolsthatimplementthestandards)foragent-basedmodeling,Compilers,Debuggers

EnergyandResiliency

Distributedcomputingtechniquestohandleresiliency/faulttolerance

Reducedcost,improvedfaulttolerance

Energyoptimizedworkflows:HPCsystemsincludingenergymonitoringandprofiling.

Faulthandling,lessprecision

Energyaware

algorithms

Betterintegrationbetweenalgorithmicbasedfaultdetectiontechniquesandmechanismintheinfrastructurefromdetectederrors

NotimportantforCoeGSS

Page 11: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

CoE andSRA-3(cont’d)DomainCoE:Whatarethekeyscientificchallengesandwhatdoesittaketoachievethem?

BioExcel CompBioMed

MaX ESiWACE E-CAM POP CoEGSS

BalanceCompute,I/OandStoragePerformance

Post-processingonthefly,Data-focusedworkflows,handlinglotsofsmallfilesinbioinformatics

Post-processingon the fly,easytransferbetweenstoragetiers

HTmaterialscienceworkloadbecomesquicklymemoryandI/Obound:systemswithhighIOPSandpostposixdataobjectsarerequired.

Post-processingonthefly,multi-tier software

Fastaccessofarchivedata,multi-platformworkflows,multi-threadedapplicationsforhybridproduction/analysis

applications

IntegrationofasynchronousI/Ointerfaceinprogrammingmodel,betterintegrationofprogrammingmodels/languagesandpersistentstorage

Convergedsystems,Livedataanalytics,Strongdatamovementcapabilities

BigDataandHPCusageModels

Proximityofdatagenerationandanalysis/visualizationresources,workflows,machinelearningforanalyzingsimulationdata, high-throughputsampling

Analyticsofsimulationoutputs,visualisation

Workflows,intelligentdataanalytics

Recomputing,data analytics

Fastaccesstodata

bases,dataminingBetterintegrationbetweenprogrammingmodelandstorageinterface, moredynamic,interactivesupercomputingpractices, makeusers/programmersawareofcost/benefitofeachindividualdataandcomputationforbetterresource/storagescheduling

HPDAPlatformsupport,Algorithms/ModelsforefficientHPDA

MathematicsandalgorithmsforextremescaleHPCsystems

Multi-scalealgorithms,task-parallel algorithms,Electrostatics solvers,ensemblesampling&clusteringtheory,ensemblesimulations

Noveltimesteppingalgorithms,automatedimplementationofmultiscalecomputingpatterns

Newalgorithmsavoidingsynchronous(unnecessary)datadependencyandexploitingunreducibledatadependencytree(nesting),toimproveconcurrencyandlocality

Disruptivenumericalmethods(discretization),dataplacement

Memory/cacheawarealgorithms,asynchronousalgorithms,efficienthandlingoflongrange/collective

correlations

Algorithmcomplexity(computation/communication).asynchronyandvariabilitytolerance,algorithm basedfaultdetection

Algorithmsforefficientdataanalytics

Page 12: Centres of Excellence view on: “What are the important ... - Peter... · ETP4HPC SRA-3 Kick-off meeting IBM IOT, Munich, March 20th 2017 Peter Bauer & Erwin Laure 4CoE EoCoE-Energy

ETP4HPCSRA-3Kick-offmeetingIBMIOT,Munich,March20th2017 PeterBauer&ErwinLaure4CoE

CoE andSRA-3/EsDAssumption:EsD willbeprecursorsofEuropeanexascale HPCfacilitieshostingnoveltechnologiesandsoftwarestack

® EsD areFLOPfocused,CoE applicationareasmaynotbe!

IfthekeyroleofCoE istoprovidewider(domain)usercommunitywithexpertise,softwareetc.:1. Howistransition from‘novel’to‘applicable’bemanaged?WhatistheroleofFET

projectsinthiscontext?(thereisalsonocertaintyforsustainabilityofFETprojectsperdomain)

2. Howarekeyapplications,toberunonEsD inoperationalmode,adaptedtonovelarchitecture,stack,programmingmodelsetc.?Bywhom?Withwhatfunding?

3. IffutureCoE becomewider(lessdomainspecific):a. howcantheaboveberealized?b. willthehorizontalCoE dothejob?