BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number...

49
BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based convergence between HPC and Cloud to handle Big Data Deliverable number D1.1 Deliverable title Requirement description Main Authors Toni Cortés, all ESRs Grant Agreement number 642963 Project ref. no MSCA-ITN-2014-ETN-642963 Project acronym BigStorage Project full name BigStorage: Storage-based convergence between HPC and Cloud to handle Big Data Starting date (dur.) 1/1/2015 (48 months) Ending date 31/12/2018 Project website http://www.bigstorage-project.eu

Transcript of BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number...

Page 1: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

BigStorage:MSCA-ITN-2014-ETN-642963

Storage-basedconvergencebetweenHPCandCloudtohandleBigData

Deliverablenumber D1.1

Deliverabletitle Requirementdescription

MainAuthors ToniCortés,allESRs

GrantAgreementnumber 642963

Projectref.no MSCA-ITN-2014-ETN-642963

Projectacronym BigStorage

Projectfullname BigStorage:Storage-basedconvergencebetweenHPCandCloudtohandleBigData

Startingdate(dur.) 1/1/2015(48months)

Endingdate 31/12/2018

Projectwebsite http://www.bigstorage-project.eu

Page 2: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page2of49

Coordinator MaríaS.Pérez

Address Campus de Montegancedo sn. 28660 Boadilla del Monte,Madrid,Spain

Replyto [email protected]

Phone +34-91-336-7380

DocumentIdentifier D1.1

ClassDeliverable Document

Version 1.0

Documentduedate M22

Submitted 24/10/2016

Responsible ToniCortes,BSC

Replyto [email protected]

Documentstatus final

Nature R(Report)

Disseminationlevel (Public)

WP/Taskresponsible(s) ToniCortes-BSC

Contributors AllESRs

DistributionList ConsortiumPartners

Reviewers Alladvisors

DocumentLocation http://bigstorage-project.eu/index.php/deliverables

Page 3: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page3of49

EXECUTIVESUMMARYThisdocumentprovidesanoverviewoftheprogressoftheworkdoneduringthefirst22-monthperiodoftheProjectBigStorage(from01-01-2015until30-09-2016)withrespecttoWP1.Duringthisperiod,ESRshaveorganizedthemselvesintofourworkinggroups,whereeach of these groups analysed the requirements of the formain use cases of the project:HumanBrainproject,SquareKilometreArray,Climatemodelling,andSmartCities.

Inthisdeliverablewebrieflydescribeeachoftheprojectsandlisttherequirementseachoftheseprojectshave intheareasofcoveredbytheETN:Storage, IO,analysis,etc.Foreachrequirement we give information about the requirements, it potential evolution, andsourcesofinformationwherethereadercangetdeeperinsightoftherequirement.Finally,alsoforeachprojectandrequirement,welistwhatESRsareworkingonsolutionthatmaycoverfullyorpartiallythelistedrequirements.

Page 4: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page4of49

DOCUMENTINFORMATIONISTProject

Number

MSCA-ITN-2014-ETN-642963

Acronym BigStorage

FullTitle BigStorage: Storage-based convergence between HPC and Cloud tohandleBigData

ProjectURL http://www.bigstorage-project.eu

DocumentURL http://bigstorage-project.eu/index.php/deliverables

EUProjectOfficer SzymonSroda

Deliverable Number D1.1 Title Requirementdescription

Workpackage Number WP1 Title Use CaseAnalysis andEvaluation

DateofDelivery Contractual M22 Actual 24/10/2016

Status version3 final!

Nature prototype□report!dissemination□

Disseminationlevel public!consortium□

Authors(Partner) <nameandinstitution>

Name ToniCortes E-mail [email protected] Partner BSC Phone +34934137966

Page 5: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page5of49

Abstract

(fordissemination)

Inthisdeliverablewebrieflydescribeeachoftheprojectsandlisttherequirements each of these projects have in the areas of covered bythe ETN: Storage, IO, analysis, etc. For each requirement we giveinformation about the requirements, it potential evolution, andsourcesofinformationwherethereadercangetdeeperinsightoftherequirement. Finally, also for each project and requirement, we listwhatESRsareworkingonsolutionthatmaycoverfullyorpartiallythelistedrequirements.

Keywords Requirements, Use cases, HBP, SKA, ClimateModelling, Smart Cities,Storage,IO,Analyisis.

Version Modification(s) Date Author(s)

01The 4WG have added the initialcontents

<15/09/2016> AllESRs

02Put the document together,homogenization,andfirstreview

<20/09/2016> ToniCortes

03 AddedcommentsbyReviewers <28/09/2016> ToniCortes

Page 6: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page6of49

PROJECTCONSORTIUMINFORMATION

Participants Contact

Universidad Politécnica deMadrid(UPM),Spain

MaríaS.Pérez

Email:[email protected]

Barcelona SupercomputingCenter(BSC),Spain

ToniCortes

Email:[email protected]

Johannes Gutenberg University(JGU)Mainz,Germany

AndréBrinkmann

Email: [email protected]

Inria,France

GabrielAntoniu

Email:[email protected]

AdrianLebre

Email:[email protected]

Foundation for Research andTechnology - Hellas (FORTH),Greece

AngelosBilas

Email:[email protected]

Page 7: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page7of49

Seagate,UK

MalcolmMuggeridge

Email:[email protected]

DKRZ,Germany

ThomasLudwig

Email:[email protected]

CA Technologies DevelopmentSpain(CA),Spain

VictorMuntes

Email:[email protected]

CEA,France

JacqueCharlesLafoucriere

Email:

[email protected]

Fujitsu Technology SolutionsGMBH,Germany

SeppStieger

Email:[email protected]

Page 8: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page8of49

TABLEOFCONTENTS

EXECUTIVESUMMARY...........................................................................................................................................3

DOCUMENTINFORMATION .................................................................................................................................4

PROJECTCONSORTIUMINFORMATION...........................................................................................................6

TABLEOFCONTENTS.............................................................................................................................................8

INTRODUCTION .................................................................................................................................................... 10

USECASES ............................................................................................................................................................... 11

HUMANBRAINPROJECT ...........................................................................................................................................................11

Description .............................................................................................................................................................................. 11

Challenges................................................................................................................................................................................ 12

requirements .......................................................................................................................................................................... 14

LinkingRequirementstoESRs........................................................................................................................................ 21

Furtherreading..................................................................................................................................................................... 21

THESQUAREKILOMETREARRAY............................................................................................................................................22

Description .............................................................................................................................................................................. 22

TheScienceDataProcessor ............................................................................................................................................. 22

Requirements ......................................................................................................................................................................... 23

LinkingRequirementstoESRs........................................................................................................................................ 27

CLIMATESCIENCE ......................................................................................................................................................................28

Description .............................................................................................................................................................................. 28

Requirements ......................................................................................................................................................................... 30

LinkingRequirementstoESRs........................................................................................................................................ 36

SMARTCITIES .............................................................................................................................................................................37

Description .............................................................................................................................................................................. 37

Page 9: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page9of49

Requirements ......................................................................................................................................................................... 38

LinkingRequirementstoESRs........................................................................................................................................ 42

CONCLUSION .......................................................................................................................................................... 43

REFERENCES .......................................................................................................................................................... 44

Page 10: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page10of49

INTRODUCTIONOneofthemaintasksofaresearcher is firsttounderstand, inenoughdetail, theproblemthatheorsheistryingtosolveinordertoavoidinventingproblemstolatersolvethem.Inorder to understand some of the problems that current research is facing, ESRs haveanalysedfourkeyprojectsintheresearchagendainEurope:HumanBrainProject,SquareKilometre Array, Climate modelling, and Smart cities. All four projects have challengingrequirementsinbothstorageandanalysistechnologies,themaintopicsofthisETN.

Inorder tounderstand the requirements, theESRshave readall availabledocumentation(which is quite heterogeneous) and have discussed the topics with some relevantresearchersineachcommunity.Inaddition,andinordertoeasethetaskofunderstandingthe needs of each project, some key researchers were invited to deliver talks in theBigStorageinitialschoolheldinBarcelona.

This analysis work has been performed in a distributed way. We have built 4 workinggroupscomposedby3 to5ESRsand1 to3advisors.Eachof theseworkinggroupshavebeenassignedtooneprojectanddeliveredthelistofrequirements.Thistaskhasreducedtheeffortofrequirementgathering,andhashelpedESRstoworkingroupswithdistributedtasks.

The result of this work is a list of requirements (with a number of references to enablefurtherdetailsthatdonotfitinthisdeliverable,butthatmaybeimportantforESRswhileperforming their research). It is important tounderstand that, given theheterogeneity ofthe sources and the distributed manner of the research work, there is not a perfectmatchinginthelevelofdetailsbetweenthedifferentprojects.Nevertheless,webelieveitisagreattool forESRstounderstandcurrentproblemsandhowtheyhavebeenfocusedbytheresearchersineachoftheprojects.

Inadditiontotherequirementlist,ESRshavealsotriedtoidentifywhichofthesedescribedrequirementsmaybeaffectedbytheworktheyareperforming.Thistablewillbetheseedfor the follow up work, where some of the listed requirements will be converted intobenchmarks to test theprogressesof the researchdonebyESRswith respect to the fouranalysedprojects.

This deliverable is organized as follows: Section 2, presents the different use cases,requirements, and potential implication of the different ESRs in each requirement andSection3Concludestheanalysis.

Page 11: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page11of49

USECASES

HUMANBRAINPROJECT

DESCRIPTIONThe Human Brain Project (HBP) is a European Commission Future and EmergingTechnologies Flagship Project. It aims to put in place a cutting-edge, ICT-based scientificresearch infrastructure, whichwill allow scientific and industrial researchers to advanceknowledgeinthefieldsofneuroscience,computingandbrain-relatedmedicine.TheProjectpromotes collaboration across the globe and is committed to drive forward Europeanindustry.

TheNeuroscienceSubprojectswillextendtheirresearchinbrainorganisationandtheory,in order to support the building of increasingly sophisticatedmodels and simulations. Inaddition, related workwill be done in brain-like computing and robotics, working up toreplicationofthewholemousebrain,whilealsolayingthefoundationsforsimulationofthemuchlargerandmorecomplexhumanbrain.Theresultingknowledgeonthestructureandconnectivity of the brain will open up new perspectives for the development of“neuromorphic”computingsystemsincorporatinguniquecharacteristicsofthebrainsuchasenergy-efficiency,fault-toleranceandtheabilitytolearn.

TheHPCPlatformSubproject(SP7)isoneoftheHBP12operationalsubprojects.Itsmissionis to build and manage the hardware and software for the supercomputing and datainfrastructure required to run cellular brain model simulations, up to the size of a fullhumanbrain.SP7willmakethisinfrastructureavailabletotheconsortiumandthescientificcommunityworldwide.Fullbrainsimulationsareexpectedtorequireexascalecapabilities,which according to most potential suppliers’ roadmaps, are likely to be available inapproximately 2021-22. Aswell as providing sufficient computing performance, the HBPsupercomputer will also need to support data-intensive interactive supercomputing andlarge-memoryfootprints.Thisincludestopicsliketightlyintegratedvisualization,analytics,simulation capabilities, efficient European-wide data management, dynamic resourcemanagement providing co-scheduling of heterogeneous resources and a significantenlargementofmemorycapacity,basedonpower-efficientmemorytechnologies.

Analysis/computationrequirementsThemaincomputationalaimforthefutureistodevelopICTtoolstogeneratehigh-fidelitydigitalreconstructionsandsimulationsofthemousebrainandultimatelythehumanbrain(“SimulatetheBrain”).Otheraimsoftheprojectinclude:

Page 12: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page12of49

• Develop hardware architectures and software systems for visually interactive,multi-scale supercomputing moving towards the exascale (“Develop InteractiveSupercomputing”).

• Develop and operate six specialized platforms dedicated respectively toneuroinformatics, brain simulation, high performance computing, medicalinformatics,neuromorphiccomputing,neurorobotics,andaunifiedportalprovidinga singlepoint of access to theplatforms (“DevelopandOperate six ICTPlatforms,MakingHBPTools,MethodsandDataAvailabletotheScientificCommunity”).

• Develop ICT tools supporting the re-implementation of bottom-up and top-downmodelsofthebraininneuromorphiccomputingandneuroroboticsystems.Machinelearningalgorithmscouldalsobedeveloped,basedontheHBspatialdataacquired(“DevelopBrain-InspiredComputingandRobotics”.)

• Develop ICT tools to federate and cluster anonymized patient data (“Map BrainDiseases”.)

• Implement a program of trans-disciplinary education to train young scientists toexploit the convergence between ICT and neuroscience, and to create newcapabilities for European academia and industry (“Education and KnowledgeManagement”.)

CHALLENGES-ComputationPower-The IBM roadmap predicts the production of an exascale computer around 2018 (1018flops/s). Extrapolating today’s Blue Brain Project numbers, exascale is probably theminimumrequiredtosimulatetheentirebrain.Thislevelofperformanceis justsufficientfor the simultaneous computation of the present estimate of the number of equationsneeded to provide a first holistic version of a brain model, one that instantiates thenonlinearinteractions,thatgiverisetotheemergentpropertiesoflivingbrains.Regardingdatastorage,thisisapracticalproblemthathaseffectivelybeensolvedbycloudcomputingand distributed storage with appropriate addressing; it is data analysis and aggregationwithefficientdatabasequeriesthatarechallengesatthisscale.

-DataGathering-Clinical scientistsareused todealingwithhighlycontrolled, “clean”datasets,despite themessy nature of their observational constructs. Hence their data sets are often small,preciousandcloselyguarded,beingacriticalpartofthediscoveryprocess.Thismindsetisinvalidated by advances in data mining algorithms that have become commonplace inindustry.Suchalgorithmsidentifypatternsinbigdatathatarecharacterizedbyinvariableclusters of (mathematical) rules. These powerful and computer-sensitive, data-hungryalgorithms often use novel mathematics. They deal with multivariate and “dirty” data,missing data, textual or semantic data and data from different sources or with differentranges.

Page 13: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page13of49

There are many basic science laboratory databases, often publically funded, held inuniversitiesandresearchlaboratoriesaroundtheworld.Thesedatahaveoftenbeenusedonceandexistforarchivalreasonsalone.Thismassoflegacydatarepresentsanenormous,untappedresearchresource.Howcansuchheterogeneousdatabeusefullyexploited?

Following the CERNmodel, asking for scientists’ data in return for giving them access tomany other databases should be a huge incentive, especially since it will accelerate theprocessofscientificdiscoverybyincreasingtheefficiencyofdatausage.

-ComputingInfrastructure-Theproposed infrastructure for theHBP canbe seen in Figure1 and somedetails of thesitesispresentedinTable1.ThiskindofInfrastructureshouldbetakenintoaccountwhenproposingsolutionsthatthattheHBPrequirements.

Figure1: InfrastructurediagramusedbytheHBP

Name Location Type Nodes Cores Petaflops

Juqueen Germany IBM/BlueGene/Q 28,672 458,752 5.9

Jureca Germany T-Platforms V-classarchitecture

1872

45,216 2.44

Page 14: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page14of49

PizDaint Switzerland CrayXC30 5,272 42,176 7.787

MareNostrumIII

Spain IBMiDataPlex 3,056 387.136 1.1

Fermi Italy BM/BlueGene/Q 10,240 163,840 2.1

Pico Italy Custom 74 1080

Table1:SiteinformationfortheHBPInfrastrcuture

-DatamanagementPlatforms--T-Storm-T-Storm is a platform for supporting scalable real-time analytics of massive sets ofvoluminous time-series. The platform is constructed over the Apache Storm paralleldataflowengineandsupportsbothverticalscalability(fullyutilizinghigh-endserversandmulti-coresystems)andhorizontalscalability(scalingacrossaclusterofphysicalmachinesorevenincorporatingvirtualcloudresources).

-MonetDB-MonetDB pioneered column-store solutions for high-performance data warehouses forbusiness intelligence and eScience since 1993. It achieves its goal by innovations at alllayersofaDBMS,e.g.astoragemodelbasedonverticalfragmentation,modernCPU-tunedqueryexecutionarchitecture,automaticandadaptiveindices,run-timequeryoptimizationandamodularsoftware.

REQUIREMENTSDataStorage,SecurityandManagement

Identifier HBP1

Title SpatialDataManagementTechnique

Description Data generated by the brain simulator is really huge andcomplex in terms of analysis as every neuron has differentcharacteristics and connections. To analyse the data onsubcellularlevel,neuroscientistsrequiremoreefficientmodels

Page 15: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page15of49

tofetchdata.

For a similar problem, a technique of prefetching spatial datawith high accuracy based on previous queries is introduced.Theyusedthreeapproaches,thefirstbeingSCOUT forefficientprefetching, FLAT for efficient range query and TOUCH forefficientandscalablein-memoryjoins.

Previsionofchangeintime Prefetchingmoreinitialspatialdata,eventuallyspeedingupthewholeprocess.

References [Stougiannis2013][Kozloski2008]

Identifier HBP2

Title SpatialJoinbasedonnon-uniformdatasetdensities

Description Spatial joinsof twodatasetswithdifferentdensitiesaccordingtotheirdatadistributioninanefficientmannercanimprovetheperformanceof theapplicationsuchas locationof synapses inthehumanbrain.

Current strategy like TRANSFORMERS use an adaptablestrategy for thedatadistribution. In indexing, it partitions thedata, organizes it and computes connectivity by using self-spatialjoin.InJoin,itusesadaptivewalkbyrandomlyselectingtheroleofeachdatasetasguideandfollower.Itthenswapstheroleofthedatasetinruntimebasedonthedensityofdataset.

It isanefficientschemefordatasetswithnon-uniformdensityandoutperformsotherschemesthatonlyfocusesonthedatasetdensities.Itperformsbetterduetoitsadaptivenature.

Previsionofchangeintime The skewness of the datasets are characteristics of theirapplication.Thisisthefirstapproachthatutilizestheconceptofadaptability for two datasets. In Human Brain Project where

Page 16: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page16of49

each spike invokes tens of thousands synapsis, the datasetwouldbecomemoredenseandbetteradaptabletechniquescanbeused.

References [Pavlovic2016][Markram2011]

Identifier HBP3

Title Multi-formatingestion

Description HBPheavilyreliesonexternaldata.Datathatdifferinphysicallocation, storagemedium and logical format. These datamustbe sanitized, unified and be federated under a common,simplifiedplatform.

Previsionofchangeintime Data formats could vary. The management of object, file orblockstorageisachallengeforthefuture.

References [Frackowiak2016]

Identifier HBP4

Title Anonymizing

Description HBP’s data is partially derived from clinical trials. To ensurepatient’s privacy the datamust be sanitized (wipe out privateinformation) and unified before ingested into the platform.However,thismusthappenwithouttamperingtheoriginaldatagiventhattheyarepropertyoftheissuingclinics.

Previsionofchangeintime Privacy of patient’s records is a very sensitive matter, whichwillbeofmoreimportanceinthenearfuture.Thesecurityand

Page 17: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page17of49

privacyofeachpatientwillbeanimportantsocialmatter.

References [Frackowiak2016]

Identifier HBP5

Title Sharing

Description Compared to other use cases, HBP is unique in terms of datacollection.Whileothersrelyonin-housedata,theHBPexploitsthird-party data, gathered over the last decades. Tomake thecollaborationviable,eachpartnermustnotonlyshare,butalsobe receive data. That makes HBP not only a processingplatform,butasharingplatformaswell.Theplatformsshouldbe open-source and share not just data, butmethods, results,and associated publications in order to accelerate thedisseminationofbothmethodsandresults,andallowscientiststoaccessthelatesttools,insights,andresults.

Previsionofchangeintime Volumeofdata tobesharedwill increase,making theCloudapossiblesolutionforcollaboration

References [Frackowiak2016]

Identifier HBP6

Title Performance impact of Global View Resilience(GVR) withintegratedNon-volatilememories

Description Checkpointingisastandardtechniquetoovercomefailuresandanincreaseincheckpointingtimelimitsaclustersperformance.ThebottleneckcanbenearlyremovedbyNVRAMasitprovides

Page 18: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page18of49

highI/Obandwidth.

GlobalViewResilience(GVR)introducesmoreclassesoferrors.The recovery test of multi versioning shows an efficient andflexible rollback performance on HPC system i.e. Blue GeneActiveStorage(BGAS)withintegratedNVM.

This technique provides an efficient approach compared tootherconventionalcheckpointingtechniquesandimprovestheHPCsystemi.e.BlueGeneforbrainsimulation.

Previsionofchangeintime The performance can be further improved by utilizing betterhybrid mapping schemes to better tier links between storageandcomputenodes.

References [Dun][Vetter2015]

Identifier HBP7

Title Storage Class Memory (SCM) in BlueGene Supercomputer toaccelerateIO

Description Modelling, simulating and analysing data generated by largescale complex brain models requiress high I/O performanceandthisneedssometimebigcachesaswellasimprovementsoftheunderlyingSSDarrays.

TheuseofDirectStorageAccess(DSA)supportingGPFSutilizesmaximum bandwidth between an application and storagenodesandshowsbetterscalability.

StorageClassMemory(SCM)inIBMBlueGeneSupercomputercanfurtherimproveread/writeperformanceandscalability.

Previsionofchangeintime As the volume of data will increase with the computationalenvironment, the need for better performance from SSDs and

Page 19: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page19of49

fromtheapplicationlayerisrequired.

References [Schürmann2014][Strande2012]

Identifier HBP8

Title Evaluating the performance and scalability of the Cephdistributedstoragesystem

Description Thevolumeofdatagenerated fromHBPwill increase,makingthemanagementandstorageofthisdataaveryimportanttopicfor the future. The usage of (Cloud-based) commodityhardware, along with CEPH, could help researchers performexperiments, without the utilisation of supercomputers. TheusageofCEPH,ascalablesoftwaredefinedstoragesystem, fortheneedsoftheHBP,isamatterwhichshouldbeexamined.

Previsionofchangeintime The storage of this data (for further use) is important, as thedata could be used more than once for experiments andanalysis,makingthepreservationofthisdataacriticalmatter,inorder toavoid the re-mappingof thebrainandpossible re-calculationsthatmightoccur.

References [Gudu2014][Weil2006]

DataAnalysisRequirements

Identifier HBP9

Title Multiaspectqueries

Description Brain’s activity is affected by a large number of parameters,each ofwhich has both temporal and spatial impact. Hence, a

Page 20: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page20of49

multi-dimensionqueryengineisessentialtoboosttheanalysis.This enginemust satisfy criteria like huge volume processing,structured and unstructured data, along with an easilyunderstandableinterfacegiventhatthemajorityoftheinvolvedscientistsarenot(andshouldnotbe)datascientists.

Previsionofchangeintime Research on the human brain could bring differentrequirements, changing the way the query engine andinteractiveanalyticswork.

References [Alagiannis2012]

Identifier HBP10

Title MachineLearning(ML)ModelsbasedontheHumanBrain

Description Machine learning for the neuromorphic computing systems,couldbebasedon the results from the analysis of theHumanBrain.Thealgorithmthatwillbederivedbytheanalysisofthehumanbraincouldbeadjustedandfittedonamachine-learningalgorithmandeventuallyonarobot.

Previsionofchangeintime ManyRobotswill bebasedonmachine learning techniques inthenearfuture,inordertoadapttoeachsituation.Theusageofthe resultswhichwill arise from the HBP analysis could helpthe development of better algorithms forML. As robots reachtheir physical capabilities, the development of algorithms forcognitive robotics will be of vital importance for the furtherdevelopmentofthesemachines.

References [Carbajal2015][Mahadevan1996][Jung2007]

Page 21: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page21of49

LINKINGREQUIREMENTSTOESRS

HBP1 HBP2 HBP3 HBP4 HBP5 HBP6 HBP7 HBP8 HBP9 HBP10

ESR1 ╳ ╳

ESR2 ╳

ESR3

ESR4 ╳ ╳ ╳

ESR5

ESR6

ESR7

ESR8 ╳ ╳ ╳

ESR9 ╳ ╳ ╳ ╳

ESR10 ╳ ╳

ESR11 ╳ ╳ ╳ ╳

ESR12 ╳ ╳ ╳

ESR13 ╳ ╳

ESR14 ╳ ╳ ╳

ESR15 ╳ ╳

FURTHERREADINGMoregeneralinformationonthisprojectcanbefoundinthefollowingdocuments:[Golland2014][Tauheed2013][Lippert2013][Brömmel2014][Huerta1993].

Page 22: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page22of49

THESQUAREKILOMETREARRAY

DESCRIPTIONTheSquareKilometrearrayisaninternationalscientificmega-projectaimingtobuildandoperatetheworld’slargestradio-telescope.SKAantennaswillbedeployedintwolocationsinSouthAfricaandAustralia.Withatotalreceivingareaofoverasquarekilometre,thegoalis to surpass the capabilities of current instruments by an order of magnitude or more[SKA]. The continuous stream of data that will be generated by the instrument will beprocessed and stored by specially designed compute centres located near the antennas’locations.

THESCIENCEDATAPROCESSORTheScienceDataProcessorisoneofthe10mainworkpackagesoftheSKAproject.TheSDPencapsulates all the tasks required to design, provision and implement the necessarycomputing hardware, software packages and algorithms required to process theobservationdataintoscienceproductsreadytobeusedbyscientists.LongtermstorageaswellasefficientdistributionofthesescienceproductsarealsotheresponsibilityoftheSDP.

The major challenge of SDP is handling the vast amount of data that will be generatedcontinuously.Whilehardwareadvancesareexpectedtocatchupbythetimethetelescopegoeslive,softwareperformanceiscurrentlyat1/1000thoftherequiredscaling.Therefore,significantresearchandinnovationarerequiredinordertodesignatrueexascalecapablesystem.

UnderstandingthetypeofdatatobeprocessedbytheSDPisimportantinordertodesignasuitable storage architecture. SKA data is highly noisy, large in volumes and comes frommultipleobservations that contain incomplete samplesof the targetvisibility.Thismeansthat the same targetwill be processed throughmultiple iterations, creating the need fortemporary storage of intermediate science products. On the other hand, the data isinherentlyparallelizable,allowingindependentprocessingofpartialdata.Thisleadstotheideaof“computeislands”whichwillbeformedbypartiallyindependentcomputeclustersresponsibleforprocessingasubsetoftheincomingdatastream.

It isalmostcertain thatamulti-tieredapproachwillberequired,combininghighand lowperformancestorageelements.BasedontheexpecteddataandcomputationcharacteristicsweidentifyfourmainfunctionsthatthestoragestackoftheSDPwillhavetofulfil.

1. TemporaryStorageofRawData2. TemporaryStorageofIntermediateDataProducts3. TemporaryPersistenceofkeydata/programstate

Page 23: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page23of49

4. Archivingofsciencedata

Basedontheaboveweexpressasetofrequirementsformanagingtheefficientfunctionofthe multiple storage tiers. Maximizing efficiency of data movements and optimizing foreveryfunctionisthemainobjective.

REQUIREMENTSDatamanagement

Identifier SKA1

Title TemporarystorageofRawData

Description TheSDPwillingestrawdataatarateof1terabyte/swhichrequiresstoringtherawdatatemporarilytoperformnear-real-timeprocessing.

Prevision Nochanges

References [Lee1996]

Identifier SKA2

Title TemporaryStorageofIntermediateDataProducts

Description Data products will be in this tier will be written once and read many times,eitherforfurtherprocessingortomovetolowertiersforpermanentstorage.

Prevision Nochanges

References [SPD2013][SPD2015]

Identifier SKA3

Page 24: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page24of49

Title ArchivingofScienceData

Description The challenge of the archiving tier is the very long expected lifetime of theproject(>50years).

Prevision Nochanges

References [SPD2013][SPD2015]

Identifier SKA4

Title HierarchicalStorageManagement

Description Hierarchical Storage Management solutions are currently focused mainly onarchiving. A multi-tiered solution is required to efficiently manage datamovementbetweenhighandlowperformancestoragehardware.

Prevision Tiersareexpectedtochangeasnewtechnologiesbecomeavailable

References [SPD2013][SPD2015]

Dataplacement

Identifier SKA5

Title Storingrawdataintheprocessingnodes(DataLocality)

Description Readandwritepatternsaswellas localityarefairlypredictable. It is thereforepossible to move the storage elements very close to the actual processing,greatlyreducingcostlydatamovements.

Page 25: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page25of49

Prevision Nochanges

References [SPD2013][SPD2015]

Faulttolerance

Identifier SKA6

Title TemporaryStorageofKeyData/ProgramState

Description • This datawill be used for failure recovery hence itwill exhibit awriteoften/readoncepattern

• Data will be duplicated via a double buffer system that allows forpersistence.

Prevision Nochanges

References [SPD2013][SPD2015]

Identifier SKA7

Title DurabilityofData

Description Replication should be avoided at this point. An alternative could be RDDs likeabstractions.

Prevision Nochanges

References [Zaharia2012]

Page 26: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page26of49

Performanceoptimizations

Identifier SKA8

Title ReadLatencyManagement

Description Explicitprefetchingneeded.Theprefetchingspecificationcanbeeitherprovidedmanuallyorgeneratedautomaticallyusingstaticcodeanalysis.

Prevision Nochanges

References [Ibrahim2006][Han2005]

Identifier SKA9

Title Specificdatastructuresformassivelyparallelaccesses

Description Today's data-structuresdonot fully exploitmulti-coremachinesparallelism -->identifysuitablesolutionse.g.,lock-freestoresforhigh-readthroughput

Prevision Nochanges

References [Fan2013]

Page 27: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page27of49

LINKINGREQUIREMENTSTOESRS

SKA1 SKA2 SKA3 SKA4 SKA5 SKA6 SKA7 SKA8 SKA9

ESR1 ╳ ╳ ╳

ESR2 ╳ ╳

ESR3

ESR4

ESR5 ╳

ESR6 ╳

ESR7

ESR8

ESR9 ╳ ╳ ╳ ╳ ╳

ESR10

ESR11 ╳

ESR12 ╳

ESR13 ╳ ╳

ESR14 ╳

ESR15 ╳

Page 28: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page28of49

CLIMATESCIENCE

DESCRIPTIONClimate Science is the study of climate and is part of the Earth system sciences. Relatedtopics include weather forecasting, disaster prediction and climate model research. Itconsidersandstudiestheorigin,developmentandchangesofweatherpatternsoveralongperiod of time. This study aims to investigate past climate, forecast future climateconditionsandtoseewhatinfluenceithasonvariousaspectsofourlifeonEarth.

Climate change has a known impact on different components of Earth system andsubstantial effect on human health in particular [CCS]. It drastically affects on waterresources, ecosystems, food production and agriculture, industry, settlement and society,urbanlife,energyconsumptionandmanyotherimportantthingsofourhabitatandplanetin whole [SGC]. The availability of information about weather and climate conditions(temperature, winds, rains, humidity, snow, thunderstorms etc.) on a daily basis becameverysignificantforpeopleinanalysis,planninganddecisionmaking.

To predict future changes and their consequences, earth system scientists develop andmaintain climatemodelswithhigh-endsimulations for longperiodsof time.Theseusefulresearch tools help to better understand present and past observed climates, supportexperimentsundergivenboundaryconditionsofanthropogenicandnaturalforces.Unlikethesemodels,similarnumericalweatherpredictionmodelsare focusedonhavingcurrentmeteo-information about atmospheric conditions and forecasting the future state ofweatheronaday-to-daybasisthroughsimulationsforshortperiodsoftime.

Standard Earth system models have a comprehensive structure that consists of manycomponentswhichareresponsible forsimulationand treatmentofmajorparts inEarth’ssystem.Amongthemusuallyaremodelsofphysicalcomponents(atmosphere,oceans,land-surface) and biogeochemical subsystems (aerosols, atmospheric chemistry, vegetation)[Heavens 2013]. Execution of such complex applications with numerical models of theclimate system demands powerful computing infrastructure that will support it. Thus,climatescienceapplicationsarekeyusersofHPCwhichstillcontinuetoimproveanddrivescientificprogress.

Unfortunately, the progress is limited by computer capabilities. Even though today'scomputingpowerofsupercomputersisveryadvancedandstillincreases,thecomplexityofclimatemodelsalso increasesandrequiresmorecomputerresources thanbeforeorevenmore than affordable. For example, large scale experiments with help of climate modelsrequire longer simulation runs to forecast climate change during a centurywhichmeansusing more processor hours. Among other main problems in HPC while working with

Page 29: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page29of49

climate models is handling of huge amounts of data produced by simulations. Climatemodelswithhighresolutionwouldnotproducevaluableresultsastheyareconstrainedbyavailableresources:storageresources,properdatamanagementandpost-processing.

Currently, more and more climate models are being developed and improved by earthsystem scientists and computational experts together with large computing centres. It isimportanttoworkcloselywithclimatescientiststoimprovescientificapplicationsandfindthebest computationalmethods and algorithms, programming languages and techniques.All this will foster the improvement of current supercomputer systems and enhance thedevelopmentofEarthsystemmodels

There are a number of different centres and research institutes dedicated to climateresearch. Among them are German Climate Computing Center (DKRZ) and Max PlanckInstitute forMeteorology (MPI-M) inHamburg (Germany), EuropeanCentre forMedium-Range Weather Forecasts (ECMWF) in Reading (United Kingdom), BarcelonaSupercomputingCenter (BSC) inBarcelona (Spain)which investigatequestions related toclimate change, physical processes in the atmosphere and perform a significant work inclimate modelling. Their requirements are different due to the variety of analysistechniques and objectives. For example, ECMWF uses a significant historical archival toreplayandupdatethemodels,whilehavingtightdeadlinestoprocessforecasts.Inthenextsection, a list of requirements regarding the computing infrastructure captured fromvariousmodelsandcentrescanbefound.Theywerediscussedtogetherwithscientistfromthe climate modelling community and are outlined here as the most important onestowardsobtainingmeaningful,credibleandusefulscientificresults.

We identified multiple enabling requirements for climate science data storage andprocessing,whichwedivideintosevencategories:

• I/O requirements enabling high-performance, parallel access to the data [CLM1,CLM2]

• Datamanagementrequirementsnecessarytoensurethatboththelargevolumesofsourceandresultdataremainaccessibleindefinitelyforreprocessingorfurtherprocessing[CLM3,CLM5]

• Dataprocessing requirementsnecessarytoensuretheneedforhighspatialandtemporal resolution in order to improve the fidelity of climate models or thepredictedneweraofclimateforecasting[CLM4]

• Datastandardsandinteroperabilityrequirementsguaranteeingthatthestoreddataremainsusableandeasilylinkabletonewexperimentsifnecessary,thankstowell-defineddata/metadatastandardsanddataqualityassessment[CLM6,CLM7]

• Climatedataacquisitiontodataaccessrequirementstoensurethatmeaningfuldatacanbeusedandsharedefficientlyamongstscientistsand institutionsaroundtheworld[CLM8]

Page 30: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page30of49

• Software infrastructurerequirementsallowingthesourcecodeusedtoperformexperimentstoremainaccessible,usable,andmaintainableandovertime[CLM9]

• Social requirements to foster trans-domain collaboration between climatescientistsandsoftwareengineers[CLM10]

REQUIREMENTSI/Orequirements

Identifier CLM1

Title ParallelI/O

Description Serialreadingandwritingenormousamountofdataformostoftoday'sclimatescience applications is a serious bottleneck during simulation performance onsupercomputers. Sequential I/O usage has a great affect on their overallperformance. Adopting parallel I/O is the solution to reduce or remove thisbottleneck with computational costs. Parallelization in this case providessimultaneousaccessforeachprocessinapplicationeithertoonefileortomanyfiles.

Prevision I/Orequirementsincrease(bothforthroughputandlatency)

References [Dennis2011][Kuhn2013][Henderson1994][Fu2010]

Identifier CLM2

Title Paralleldistributedfilesystemsexploitation

Description Parallel I/O is an essential requirement to leverage the heavily parallelcomputationcapabilitiesofmoderncomputingplatforms,bothonHPCplatformsand modern clouds. In contrast, sequential I/O would lack the requiredscalabilityformostclimateapplicationstodayontheseplatforms.

Prevision ExploitationofparallelDFSisinevitableinordertouseefficientlyparallelI/O

Page 31: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page31of49

References [Dennis2011][Huang2014][Kuhn2013]

Datamanagementrequirements

Identifier CLM3

Title Longtermarchivingofclimatedata

Description Collectingasmuchdataaspossibleiskeytoimprovethelikelihoodofachievinga scientific discovery of climate change. Long-term archiving entails easy,affordable and timely access for a large number of scientists spanning manydifferent fields, while additionally enabling reprocessing large data sets asunderstanding of sensor performance, algorithms and earth science improves.Examples of new information that would warrant long-term archival for datareprocessingaredetectionofsensorcalibrationdrift,theavailabilityofancillarydata sets, better climate models, or simply errors in previous processing.Additionally,thedatamusttostayreadableindefinitely,specificallybyensuringthatitdoesnotgetcorruptedorlostovertime.

Prevision Volume(numberofarchives)willincrease

References [Luthardt2015][Cash2015]

Identifier CLM4

Title Exabyte-scaledatastoragecapacity

Description In climatology, there are huge volumes of data from observation and climatemodel simulations. These data are critical for understanding the climatemechanisms and predicting the future climate change. Such data intensiveexperimentsareexpectedtogenerateexabytesofdataoverthenext5-10years,which must be transferred, visualized, and analysed by geographicallydistributedteamsofresearchers.Theamountofdatacollectedandproducedisexpandingatastaggeringrate,andprojectedtoexceedhundredsofexabytesby

Page 32: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page32of49

2020.

Prevision Datavolumewillinevitablyincreasewithahighspeedandclimateapplicationswilldemandmorestoragespaceforit.

References [Dart2007][Fernández-Quiruelas2011][André2012]

Identifier CLM5

Title Capacityoverperformance

Description Such archival is necessary for producing high-quality output and fostersscientificadvancesinthefieldofclimatology.Usingcost-effective,highly-reliableand resilient storage systems ismandatory to achieve such long-term archivalusing awide variety ofmediums such as spinning disks ormagnetic tapes. Incomparison, high-performance storage solutions based on flash storage tradebothprice, volumeandresilience forperformance.Asdataaccess times is lessimportant than the available volume of data for long-term archives, high-capacityarchivesmustbeused,andsystemsmustprioritizehighcapacitiesoverhighperformance.

Prevision Storagecapacitywillbeinhighdemand

References [Luthardt2015][Jewell2014]

DatastandardsandInteroperability

Identifier CLM6

Title Dataandmetadatastandardization

Description Inordertoenableeasilylinkingandcomparingcurrentdatawitharchiveddatawhenrunningnewexperiments, thedatamustbestructuredwithwell-defined

Page 33: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page33of49

standards which are used in climate science andmore generally current HPCapplications. Examples of such standards are NetCDF and HDF. Suchstandardizationmustbeappliedtobothdataandmetadata.

Prevision Allproduceddataandmetadatasetswill fallundergeneralstandardotherwisedifferentformatswillappear

References [CCDF][FDSD][Luthardt2015]

Identifier CLM7

Title Dataqualityassessmentandcuration

Description Datastorageisconfinedtosimplykeepingdatainexistenceandensuringthatitcanbeaccessedwhenneeded.Assuch,datacurationisalsonecessarytoentailpracticesofrefreshmentorformatmigration(essentialtomaintainingthedatainausable formforre-runningexperimentsor linkingnewoneswitharchiveddata)andtocallforhigher-levelcuratorialpracticessuchasenhancementofthedatathroughaddedmetadata,ormigrationfromonerepresentationalstandardtoanother.

Prevision Strengtheneddataqualityrequirements

References [Luthardt2015]

ClimateDataaccessrequirements

Identifier CLM8

Title Scientificdataaccessibilityanddissemination

Description Producedand collectedvolumesof climatedata are veryuseful and importantfor the scientific community in further research projects. Restricted access to

Page 34: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page34of49

datawhichhavenotbeenevaluatedthroughqualitycontrolprocedurewillonlyreduce the velocity of progress and benefits of climate change researchwork.Thus, it is required that valuable climate datasets must have digital objectidentifier (DOI), shouldbestored inanddisseminated throughdomain-specificrepositories where researchers can download them via the Internet free ofcharge.Everyrepositorymustbeassessedandapprovedtoensuredigitaldataproducers that their published information is safe and properly managed.Althoughsomerepositoriesmightchargeforscientificdatahosting,allmaterialshavetobefreelyavailableforsearchandretrievinginnon-commercial,researchandeducationalpurposes.

However,userswhowanttoexploitclimatedatasetsinresearchworkmightbeasked to apply (optionally andwithout any costs) for an access permission. Itwillhelptovalidatethepurposesofdatausageandtoseestatisticsonhowtheyservethescientificcommunity.Users(forexample,researchers)intheirturnareobliged to make proper data citation and acknowledgements in their papers,articlesetc.

Prevision Scientificdatarestrictionselimination

References [Luthardt2015][RDR][DSA][IS-ENESD6.1][Overpeck2011]

Softwareinfrastructurerequirements

Identifier CLM9

Title Modelcodecompatibility

Description Inordertocopewithever-evolvingtools,libraries,architecturesandhardware,thecodeusedforscientificexperimentsinclimatesciencemustbearchivedwiththe data, and kept backwards compatible to guarantee that even old codewillremain runnable on new machines. Consequently, the code should rely onstandard tools rather than platform-specific libraries (such as CUDA). Thisenablesdatareprocessingatalaterstageregardlessoftheplatformchangesthatcouldhaveoccurredsinceithasbeenfirstdeveloped.

Page 35: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page35of49

Prevision Existingcodeshouldrunonnewhardwarewithrelativelylittleeffortotherwisecoderefactoringwillbeneeded

References [Intel][Kindratenko2009][SED]

Socialrequirements

Identifier CLM10

Title Scientificcommunicationandcollaborationwithclimatecommunity

Description Leveraging the latest advances in computer science is not an easy task forclimatescientists,asthedevelopedsolutionsarerarelyusableout-of-the-boxforclimate science, consequently leading to slow or limited adoption of HPCinnovations by the climate community. Similarly, the limited knowledge ofcomputerscientistsonclimatesciencesignificantly limitsunderstandingof therequirementsandconstraintsofclimateresearch.

In order to efficiently employ all advances in computer science for climateresearch, and to enable purpose-designed applications to evolve along withlatestmodelfindingfromclimatescientists, it isofcrucialimportancetocreatebridges between these two communities. This can take the form of trainings,active and continuous collaboration, as well as federated access to data andmodelsacrossthetwocommunities.

Prevision Collaboration between computer and climate scientists will be necessary inordertomakenextgenerationofHPCusefulforEarthsciencemodelling

References [Mitchell2012][Washington2009]

Page 36: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page36of49

LINKINGREQUIREMENTSTOESRS

CLM1 CLM2

CLM3 CLM4 CLM5 CLM6 CLM7 CLM8 CLM9 CLM10

ESR1 ╳ ╳ ╳ ╳ ╳

ESR2

ESR3 ╳ ╳ ╳ ╳

ESR4 ╳

ESR5

ESR6

ESR7 ╳ ╳ ╳ ╳

ESR8 ╳

ESR9 ╳ ╳ ╳ ╳

ESR10

ESR11

ESR12 ╳ ╳

ESR13

ESR14 ╳ ╳ ╳ ╳ ╳

ESR15

Page 37: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page37of49

SMARTCITIES

DESCRIPTIONThe proliferation of small sensors and devices around us capable of generating valuableinformationhascreatedanewparadigmofcomputingknownasInternetofThings.Oneofthemostimportantconceptsofthisparadigmisthepossibilityofintegratingandanalysingallofthisdatainputtomakesmartdecisionsinfieldslikehealthcare,trafficmanagement,waterquality,airpollutionandmanymore[Gubbi2013].

SmartCitiesareabletotakeadvantageoftheseIoTnetworkstoimprovecitizen’slife.Itcanbringbenefitstopublictransportation,garbagemanagement,parkingoreducationtonamesome examples [Zanella 2014]. The objective is to be able to gather data from differentsensors locatedaround the SmartCity infrastructure andanalyse it either at real timeoroffline. The information gathered can be used to improve services, take decisions or bepublishedasopendataforthecitizens.

However,buildingthiskindofinfrastructureposesnewchallengesinmanydifferentlevels.Firstthedatahasimportantcontextual informationlikespatialortemporalthathastobeconsideredat themomentof storing it.Therearealsoprivacyandsecurity concerns thatcan be raised to protect the privacy of citizens. Also the speed and volume of the datagenerated has to be considered. Butmost important is the heterogeneity of the differentsensors and systems involved in this kind of infrastructure. For example, thecommunication protocol or the architecture of the sensors/devices can be differentdependingon its typecreating thenecessityof integratingallof thedifferentsourcesandstoringthedatainourfrontend.

MeasurementsanddatacomingfromIoTdevicesarenotonlyprocessedinthecloud,sincethe infrastructure and processing capabilities can be insufficient. The needs of, e.g.,geographicaldistributionof resources, real-timecommunication, incorporationwith largenetworksarehandledbyfogcomputing.Throughthisparadigm,partofprocessingisdoneby edgedevices or clouds closer to data sources, resulting in less latency andbandwidthusage[Vaquero2014].

Lastlythehugeamountofsensorsanddevicesinvolvedmakesitdifficulttodetectfailuresand problems in the network, like disconnected wires, high and abnormal energyconsumptionorwronglyconfigureddevices.ThedetectionandrootcauseanalysisoftheseeventshavetobeconsideredtooperateaSmartCityeffectively.

Page 38: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page38of49

Internet of Things and Smart Cities is a novel area that is attracting lots of interest andcreatingnew research challenges.Herewe try todetect the requirementsneededby thiskindofnetworksandlinkingthemtothedifferentBigStorageobjectives.

REQUIREMENTSAnalysis/computationrequirements

Identifier SCT1

Title Streamanalysis

Description Data could be analysed in real time to monitor different aspects of the city (Environment,traffic…)

Prevision Streamsofdataiscloselyrelatedtotheusecase

References [Kitchin2014]

Identifier SCT2

Title Spatialandtemporaldata

Description The nature of the data generated through sensors has embedded spatial andtemporaldata(e.g.Whenwasthemeasuregeneratedandwhere?)

Prevision Nochanges

References [Gubbi2013]

Identifier SCT3

Title Openandaccessibledata

Page 39: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page39of49

Description This huge amounts of data have to be open and/or accessible for its use. Thisalsobringsprivacyandsecuritychallenges

Prevision Nochanges

References [ODBDW]

Identifier SCT4

Title Batchprocessingandlearningfromdata

Description In addition to real-time data processing huge amounts of data can be alsoanalysedoff-line(optimisingpublictransportroutes,etc.).

Prevision Nochangesexpected

References [Zanella2014]

Storagerequirements

Identifier SCT5

Title Storageinrealtime

Description Multiplesensorsgeneratedatawithhighvelocitythathastobestoredalmostinrealtime

Prevision Nochangesexpected

References [Kitchin2014]

Page 40: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page40of49

Identifier SCT6

Title Replicatedstoragesystem

Description Dependabilityvisprovisionofreplicatedstorage

Prevision Nochangesexpected

References [Nastic2014]

Infrastructurerequirements

Identifier SCT7

Title Heterogeneousenvironment

Description The architecture of a Smart City involves connecting heterogeneousenvironments with different protocols and technologies (sensors, storagesystem,backend,frontend...)

Prevision Nochangesexpected

References [Zanella2014]

Identifier SCT8

Title Datalocality

Description Itisnotnecessarytosendalldataaroundtheworld,butratherprocessitlocallyandsendaggregates.

Page 41: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page41of49

Prevision Nochangesexpected

References [Vaquero2014]

Identifier SCT9

Title FaultdetectionsystemforIoTsystem

Description Detect wrongly configured devices, disconnected wires, explain accuratelyoccurrencesofcombinedfaults.Detectandexplainhighenergyconsumption

Prevision Nochangesexpected

References [Niggemann2016][Lazarova-Molnar2016]

Identifier SCT10

Title Scalablesystem

Description Ithastobescalable(abletoaddnewsensorsandinputsources), includingtheabilitytoingestnewdatawithastructurethatisnotknowninadvance.

Prevision Nochangesexpected

References [Zanella2014]

Page 42: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page42of49

LINKINGREQUIREMENTSTOESRS

SCT1 SCT2 SCT3 SCT4 SCT5 SCT6 SCT7 SCT8 SCT9 SCT10

ESR1 ╳ ╳ ╳ ╳ ╳ ╳ ╳ ╳

ESR2 ╳ ╳ ╳ ╳

ESR3

ESR4 ╳

ESR5

ESR6

ESR7

ESR8 ╳

ESR9 ╳ ╳ ╳ ╳ ╳ ╳

ESR10

ESR11

ESR12 ╳ ╳ ╳ ╳

ESR13 ╳ ╳

ESR14 ╳

ESR15 ╳ ╳ ╳

Page 43: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page43of49

CONCLUSIONInthisdeliverable,wehavepresentedalistofrequirementsfromfourkeyresearchprojectsin Europe: Human Brain Project, Square Kilometre Array, Climate modelling, and SmartCities.WehavealsoidentifiedwhatESR’sresearchhasmorepotential inaffectingeachofthe requirements. This information will help ESRs to find good mechanisms to test theprogressoftheirworkinarealproblem.

Page 44: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page44of49

REFERENCES[Alagiannis 2012] Alagiannis, I., Borovica, R., Branco,M., Idreos, S., & Ailamaki, A. (2012,May).NoDB: efficient query executionon rawdata files. InProceedings of the2012ACMSIGMOD International Conference on Management of Data (pp. 241-252). ACM.https://www.researchgate.net/publication/254006772_NoDB_Efficient_Query_Execution_on_Raw_Data_Files

[André 2012] Jean-Claude André, Two remarks on future developments of climatesimulation, with strong impact on computing and data processing,http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/BDEC-Climate.Forecasting-JCA-15.01.12.pdf

[Brömmel2014]Brömmel,D.,Suarez,E.,Orth,B.,Graf,S.,Detert,U.,Pleiter,D.,...&Lippert,T. (2014). Paving the road towards pre-exascale supercomputing. InNICSymposium2014(No.FZJ-2014-01327).JülichSupercomputingCenter.

[Carbajal2015]Carbajal,J.P.,Dambre,J.,Hermans,M.,&Schrauwen,B.(2015).Memristormodels for machine learning. Neural computation.http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00694#.V99_3Ft9672

[Cash2015]BenCash,ClimateModelingandBigData:CurrentChallengesandProspectsfortheFuture,http://cra.org/wp-content/uploads/2015/08/Cash.pdf

[CCDF] Common climate data formats: overviewhttps://climatedataguide.ucar.edu/climate-data-tools-and-analysis/common-climate-data-formats-overview

[CCS] Fact sheet: Climate change science - the status of climate change science today,https://unfccc.int/files/press/backgrounders/application/pdf/press_factsh_science.pdf

[Dart2007]E.DartandB.Tierney(editors),"BESScienceNetworkRequirements",ReportoftheBasicEnergySciencesNetworkRequirementsWorkshopsponsoredbyBasicEnergySciencesProgramOffice,DOEOfficeofScienceandtheEnergySciencesNetwork2007.

[Dennis 2011] Dennis, John M., et al. "An application-level parallel I/O library for Earthsystemmodels."InternationalJournalofHighPerformanceComputingApplications(2011):1094342011428143.

[DSA] The Data Seal of Approvalhttp://www.datasealofapproval.org/en/information/about/

Page 45: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page45of49

[Dun]NanDun,DirkPlieter,AimanFang,Nicolas,Andrew,“MultiVersioningPerformanceOpportunities inBGASSystemforResilience”,UniversityofChicago- JSC, Julich,germany,”https://sites.google.com/site/uchicagolssg/lssg/research/gvr”

[Fan2013]BinFan,DavidG.Andersen,andMichaelKaminsky.2013.MemC3:compactandconcurrent MemCache with dumber caching and smarter hashing. In Proceedings of the10thUSENIXconferenceonNetworkedSystemsDesignandImplementation(NSDI'13).

[FDSD] Formats for delivery of scientific/environmental data, https://geo-ide.noaa.gov/wiki/index.php?title=Formats_for_delivery_of_scientific/environmental_data

[Fernández-Quiruelas2011]Fernández-Quiruelas,V.,Fernández, J.,Cofiño,A.S.,Fita,L.,&Gutiérrez, J. M. (2011). Benefits and requirements of grid computing for climateapplications.Anexamplewiththecommunityatmosphericmodel.EnvironmentalModelling&Software,26(9),1057-1069.

[Frackowiak 2016] Frackowiak, R., Ailamaki, A., & Kherif, F. (2016). Federating andIntegrating What We Know About the Brain at All Scales: Computer Science Meets theClinical Neurosciences. In Micro-, Meso-andMacro-Dynamics of the Brain (pp. 157-170).Springer International Publishing. link.springer.com/content/pdf/10.1007%2F978-3-319-28802-4_10.pdf

[Fu2010]Fu,J.,Liu,N.,Sahni,O.,Jansen,K.E.,Shephard,M.S.,andCarothers,C.D.:ScalableparallelI/Oalternativesformassivelyparallelpartitionedsolversystems,in:ProceedingsoftheWorkshoponLarge-ScaleParallelProcessinginconjunctionwiththeIEEEInternationalParallelandDistributedProcessingSymposium,1–8,Atlanta,Georgia,USA,2010.

[Golland2014]Golland, P., Gallant, J.,Hager, G., Pfister,H., Papadimitriou, C., Schaal, S.,&Vogelstein, J. T. (2014). ANewAge of Computing and theBrain:Report of the CCCBrainWorkshop.

[Gubbi2013].Gubbi, Jayavardhana, et al. "InternetofThings (IoT):Avision, architecturalelements, and future directions." Future Generation Computer Systems29.7 (2013): 1645-1660.

[Gudu2014]Gudu,D.,Hardt,M.,&Streit,A. (2014,October).Evaluating theperformanceand scalability of the ceph distributed storage system. In Big Data (Big Data), 2014 IEEEInternational Conference on (pp. 177-182). IEEE.http://ieeexplore.ieee.org/document/7004229/

[Han2005]W.Han,K.Whang,andY.Moon,“Aformalframeworkforprefetchingbasedonthetype-levelaccesspatterninobject-relationalDBMSs,”IEEETrans.KnowledgeDataEng.,vol.17,no.10,pp.1436–1448,2005.

Page 46: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page46of49

[Heavens2013]Heavens,N.G.,Ward,D.S.&Natalie,M.M.(2013)StudyingandProjectingClimateChangewithEarthSystemModels.NatureEducationKnowledge4(5):4

[Henderson1994]Henderson,M.,Nickless,B.,&Stevens,R. (1994,May).Ascalablehigh-performance I/O system. In Scalable High-Performance Computing Conference, 1994.,Proceedingsofthe(pp.79-86).IEEE.

[Huang2014]Huang,X.M.,Wang,W.C.,Fu,H.H.,Yang,G.W.,Wang,B.,&Zhang,C.(2014).A fast input/output library for high-resolution climate models. Geoscientific ModelDevelopment,7(1),93-103.

[Huerta1993]Huerta,M.F.,Koslow,S.H.,&Leshner,A.I.(1993).Thehumanbrainproject:aninternationalresource.Trendsinneurosciences,16(11),436-438.

[Ibrahim 2006] A. Ibrahim andW. Cook, “Automatic prefetching by traversal profiling inobjectpersistencearchitectures,”inProceedingsofthe20thEuropeanConferenceonObject-OrientedProgramming,ser.ECOOP2006.Springer-Verlag,2006,pp.50–73.

[Intel] Intel® Compiler for Linux*: Compatibility with GNU compilers,https://software.intel.com/sites/products/collateral/hpc/compilers/intel_linux_compiler_compatibility_with_gnu_compilers.pdf

[IS-ENESD6.1] IS-ENES2 deliverable (D-N: 6.1) Report on access rights for CMIP5 andCORDEX for commercial use, https://verc.enes.org/ISENES2/documents/deliverables/is-enes2_d6-1_report-on-access-rights-for-cmip5-and-cordex-for-commercial-use

[Jewell 2014] D. Jewell, R. D. Barros, S. Diederichs, L. M. Duijvestijn, M. Hammersley, A.Hazra,C.Holban,Y.Li,O.Osaigbovo,A.Plach, I.Portilla,M.Saptarshi,H.P.Seera,E.Stahl,andC. Zolotow. (2014, Jan.). Performance and capacity implications forbigdata. Int.Bus.Mach..[Online].Available:http://www.redbooks.ibm.com/redpapers/pdfs/redp5070.pdf

[Jung2007]Jung,Y.,Choi,Y.,Park,H.,Shin,W.,&Myaeng,S.H.(2007,August).Integratingrobot taskscriptswitha cognitivearchitecture for cognitivehuman-robot interactions. In2007 IEEE InternationalConferenceon InformationReuseand Integration (pp.152-157).IEEE.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.700.9117&rep=rep1&type=pdf

[Kindratenko 2009] GPU Clusters for High-Performance Computinghttp://www.ncsa.illinois.edu/People/kindr/papers/ppac09_paper.pdf

[Kitchin2014]Kitchin,Rob."Thereal-timecity?Bigdataandsmarturbanism."GeoJournal79.1(2014):1-14.

Page 47: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page47of49

[Kozloski 2008] Kozloski, James, et al. "Identifying, tabulating, and analyzing contactsbetweenbranchedneuronmorphologies."IBMJournalofResearchandDevelopment52.1.2(2008):43-55.

[Kuhn2013]A Semantics-Aware I/O Interface forHighPerformanceComputing (MichaelKuhn), In Supercomputing, Lecture Notes in Computer Science (7905), pp. 408–421,(Editors: Julian Martin Kunkel, Thomas Ludwig, Hans Werner Meuer), Springer (Berlin,Heidelberg), ISC 2013, Leipzig, Germany, ISBN: 978-3-642-38749-4, ISSN: 0302-9743,2013-06

[Lazarova-Molnar2016]Lazarova-Molnar,S.,Shaker,H.R.,&Mohamed,N.(2016,March).Faultdetectionanddiagnosisforsmartbuildings:Stateoftheart,trendsandchallenges.In2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC) (pp. 1-7).IEEE.

[Lee1996]C.Lee,C.KesselmanandS.Schwab,“Near-real-timesatelliteimageprocessing:MetacomputinginCC++”,IEEEComput.Graph.Appl.,vol.16,no.4,pp.79-84,1996.

[Lippert 2013] Lippert, T., & Orth, B. (2013, July). Supercomputing Infrastructure forSimulations of the Human Brain. In InternationalWorkshop on Brain-Inspired Computing(pp.198-212).SpringerInternationalPublishing.

[Luthardt 2015] Hans Luthardt, Hans Ramthun, Frank Toussaint, Long Term ArchiveHandbook, DKRZ – Department Data Management, Version 1.1, 2015https://www.dkrz.de/daten-en/data-services/long_term_archiving/DKRZ-lta-handbook_rev_1.1.pdf?lang=en

[Mahadevan 1996]Mahadevan, S. (1996).Machine learning for robots: A comparison ofdifferentparadigms.InProceedingsoftheWorkshoponTowardsRealAutonomy,IEEE/RSJInternaltional Conference on Intelligent Robots and Systems (IROS’96).http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.5162&rep=rep1&type=pdf

[Markram2011]H.Markram,K.Meier, S.Grillner,R. Frackowiak, S.Dehaene,A.Knoll,H.Sompolinsky, K. Verstreken, J. DeFelipe, S. Grant, and J.-P. Changeux, “Introducing theHuman Brain Project,” vol. 7, no. 0, 2011, fET’ 11http://www.sciencedirect.com/science/article/pii/S1877050911006806

[Mitchell 2012]Mitchell, J., Budich,R., Joussaume, S., Lawrence, B.,&Marotzke, J. (2012).InfrastructurestrategyfortheEuropeanEarthsystemmodellingcommunity2012–2022.

[Nastic 2014] Nastic, S., Sehic, S., Le, D. H., Truong, H. L., & Dustdar, S. (2014, August).Provisioning software-defined iot cloud systems. In Future Internet of Things and Cloud(FiCloud),2014InternationalConferenceon(pp.288-295).IEEE.

Page 48: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page48of49

[Niggemann 2016] Niggemann, Oliver, et al. "Data-Driven Monitoring of Cyber-PhysicalSystemsLeveragingonBigDataandtheInternet-of-ThingsforDiagnosisandControl."

[ODBDW] http://www.icsu.org/science-international/accord/open-data-in-a-big-data-world-short

[Overpeck2011]Overpeck, J. T.,Meehl,G.A., Bony, S.,&Easterling,D.R. (2011). Climatedatachallengesinthe21stcentury.science,331(6018),700-702.

[Pavlovic2016]Pavlovic,M.,Heinis,T.,Tauheed,F.,Karras,P.,&Ailamaki,A.(2016,May).TRANSFORMERS:Robustspatialjoinsonnon-uniformdatadistributions.In2016IEEE32ndInternational Conference on Data Engineering (ICDE) (pp. 673-684). IEEE.https://spiral.imperial.ac.uk/bitstream/10044/1/30595/2/transformers_compressed.pdf

[RDR] Recommended Data Repositorieshttp://www.nature.com/sdata/policies/repositories

[Schürmann 2014] Schürmann, Felix, et al. "Rebasing i/o for scientific computing:Leveraging storage class memory in an ibm bluegene/q supercomputer." InternationalSupercomputingConference.SpringerInternationalPublishing,2014.

[SED] Software Engineering and Code Development for HPC Applicationshttp://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.126.8459&rep=rep1&type=pdf

[SGC]StatementofGuidance forClimate (otheraspects -CCl) -Ananalysisof currentandemergingcapacitygapsinsurfaceandupperairobservationstosupportclimateactivities.http://www.wmo.int/pages/prog/www/OSY/SOG/SoG-Climate-CCl.doc

[SKA]http://www.skatelescope.org

[SDP2013] PaulAlexander, Chris Broekema, SimonRatcliffe, RosieBolton, BojanNikolic.“SDP Element Concept”, SKA Project Document number SDP-PROP-DR-001-1https://www.skatelescope.org/wp-content/uploads/2013/09/SDP-PROP-DR-001-1_ElemConc.pdf

[SDP 2015] P.C. Broekemaa, R.V. van Nieuwpoortb and H.E. Balc. “The Square KilometreArray Science Data Processor. Preliminary compute platform design”. Journal ofInstrumentation, Volume 10, Issue 07, article id. C07004 (2015).http://dx.doi.org/10.1088/1748-0221/10/07/C07004

[Stougiannis2013]Stougiannis,M.Pavlovic,etal.2013.Data-drivenneuroscience:enablingbreakthroughs via innovative datamanagement. InProceedings of the 2013ACMSIGMODInternationalConferenceonManagementofData(SIGMOD'13).ACM

Page 49: BigStorage: MSCA-ITN-2014-ETN-642963 Deliverable number …bigstorage-project.eu/images/Deliverables/BigStorage-D1... · 2016-10-24 · BigStorage: MSCA-ITN-2014-ETN-642963 Storage-based

MSCA-ITN-2014-ETN-642963

D1.1:Requirementdescription Page49of49

[Strande 2012] Strande, ShawnM., et al. "Gordon: design, performance, and experiencesdeploying and supporting a data intensive supercomputer." Proceedings of the 1stConferenceoftheExtremeScienceandEngineeringDiscoveryEnvironment:BridgingfromtheeXtremetothecampusandbeyond.ACM,2012.

[Tauheed 2013] Tauheed, F., Nobari, S., Biveinis, L., Heinis, T., & Ailamaki, A. (2013,September). Computational neuroscience breakthroughs through innovative datamanagement. In East European Conference on Advances in Databases and InformationSystems(pp.14-27).SpringerBerlinHeidelberg.

[Vaquero 2014]. L. M. Vaquero and L. Rodero-Merino, "Finding your Way in the Fog:Towards a Comprehensive Definition of Fog Computing," ACM SIGCOMM ComputerCommunicationReview,vol.44,no.5,pp.27-32,October2014.

[Vetter 2015]Vetter, Jeffrey S., and SparshMittal. "Opportunities for nonvolatilememorysystems in extreme-scale high-performance computing." Computing in Science &Engineering17.2(2015):73-82.

[Washington 2009] Washington, W. M., Buja, L., & Craig, A. (2009). The computationalfuture for climate and Earth system models: on the path to petaflop and beyond.Philosophical Transactions of the Royal Society of London A:Mathematical, Physical andEngineeringSciences,367(1890),833-846.

[Weil 2006] Weil, S. A., Brandt, S. A., Miller, E. L., Long, D. D., & Maltzahn, C. (2006,November). Ceph: A scalable, high-performance distributed file system. InProceedings ofthe 7th symposium on Operating systems design and implementation (pp. 307-320).USENIXAssociation.

[Zaharia2012]MateiZaharia,MosharafChowdhury,TathagataDas,AnkurDave,JustinMa,Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilientdistributed datasets: a fault-tolerant abstraction for in-memory cluster computing.(NSDI'12).

[Zanella2014].Zanella,Andrea,etal. "Internetof things forsmartcities."IEEEInternetofThingsJournal1.1(2014):22-32.