How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for...
Transcript of How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for...
HowtheINDIGO-DataCloudcomputingplatformaims
athelpingscientificcommunities
RIA-653549Giacinto DONVITO
INDIGOTechnicalDirectorINFNBari
INDIGO-DataCloud
• AnH2020projectapprovedinJanuary2015intheEINFRA-1-2014call• 11.1M€,30months (fromApril2015toSeptember2017)
• Who:26Europeanpartnersin11Europeancountries• CoordinationbytheItalianNationalInstituteforNuclearPhysics(INFN)• Includingdevelopersofdistributedsoftware,industrialpartners,researchinstitutes,universities,e-infrastructures
• What:developanopensourceCloudplatform forcomputinganddata(“DataCloud”)tailoredtoscience.
• For:multi-disciplinaryscientificcommunities• E.g.structuralbiology, earthscience,physics,bioinformatics, culturalheritage,astrophysics,lifescience,climatology
• Where:deployableonhybrid(publicorprivate)Cloudinfrastructures• INDIGO=INtegratingDistributeddataInfrastructuresforGlobalExplOitation
• Why:answertothetechnologicalneedsofscientistsseekingtoeasilyexploitdistributedCloud/Gridcomputeanddataresources. 2
FromthePaper“AdvancesinCloud”
• ECExpertGroupReportonCloudComputing,http://cordis.europa.eu/fp7/ict/ssai/docs/future-cc-2may-finalreport-experts.pdf
To reach the full promises of CLOUD computing, major aspects have not yet beendeveloped and realised and in some cases not even researched. Prominent among theseare open interoperation across (proprietary) CLOUD solutions at IaaS, PaaS and SaaSlevels. A second issue is managing multitenancy at large scale and in heterogeneousenvironments. A third is dynamic and seamless elasticity from in- house CLOUD to publicCLOUDs for unusual (scale, complexity) and/or infrequent requirements. A fourth is datamanagement in a CLOUD environment: bandwidth may not permit shipping data to theCLOUD environment and there are many associated legal problems concerning securityand privacy. All these challenges are opportunities towards a more powerful CLOUDecosystem.[…] A major opportunity for Europe involves finding a SaaS interoperable solution acrossmultiple CLOUD platforms. Another lies in migrating legacy applications without losingthe benefits of the CLOUD, i.e. exploiting the main characteristics, such as elasticity etc.
3
INDIGOAddressesCloudGaps
• INDIGOfocusesonusecasespresentedbyitsscientificcommunities toaddressthegapsidentifiedbythepreviouslymentionedECReport,withregardto:• Redundancy/reliability• Scalability(elasticity)• Resourceutilization• Multi-tenancyissues• Lock-in• MovingtotheCloud• Datachallenges:streaming,multimedia,bigdata• Performance
• Reusingexistingopensourcecomponentswhereverpossibleandcontributingtoupstreamprojects (suchasOpenStack,OpenNebula,Galaxy,etc.)forsustainability.
4IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
INDIGOandotherEuropeanProjects• TheINDIGOservicesarebeingdevelopedaccordingtotherequirementscollectedwithinmanymultidisciplinaryscientificcommunities,suchasELIXIR,WeNMR,INSTRUCT,EGI-FedCloud,DARIAH,INAF-LBT,CMCC-ENES,INAF-CTA,LifeWatch-Algae-Bloom,EMSO-MOIST,EuroBioImaging.However,theyareimplementedsothattheycanbeeasilyreusedbyotherusercommunities.• INDIGOhasstrongrelationshipswithcomplementaryinitiatives,suchasEGI-EngageontheoperationalsideandAARCwithrespecttoAuthN/AuthZ policies.UsersofEC-fundedinitiativessuchasPRACE andEUDAT arealsoexpectedtobenefitfromthedeploymentofINDIGOcomponentsinsuchinfrastructures.• SeveralNational/Regionalinfrastructuresarecoveredbythe26INDIGOpartners,locatedin11Europeancountries.• INDIGOismentionedintherecentImportantProjectofCommonEuropeanInterest(IPCEI) fortheexploitationofHPCandHTCresourcesatnational,regionalandEuropeanlevels.
5IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
WorkPackages
6IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
INDIGO-DataCloudGeneralArchitecture
7
JSAGA/JSAGAAdaptorsFuture GatewayEngineFuture GatewayRESTAPI
OtherScienceGateways
Mobile Apps
OpenMobileToolkit
Ophidpiaplugin
LONIplugin
Taverna,Keplerplugin
AdminPortlets
UserPortlets
DataAnalitics
WorkflowPortlets
SGMonGUIClients
FutureGatewayPortal WorkflowsMobileclientsSupportservices
WP6Services
Kubernetes Cluster
IAM
Service
PaaS
Orchestrator
QoS/SLA
CloudProvider
Ranker
Monitoring
Infrastructure
Manager
TOSCA
TOSCAWP5
Services
Onedata Dynafed
FTSDataServices
REST/CDMI/Wedbav/posix/GridftpOIDC
Accounting
Non-INDIGO
IaaS
NativeIaaS API
Heat/IM
TOSCA
WP4Services
Mesos
ClusterMesos
Cluster
Aut.Scaling
Service
Storage
Service
S3/CDMI/Posix/WebdavGridFTP
Smart
Scheduling
SpotIstances
Native
Docker
QoS Support
Identity
Armonization
Local
Repository
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
IaaSFeatures(1)
• Improvedschedulingforallocationofresources bypopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• Enhancementswilladdressbothbetterschedulingalgorithmsandsupportforspot-instances.Thelatterareinparticularneededtosupportallocationmechanisms similartothoseavailableonpubliccloudssuchasAmazonandGoogle.
• Wewillalsosupportdynamicpartitioningofresourcesamong“traditionalbatchsystems”andCloudinfrastructures(forsomeLRMS).
• SupportforstandardsinIaaSresourceorchestrationengines throughtheuseoftheTOSCAstandard.• ThisovercomestheportabilityandusabilityproblemthatwaysoforchestratingresourcesinCloudcomputingframeworkswidelydifferamongeachother.
• ImprovedIaaSorchestrationcapabilities forpopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• EnhancementswillincludethedevelopmentofcustomTOSCAtemplatestofacilitateresourceorchestrationforendusers,increasedscalabilityofdeployedresourcesandsupportoforchestrationcapabilitiesforOpenNebula.
8IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
IaaSFeatures(2)
• ImprovedQoS capabilitiesofstorageresources.• Bettersupportofhigh-levelstoragerequirementssuchasflexibleallocationofdiskortapestoragespaceandsupportfordatalifecycle.Thisisanenhancementalsowithrespecttowhatiscurrentlyavailableinpublicclouds,suchasAmazonGlacierandGoogleCloudStorage.
• Improvedcapabilitiesfornetworkingsupport.• EnhancementswillincludeflexiblenetworkingsupportinOpenNebula andhandlingofnetworkconfigurationsthroughdevelopmentsoftheOCCIstandardforbothOpenNebula andOpenStack.
• ImprovedandtransparentsupportforDockercontainers.• IntroductionofnativecontainersupportinOpenNebula,developmentofstandardinterfacesusingtheOCCIprotocoltodrivecontainersupportinbothOpenNebulaandOpenStack.
9IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
PaaSFeatures(1)
• ImprovedcapabilitiesinthegeographicalexploitationofCloudresources.• Endusersneednottoknowwhereresourcesarelocated,becausetheINDIGOPaaSlayerishidingthecomplexityofbothschedulingandbrokering.
• StandardinterfacetoaccessPaaSservices.• Currently,eachPaaSsolutionavailableonthemarketisusingadifferentsetofAPIs,languages,etc.INDIGOwillusetheTOSCAstandardtohidethesedifferences.
• SupportfordatarequirementsinCloudresourceallocations.• Resourcescanbeallocatedwheredataisstored.
• IntegrateduseofresourcescomingfrombothpublicandprivateCloudinfrastructures.• TheINDIGOresourceorchestratoriscapableofaddressingbothtypesofCloudinfrastructuresthroughTOSCAtemplateshandledateitherthePaaSorIaaSlevel.
10IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
PaaSFeatures(2)
• Distributeddatafederations supportinglegacyapplicationsaswellashighlevelcapabilitiesfordistributedQoS andDataLifecycleManagement.• ThisincludesforexampleremotePosix accesstodata.
• IntegratedIaaSandPaaSsupportinresourceallocations.• Forexample,storageprovidedattheIaaSlayerisautomaticallymadeavailabletohigher-levelallocationresourcesperformedatthePaaSlayer.
• Transparentclient-sideimport/exportofdistributedClouddata.• Thissupportsdropbox-likemechanismsforimportingandexportingdatafrom/totheCloud.ThatdatacanthenbeeasilyingestedbyCloudapplicationsthroughtheINDIGOunifieddatatools.
• Supportfordistributeddatacachingmechanismsandintegrationwithexistingstorageinfrastructures.• INDIGOstoragesolutionsarecapableofprovidingefficientaccesstodataandoftransparentlyconnectingtoPosix filesystemsalreadyavailableindatacenters.
11IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
PaaSFeatures(3)
• Deployment,monitoringandautomaticscalabilityofexistingapplications.• Forexample,existingapplicationssuchaswebfront-endsorR-Studioserverscanbeautomaticallyanddynamicallydeployedinhighly-availableandscalableconfigurations.
• Integratedsupportforhigh-performanceBigDataanalytics.• ThisincludescustomframeworkssuchasOphidia(providingahighperformanceworkflowexecutionenvironmentforBigDataAnalyticsonlargevolumesofscientificdata)aswellasgeneralpurposeenginesforlarge-scaledataprocessingsuchasSpark,allintegratedtomakeuseoftheINDIGOPaaSfeatures.
• Supportfordynamicandelasticclustersofresources.• ResourcesandapplicationscanbeclusteredthroughtheINDIGOAPIs.Thisincludesforexamplebatchsystemson-demand(suchasHTCondor orTorque)andextensibleapplicationplatforms(suchasApacheMesos)capableofsupportingbothapplicationexecutionandinstantiationoflong-runningservices.
12IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
AAIFeatures
• Provideanadvancedsetoffeaturesthatincludes:• Userauthentication(supportingSAML,OIDC,X.509)• Identityharmonization(linkheterogeneousAuthN mechanismstoasingleVOidentity)• ManagementofVOmembership(i.e.,groupsandotherattributes)• Managementofregistrationandenrolmentflows• ProvisioningofVOstructureandmembershipinformationtoservices• Management,distributionandenforcementofauthorizationpolicies
13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
StorageQualityofServiceandtheCloud
14
Amazon S3 Glacier
Google Standard DurableReducesAvailability Nearline
HPSS/GPFS CorrespondstotheHPSSClasses(customizable)
dCache Resilient TAPEdisk+tape
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
Nextstep:DataLifeCycle
15
• DataLifeCycleisjustthetimedependentchangeof• StorageQualityofService• OwnershipandAccessControl(PIOwned,noaccess,SiteOwned,Publicaccess)• Paymentmodel:Payasyougo;Payinadvanceforrestoflifetime.• Maybeotherthings
6m 1years 10years
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
DataFederation
AmazonS3
DNS:p-aws-useast
INFNItaly
DockerOneclient
Docker
AWSUSA
DockerOnezone
VMonezone
DockerOneclient
Docker
NFSServer
VMoneprovider
VMnfs
VMoneclient
POSIXVolume
DockerOneclient
DockerUPVSpain
VM:demo-onedata-upv-provider
DockerOneclient
LaptopOSX
SAMBAExport
boot2docker
20IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
17
FrontendServices/Toolkit
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
Integration schemas
• WeprovidethegraphicaluserinterfacesintheformofthescientificgatewaysandworkflowsandthewaytoaccesstheINDIGOPaaS servicesandsoftwarestack,andallowdefineandsetuptheon-demandinfrafortheWP2usecases.• Settingupwholeusecaseinfrastructure:Theadministratorwillbeprovidedwiththereadytousereceiptsthathewillbeabletocustomize.Thefinaluserswillbeprovidedwiththeserviceend-pointsandwillnotbeawareofthebackend.
• UsetheINDIGOfeaturesfromtheirownPortals: Usercommunities, havingtheirownScientificGatewaysetup,canexploittheFutureGateway RESTAPItodealwithINDIGOwholesoftwarestack.
• UseoftheINDIGOtoolsandportals, including theFutureGateway,ScientificWorkflowsSystems,BigDataAnalyticsFrameworks(likeOphidia),MobileApplications.InthisscenariothefinalusersaswellasdomainadministratorswillusetheGUItools.Theadministratorwilluseitasdescribedinfirstcase.Inadditiondomainspecificuserswillbeprovidedwithspecificportlets/workflows/apps thatwillallowgraphicalinteractionwiththeirapplicationsrunviaINDIGOsoftwarestack.
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 18
FromCSGFtoFutureGateway
GridEngine
JSAGA
Portlet Portlet …
ClassicCSGF (before INDIGO)
Liferay/Glassfish
JSAGA
Portlet Portlet …
FutureGateway Approach (INDIGO)
Liferay/Tomcat
Comunication Portlet-GridEngine-JSAGAonly possiblewithJAVAlibraries
APIServer
Comunication Portlet-APIServerviaRESTAPIs,thisallowstoserveexternalapplicationsTheAPIServerinteractsviaJAVA librariestoJSAGA
RESTAPIs
Web/MobileApps
• ThesameRESTAPIscouldbeusedbyMobileApps
• ThoseAPIsmakeeasiertheinteractionwiththePaaS layer
• ThoseRESTAPIsprovideaneasyexploitationofINDIGOCapabilitiestonon-INDIGOApplications
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud19
Ophidia framework
• Ophidia isabigdataanalyticsframeworkforeScience• Primarilyusedfortheanalysisofclimatedata,exploitableinmultipledomains• “Datacube”abstractionandOLAP-basedapproachforbigdata• Supportforarray-baseddataanalysisandscientificdataformats• Parallelcomputingtechniquesandsmartdatadistributionmethods• ~100array-basedprimitivesand~50datacubeoperators
• i.e.:datasub-setting, dataaggregation,array-basedtransformations,datacube roll-up/drill-down,datacubeimport,etc.
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 20
INDIGOmoduleforKepler
• TheKeplerscientificworkflowsystemisanopensourcetoolthatenablescreation,executionandsharingofworkflowsacrossabroadrangeofscientificandengineeringdisciplines.• FirstversionoftheINDIGOmoduledelivered,graduallyadded newfunctionalitiesavailablefortheusers.• INDIGOmodulebased ontheFutureGateway API• Atthemoment,itispossibletobuildworkflowsthatdefinetask,preparesinputsandtriggersexecution.WhileataskisexecutedwithinINDIGO'sinfrastructure,itispossibletocheckitsstatus.• FutureGateway APIclient: https://github.com/indigo-dc/indigoclient• Keplerbasedactors: https://github.com/indigo-dc/indigokepler
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 21
22
Usecasesexamples
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
UC:Awebportalthat exploits abatch systemtorunapplications
• Ausercommunitymaintainsa“vanilla”versionofportalandcomputingimageplussomespecificrecipestocustomizesoftwaretoolsanddata• Portalandcomputingarepartofthesameimagethatcantakedifferentroles.• Customizationmayincludecreatingspecialusers,copying(andregisteringintheportal)referencedata,installing(andagainregistering)processingtools.• Typicallywebportalimagealsohasabatchqueueserverinstalled.
• Alltherunninginstancesshareacommondirectory.• Differentcredentials:end-userandapplicationdeployment.
13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
UCInspiration:Galaxyonthecloud
• Galaxycanbeinstalledonadedicatedmachineorasafront/endtoabatchqueue.• Galaxyexposesawebinterfaceandexecutesalltheinteractions(includingdatauploading)asjobsinabatchqueue.• Requiresashareddirectoryamongtheworkingnodesandthefront/end.• Itsupportsaseparatestorageareafordifferentusers,managingthemthroughtheportal.
24IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
UC:Awebportalthat exploits abatchsystem torunapplications
1) Thewebportalisinstantiated,installedandconfiguredautomaticallyexploitingAnsible recipesandTOSCATemplates.
2) Aremoteposix shareisautomaticallymountedonthewebportalusingOnedata
3) Thesameposix shareisautomaticallymountedalsoonworkernodesusingOnedata
4) End-userscanseeandaccessthesamefilesviasimplewebbrowsersorsimilar.5) AbatchsystemisdynamicallyandautomaticallyconfiguredviaTOSCA
Templates6) Theportalisautomaticallyconfiguredinordertoexecutejobonthebatch
cluster7) Thebatchclusterisautomaticallyscaledup&downlookingatthejobloadon
thebatchsystem.IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 25
UC:UseCaseLifecycle
• Preliminary• Theusecaseadministratorcreatesthe“vanilla”imagesoftheportal+computingimage.• Theusecaseadministrator,withthesupportofINDIGOexperts,writestheTOSCAspecificationoftheportal,queue,computingconfiguration.
• Group-specific• Theusecaseadministrator,withthesupportofINDIGOexperts,writesspecificmodulesforportal-specificconfigurations.• Theusecaseadministratordeploysthevirtualappliance.
• Dailywork• UsersAccesstheportalasifitwaslocallydeployedandsubmitJobstothesystemastheywouldhavebeenprovisionedstatically.
30IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
UC:AGraphic Overview
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
Future GatewayAPIServer
WP6
WP5
Front-EndPublic IP
Provider
User2)Deploy TOSCAwithVanilla VM/Container
1)StageData
5)Mount
6)AccessWebPortal
Galaxy
4)Install /Configure
WNWNWN …
VirtualElastic Cluster
Orchestrator
IM
OpenNebula
WP4
Other PaaSCore Services
CloudSite
OpenStack
HeatClues
IM
31
TOSCADocuments andDockerfiles perUseCase
INDIGO-DataCloudDocker Hub Organization
Champion+JRA
1.a.1)build,push
1.a.2)Dockerfile(commit)
1.b)AutomatedBuild
ApossiblePhenomenal-INDIGOintegrationscenario
• Phenomenalalreadyrelyonaveryrichset-upexploitingMesos• INDIGOisabletoprovideacustomizableenvironmentwhereanaprioricomplexclustercouldbedeployedinanautomaticway:• UsingaspecificTOSCATemplatebuildwiththeexpertiseoftheINDIGOPaaS developers
• INDIGOcouldprovidetoPhenomenal:• (Automatic)Resourceprovisioningexploitinganykindofcloudenvironment(privateorpublic)
• Reactingonthemonitoringthestatusoftheservicesistantiated• AdvancedandflexibleAAIsolution• Advancedandflexibledatamanagementsolution• Advancedschedulingacrossmanycloudproviderbasedon:
• SLA/QoS,Datalocation,availabilitymonitoringandrankedwithhighlyflexiblerules• Easytousewebinterfacebothfortheendusersandfortheservicesadmin/developers
32IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
Phenomenal exploiting INDIGO
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
Future GatewayAPIServer
WP6
WP5
MesosMasters
Public IP
Provider
User2)Deploy TOSCAwithVanilla VM/Container
1)StageData
5)Mount
6)AccessMesosServices
Chronos/Marathon
4)Install /Configure
Workers…
VirtualElastic MesosCluster
Orchestrator
IM
OpenNebula
WP4
Other PaaSCore Services
CloudSite
OpenStack
HeatClues
IM
33
TOSCADocuments andDockerfiles perUseCase
INDIGO-DataCloudDocker Hub Organization
Champion+JRA
1.a.1)build,push
1.a.2)Dockerfile(commit)
1.b)AutomatedBuild
Workers
INDIGOFAQ
• HowdoINDIGOachieveresourceredundancyandhighavailability?• Thisisachievedatmultiplelevels:
• atthedatalevel,redundancycanbeimplemented exploitingthecapabilityofINDIGO'sOnedata ofreplicatingdataacrossdifferentdatacenters.
• atthesitelevel, itispossibletoaskforcopiesofdatatobeforexampleonbothdiskandtapeusingtheINDIGOQoS storagefeatures.
• forservices,theINDIGOarchitectureusesMesos andMarathontoprovideautomaticservicehigh-availabilityandloadbalancing.Thisautomationiseasilyobtainableforstateless services; forstatefulservicesthisisapplication-dependent butitcannormallybeintegratedintoMesos through,forexample,acustomframework(examplesofwhichareprovidedbyINDIGO).
• HowdoINDIGOachieveresourcescalability?• Firstofall,wecandistinguishbetweenvertical(scaleup)andhorizontal(scaleout)scalability.INDIGOprovidesboth:• Mesos andMarathonhandleverticalscalabilitybydeployingDocker containerswithanincreasingamountofresources.
• TheINDIGOPaaS OrchestratorhandleshorizontalscalabilitythroughrequestsmadeattheIaaS leveltoaddresourceswhenneeded. 34
IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
INDIGOFAQ
• HowdoINDIGOachieveresourcescalability?• TheINDIGOsoftwaredoesthisinasmartway,i.e.forexampleitdoesnotlookatCPUloadonly:• InthecaseofadynamicallyinstantiatedLRMS,itchecksthestatusofjobsandqueuesandaccordinglyaddsorremovecomputingnodes.
• InthecaseofaMesos cluster,incasethereareapplicationstostartandtherenofreeresources,INDIGOstartsupmorenodes.ThishappenswithinthelimitsofthesubmittedTOSCAtemplates.Inotherwords,anygivenuserstayswithinthelimitsoftheTOSCAtemplatehehassubmitted;thisistruealsoforwhatregardsaccountingpurposes.
• Howdoyouknowwhenandwhereresourcesareavailable?• WeareextendingtheInformationSystemavailableintheEuropeanGridInfrastructure(EGI)toinformtheINDIGOPaaS orchestratorabouttheavailableIaaSinfrastructuresandabouttheservicestheyprovide.ItisthereforepossiblefortheINDIGOorchestratortooptimallychooseacertainIaaS infrastructuregiven,forexample,thelocationofacertaindataset.
35IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
Conclusions
• Firstofficialreleasewillbe:endofJuly
• Thefirstprototypeisalreadyavailable:• Notalltheservicesandfeaturesareavailable• Thisisforinternalevaluation,butalreadysomeservicescouldbetested
• Alotofimportantdevelopmentarebeingcarriedonwiththeoriginaldeveloperscommunitysothatthecodemantenance isnot(only)inourhands
32 IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud
Thankyou
https://www.indigo-datacloud.euBetterSoftwareforBetterScience.
33IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud