How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for...

33
How the INDIGO-DataCloud computing platform aims at helping scientific communities RIA-653549 Giacinto DONVITO INDIGO Technical Director INFN Bari [email protected]

Transcript of How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for...

Page 1: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

HowtheINDIGO-DataCloudcomputingplatformaims

athelpingscientificcommunities

RIA-653549Giacinto DONVITO

INDIGOTechnicalDirectorINFNBari

[email protected]

Page 2: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

INDIGO-DataCloud

• AnH2020projectapprovedinJanuary2015intheEINFRA-1-2014call• 11.1M€,30months (fromApril2015toSeptember2017)

• Who:26Europeanpartnersin11Europeancountries• CoordinationbytheItalianNationalInstituteforNuclearPhysics(INFN)• Includingdevelopersofdistributedsoftware,industrialpartners,researchinstitutes,universities,e-infrastructures

• What:developanopensourceCloudplatform forcomputinganddata(“DataCloud”)tailoredtoscience.

• For:multi-disciplinaryscientificcommunities• E.g.structuralbiology, earthscience,physics,bioinformatics, culturalheritage,astrophysics,lifescience,climatology

• Where:deployableonhybrid(publicorprivate)Cloudinfrastructures• INDIGO=INtegratingDistributeddataInfrastructuresforGlobalExplOitation

• Why:answertothetechnologicalneedsofscientistsseekingtoeasilyexploitdistributedCloud/Gridcomputeanddataresources. 2

Page 3: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

FromthePaper“AdvancesinCloud”

• ECExpertGroupReportonCloudComputing,http://cordis.europa.eu/fp7/ict/ssai/docs/future-cc-2may-finalreport-experts.pdf

To reach the full promises of CLOUD computing, major aspects have not yet beendeveloped and realised and in some cases not even researched. Prominent among theseare open interoperation across (proprietary) CLOUD solutions at IaaS, PaaS and SaaSlevels. A second issue is managing multitenancy at large scale and in heterogeneousenvironments. A third is dynamic and seamless elasticity from in- house CLOUD to publicCLOUDs for unusual (scale, complexity) and/or infrequent requirements. A fourth is datamanagement in a CLOUD environment: bandwidth may not permit shipping data to theCLOUD environment and there are many associated legal problems concerning securityand privacy. All these challenges are opportunities towards a more powerful CLOUDecosystem.[…] A major opportunity for Europe involves finding a SaaS interoperable solution acrossmultiple CLOUD platforms. Another lies in migrating legacy applications without losingthe benefits of the CLOUD, i.e. exploiting the main characteristics, such as elasticity etc.

3

Page 4: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

INDIGOAddressesCloudGaps

• INDIGOfocusesonusecasespresentedbyitsscientificcommunities toaddressthegapsidentifiedbythepreviouslymentionedECReport,withregardto:• Redundancy/reliability• Scalability(elasticity)• Resourceutilization• Multi-tenancyissues• Lock-in• MovingtotheCloud• Datachallenges:streaming,multimedia,bigdata• Performance

• Reusingexistingopensourcecomponentswhereverpossibleandcontributingtoupstreamprojects (suchasOpenStack,OpenNebula,Galaxy,etc.)forsustainability.

4IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 5: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

INDIGOandotherEuropeanProjects• TheINDIGOservicesarebeingdevelopedaccordingtotherequirementscollectedwithinmanymultidisciplinaryscientificcommunities,suchasELIXIR,WeNMR,INSTRUCT,EGI-FedCloud,DARIAH,INAF-LBT,CMCC-ENES,INAF-CTA,LifeWatch-Algae-Bloom,EMSO-MOIST,EuroBioImaging.However,theyareimplementedsothattheycanbeeasilyreusedbyotherusercommunities.• INDIGOhasstrongrelationshipswithcomplementaryinitiatives,suchasEGI-EngageontheoperationalsideandAARCwithrespecttoAuthN/AuthZ policies.UsersofEC-fundedinitiativessuchasPRACE andEUDAT arealsoexpectedtobenefitfromthedeploymentofINDIGOcomponentsinsuchinfrastructures.• SeveralNational/Regionalinfrastructuresarecoveredbythe26INDIGOpartners,locatedin11Europeancountries.• INDIGOismentionedintherecentImportantProjectofCommonEuropeanInterest(IPCEI) fortheexploitationofHPCandHTCresourcesatnational,regionalandEuropeanlevels.

5IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 6: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

WorkPackages

6IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 7: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

INDIGO-DataCloudGeneralArchitecture

7

JSAGA/JSAGAAdaptorsFuture GatewayEngineFuture GatewayRESTAPI

OtherScienceGateways

Mobile Apps

OpenMobileToolkit

Ophidpiaplugin

LONIplugin

Taverna,Keplerplugin

AdminPortlets

UserPortlets

DataAnalitics

WorkflowPortlets

SGMonGUIClients

FutureGatewayPortal WorkflowsMobileclientsSupportservices

WP6Services

Kubernetes Cluster

IAM

Service

PaaS

Orchestrator

QoS/SLA

CloudProvider

Ranker

Monitoring

Infrastructure

Manager

TOSCA

TOSCAWP5

Services

Onedata Dynafed

FTSDataServices

REST/CDMI/Wedbav/posix/GridftpOIDC

Accounting

Non-INDIGO

IaaS

NativeIaaS API

Heat/IM

TOSCA

WP4Services

Mesos

ClusterMesos

Cluster

Aut.Scaling

Service

Storage

Service

S3/CDMI/Posix/WebdavGridFTP

Smart

Scheduling

SpotIstances

Native

Docker

QoS Support

Identity

Armonization

Local

Repository

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 8: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

IaaSFeatures(1)

• Improvedschedulingforallocationofresources bypopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• Enhancementswilladdressbothbetterschedulingalgorithmsandsupportforspot-instances.Thelatterareinparticularneededtosupportallocationmechanisms similartothoseavailableonpubliccloudssuchasAmazonandGoogle.

• Wewillalsosupportdynamicpartitioningofresourcesamong“traditionalbatchsystems”andCloudinfrastructures(forsomeLRMS).

• SupportforstandardsinIaaSresourceorchestrationengines throughtheuseoftheTOSCAstandard.• ThisovercomestheportabilityandusabilityproblemthatwaysoforchestratingresourcesinCloudcomputingframeworkswidelydifferamongeachother.

• ImprovedIaaSorchestrationcapabilities forpopularopensourceCloudplatforms,i.e.OpenStackandOpenNebula.• EnhancementswillincludethedevelopmentofcustomTOSCAtemplatestofacilitateresourceorchestrationforendusers,increasedscalabilityofdeployedresourcesandsupportoforchestrationcapabilitiesforOpenNebula.

8IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 9: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

IaaSFeatures(2)

• ImprovedQoS capabilitiesofstorageresources.• Bettersupportofhigh-levelstoragerequirementssuchasflexibleallocationofdiskortapestoragespaceandsupportfordatalifecycle.Thisisanenhancementalsowithrespecttowhatiscurrentlyavailableinpublicclouds,suchasAmazonGlacierandGoogleCloudStorage.

• Improvedcapabilitiesfornetworkingsupport.• EnhancementswillincludeflexiblenetworkingsupportinOpenNebula andhandlingofnetworkconfigurationsthroughdevelopmentsoftheOCCIstandardforbothOpenNebula andOpenStack.

• ImprovedandtransparentsupportforDockercontainers.• IntroductionofnativecontainersupportinOpenNebula,developmentofstandardinterfacesusingtheOCCIprotocoltodrivecontainersupportinbothOpenNebulaandOpenStack.

9IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 10: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

PaaSFeatures(1)

• ImprovedcapabilitiesinthegeographicalexploitationofCloudresources.• Endusersneednottoknowwhereresourcesarelocated,becausetheINDIGOPaaSlayerishidingthecomplexityofbothschedulingandbrokering.

• StandardinterfacetoaccessPaaSservices.• Currently,eachPaaSsolutionavailableonthemarketisusingadifferentsetofAPIs,languages,etc.INDIGOwillusetheTOSCAstandardtohidethesedifferences.

• SupportfordatarequirementsinCloudresourceallocations.• Resourcescanbeallocatedwheredataisstored.

• IntegrateduseofresourcescomingfrombothpublicandprivateCloudinfrastructures.• TheINDIGOresourceorchestratoriscapableofaddressingbothtypesofCloudinfrastructuresthroughTOSCAtemplateshandledateitherthePaaSorIaaSlevel.

10IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 11: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

PaaSFeatures(2)

• Distributeddatafederations supportinglegacyapplicationsaswellashighlevelcapabilitiesfordistributedQoS andDataLifecycleManagement.• ThisincludesforexampleremotePosix accesstodata.

• IntegratedIaaSandPaaSsupportinresourceallocations.• Forexample,storageprovidedattheIaaSlayerisautomaticallymadeavailabletohigher-levelallocationresourcesperformedatthePaaSlayer.

• Transparentclient-sideimport/exportofdistributedClouddata.• Thissupportsdropbox-likemechanismsforimportingandexportingdatafrom/totheCloud.ThatdatacanthenbeeasilyingestedbyCloudapplicationsthroughtheINDIGOunifieddatatools.

• Supportfordistributeddatacachingmechanismsandintegrationwithexistingstorageinfrastructures.• INDIGOstoragesolutionsarecapableofprovidingefficientaccesstodataandoftransparentlyconnectingtoPosix filesystemsalreadyavailableindatacenters.

11IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 12: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

PaaSFeatures(3)

• Deployment,monitoringandautomaticscalabilityofexistingapplications.• Forexample,existingapplicationssuchaswebfront-endsorR-Studioserverscanbeautomaticallyanddynamicallydeployedinhighly-availableandscalableconfigurations.

• Integratedsupportforhigh-performanceBigDataanalytics.• ThisincludescustomframeworkssuchasOphidia(providingahighperformanceworkflowexecutionenvironmentforBigDataAnalyticsonlargevolumesofscientificdata)aswellasgeneralpurposeenginesforlarge-scaledataprocessingsuchasSpark,allintegratedtomakeuseoftheINDIGOPaaSfeatures.

• Supportfordynamicandelasticclustersofresources.• ResourcesandapplicationscanbeclusteredthroughtheINDIGOAPIs.Thisincludesforexamplebatchsystemson-demand(suchasHTCondor orTorque)andextensibleapplicationplatforms(suchasApacheMesos)capableofsupportingbothapplicationexecutionandinstantiationoflong-runningservices.

12IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 13: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

AAIFeatures

• Provideanadvancedsetoffeaturesthatincludes:• Userauthentication(supportingSAML,OIDC,X.509)• Identityharmonization(linkheterogeneousAuthN mechanismstoasingleVOidentity)• ManagementofVOmembership(i.e.,groupsandotherattributes)• Managementofregistrationandenrolmentflows• ProvisioningofVOstructureandmembershipinformationtoservices• Management,distributionandenforcementofauthorizationpolicies

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 14: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

StorageQualityofServiceandtheCloud

14

Amazon S3 Glacier

Google Standard DurableReducesAvailability Nearline

HPSS/GPFS CorrespondstotheHPSSClasses(customizable)

dCache Resilient TAPEdisk+tape

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 15: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

Nextstep:DataLifeCycle

15

• DataLifeCycleisjustthetimedependentchangeof• StorageQualityofService• OwnershipandAccessControl(PIOwned,noaccess,SiteOwned,Publicaccess)• Paymentmodel:Payasyougo;Payinadvanceforrestoflifetime.• Maybeotherthings

6m 1years 10years

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 16: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

DataFederation

AmazonS3

DNS:p-aws-useast

INFNItaly

DockerOneclient

Docker

AWSUSA

DockerOnezone

VMonezone

DockerOneclient

Docker

NFSServer

VMoneprovider

VMnfs

VMoneclient

POSIXVolume

DockerOneclient

DockerUPVSpain

VM:demo-onedata-upv-provider

DockerOneclient

LaptopOSX

SAMBAExport

boot2docker

20IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 17: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

17

FrontendServices/Toolkit

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 18: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

Integration schemas

• WeprovidethegraphicaluserinterfacesintheformofthescientificgatewaysandworkflowsandthewaytoaccesstheINDIGOPaaS servicesandsoftwarestack,andallowdefineandsetuptheon-demandinfrafortheWP2usecases.• Settingupwholeusecaseinfrastructure:Theadministratorwillbeprovidedwiththereadytousereceiptsthathewillbeabletocustomize.Thefinaluserswillbeprovidedwiththeserviceend-pointsandwillnotbeawareofthebackend.

• UsetheINDIGOfeaturesfromtheirownPortals: Usercommunities, havingtheirownScientificGatewaysetup,canexploittheFutureGateway RESTAPItodealwithINDIGOwholesoftwarestack.

• UseoftheINDIGOtoolsandportals, including theFutureGateway,ScientificWorkflowsSystems,BigDataAnalyticsFrameworks(likeOphidia),MobileApplications.InthisscenariothefinalusersaswellasdomainadministratorswillusetheGUItools.Theadministratorwilluseitasdescribedinfirstcase.Inadditiondomainspecificuserswillbeprovidedwithspecificportlets/workflows/apps thatwillallowgraphicalinteractionwiththeirapplicationsrunviaINDIGOsoftwarestack.

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 18

Page 19: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

FromCSGFtoFutureGateway

GridEngine

JSAGA

Portlet Portlet …

ClassicCSGF (before INDIGO)

Liferay/Glassfish

JSAGA

Portlet Portlet …

FutureGateway Approach (INDIGO)

Liferay/Tomcat

Comunication Portlet-GridEngine-JSAGAonly possiblewithJAVAlibraries

APIServer

Comunication Portlet-APIServerviaRESTAPIs,thisallowstoserveexternalapplicationsTheAPIServerinteractsviaJAVA librariestoJSAGA

RESTAPIs

Web/MobileApps

• ThesameRESTAPIscouldbeusedbyMobileApps

• ThoseAPIsmakeeasiertheinteractionwiththePaaS layer

• ThoseRESTAPIsprovideaneasyexploitationofINDIGOCapabilitiestonon-INDIGOApplications

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud19

Page 20: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

Ophidia framework

• Ophidia isabigdataanalyticsframeworkforeScience• Primarilyusedfortheanalysisofclimatedata,exploitableinmultipledomains• “Datacube”abstractionandOLAP-basedapproachforbigdata• Supportforarray-baseddataanalysisandscientificdataformats• Parallelcomputingtechniquesandsmartdatadistributionmethods• ~100array-basedprimitivesand~50datacubeoperators

• i.e.:datasub-setting, dataaggregation,array-basedtransformations,datacube roll-up/drill-down,datacubeimport,etc.

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 20

Page 21: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

INDIGOmoduleforKepler

• TheKeplerscientificworkflowsystemisanopensourcetoolthatenablescreation,executionandsharingofworkflowsacrossabroadrangeofscientificandengineeringdisciplines.• FirstversionoftheINDIGOmoduledelivered,graduallyadded newfunctionalitiesavailablefortheusers.• INDIGOmodulebased ontheFutureGateway API• Atthemoment,itispossibletobuildworkflowsthatdefinetask,preparesinputsandtriggersexecution.WhileataskisexecutedwithinINDIGO'sinfrastructure,itispossibletocheckitsstatus.• FutureGateway APIclient: https://github.com/indigo-dc/indigoclient• Keplerbasedactors: https://github.com/indigo-dc/indigokepler

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 21

Page 22: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

22

Usecasesexamples

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 23: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

UC:Awebportalthat exploits abatch systemtorunapplications

• Ausercommunitymaintainsa“vanilla”versionofportalandcomputingimageplussomespecificrecipestocustomizesoftwaretoolsanddata• Portalandcomputingarepartofthesameimagethatcantakedifferentroles.• Customizationmayincludecreatingspecialusers,copying(andregisteringintheportal)referencedata,installing(andagainregistering)processingtools.• Typicallywebportalimagealsohasabatchqueueserverinstalled.

• Alltherunninginstancesshareacommondirectory.• Differentcredentials:end-userandapplicationdeployment.

13IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 24: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

UCInspiration:Galaxyonthecloud

• Galaxycanbeinstalledonadedicatedmachineorasafront/endtoabatchqueue.• Galaxyexposesawebinterfaceandexecutesalltheinteractions(includingdatauploading)asjobsinabatchqueue.• Requiresashareddirectoryamongtheworkingnodesandthefront/end.• Itsupportsaseparatestorageareafordifferentusers,managingthemthroughtheportal.

24IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 25: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

UC:Awebportalthat exploits abatchsystem torunapplications

1) Thewebportalisinstantiated,installedandconfiguredautomaticallyexploitingAnsible recipesandTOSCATemplates.

2) Aremoteposix shareisautomaticallymountedonthewebportalusingOnedata

3) Thesameposix shareisautomaticallymountedalsoonworkernodesusingOnedata

4) End-userscanseeandaccessthesamefilesviasimplewebbrowsersorsimilar.5) AbatchsystemisdynamicallyandautomaticallyconfiguredviaTOSCA

Templates6) Theportalisautomaticallyconfiguredinordertoexecutejobonthebatch

cluster7) Thebatchclusterisautomaticallyscaledup&downlookingatthejobloadon

thebatchsystem.IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud 25

Page 26: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

UC:UseCaseLifecycle

• Preliminary• Theusecaseadministratorcreatesthe“vanilla”imagesoftheportal+computingimage.• Theusecaseadministrator,withthesupportofINDIGOexperts,writestheTOSCAspecificationoftheportal,queue,computingconfiguration.

• Group-specific• Theusecaseadministrator,withthesupportofINDIGOexperts,writesspecificmodulesforportal-specificconfigurations.• Theusecaseadministratordeploysthevirtualappliance.

• Dailywork• UsersAccesstheportalasifitwaslocallydeployedandsubmitJobstothesystemastheywouldhavebeenprovisionedstatically.

30IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 27: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

UC:AGraphic Overview

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Future GatewayAPIServer

WP6

WP5

Front-EndPublic IP

Provider

User2)Deploy TOSCAwithVanilla VM/Container

1)StageData

5)Mount

6)AccessWebPortal

Galaxy

4)Install /Configure

WNWNWN …

VirtualElastic Cluster

Orchestrator

IM

OpenNebula

WP4

Other PaaSCore Services

CloudSite

OpenStack

HeatClues

IM

31

TOSCADocuments andDockerfiles perUseCase

INDIGO-DataCloudDocker Hub Organization

Champion+JRA

1.a.1)build,push

1.a.2)Dockerfile(commit)

1.b)AutomatedBuild

Page 28: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

ApossiblePhenomenal-INDIGOintegrationscenario

• Phenomenalalreadyrelyonaveryrichset-upexploitingMesos• INDIGOisabletoprovideacustomizableenvironmentwhereanaprioricomplexclustercouldbedeployedinanautomaticway:• UsingaspecificTOSCATemplatebuildwiththeexpertiseoftheINDIGOPaaS developers

• INDIGOcouldprovidetoPhenomenal:• (Automatic)Resourceprovisioningexploitinganykindofcloudenvironment(privateorpublic)

• Reactingonthemonitoringthestatusoftheservicesistantiated• AdvancedandflexibleAAIsolution• Advancedandflexibledatamanagementsolution• Advancedschedulingacrossmanycloudproviderbasedon:

• SLA/QoS,Datalocation,availabilitymonitoringandrankedwithhighlyflexiblerules• Easytousewebinterfacebothfortheendusersandfortheservicesadmin/developers

32IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 29: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

Phenomenal exploiting INDIGO

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Future GatewayAPIServer

WP6

WP5

MesosMasters

Public IP

Provider

User2)Deploy TOSCAwithVanilla VM/Container

1)StageData

5)Mount

6)AccessMesosServices

Chronos/Marathon

4)Install /Configure

Workers…

VirtualElastic MesosCluster

Orchestrator

IM

OpenNebula

WP4

Other PaaSCore Services

CloudSite

OpenStack

HeatClues

IM

33

TOSCADocuments andDockerfiles perUseCase

INDIGO-DataCloudDocker Hub Organization

Champion+JRA

1.a.1)build,push

1.a.2)Dockerfile(commit)

1.b)AutomatedBuild

Workers

Page 30: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

INDIGOFAQ

• HowdoINDIGOachieveresourceredundancyandhighavailability?• Thisisachievedatmultiplelevels:

• atthedatalevel,redundancycanbeimplemented exploitingthecapabilityofINDIGO'sOnedata ofreplicatingdataacrossdifferentdatacenters.

• atthesitelevel, itispossibletoaskforcopiesofdatatobeforexampleonbothdiskandtapeusingtheINDIGOQoS storagefeatures.

• forservices,theINDIGOarchitectureusesMesos andMarathontoprovideautomaticservicehigh-availabilityandloadbalancing.Thisautomationiseasilyobtainableforstateless services; forstatefulservicesthisisapplication-dependent butitcannormallybeintegratedintoMesos through,forexample,acustomframework(examplesofwhichareprovidedbyINDIGO).

• HowdoINDIGOachieveresourcescalability?• Firstofall,wecandistinguishbetweenvertical(scaleup)andhorizontal(scaleout)scalability.INDIGOprovidesboth:• Mesos andMarathonhandleverticalscalabilitybydeployingDocker containerswithanincreasingamountofresources.

• TheINDIGOPaaS OrchestratorhandleshorizontalscalabilitythroughrequestsmadeattheIaaS leveltoaddresourceswhenneeded. 34

IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 31: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

INDIGOFAQ

• HowdoINDIGOachieveresourcescalability?• TheINDIGOsoftwaredoesthisinasmartway,i.e.forexampleitdoesnotlookatCPUloadonly:• InthecaseofadynamicallyinstantiatedLRMS,itchecksthestatusofjobsandqueuesandaccordinglyaddsorremovecomputingnodes.

• InthecaseofaMesos cluster,incasethereareapplicationstostartandtherenofreeresources,INDIGOstartsupmorenodes.ThishappenswithinthelimitsofthesubmittedTOSCAtemplates.Inotherwords,anygivenuserstayswithinthelimitsoftheTOSCAtemplatehehassubmitted;thisistruealsoforwhatregardsaccountingpurposes.

• Howdoyouknowwhenandwhereresourcesareavailable?• WeareextendingtheInformationSystemavailableintheEuropeanGridInfrastructure(EGI)toinformtheINDIGOPaaS orchestratorabouttheavailableIaaSinfrastructuresandabouttheservicestheyprovide.ItisthereforepossiblefortheINDIGOorchestratortooptimallychooseacertainIaaS infrastructuregiven,forexample,thelocationofacertaindataset.

35IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 32: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

Conclusions

• Firstofficialreleasewillbe:endofJuly

• Thefirstprototypeisalreadyavailable:• Notalltheservicesandfeaturesareavailable• Thisisforinternalevaluation,butalreadysomeservicescouldbetested

• Alotofimportantdevelopmentarebeingcarriedonwiththeoriginaldeveloperscommunitysothatthecodemantenance isnot(only)inourhands

32 IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud

Page 33: How the INDIGO-DataCloud computing platform aims at helping … · 2016-07-03 · CLOUDs for unusual (scale, complexity) and/or infrequentrequirements. A fourth is data managementin

Thankyou

https://www.indigo-datacloud.euBetterSoftwareforBetterScience.

33IntegratingdistributeddatainfrastructureswithINDIGO-DataCloud