Porting applications to EU-IndiaGrid: EGEE Marco Verlato (@pd.infn.it) EU-IndiaGrid Workshop 16-18...

Post on 20-Jan-2016

215 views 0 download

Tags:

Transcript of Porting applications to EU-IndiaGrid: EGEE Marco Verlato (@pd.infn.it) EU-IndiaGrid Workshop 16-18...

Porting applications to EU-IndiaGrid:

EGEE

Marco Verlato (@pd.infn.it)

EU-IndiaGrid Workshop 16-18 April 2007 Bangalore, India

2

Outline

• Concepts• Job categories• Levels of Gridification• Software remote installation• Applications invocation

3

Grid Applications Development

Where computer science meets the application communities!

VO-specific developments built on higher-level tools and core services

Makes Grid services useable by non-specialists

Grids provide the compute and data storage resources

Basic Grid services:AA, job submission, info, …

Higher-level grid services (brokering,…)

Application toolkits, …..

Application

Production grids provide these core services.

4

Job concept• gLite middleware follows the job submission

concept for application execution and resource sharing

• A job is a self contained entity that packages and conveys all the required information and artifacts for the successful remote execution of an application−Executable files−Input/Output data−Parameters−Environment−Infrastructure Requirements−Workflows

5

Type of applications

• The work done to port the application depends on its characteristics, complexity and the level of interactions with external grid services

• Size of input/output data• Metadata request• Access to external software

(database, libraries...)

6

Applications categories

• Simulation• Bulk Processing• Responsive Apps.• Workflow• Parallel Jobs

7

Simulation

• Examples– LHC Monte Carlo simulation– Fusion

• Characteristics– Jobs are CPU-intensive– Large number of independent jobs– Run by few (expert) users– Small input; large output

• Needs– Batch-system services– Minimal data management for storage of results

ATLAS

ITER

8

Bulk Processing

• Examples– HEP processing of raw data, analysis– Earth observation data processing

• Characteristics– Widely-distributed input data– Significant amount of input and output data

• Needs– Job management tools (workload

management)– Meta-data services– More sophisticated data management

9

Responsive Apps. (I)

• Examples–Prototyping new applications–Monitoring grid operations

• Characteristics–Small amounts of input and output data–Not CPU-intensive–Short response time (few minutes)

• Needs–Configuration which allows “immediate”

execution (QoS)–Services must treat jobs with minimum

latency

10

Responsive Apps. (II)

• Grid as a backend infrastructure:– gPTM3D: interactive analysis of medical images– GPS@: bioinformatics via web portal– GATE: radiotherapy planning– DILIGENT: digital libraries

• Characteristics– Rapid response: a human waiting for the result!– Many small but CPU-intensive tasks– User is not aware of “grid”!

• Needs– Interfacing (data & computing) with non-grid application

or portal– User and rights management between front-end and grid

11

Workflow

• Examples–“Bronze Standard”: medical images registration–Flood prediction

• Characteristics–Use of grid and non-grid services–Complex set of algorithms for the analysis–Complex dependencies between

individual tasks

• Needs–Tools for managing the workflow itself–Standard interfaces for services (I.e. web-

services)

12

Parallel Jobs

• Examples– Climate modeling– Earthquake analysis– Computational chemistry

• Characteristics– Many interdependent, communicating tasks– Many CPUs needed simultaneously– Use of MPI libraries

• Needs– Configuration of resources for flexible use of

MPI– Pre-installation of optimized MPI libraries

13

Universal Needs• Security

–Ability to control access to services and to data

•Fine-grained access control listsEncryption & logging for more demanding

disciplines•Access control consistently implemented over

all services

• VO Management–Management of users, groups, and roles–Changing the priority of jobs for different

users, groups, roles–Quota management for users, groups, roles–Definition and access to special resources

•Application-level services•Responsive queues (guaranteed, low-latency

execution)

14

Levels of “Gridification”• No development: wrap existing applications as

jobs. No source code modification is required

• Minor modifications: the application exposes minimal interaction with the grid services (e.g. Data Managements)

• Major modifications: a wide portion of the code is rewritten to adopt to the new environment (e.g. parallelization, metadata, information)

• Pure grid applications: developed from scratch. Extensively exploit existing grid services to provide new capabilities customized for a specific domain (e.g. metadata, job management, credential management)

15

No development

• It is the simplest case : the application has not external dependencies, just needs computational power

• Being sure that the application runs on the Worker Node, the executable, and the input files it may need are carried in the Input Sandbox and therefore executed

• Although the application can be directly run, it’s recommended to invoke it from a supplied script which prepare the environment before the execution

16

Minor modifications• The application has minimal

interactions with grid service : it is self consistent but...– e.g. takes input files from an SE– output files are stored on a SE– Perform I/O from a metadata catalog– the application itself can be

downloaded from a SE • User provides an adapter script

which executes needed tasks, set the environment and run the application

17

Major modifications

• A part of the application software is rewritten taking care of the new environment : e.g. because of parallelization, interaction at run time with grid services...– Use of API – Web Service invoked – CLI wrap

• Source code recompiled taking into account this

18

Pure grid application

• Entirely developed from scratch taking into account grid capabilities

• Better exploits grid facilities• Needs a deep knowledge of grid

environment and possibilities in order to engineer entirely the application logic over a grid environment

• Very rare to meet...

19

Additional software installed

• Independently from the integration level, applications may need external software (libraries) that, as application itself, can be provided in several ways– at run time

• carrying out it within InputSandbox• downloading it from a Storage Element

– permanent install on the CE• as tar.gz installed by the VO software

manager • as rpm being part of the gLite release

20

• The Experiment Software Manager (ESM) is the member of the experiment VO entitled to install Application Software in the different sites.

• The ESM is generally acknowledged through a VOMS role (SoftwareManager)

• The ESM can manage (install, validate, remove...) Experiment Software on a site at any time through a normal Grid job, without previous communication to the site administrators.

• Such job has in general no scheduling priorities and will be treated as any other job of the same VO in the same queue.

• lcg-asis is a good generator for such installation jobs

Software installation on CE

21

Software installation on CE

• The site provides a dedicated space where each supported VO can install or remove software. The amount of available space must be negotiated between the VO and the site

• An environmental variable holds the path for the location of a such space. Its format is the following:

VO_<name_of_VO>_SW_DIR

22

Software validation

• Once the software is installed, the ESM can perform the validation

• The validation is a process verifying that the software was properly installed and works as expected

• It can be performed in the same job, after installation (validation on the fly) or later on, via a dedicated job.

23

Software validation• If the ESM judges that the software was

properly installed he publishes in the Information System (IS) the tag which identifies unequivocally the software,

e.g.: VO-cms-CMSSW_1_2_2

• Such a tag is added to the GlueHostApplicationSoftwareRunTimeEnvironment attribute of the IS, using the GRIS running in the CE

• Jobs requesting for a particular piece of software can be directed to the appropriate CE just by setting special requirements on the JDL, e.g.: Requirements=Member("VO-cms-CMSSW_1_2_2",

other.GlueHostApplicationSoftwareRunTimeEnvironment);

24

Invoke Application

• From the UI−Command Line Interfaces / Scripts−APIs−Higher level tools

• From portals−Accessible from any browser−Tailored to applications

25

References (I)

https://grid.ct.infn.it/twiki/bin/view/GILDA/APIUsage

26

References (II)

https://glite-demo.ct.infn.it/

27

Acknowledgements

• Thanks to Mike Mineter (NeSC), Valeria Ardizzone and Emidio Giorgio of the INFN GILDA team