Dissertation defense

Post on 08-May-2015

1.704 views 8 download

description

Dissertation title and final project: Data source registration in the Virtual Laboratory. The subject of the thesis and related project was to integrate EGEE/WLCG data sources into GridSpace Virtual Laboratory (http://gs.cyfronet.pl/). Poster presentation entitled Integrating EGEE Storage Services with the Virtual Laboratory: http://www.plgrid.pl/en/pr_materials/posters Dissertation available at http://virolab.cyfronet.pl/trac/vlvl#MasterofScienceThesesrelatedtoViroLab

Transcript of Dissertation defense

EUROPEAN UNION

Data source registration in the Virtual Laboratory

Marek Pomockamajor: applied computer science

specialisation: computer techniques in science and technologyFaculty of Physics and Applied Computer Science,

AGH University of Science and Technology

Supervisor: Marian Bubak, Ph.D.Consultants: Piotr Nowakowski, M.Sc.

Daniel Harężlak, M.Sc.

Master’s thesis defenseNovember 13, 2009

Introduction to Grid technologies and Virtual Laboratories

Motivation and ObjectivesConceptual view onto the solutionChallenges and solutionsApplicationsFuture workSummaryReferences

Outline

GRID TECHNOLOGIES AND VIRTUAL LABORATORIES

3

Grid is a distributed computing architecture with cross-organizational access, providing nontrivial quality of service for participating actors.

Notable applications include

high-energy physics

(LHC)

Complex parameter studies in biomedicine and biochemistry

Weather

forecastingNatural disaster modelling

Digital

image

archives

Grid is a computer

infrastructure

.. dedicated to conducting

in-silico research

TASK

PCSSICM

CYFRONET

WCSS

created by many partnerswho share supercomputers, computer clusters, storage and research instruments

to create common space for e-Science

Grid users are

Virtual

Organizations

(VOs)

CYFRONET PSNC CYFRONET PSNC

which are dynamic

by their nature

VO approach simplifies access management

Examples of Grids

TeraGrid

Open Science Grid

EGEE, DEISA

Virtual Laboratories (VLs) supply

higher-level services and abstract

low-level details related to Grid

services invocations, security etc.

away from end-users.Grid middleware

Virtual Laboratory

Grid infrastructure

Many VLs endeavor to be general

purpose in-silico (or virtual)

experiment design and execution

environment,

e.g. GridSpace

Virtual Laboratory.

Others are often designed for specific purpose

such as remote

access to

scientific

instruments

(e.g. VLAB)

supporting research in

meteorology (LEAD)

research and decision support

in virology

(ViroLab)

if (condition) then …else …end

… or using workflow

languages (e.g. in VL-e,

VLAB, myExperiment,

myGrid Taverna, Kepler,

Triana, Pegasus)

Virtual experiments in VLs are

expressed using script-based

languages (e.g. in GridSpace,

Athena, Geodise)

VLs made Grids

available to non-

computer scientists.

Virtual Laboratory

Grid Users

MOTIVATION AND OBJECTIVES

13

Hello, I’m a chemist. I use Gaussian program and

work mostly with files. I’d like to use Grids, but filesystem is far too

complex for me.

... the security system is

complicated too.

Yes, I do agree. We won’t use Grids until there is an easy way of using Grid file catalogues from virtual experiments.

Objectives The objective of the dissertation is to meet these

needs by enabling access to LFC data sources

from GridSpace scripts concealing most of

interactions with Grid Security Infrastructure (GSI).

This goal entails several other

objectives:Data Source

Registry

reorganization

extending DSR

EPE plug-in

Integration with GridSpace Engine

GSEngine

DAC2

LFC DS

Conceptual view onto the solution

CHALLENGES AND SOLUTIONS

17

Not to comprise GSEngine portability

Linux

UNIX

Windows

Mac OS X

Scientific Linux 4 (SL4)

Platform independent

GScript LFC integration

LFC connector LFC client library LFC DS Server

GSEngine

Solution:

Platform dependent

Isolation of platform

dependent code into

a remote service

Serve multiple users utilizing inherently single user gLite libraries.

Solution:

ChemPo command wrappers – each

command is run in new JVM with

prepared UNIX environment.

Instead of permanent place for a credentials (e.g. ~/.globus/),

use temporary files and specify paths dynamically in UNIX

environment of created JVM processes.

Cert1

Cert2 Key2

Key1

Worker 2 JVM

Worker 1 JVMLFC DS Server

(ServerJVM)

Enabling access to Grid files without downloading them to

GSEngine machine

Grid File Access Library (GFAL)

ChemPo command wrappers do

not support such a mode of

operation (streaming to client)

First, download file to LFC DS

Server. Then, stream it to client.

Vice—versa for sending file to

Grid, i.e. stream file to LFC DS

Server, then send it to Grid.

Streaming representation in GridSpace scripts

Solution: User receives modified version of Ruby IO object

(sending file to Grid happens on file close operation while

retrieving a file from Grid during object initialization)Reading a Grid file

ds.open("mpomocka/test_file", "r") do |file| file.each {|line| puts line}endf = ds.open("mpomocka/test_file", :r)f.each {|line| puts line}f.close

Writing to a Grid filef = ds.open("mpomocka/test_file",:write)f.puts "First line of the file test_file"f.puts "Second line of the file test_file"f.close

Alternativelyds.open("mpomocka/test_file",:w) do |f| f.puts "Another way to write to a file" f.puts "Note that close is not necessary“end

Need for a descriptive and intuitive API

mimicking Ruby file operations,

e.g. exist?, file?

DAC2 LFC DS methods

Method name, Aliases

createDirectory(parent,child),create_directorycreateDirectory(path), create_directorydelete(path), delete_file, deleteFiledeleteFile(filename)directory?(filename), isDirectory, is_directoryexist?(path), exist, exists, exist?file?(path), isFile, is_filegetFile(filename), get_filegetSize(path), size, size?, get_sizelistFiles(path), list_filesopenFile(path, mode, &b), open, open_filestoreFile(payload, filename), store_filezero?(path)

e.g. create_directory

instead of mkdir

Security Secure communication

Need to manage keystores

Tunnelling is simpler

Transport Layer Security

Credentials management

Proxy certificate generation

Java CoG Kit

Credentials are stored in DSRData Source Registry

Credentials can be set

static, i.e. shared with other

authenticated users

Proxy generated automatically during initialization

Information needs – previous DSR structure did not enable

storage of LFC data sources information nor gLite credentials.

Solution:

DataSourcesRelationalDataSource

sDataSources

LFCDataSources

LFCCertData

Also changes to DAC2 and DSR EPE

Plug-in DSR access modules.

LFCDSConnections++

GUI for registering data source of new type

Created as a new form in EPE DSR Plug-in

In addition, some new DSR access methods were created in DSR EPE Plug-in.

Selection of distributed computing approachTechnology Com

munication overhead

Development cost

Operation when endpoints are protected by firewall

Unnecessary features

Java RMI Low Low Difficult Few

SOAP High Moderate Uncomplicated Few

Heavy-weight distributed computing frameworks (e.g. CORBA, EJB)

? Moderate or high ?

Many

Socket-based communication

Low Very high Uncomplicated Few

Cajo Low Low Uncomplicated Few

Exchanging large files – how to avoid OutOfMemory errors?

Solution: employ RMIIO library (RemoteInputStream[Server]

and RemoteOutputStream[Server] classes)

Figure illustrates downloading a file to client

Figure – sending a file from client to server

Additional benefits

of using RMIIO:

Compressed socket-based communication

Automaticretry

Solution scales linearly

Figure – download and upload times up to 2Gb when tested

locally on ChemPo server

PL-Grid:

Polish Infrastructure for

Information Science Support in

the European Research Space.

Chemistry Portal – ChemPo

Applications

Finer-grained security

Pseudo memory mapped-file API

(Pseudo MMAP)

Future work

SUMMARY

33

LFC DS Server LFC DS client Java library

DAC2 LFC connector DAC2 LFC DS methods

Method name, Aliases

createDirectory(parent,child),create_directorycreateDirectory(path), create_directorydelete(path), delete_file, deleteFiledeleteFile(filename)directory?(filename), isDirectory, is_directory….

New DAC2 API

Automated and transparent

handling of Grid credentials

Reorganized DSR Schema

Extended EPE

DSR Plug-in

References

[1] M. Pomocka,  P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. Poster presented as part of the Cracow Grid Workshop ’09, Krakow, Poland, 12-14 October 2009.

[2] M. Pomocka,  P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. In Marian Bubak, Michał Turała, and Kazimierz Wiatr, editors, Proceedings of Cracow Grid Workshop – CGW’09, October 2009, Krakow, Poland. ACC-Cyfronet AGH. to appear

[3] Lana Abadie et al., Grid-Enabled Standards-based Data Management. In Mass Storage Systems and Technologies, 2007. MSST 2007. 24th IEEE Conference on, pages 60–71, Sept. 2007.

[4] Marian Bubak et al., Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global

[5] Matthias Assel et al. : A Collaborative Environment Allowing Clinical Investigations on Integrated Biomedical Databases. In Tony Solomonides et al. (Ed.): Healthgrid Research, Innovation and Business Case; Proceedings of HealthGrid 2009, Studies in Health Technology and Informatics, vol 147, IOS Press, ISSN 0926-9630, pp 51 -61

[6] M. Malawski, T. Bartynski, and M. Bubak, "Invocation of operations from script-based grid applications," Future Generation Computer Systems, vol. In Press, Accepted Manuscript, 2009.

36