Dissertation defense
-
Upload
marekpomocka -
Category
Technology
-
view
1.704 -
download
8
description
Transcript of Dissertation defense
EUROPEAN UNION
Data source registration in the Virtual Laboratory
Marek Pomockamajor: applied computer science
specialisation: computer techniques in science and technologyFaculty of Physics and Applied Computer Science,
AGH University of Science and Technology
Supervisor: Marian Bubak, Ph.D.Consultants: Piotr Nowakowski, M.Sc.
Daniel Harężlak, M.Sc.
Master’s thesis defenseNovember 13, 2009
Introduction to Grid technologies and Virtual Laboratories
Motivation and ObjectivesConceptual view onto the solutionChallenges and solutionsApplicationsFuture workSummaryReferences
Outline
GRID TECHNOLOGIES AND VIRTUAL LABORATORIES
3
Grid is a distributed computing architecture with cross-organizational access, providing nontrivial quality of service for participating actors.
Notable applications include
high-energy physics
(LHC)
Complex parameter studies in biomedicine and biochemistry
Weather
forecastingNatural disaster modelling
Digital
image
archives
Grid is a computer
infrastructure
.. dedicated to conducting
in-silico research
TASK
PCSSICM
CYFRONET
WCSS
created by many partnerswho share supercomputers, computer clusters, storage and research instruments
to create common space for e-Science
Grid users are
Virtual
Organizations
(VOs)
CYFRONET PSNC CYFRONET PSNC
which are dynamic
by their nature
VO approach simplifies access management
Examples of Grids
TeraGrid
Open Science Grid
EGEE, DEISA
Virtual Laboratories (VLs) supply
higher-level services and abstract
low-level details related to Grid
services invocations, security etc.
away from end-users.Grid middleware
Virtual Laboratory
Grid infrastructure
Many VLs endeavor to be general
purpose in-silico (or virtual)
experiment design and execution
environment,
e.g. GridSpace
Virtual Laboratory.
Others are often designed for specific purpose
such as remote
access to
scientific
instruments
(e.g. VLAB)
supporting research in
meteorology (LEAD)
research and decision support
in virology
(ViroLab)
if (condition) then …else …end
… or using workflow
languages (e.g. in VL-e,
VLAB, myExperiment,
myGrid Taverna, Kepler,
Triana, Pegasus)
Virtual experiments in VLs are
expressed using script-based
languages (e.g. in GridSpace,
Athena, Geodise)
VLs made Grids
available to non-
computer scientists.
Virtual Laboratory
Grid Users
MOTIVATION AND OBJECTIVES
13
Hello, I’m a chemist. I use Gaussian program and
work mostly with files. I’d like to use Grids, but filesystem is far too
complex for me.
... the security system is
complicated too.
Yes, I do agree. We won’t use Grids until there is an easy way of using Grid file catalogues from virtual experiments.
Objectives The objective of the dissertation is to meet these
needs by enabling access to LFC data sources
from GridSpace scripts concealing most of
interactions with Grid Security Infrastructure (GSI).
This goal entails several other
objectives:Data Source
Registry
reorganization
extending DSR
EPE plug-in
Integration with GridSpace Engine
GSEngine
DAC2
LFC DS
Conceptual view onto the solution
CHALLENGES AND SOLUTIONS
17
Not to comprise GSEngine portability
Linux
UNIX
Windows
Mac OS X
Scientific Linux 4 (SL4)
Platform independent
GScript LFC integration
LFC connector LFC client library LFC DS Server
GSEngine
Solution:
Platform dependent
Isolation of platform
dependent code into
a remote service
Serve multiple users utilizing inherently single user gLite libraries.
Solution:
ChemPo command wrappers – each
command is run in new JVM with
prepared UNIX environment.
Instead of permanent place for a credentials (e.g. ~/.globus/),
use temporary files and specify paths dynamically in UNIX
environment of created JVM processes.
Cert1
Cert2 Key2
Key1
Worker 2 JVM
Worker 1 JVMLFC DS Server
(ServerJVM)
Enabling access to Grid files without downloading them to
GSEngine machine
Grid File Access Library (GFAL)
ChemPo command wrappers do
not support such a mode of
operation (streaming to client)
First, download file to LFC DS
Server. Then, stream it to client.
Vice—versa for sending file to
Grid, i.e. stream file to LFC DS
Server, then send it to Grid.
Streaming representation in GridSpace scripts
Solution: User receives modified version of Ruby IO object
(sending file to Grid happens on file close operation while
retrieving a file from Grid during object initialization)Reading a Grid file
ds.open("mpomocka/test_file", "r") do |file| file.each {|line| puts line}endf = ds.open("mpomocka/test_file", :r)f.each {|line| puts line}f.close
Writing to a Grid filef = ds.open("mpomocka/test_file",:write)f.puts "First line of the file test_file"f.puts "Second line of the file test_file"f.close
Alternativelyds.open("mpomocka/test_file",:w) do |f| f.puts "Another way to write to a file" f.puts "Note that close is not necessary“end
Need for a descriptive and intuitive API
mimicking Ruby file operations,
e.g. exist?, file?
DAC2 LFC DS methods
Method name, Aliases
createDirectory(parent,child),create_directorycreateDirectory(path), create_directorydelete(path), delete_file, deleteFiledeleteFile(filename)directory?(filename), isDirectory, is_directoryexist?(path), exist, exists, exist?file?(path), isFile, is_filegetFile(filename), get_filegetSize(path), size, size?, get_sizelistFiles(path), list_filesopenFile(path, mode, &b), open, open_filestoreFile(payload, filename), store_filezero?(path)
e.g. create_directory
instead of mkdir
Security Secure communication
Need to manage keystores
Tunnelling is simpler
Transport Layer Security
Credentials management
Proxy certificate generation
Java CoG Kit
Credentials are stored in DSRData Source Registry
Credentials can be set
static, i.e. shared with other
authenticated users
Proxy generated automatically during initialization
Information needs – previous DSR structure did not enable
storage of LFC data sources information nor gLite credentials.
Solution:
DataSourcesRelationalDataSource
sDataSources
LFCDataSources
LFCCertData
Also changes to DAC2 and DSR EPE
Plug-in DSR access modules.
LFCDSConnections++
GUI for registering data source of new type
Created as a new form in EPE DSR Plug-in
In addition, some new DSR access methods were created in DSR EPE Plug-in.
Selection of distributed computing approachTechnology Com
munication overhead
Development cost
Operation when endpoints are protected by firewall
Unnecessary features
Java RMI Low Low Difficult Few
SOAP High Moderate Uncomplicated Few
Heavy-weight distributed computing frameworks (e.g. CORBA, EJB)
? Moderate or high ?
Many
Socket-based communication
Low Very high Uncomplicated Few
Cajo Low Low Uncomplicated Few
Exchanging large files – how to avoid OutOfMemory errors?
Solution: employ RMIIO library (RemoteInputStream[Server]
and RemoteOutputStream[Server] classes)
Figure illustrates downloading a file to client
Figure – sending a file from client to server
Additional benefits
of using RMIIO:
Compressed socket-based communication
Automaticretry
Solution scales linearly
Figure – download and upload times up to 2Gb when tested
locally on ChemPo server
PL-Grid:
Polish Infrastructure for
Information Science Support in
the European Research Space.
Chemistry Portal – ChemPo
Applications
Finer-grained security
Pseudo memory mapped-file API
(Pseudo MMAP)
Future work
SUMMARY
33
LFC DS Server LFC DS client Java library
DAC2 LFC connector DAC2 LFC DS methods
Method name, Aliases
createDirectory(parent,child),create_directorycreateDirectory(path), create_directorydelete(path), delete_file, deleteFiledeleteFile(filename)directory?(filename), isDirectory, is_directory….
New DAC2 API
Automated and transparent
handling of Grid credentials
Reorganized DSR Schema
Extended EPE
DSR Plug-in
References
[1] M. Pomocka, P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. Poster presented as part of the Cracow Grid Workshop ’09, Krakow, Poland, 12-14 October 2009.
[2] M. Pomocka, P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. In Marian Bubak, Michał Turała, and Kazimierz Wiatr, editors, Proceedings of Cracow Grid Workshop – CGW’09, October 2009, Krakow, Poland. ACC-Cyfronet AGH. to appear
[3] Lana Abadie et al., Grid-Enabled Standards-based Data Management. In Mass Storage Systems and Technologies, 2007. MSST 2007. 24th IEEE Conference on, pages 60–71, Sept. 2007.
[4] Marian Bubak et al., Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global
[5] Matthias Assel et al. : A Collaborative Environment Allowing Clinical Investigations on Integrated Biomedical Databases. In Tony Solomonides et al. (Ed.): Healthgrid Research, Innovation and Business Case; Proceedings of HealthGrid 2009, Studies in Health Technology and Informatics, vol 147, IOS Press, ISSN 0926-9630, pp 51 -61
[6] M. Malawski, T. Bartynski, and M. Bubak, "Invocation of operations from script-based grid applications," Future Generation Computer Systems, vol. In Press, Accepted Manuscript, 2009.
36