The GRelC Project: architecture, history and a use case in the environmental domain G. Aloisio - S....

1
The GRelC Project: architecture, history and a use case in the environmental domain G. Aloisio - S. Fiore The Climate-G testbed is an interdisciplinary effort involving partners from several institutions and joining expertise in the field of climate change and computational science. Its main goal is to allow scientists carrying out geographical and cross-institutional data discovery, access, analysis, visualization and sharing of climate data . The Climate-G partners involved into the testbed are: the Centro Euro-Mediterraneo per i Cambiamenti Climatici (CMCC, Italy), the Institut Pierre-Simon Laplace (IPSL/CNRS, France), the Fraunhofer Institut für Algorithmen und Wissenschaftliches Rechnen (SCAI, Germany), the National Center for Atmospheric Research (NCAR, USA) and Rensselaer Polytechnic Institute (RPI, USA), the University of Reading (Reading, UK), the University of Cantabria (UC, Spain) and the University of Salento (UniSalento, Italy). The Climate-G Portal is the access point to the entire testbed infrastructure. It represents the scientific gateway of the testbed and it is intended for scientists and researchers that easily and transparently want to manage the available climate change experiments and datasets. Basically, all of the activities related to the testbed infrastructure must be carried out exploiting high level web interfaces available through the Climate-G Portal. This means that all of the involved actors in the scene (system/portal administrators, scientists, guest users, data and metadata providers, etc.) have to perform most/all of their activities through the portal. Fig.4 shows the central role of the portal in the Climate-G infrastructure. The GRelC DAIS is a general purpose data grid service for database access, query and management. This service acts as a standard front-end for database access on the grid. It provides both basic and advanced primitives to transparently access, query, manage and interact with different data sources, concealing the back-end heterogeneity, Globus GSI and VOMS, security details, connection and other low level issues. It currently (i) exploits the Web-Services paradigm (WS-I based), (ii) it is compatible with Globus and gLite grid middleware/environments and (iii) it provides grid- enabled query mechanisms leveraging compression, chunking, pre-fetching and streaming to enhance performance on a wide area network/grid environment. Moreover, it supports both global (by means of VOMS) and local (on the GRelC DAS side) authorization levels, increasing flexibility, manageability and scalability in role and policy management. The main goal of this data grid service is to efficiently, securely and transparently manage databases on the grid, across virtual organizations, with regard to modern grid standards and specifications (OGSA and WSRF compliant) as well as existing middleware such as Globus, gLite, etc. The GRelC project started in 2001 as a research effort at the University of Salento with a Ph.D. thesis. The initial goal was both simple and ambitious: to provide a set of data grid services to transparently, securely and efficiently manage relational databases in a grid environment. Until 2004, the GRelC releases exploited a client-server architecture, a proprietary communication protocol and the Grid Security Infrastructure. In 2006 the GRelC service was completely re- engineered to address interoperability through a WS-based approach. Instead of moving towards OGSI (which seemed to be too heavy), the GRelC service was implemented as a web service WS-I compliant and GSI enabled, that is a very light implementation. The gLite-based release (2007) was a crucial step to meet the EGEE community, their use cases and needs. This community provided new important requirements, particulary in the Earth Science context (EGEE NA4). That release was also available for training and dissemination purposes through the GILDA t-infrastructure. From 2008, the GRelC software has been included into the Italian grid release (gLite-based) and distributed into the Worker nodes and User Interfaces components across the Italian country. This way, several performance tests based on the gLite middleware were also carried out to stress the system and prove its stability. In the same year, GRelC was included into the EGEE RESPECT Program due to its compatibility with the gLite middleware and its added value with regard to new database-oriented functionalities that were not available in the gLite release at that time. From 2009 to 2010 new GRelC releases (server and portal) addressing stability, management and monitoring were made available to the user community. In 2011, the GRelC team will face new challenges. The most relevant one will be related to the EGI Database of Databases (a global registry hosting the

Transcript of The GRelC Project: architecture, history and a use case in the environmental domain G. Aloisio - S....

Page 1: The GRelC Project: architecture, history and a use case in the environmental domain G. Aloisio - S. Fiore The Climate-G testbed is an interdisciplinary.

The GRelC Project: architecture, history and a use case in the environmental

domain

The GRelC Project: architecture, history and a use case in the environmental

domain

G. A

loisi

o - S

. Fio

re The Climate-G testbed is an interdisciplinary effort involving partners from several institutions and joining expertise in the field of climate change and computational science. Its main goal is to allow scientists carrying out geographical and cross-institutional data discovery, access, analysis, visualization and sharing of climate data.

The Climate-G partners involved into the testbed are: the Centro Euro-Mediterraneo per i Cambiamenti Climatici (CMCC, Italy), the Institut Pierre-Simon Laplace (IPSL/CNRS, France), the Fraunhofer Institut für Algorithmen und Wissenschaftliches Rechnen (SCAI, Germany), the National Center for Atmospheric Research (NCAR, USA) and Rensselaer Polytechnic Institute (RPI, USA), the University of Reading (Reading, UK), the University of Cantabria (UC, Spain) and the University of Salento (UniSalento, Italy).

The Climate-G Portal is the access point to the entire testbed infrastructure. It represents the scientific gateway of the testbed and it is intended for scientists and researchers that easily and transparently want to manage the available climate change experiments and datasets.

Basically, all of the activities related to the testbed infrastructure must be carried out exploiting high level web interfaces available through the Climate-G Portal.

This means that all of the involved actors in the scene (system/portal administrators, scientists, guest users, data and metadata providers, etc.) have to perform most/all of their activities through the portal.

Fig.4 shows the central role of the portal in the Climate-G infrastructure.

The GRelC DAIS is a general purpose data grid service for database access, query and management.

This service acts as a standard front-end for database access on the grid. It provides both basic and advanced primitives to transparently access, query, manage and interact with different data sources, concealing the back-end heterogeneity, Globus GSI and VOMS, security details, connection and other low level issues. It currently (i) exploits the Web-Services paradigm (WS-I based), (ii) it is compatible with Globus and gLite grid middleware/environments and (iii) it provides grid-enabled query mechanisms leveraging compression, chunking, pre-fetching and streaming to enhance performance on a wide area network/grid environment.

Moreover, it supports both global (by means of VOMS) and local (on the GRelC DAS side) authorization levels, increasing flexibility, manageability and scalability in role and policy management. The main goal of this data grid service is to efficiently, securely and transparently manage databases on the grid, across virtual organizations, with regard to modern grid standards and specifications (OGSA and WSRF compliant) as well as existing middleware such as Globus, gLite, etc.

The GRelC project started in 2001 as a research effort at the University of Salento with a Ph.D. thesis. The initial goal was both simple and ambitious: to provide a set of data grid services to transparently, securely and efficiently manage relational databases in a grid environment. Until 2004, the GRelC releases exploited a client-server architecture, a proprietary communication protocol and the Grid Security Infrastructure. In 2006 the GRelC service was completely re-engineered to address interoperability through a WS-based approach. Instead of moving towards OGSI (which seemed to be too heavy), the GRelC service was implemented as a web service WS-I compliant and GSI enabled, that is a very light implementation. The gLite-based release (2007) was a crucial step to meet the EGEE community, their use cases and needs. This community provided new important requirements, particulary in the Earth Science context (EGEE NA4). That release was also available for training and dissemination purposes through the GILDA t-infrastructure. From 2008, the GRelC software has been included into the Italian grid release (gLite-based) and distributed into the Worker nodes and User Interfaces components across the Italian country. This way, several performance tests based on the gLite middleware were also carried out to stress the system and prove its stability. In the same year, GRelC was included into the EGEE RESPECT Program due to its compatibility with the gLite middleware and its added value with regard to new database-oriented functionalities that were not available in the gLite release at that time.From 2009 to 2010 new GRelC releases (server and portal) addressing stability, management and monitoring were made available to the user community. In 2011, the GRelC team will face new challenges. The most relevant one will be related to the EGI Database of Databases (a global registry hosting the list of DB resources available in the EGI context). The registry will complement the EGI Application Database allowing scientists to know more about existing DBs, their location, main purpose, available data, etc. This will help the co-operation and interaction among research groups, promoting a more effective publishing and sharing of grid-enabled data sources.