SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred...

24
SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo Dell’Acqua ECMWF

Transcript of SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred...

Page 1: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT

V-GISC/SIMDAT project – a Virtual GISC

Alfred Hofstadler, Matteo Dell’Acqua

ECMWF

Page 2: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-2 SIMDAT

Project History:

May 2002: Thirteenth session WMO Regional Association VI: “…agreed that the concept of a Virtual GISC had merit…”

June 2002: V-GISC in RA-VI Kick-off Meeting

Partners: DWD, Meteo France, UK Met-Office, EUMETSAT, ECMWF

Steering Group + 4 working groups: Policy, Data, Communications, Dissemination/Acquisition

2003: SIMDAT project proposal submitted to EU

1 September 2004: contract with EU is signed

October 2004: V-GISC steering group decides to move V-GISC development into the SIMDAT project

November 2004: SIMDAT Kick-off meeting

4 V-GISC working groups are mapped onto SIMDAT working groups: Virtual Organisation, Ontologies, GRID Infrastructure, Access to Distributed Data

February 2005: First (V-)GISC-demonstrator at CBS

Page 3: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-3 SIMDAT

SIMDAT - Introduction

Data Grids for Process and Product Development using Numerical Simulation and Knowledge Discovery

4 years project funded by the EU

Contract with EU was signed on 1 September 2004

SIMDAT focuses on 4 applications

Product design in automotive and aerospace

Process design in pharmacology

Service provision in meteorology

Objective of SIMDAT is to use data grid technology to resolve a complex problem for each of the 4 applications

Budget of 11 M € of which 10.5% for meteorological activity

320 men/month taking into account EU funding and the contribution from the partners

Page 4: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-4 SIMDAT

SIMDAT - Strategy

7 Grid-technology areas have been identified to achieving SIMDAT objectives

Phase 1: Connectivity Phase 2: Interoperability Phase 3: Knowledge

. Deployment of Grid infrastructure with particular attention to data transport and management. Distributed DB access

. Virtual Data Repository

. Introduction of grid technologies research

Workflows for next-generation aggregated knowledge capture, discovery and mining

Ontologies

Integration of analysis services

Workflows

Knowledge Services

Integrated Grid infrastructure offering basic services to applications

Access to data distributed on Grid sites

Management of Virtual Organisation

Page 5: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-5 SIMDAT

Meteorology application : Project Aims

5 partners: DWD, Meteo-France, UK Met Office, EUMETSAT and ECMWF

3 “potential” GISCs : DWD, Meteo-France, UK Met Office

2 DCPCs : ECMWF, EUMETSAT

Instead of each National Met Service having a GISC (Global Information System Centre)

The V-GISC will be seen as a normal GISC and will fulfil the WMO Information System technical requirements

The project will build the foundations of the V-GISC by developing an infrastructure that brings together the data of the partners and provides access to the distributed meteorological databases

Page 6: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-6 SIMDAT

Meteorology application : Project Aims

A complex problem: To build a Virtual GISC, an integrated and scalable framework for the collection and sharing of distributed data that will offer:

A single view of meteorological information which is distributed amongst the 5 partners

Improve visibility and access to meteorological data through a comprehensive discovery service based on metadata development

Offer a variety of reliable reliable delivery services (routine dissemination of and collection of data)

Provide a global access control policy managed by the partners and integrated into their existing security infrastructure

Quality of services, reliability and security

Processing services and shared data manipulation facilities

The software developed within the project will be made available to WMO

Page 7: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-7 SIMDAT

GRID Technology

Grid technology will be used To connect the diverse data sources and create a Virtual Database

To enable flexible, secure collaboration through virtual organisation

Data Grid technology presents an architectural framework that aims to provide access to distributed data in a simple,secure, reliable and scalable manner from a widely distributed set of computers and across various administrative boundaries

The essential characteristics of a Data Grid are:

Reference a dataset by a unique identifier

Discover dataset by attributes

Track multiple copies of a single file, and ultimately locate the "nearest" copy

Move files from one point on the grid to another point (push, pull and third party copy)

The domain of the V-GISC is an ideal candidate to exploit such a framework

Page 8: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-8 SIMDAT

V-GISC infrastructure

Interface to offer a single view of the data - Discovery facilities - Request/Subscription

MonitoringLoggingControl

Error tracking

SecurityAuthenticationAuthorization

AuditManagementUser registration

DB adminCatalogue admin

Grid infrastructure for sharing data

Interoperability interfaces for data/metadata exchange

mechanisms to synchronise metadata

Dissemination/acquisition mechanisms

Page 9: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-9 SIMDAT

Meteo requirements

Page 10: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-10 SIMDAT

V-GISC Conceptual view

Through the Distributed Portal users searches for and retrieves data, subscribe to services subject to authentication and authorization

The Virtual Database Service provides a single view of partners databases

Page 11: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-11 SIMDAT

V-GISC Conceptual view Virtual Database

Provide the unified view of all the shared datasets through a distributed catalogue Maintain the distributed catalogue amongst the partners using synchronization

mechanisms Provide interfaces with the legacy databases Implement data replication mechanisms Preserve the integrity of the data

Access Facilities

Collection & Dissemination services that support secure, efficient and reliable transport mechanisms

Quality of Service (QoS): Traffic Prioritization, Queuing mechanisms, Scheduling Discovery service by browsing the catalogue or using a keyword search engine Interactive and batch interfaces

VO

Security Services (CA, AuthN, AuthZ, Audit,…) Users management Data policy management Monitoring and control

Page 12: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-12 SIMDAT

V-GISC Distributed Architecture

V-GISC node is installed on each partner site

All the nodes are interconnected through a dedicated secure communication channel; The Database Communication Layer (DCL)

All the nodes exchange messages through the DCL

The architecture is decentralized

No central point where all the nodes are declared

No single point of failure

The network of nodes is self-organized

The network dynamically accepts new nodes and is aware of node disconnections

The network organizes its topology and indicates to the entering new nodes their position within the network

No manual intervention on the nodes to accepts new peers

Page 13: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-13 SIMDAT

V-GISC Distributed Architecture

Page 14: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-14 SIMDAT

V-GISC Node

Each node maintains a copy of the global catalogue describing data available through the V-GISC

The catalogue synchronization is done using the DCL

Each node maintains a cache used to replicate data and to efficiently serve the users

A node is interfaced with the local legacy databases

A node has a Web Portal for interactive access

A node has a Grid/Web Service Portal for batch access and integration of the V-GISC in a bigger Grid

A node implement all services offered by the V-GISC

Page 15: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-15 SIMDAT

V-GISC Node - Functional Design

Page 16: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-16 SIMDAT

Demonstrator – Functional View

To deploy a flexible infrastructure on top of which the Virtual Information Centre can be built

To use Grid technologies to federate databases located on partners site

To show to the user a unique view of data sets stored by at least 3 partners

To get a first implementation of the catalogue based on WMO core metadata

To offer first VO security services

Page 17: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-17 SIMDAT

Demonstrator - Design

3 main components to build the virtual database: Data Repository, Catalogue Node and Portal

installed on each partner site and interconnected through a dedicated secure connection channel

Data Repository Interface to the partners databases Offers metadata information to describe, search, locate data Offers interface to retrieve data from the associated local databases

Catalogue Node Maintains the catalogue and ensures synchronisation Harvests metadata and requests data from the data Repository Ingests data and maintains the cache of the V-GISC Serves clients: Portal or other Nodes Monitors the execution of the requests

Distributed Portal Offers interface to search/browse the V-GISC catalogue

Page 18: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-18 SIMDAT

Demonstrator - Architectural Choices

Grid Architecture that can accept any kind of Grid Technology

Free to choose any grid middleware (OGSA-DAI, GRIA, Glite, GT4) and pick the best component of each middleware that meets the V-GISC requirement

Catalogue Node built on a J2EE component framework

Solid framework used in production environment

Includes different services such as persistency, monitoring, configuration, etc

The framework can be seen as a kernel of components where it is easy to add services such as Grid services or Web services

Catalogue duplicated and synchronized on each site

To have a fast discovery (browse & search phase) phase

To have a reliable system (client redirection to another node in case of problems)

Page 19: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-19 SIMDAT

Demonstrator - Architecture

Page 20: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-20 SIMDAT

Demonstrator - Deployment

Page 21: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-21 SIMDAT

Problems and lessons learned - 1

Grid Middleware

Technology not mature for production environment

Middleware still evolving toward standards (WSRF, WSI, …)

Access to distributed data

No efficient and robust transport mechanism

No mechanism to duplicate and synchronize data

Difficult to ensure data integrity on huge data volumes

OGSA-DAI is promising, easy to understand and use

Page 22: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-22 SIMDAT

Problems and lessons learned - 2

Ontology / Metadata

Meteorological metadata are described using XML WMO-CORE metadata Profile

• Metadata description larger than the data • Same information repeated in all metadata records Unnecessary

information is circulating over the network• Large metadata records slowing down the Database hosting the

catalogue Universal request language was not a solution to the virtual database

problem

VO

No standard tools to manage users and data policies

No standard security policies

Page 23: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-23 SIMDAT

What’s next Finalise the Connectivity phase (by M18/Mar 2006)

Connect EUMETSAT to the Grid (M12-M15/Sep-Dec 2005)

Enhance the architecture (M13-M18/Oct 2005-Mar 2006)

Implement Registration Authority (M16-M17/Jan-Feb 2006)

Improve metadata model (M13-M16/Oct 2005-Jan 2006)

Enhance distributed portal (M14-M16/Nov 2005-Jan 2006)

Introduce acquisition of data (M18-M24/Mar-Sep 2006)

Develop subscription service (M20-M28/May 2006-Jan 2007)

Start developing the Virtual Organisation

Monitoring and management of the system (M18-M24/Mar-Sep 2006)

User management and data access control (M24-M30/Sep 2006-Mar 2007)

Develop the discovery mechanism (M20-M25/May-Oct 2006)

Start testing with other potential GISC

Japan and Australia have expressed interest in joining the SIMDAT project

Page 24: SIMDAT TMB, 15 December 2004 AMD-1 SIMDAT V-GISC/SIMDAT project – a Virtual GISC Alfred Hofstadler, Matteo DellAcqua ECMWF.

SIMDAT TMB, 15 December 2004 AMD-24 SIMDAT

Global View : Coordination Effort

Metadata

Request-reply mechanism

Exchange of catalogues

Definition on what data should be available and to whom

Virtual Organisation

Standardisation of services

Quality of Service

Security