An architecture for mutualization and sharing of ... · An architecture for mutualization and...

39
1 Territories, Environment, Remote Sensing & Spatial Information Joint Research Unit Cemagref - CIRAD - ENGREF An architecture for mutualization and sharing of environmental d An architecture for mutualization and sharing of environmental d ata: ata: the PADOUE project the PADOUE project Anne Doucet LIP6 (University Paris 6)

Transcript of An architecture for mutualization and sharing of ... · An architecture for mutualization and...

1

Territories, Environment, Remote Sensing & Spatial Information Joint Research Unit Cemagref - CIRAD - ENGREF

An architecture for mutualization and sharing of environmental dAn architecture for mutualization and sharing of environmental data:ata:

the PADOUE projectthe PADOUE project

Anne Doucet

LIP6 (University Paris 6)

2 / 39

The PADOUE ProjectThe PADOUE Project

The PADOUE projectThe PADOUE project

• Data sharing for environment-related uses

• ACI-GRID (2002-2005)

• Partners: – LIP6 (CNRS - University Paris 6)

– INRIA (Project Caravel, Rocquencourt)

– LIRMM (CNRS - University of Montpellier II)

– CEMAGREF (LISC and JRU-3S)

– IRD (Desertification unit, Montpellier)

– CDS (Strasbourg)

Introduction Architecture Metadata service Application Conclusion

3 / 39

The PADOUE ProjectThe PADOUE Project

ContextContext

• Environmental scientists– Acquire and stock raw data, images, maps, …

– Develop programmes to analyse and process this data and applications using this information

Introduction Architecture Metadata service Application Conclusion

4 / 39

The PADOUE ProjectThe PADOUE Project

Network

ContextContext

– Large amounts of heterogeneous data, stored in autonomous and distributed sources

– Computer programmes to analyse and process data developed in an autonomous manner

A formidable network of resources!

A

MD

B

MD

CMD

Introduction Architecture Metadata service Application Conclusion

5 / 39

The PADOUE ProjectThe PADOUE Project

GoalsGoals

• How to take advantage of this network?– Identify and locate relevant data

– Calculate the data (filter and combine information flow, deliver

usable results)

– Enhance the network with consolidated data

Introduction Architecture Metadata service Application Conclusion

6 / 39

The PADOUE ProjectThe PADOUE Project

GoalsGoals

• Documentation and archiving of data and processesStructuring and managing metadata

• Interoperability of data and processes Architecture for large-scale sharing of heterogeneous data

• Establishing sequences of scientific processingScientific dataflow language

• Meaningful localization of relevant dataUse of the semantics of the data sources’ contents, and experiences of users, communities, requests

Introduction Architecture Metadata service Application Conclusion

7 / 39

The PADOUE ProjectThe PADOUE Project

StructureStructure

• Architecture– Sharing of heterogeneous data on a large scale

• Integration of schemas

– Localization improvement• Semantic organization of the network

• Experiences of users, communities, queries

• Metadata service– Metadata structuring

– Automatization of the management (MDweb)

• Application– Management and monitoring of dikes

Introduction Architecture Metadata service Application Conclusion

8 / 39

The PADOUE ProjectThe PADOUE Project

LargeLarge--scale data mediationscale data mediation

• Sharing of heterogeneous data– Mediation tool

• Large-scale distribution– Peer-to-peer network

Introduction Architecture Metadata service Application Conclusion

9 / 39

The PADOUE ProjectThe PADOUE Project

Data mediationData mediation

• Processing data heterogeneity

• Properties– Integration of existing data

– 3-layer architecture

• Hypothesis– Knowledge about the sources

• Interest– Meaningful processing of requests

Application

Mediatorédiateur

Adapt. Adapt. Adapt.

DBMSRelationnel

R(A,B,C)

Fichier texteFichier

texteFichier texte

x ; y ; ,z

Sources

Documents XML

Documents XMLDocuments

XML

R’

X B Y

Introduction Architecture Metadata service Application Conclusion

10 / 39

The PADOUE ProjectThe PADOUE Project

PeerPeer--toto--peer (P2P) networkspeer (P2P) networks

• Permit scalability and dynamicity

• P2P architecture– Client and server simultaneously

– No centralized coordination

– No global knowledge of the network

– All data accessible from any machine

• Unstructured P2P

TTL=4

TTL=3

TTL=2

TTL=3

TTL=2

TTL=2

TTL=1

TTL=1

TTL=1

TTL=0

TTL=0

A

B

H

F

C DE

G

I J

K

L

: message

����

����

����

����

����

����

����

����

���� ����

����

���� : message processed

: message not processed

����

����

Introduction Architecture Metadata service Application Conclusion

11 / 39

The PADOUE ProjectThe PADOUE Project

Mediation and PeerMediation and Peer--toto--PeerPeer

• Take advantage of the complementarity:– Mediator:

+ High-level language (SQL, XQuery,…)

- Known sources (fixed)

- Limited number of sources

– Peer-to-peer

+ Very large number of sources

+ Facility to add new sources

- Poor query language (keywords)

Introduction Architecture Metadata service Application Conclusion

12 / 39

The PADOUE ProjectThe PADOUE Project

ArchitectureArchitecture

• Interoperability of mediators in a peer-to-peer network

M

M

M

P2P network

schema

schema

schema

Introduction Architecture Metadata service Application Conclusion

13 / 39

The PADOUE ProjectThe PADOUE Project

ChallengesChallenges

• How to know what exists?– Providing a schema fulfilling the users’ needs which is consistent with the reality

of the network (high dynamicity)

CUSTOMIZATION OF SCHEMAS

• How to locate efficiently the relevant data?

TAKE ADVANTAGE OF THE KNOWLEDGE

(Network, users, requests, ...)

Introduction Architecture Metadata service Application Conclusion

14 / 39

The PADOUE ProjectThe PADOUE Project

Customization of the schemaCustomization of the schema• Associate to each source a thematic vector describing the source’s

content• Ex: Th.1: 65%, Th. 2: 30%, Th. 3: 5%

• Define a taxonomy of themes• Provide a thematic vision of the network

• Retain, for each theme, the interesting sources• Enhance the schema with the information on the location of data.

General theme

Middle theme1 Middle theme2

Theme1 Theme2 Theme3 Theme4 Theme5

Music

Opera PopRock

Operetta BelCanto Comic opera Pop Rock

Environment

Hydrology Climatology

Oceanography Oceanology Hydrogeology Meteorology Paleoclimatology

Introduction Architecture Metadata service Application Conclusion

15 / 39

The PADOUE ProjectThe PADOUE Project

Enhancing the schemaEnhancing the schema

Defining a model of schemas that allows:– The addition of the semantic component to metadata

– The consideration of meta-information• Identification

– Data theme

• Localization– Mediator’s address

– Map it

• Description– Description of items

• Quality– Criteria for reuse and comparison of schemas

Introduction Architecture Metadata service Application Conclusion

16 / 39

The PADOUE ProjectThe PADOUE Project

Constructing the schemaConstructing the schema

• Two stages:– Publication of schemas

– Collecting the schemas for a given theme and integrating published

schemas

Introduction Architecture Metadata service Application Conclusion

17 / 39

The PADOUE ProjectThe PADOUE Project

Taking advantage of knowledgeTaking advantage of knowledge

• Indexing data for optimizing the data localization

– Knowledge of the network: semantic contents of the network nodes

– Knowledge of the community: collaborative filtering, experiences of members of the same community

– Knowledge of the user (profile): requests history

Introduction Architecture Metadata service Application Conclusion

18 / 39

The PADOUE ProjectThe PADOUE Project

Knowledge representationKnowledge representation

• Knowledge of the network– Thematic vector

• Knowledge of the users– Community vector

…00.400.250.35Proportion

…‘Oceanography’‘Oceanology’‘Hydrology’‘Climatology’Theme

…0101Presence

…‘Oceanography’‘Oceanology’‘Hydrology’‘Climatology’Theme

Introduction Architecture Metadata service Application Conclusion

19 / 39

The PADOUE ProjectThe PADOUE Project

Complementarity of approachesComplementarity of approaches

• Knowledge of the network (Thematic vector)

���� Organization of the network

• Knowledge of the community (Community vector)

���� Optimization of request routing

Introduction Architecture Metadata service Application Conclusion

20 / 39

The PADOUE ProjectThe PADOUE Project

Semantic organization of the networkSemantic organization of the network

• Organizing the network in terms of the data theme⇒ Reduce the logical distance between the nodes storing data of the same theme

• Construction of a logical neighbourhood for pairs connecting to the network by using a neural classification algorithm.

N10

N40N30

N21

N22

N41N13

N12

N11

N20

Initial

� �

Cluster 3

Cluster 1

Cluster 4

Cluster 2

N41

N40 N30

N20

N22

N21N13N12

N11

N10

Initial

��

��

Introduction Architecture Metadata service Application Conclusion

21 / 39

The PADOUE ProjectThe PADOUE Project

Optimizing the query routingOptimizing the query routing

• The knowledge of a community (community vector) and the experiences of the community (the relevant sites for this community) are used to preferably send these queries to these nodes.

• The requests history of a user points to the sites relevant for that user.

Introduction Architecture Metadata service Application Conclusion

22 / 39

The PADOUE ProjectThe PADOUE Project

StructureStructure

• Architecture– Sharing of heterogeneous data on a large scale

• Integration of schemas

– Localization help• Semantic organization of the network

• Experiences of users, communities, requests

• Metadata service– Metadata structuring

– Automatization of the management (MDweb)

• Application– Management and monitoring of dikes

Introduction Architecture Metadata service Application Conclusion

23 / 39

The PADOUE ProjectThe PADOUE Project

Context of the Padoue project:Context of the Padoue project:

environmental applicationsenvironmental applications

• Data exists in great numbers, it is very voluminous and very heterogeneous

• Data originates from different protocols, poorly documented• Information production requires the combination of base data

(spatial and temporal consistency of data)• Mixture of reference data (normalized) and specific,

undocumented data• Several actors, of heterogeneous profiles (different points of

view, professions)

Introduction Architecture Metadata service Application Conclusion

24 / 39

The PADOUE ProjectThe PADOUE Project

Specificities of environmental dataSpecificities of environmental data

• Necessity of using metadata– To avoid

• Erroneous interpretation

• Wrong use

• Faulty perception of the accuracy

• Existence of adapted standards – Dublin Core

– United States: FGDC & OGI Open GIS Consortium, ISO

– Europe: CEN/TC 287, Australia: ANZLIG

Introduction Architecture Metadata service Application Conclusion

25 / 39

The PADOUE ProjectThe PADOUE Project

Metadata serviceMetadata service

• General metadata model describing the resources– Compatible with the different standards (FGDC, ISO, CEN)

• Acquisition and maintenance of metadata– Automatic extraction of certain fields

• Display and Navigation

Introduction Architecture Metadata service Application Conclusion

26 / 39

The PADOUE ProjectThe PADOUE Project

Metadata structuringMetadata structuring

• Organization of metadata into sections• Identification

• Quality

• Spatial organization

• Protocol

• …

• Organization into three abstraction levels• Elementary

• Extended

• Complete

Introduction Architecture Metadata service Application Conclusion

27 / 39

The PADOUE ProjectThe PADOUE Project

TemplateTemplate

• Extraction of items of the standard (and extension if necessary)depending on the needs of a community, a project or an application

Core of the standard Template

standard

Introduction Architecture Metadata service Application Conclusion

28 / 39

The PADOUE ProjectThe PADOUE Project

Generic toolGeneric tool

MDweb tool:MDweb tool:•• Server application• Multi-platform (Unix, Linux, Windows)• Multi-RDBMS (Oracle, MySql, Postgres)• Structured around the ISO 19115 standard

Features:Features:• Semi-automatic entry and updating of metadata• Management of the publication and the confidentiality• Multi-criteria consultation and spatial requests• Global configuration of the interface

ROSELT application (Network of observatories for long-term ecological monitoring)

Introduction Architecture Metadata service Application Conclusion

29 / 39

The PADOUE ProjectThe PADOUE Project

MDWEBMDWEB

Introduction Architecture Metadata service Application Conclusion

30 / 39

The PADOUE ProjectThe PADOUE Project

Interfaces

AutomatizationCartographic interface

Data-entry

ImplementationIntroduction Architecture Metadata service Application

31 / 39

The PADOUE ProjectThe PADOUE Project

Implementation

Interfaces

Data-entry

AutomatizationThesaurus

Introduction Architecture Metadata service Application Conclusion

32 / 39

The PADOUE ProjectThe PADOUE Project

Implementation

Interfaces Local search

Introduction Architecture Metadata service Application Conclusion

33 / 39

The PADOUE ProjectThe PADOUE Project

StructureStructure

• Architecture– Sharing of heterogeneous data on a large scale

• Integration of schemas

– Localization help• Semantic organization of the network

• Experiences of users, communities, requests

• Metadata service– Metadata structuring

– Automatization of the management (MDweb)

• Application– Management and monitoring of dikes

Introduction Architecture Metadata service Application Conclusion

34 / 39

The PADOUE ProjectThe PADOUE Project

Dikes applicationDikes application• Dikes management (Diagnostic and maintenance of dikes,

Enumeration)

• Management of protected areas, crisis management

• Dikes in France– Several thousand km of dikes (~ 8000 km)

– Structures not well-known, old and fragile

– Considerable human, economic and environmental issues

• ~ 2 million people are sheltered by dikes

Introduction Architecture Metadata service Application Conclusion

35 / 39

The PADOUE ProjectThe PADOUE Project

Disturbances

Widespread impact Arles, December 2003: 1,000 K€

Breaks in dikes

36 / 39

The PADOUE ProjectThe PADOUE Project

‘Dikes management’ scenario‘Dikes management’ scenario• Dikes IS at two scales

(a) 1 x IS ‘national enumeration of dikes’ (Min. of Environment)

(b) n x IS local ‘Dike SRIS’ (available locally to dike managers): Camargue, Isère, Loire Moyenne, …

• Plan adoptedThe Padoue application:

1- Scans the network of nodes of the Dike SRIS community (b)

2- Enumerates the fragile dike sections (mathematical model included in the IS (b))

3- Combines with the IGN DB to identify the entities at risk near the fragile sections (topological requests)

4- Display the map with entities at risk by dike

5- Calculate and update the entities at risk field of the IS (a)

Introduction Architecture Metadata service Application Conclusion

37 / 39

The PADOUE ProjectThe PADOUE Project

ImplementationImplementation

Dike

SRIS

IGN DB

PostGIS

DB

Import

‘Sections at

risk’

Import

‘Buildings’

Java program…

SQL requests

create table buffer as (select

buffer(the_geom, 250),* from

dike_section)

Introduction Architecture Metadata service Application Conclusion

38 / 39

The PADOUE ProjectThe PADOUE Project

ResultsResults

Dikes

Sections at risk

Buildings

Zones potentially

affected by a break

Buildings at risk

Introduction Architecture Metadata service Application Conclusion

39 / 39

The PADOUE ProjectThe PADOUE Project

ConclusionConclusion

• Data mediation on a large scale– Customized schemas

– Semantic organization of the network

• Service for metadata management– Metadata structuring

– Help for automatizing acquisition and maintenance

– Search

• Applications – Inclusion of environmental data for help in decision making, crisis

management, etc.

Introduction Architecture Metadata service Application Conclusion