An architecture for mutualization and sharing of ... · An architecture for mutualization and...
-
Upload
truongkiet -
Category
Documents
-
view
220 -
download
0
Transcript of An architecture for mutualization and sharing of ... · An architecture for mutualization and...
1
Territories, Environment, Remote Sensing & Spatial Information Joint Research Unit Cemagref - CIRAD - ENGREF
An architecture for mutualization and sharing of environmental dAn architecture for mutualization and sharing of environmental data:ata:
the PADOUE projectthe PADOUE project
Anne Doucet
LIP6 (University Paris 6)
2 / 39
The PADOUE ProjectThe PADOUE Project
The PADOUE projectThe PADOUE project
• Data sharing for environment-related uses
• ACI-GRID (2002-2005)
• Partners: – LIP6 (CNRS - University Paris 6)
– INRIA (Project Caravel, Rocquencourt)
– LIRMM (CNRS - University of Montpellier II)
– CEMAGREF (LISC and JRU-3S)
– IRD (Desertification unit, Montpellier)
– CDS (Strasbourg)
Introduction Architecture Metadata service Application Conclusion
3 / 39
The PADOUE ProjectThe PADOUE Project
ContextContext
• Environmental scientists– Acquire and stock raw data, images, maps, …
– Develop programmes to analyse and process this data and applications using this information
Introduction Architecture Metadata service Application Conclusion
4 / 39
The PADOUE ProjectThe PADOUE Project
Network
ContextContext
– Large amounts of heterogeneous data, stored in autonomous and distributed sources
– Computer programmes to analyse and process data developed in an autonomous manner
A formidable network of resources!
A
MD
B
MD
CMD
Introduction Architecture Metadata service Application Conclusion
5 / 39
The PADOUE ProjectThe PADOUE Project
GoalsGoals
• How to take advantage of this network?– Identify and locate relevant data
– Calculate the data (filter and combine information flow, deliver
usable results)
– Enhance the network with consolidated data
Introduction Architecture Metadata service Application Conclusion
6 / 39
The PADOUE ProjectThe PADOUE Project
GoalsGoals
• Documentation and archiving of data and processesStructuring and managing metadata
• Interoperability of data and processes Architecture for large-scale sharing of heterogeneous data
• Establishing sequences of scientific processingScientific dataflow language
• Meaningful localization of relevant dataUse of the semantics of the data sources’ contents, and experiences of users, communities, requests
Introduction Architecture Metadata service Application Conclusion
7 / 39
The PADOUE ProjectThe PADOUE Project
StructureStructure
• Architecture– Sharing of heterogeneous data on a large scale
• Integration of schemas
– Localization improvement• Semantic organization of the network
• Experiences of users, communities, queries
• Metadata service– Metadata structuring
– Automatization of the management (MDweb)
• Application– Management and monitoring of dikes
Introduction Architecture Metadata service Application Conclusion
8 / 39
The PADOUE ProjectThe PADOUE Project
LargeLarge--scale data mediationscale data mediation
• Sharing of heterogeneous data– Mediation tool
• Large-scale distribution– Peer-to-peer network
Introduction Architecture Metadata service Application Conclusion
9 / 39
The PADOUE ProjectThe PADOUE Project
Data mediationData mediation
• Processing data heterogeneity
• Properties– Integration of existing data
– 3-layer architecture
• Hypothesis– Knowledge about the sources
• Interest– Meaningful processing of requests
Application
Mediatorédiateur
Adapt. Adapt. Adapt.
DBMSRelationnel
R(A,B,C)
Fichier texteFichier
texteFichier texte
x ; y ; ,z
Sources
Documents XML
Documents XMLDocuments
XML
R’
X B Y
Introduction Architecture Metadata service Application Conclusion
10 / 39
The PADOUE ProjectThe PADOUE Project
PeerPeer--toto--peer (P2P) networkspeer (P2P) networks
• Permit scalability and dynamicity
• P2P architecture– Client and server simultaneously
– No centralized coordination
– No global knowledge of the network
– All data accessible from any machine
• Unstructured P2P
TTL=4
TTL=3
TTL=2
TTL=3
TTL=2
TTL=2
TTL=1
TTL=1
TTL=1
TTL=0
TTL=0
A
B
H
F
C DE
G
I J
K
L
: message
����
����
����
����
����
����
����
����
���� ����
����
���� : message processed
: message not processed
����
����
Introduction Architecture Metadata service Application Conclusion
11 / 39
The PADOUE ProjectThe PADOUE Project
Mediation and PeerMediation and Peer--toto--PeerPeer
• Take advantage of the complementarity:– Mediator:
+ High-level language (SQL, XQuery,…)
- Known sources (fixed)
- Limited number of sources
– Peer-to-peer
+ Very large number of sources
+ Facility to add new sources
- Poor query language (keywords)
Introduction Architecture Metadata service Application Conclusion
12 / 39
The PADOUE ProjectThe PADOUE Project
ArchitectureArchitecture
• Interoperability of mediators in a peer-to-peer network
M
M
M
P2P network
schema
schema
schema
Introduction Architecture Metadata service Application Conclusion
13 / 39
The PADOUE ProjectThe PADOUE Project
ChallengesChallenges
• How to know what exists?– Providing a schema fulfilling the users’ needs which is consistent with the reality
of the network (high dynamicity)
CUSTOMIZATION OF SCHEMAS
• How to locate efficiently the relevant data?
TAKE ADVANTAGE OF THE KNOWLEDGE
(Network, users, requests, ...)
Introduction Architecture Metadata service Application Conclusion
14 / 39
The PADOUE ProjectThe PADOUE Project
Customization of the schemaCustomization of the schema• Associate to each source a thematic vector describing the source’s
content• Ex: Th.1: 65%, Th. 2: 30%, Th. 3: 5%
• Define a taxonomy of themes• Provide a thematic vision of the network
• Retain, for each theme, the interesting sources• Enhance the schema with the information on the location of data.
General theme
Middle theme1 Middle theme2
Theme1 Theme2 Theme3 Theme4 Theme5
…
Music
Opera PopRock
Operetta BelCanto Comic opera Pop Rock
…
Environment
Hydrology Climatology
Oceanography Oceanology Hydrogeology Meteorology Paleoclimatology
…
Introduction Architecture Metadata service Application Conclusion
15 / 39
The PADOUE ProjectThe PADOUE Project
Enhancing the schemaEnhancing the schema
Defining a model of schemas that allows:– The addition of the semantic component to metadata
– The consideration of meta-information• Identification
– Data theme
• Localization– Mediator’s address
– Map it
• Description– Description of items
• Quality– Criteria for reuse and comparison of schemas
Introduction Architecture Metadata service Application Conclusion
16 / 39
The PADOUE ProjectThe PADOUE Project
Constructing the schemaConstructing the schema
• Two stages:– Publication of schemas
– Collecting the schemas for a given theme and integrating published
schemas
Introduction Architecture Metadata service Application Conclusion
17 / 39
The PADOUE ProjectThe PADOUE Project
Taking advantage of knowledgeTaking advantage of knowledge
• Indexing data for optimizing the data localization
– Knowledge of the network: semantic contents of the network nodes
– Knowledge of the community: collaborative filtering, experiences of members of the same community
– Knowledge of the user (profile): requests history
Introduction Architecture Metadata service Application Conclusion
18 / 39
The PADOUE ProjectThe PADOUE Project
Knowledge representationKnowledge representation
• Knowledge of the network– Thematic vector
• Knowledge of the users– Community vector
…00.400.250.35Proportion
…‘Oceanography’‘Oceanology’‘Hydrology’‘Climatology’Theme
…0101Presence
…‘Oceanography’‘Oceanology’‘Hydrology’‘Climatology’Theme
Introduction Architecture Metadata service Application Conclusion
19 / 39
The PADOUE ProjectThe PADOUE Project
Complementarity of approachesComplementarity of approaches
• Knowledge of the network (Thematic vector)
���� Organization of the network
• Knowledge of the community (Community vector)
���� Optimization of request routing
Introduction Architecture Metadata service Application Conclusion
20 / 39
The PADOUE ProjectThe PADOUE Project
Semantic organization of the networkSemantic organization of the network
• Organizing the network in terms of the data theme⇒ Reduce the logical distance between the nodes storing data of the same theme
• Construction of a logical neighbourhood for pairs connecting to the network by using a neural classification algorithm.
N10
N40N30
N21
N22
N41N13
N12
N11
N20
Initial
�
�
�
�
�
�
�
� �
Cluster 3
Cluster 1
Cluster 4
Cluster 2
N41
N40 N30
N20
N22
N21N13N12
N11
N10
Initial
�
�
�
�
�
��
��
Introduction Architecture Metadata service Application Conclusion
21 / 39
The PADOUE ProjectThe PADOUE Project
Optimizing the query routingOptimizing the query routing
• The knowledge of a community (community vector) and the experiences of the community (the relevant sites for this community) are used to preferably send these queries to these nodes.
• The requests history of a user points to the sites relevant for that user.
Introduction Architecture Metadata service Application Conclusion
22 / 39
The PADOUE ProjectThe PADOUE Project
StructureStructure
• Architecture– Sharing of heterogeneous data on a large scale
• Integration of schemas
– Localization help• Semantic organization of the network
• Experiences of users, communities, requests
• Metadata service– Metadata structuring
– Automatization of the management (MDweb)
• Application– Management and monitoring of dikes
Introduction Architecture Metadata service Application Conclusion
23 / 39
The PADOUE ProjectThe PADOUE Project
Context of the Padoue project:Context of the Padoue project:
environmental applicationsenvironmental applications
• Data exists in great numbers, it is very voluminous and very heterogeneous
• Data originates from different protocols, poorly documented• Information production requires the combination of base data
(spatial and temporal consistency of data)• Mixture of reference data (normalized) and specific,
undocumented data• Several actors, of heterogeneous profiles (different points of
view, professions)
Introduction Architecture Metadata service Application Conclusion
24 / 39
The PADOUE ProjectThe PADOUE Project
Specificities of environmental dataSpecificities of environmental data
• Necessity of using metadata– To avoid
• Erroneous interpretation
• Wrong use
• Faulty perception of the accuracy
• Existence of adapted standards – Dublin Core
– United States: FGDC & OGI Open GIS Consortium, ISO
– Europe: CEN/TC 287, Australia: ANZLIG
Introduction Architecture Metadata service Application Conclusion
25 / 39
The PADOUE ProjectThe PADOUE Project
Metadata serviceMetadata service
• General metadata model describing the resources– Compatible with the different standards (FGDC, ISO, CEN)
• Acquisition and maintenance of metadata– Automatic extraction of certain fields
• Display and Navigation
Introduction Architecture Metadata service Application Conclusion
26 / 39
The PADOUE ProjectThe PADOUE Project
Metadata structuringMetadata structuring
• Organization of metadata into sections• Identification
• Quality
• Spatial organization
• Protocol
• …
• Organization into three abstraction levels• Elementary
• Extended
• Complete
Introduction Architecture Metadata service Application Conclusion
27 / 39
The PADOUE ProjectThe PADOUE Project
TemplateTemplate
• Extraction of items of the standard (and extension if necessary)depending on the needs of a community, a project or an application
Core of the standard Template
standard
Introduction Architecture Metadata service Application Conclusion
28 / 39
The PADOUE ProjectThe PADOUE Project
Generic toolGeneric tool
MDweb tool:MDweb tool:•• Server application• Multi-platform (Unix, Linux, Windows)• Multi-RDBMS (Oracle, MySql, Postgres)• Structured around the ISO 19115 standard
Features:Features:• Semi-automatic entry and updating of metadata• Management of the publication and the confidentiality• Multi-criteria consultation and spatial requests• Global configuration of the interface
ROSELT application (Network of observatories for long-term ecological monitoring)
Introduction Architecture Metadata service Application Conclusion
29 / 39
The PADOUE ProjectThe PADOUE Project
MDWEBMDWEB
Introduction Architecture Metadata service Application Conclusion
30 / 39
The PADOUE ProjectThe PADOUE Project
Interfaces
AutomatizationCartographic interface
Data-entry
ImplementationIntroduction Architecture Metadata service Application
31 / 39
The PADOUE ProjectThe PADOUE Project
Implementation
Interfaces
Data-entry
AutomatizationThesaurus
Introduction Architecture Metadata service Application Conclusion
32 / 39
The PADOUE ProjectThe PADOUE Project
Implementation
Interfaces Local search
Introduction Architecture Metadata service Application Conclusion
33 / 39
The PADOUE ProjectThe PADOUE Project
StructureStructure
• Architecture– Sharing of heterogeneous data on a large scale
• Integration of schemas
– Localization help• Semantic organization of the network
• Experiences of users, communities, requests
• Metadata service– Metadata structuring
– Automatization of the management (MDweb)
• Application– Management and monitoring of dikes
Introduction Architecture Metadata service Application Conclusion
34 / 39
The PADOUE ProjectThe PADOUE Project
Dikes applicationDikes application• Dikes management (Diagnostic and maintenance of dikes,
Enumeration)
• Management of protected areas, crisis management
• Dikes in France– Several thousand km of dikes (~ 8000 km)
– Structures not well-known, old and fragile
– Considerable human, economic and environmental issues
• ~ 2 million people are sheltered by dikes
Introduction Architecture Metadata service Application Conclusion
35 / 39
The PADOUE ProjectThe PADOUE Project
Disturbances
Widespread impact Arles, December 2003: 1,000 K€
Breaks in dikes
36 / 39
The PADOUE ProjectThe PADOUE Project
‘Dikes management’ scenario‘Dikes management’ scenario• Dikes IS at two scales
(a) 1 x IS ‘national enumeration of dikes’ (Min. of Environment)
(b) n x IS local ‘Dike SRIS’ (available locally to dike managers): Camargue, Isère, Loire Moyenne, …
• Plan adoptedThe Padoue application:
1- Scans the network of nodes of the Dike SRIS community (b)
2- Enumerates the fragile dike sections (mathematical model included in the IS (b))
3- Combines with the IGN DB to identify the entities at risk near the fragile sections (topological requests)
4- Display the map with entities at risk by dike
5- Calculate and update the entities at risk field of the IS (a)
Introduction Architecture Metadata service Application Conclusion
37 / 39
The PADOUE ProjectThe PADOUE Project
ImplementationImplementation
Dike
SRIS
IGN DB
PostGIS
DB
Import
‘Sections at
risk’
Import
‘Buildings’
Java program…
SQL requests
…
create table buffer as (select
buffer(the_geom, 250),* from
dike_section)
Introduction Architecture Metadata service Application Conclusion
38 / 39
The PADOUE ProjectThe PADOUE Project
ResultsResults
Dikes
Sections at risk
Buildings
Zones potentially
affected by a break
Buildings at risk
Introduction Architecture Metadata service Application Conclusion
39 / 39
The PADOUE ProjectThe PADOUE Project
ConclusionConclusion
• Data mediation on a large scale– Customized schemas
– Semantic organization of the network
• Service for metadata management– Metadata structuring
– Help for automatizing acquisition and maintenance
– Search
• Applications – Inclusion of environmental data for help in decision making, crisis
management, etc.
Introduction Architecture Metadata service Application Conclusion