Co-funded by the European Community eContentplus programme The NATURE-SDIplus Validation...
-
Upload
bethanie-arnold -
Category
Documents
-
view
216 -
download
2
Transcript of Co-funded by the European Community eContentplus programme The NATURE-SDIplus Validation...
co-funded by the European Community eContentplus programme
The NATURE-SDIplus Validation methodology
2
Overall testing and validation approach Validation of data specification and data
encoding Data accessibility and usability testing Data Quality evaluation Data generalisation
PRESENTATION OUTLINE
3
NATURE-SDIplus MAIN OUTCOMES
Harmonised DS & MD(PS, BR, HB, SD)
GEOPORTAL (network services)
Data Models for 3 Annex III themes
(BR, HB, SD)
NatSDI MD profile(s)
4
NATURE-SDIplus DATASETS AND METADATA
Harmonised DS + MD(PS, BR, HB, SD)
after harmonisation
DS + MD(PS, BR, HB, SD)
before harmonisation
Before Task 4.1
After Task 4.1
5
WP5: TASKS AND INTER-RELATIONSHIPS
INSPIRE validation T5.1
Test on data accessibility &
usabilityT5.2
Quality evaluation
and dataset generalisation
T5.3
6
Generic validation process Covers both
Validation of specification encoding Validation of data encoding
NATURE-SDIplus specifications and test data as examples
NATURE-SDIplus VALIDATION METHODOLOGY
7
VALIDATION OF SPECIFICATION ENCODING
The required steps: Validate Schema Check transposition of
specification Check validatability
The process:
8
The process:
METADATA VALIDATION
The required steps:
Syntactic validationSemantic validation
9
DATA VALIDATION
The process:
The required steps:
Syntactic validationSemantic validation
10
VALIDATION BRIEFCASE OVERVIEW
11
Validation Reports not uploaded;
122; 57%
Validation Reports uploaded; 93;
43%
Harmonisation completed; 215;
92%
Harmonisation in progress; 19; 8%
RESULTS OF THE USE OF THE VALIDATION BRIEFCASE
Validation Completed; 63;
29%
Other reasons (gml not uploaded, etc.);
13; 9,4%Invalid file format (shp); 46; 33,3%
Invalid file format (gml no 3.2.1); 79;
57,2%
Validation Not Completed; 14; 7%
Validation Not Applicable; 138;
64%
both Schema Validation and
INSPIRE theme Schematron
Validation NCs; 4; 13%
only INSPIRE theme Schematron Validation NCs; 4;
13%
only Schema Validation NCs; 23;
74%
with Non Conformities; 31;
49,2%
without Non Conformities; 32;
50,8%
12
HALE; 10; 15,9%
Geoconverter; 8; 12,7%
FME,XmlSpy,deegree; 1; 1,6%
GO Publisher; 30; 47,6%
Arc GIS Desktop 9.3, Quantum GIS, OGR tools, Altova
MapForce, Oxygen XML Editor,
(Humboldt tools); 14; 22,2%
STATISTICS OF THE REMODELLING Tools used to hamonise the 63 datasets for which the validation has been completed
13
ASSESSING DATA ACCESIBILITY
Level I Criteria Level II Criteria Test Item Test Method
Discoverability Data Search, Multilingualism and Semantic search, Explore Metadata Details, Panning, Zooming and Exploring Feature Information
Geoportal functionalities enabling search of metadata and access to harmonised datasets
Evaluate using the test criterion in section 3.1
Retrievability Retrievability - Retrieving Spatial Features, Downloading GML
Geoportal functionalities enabling the download of GML as zip files
Evaluate using the test criterion in section 3.1
Exploitability Performance, Availability, Reliability, Compliance, Security
Portal and download services offered by the geoportal
Evaluate using the test criterion in section 3.1
14
ASSESSING DATA USABILITY
STEP 1: Design of online questionnaires
STEP 2: Distribution and survey
STEP 3: Result gathering and analysis
STEP 4: Reporting
15
DATA USABILITY ON-LINE QUESTIONNAIRE (1/2)
First part to collect info about the user extent of the geographical AOI used group of stakeholders belonging to type of professional activity / field of expertise Data theme assessed key-words used during data search
16
DATA USABILITY ON-LINE QUESTIONNAIRE (2/2) Second part to collect info about how data relevant to a given theme
are usable: within the geoportal (using its functionalities) outside the geoportal (downloading the data via the geoportal
and using them inside your application, and/or consuming the wms/wfs directly in your application).
The user is asked to rate as poor or moderate or good or excellent her/his level of satisfaction of using:
the overall Geoportal functionalities the specific search functionalities the data within the Geoportal the data outside the Geoportal
Built using Google docs tools
28/06/2011 17INSPIRE Conference 2011
QUSTIONNAIRES PROCESSING (1/4)
18
QUSTIONNAIRES PROCESSING (2/4)
19
QUSTIONNAIRES PROCESSING (3/4)
20
QUSTIONNAIRES PROCESSING (4/4)
DATA QUALITY EVALUATION
Main objective of the task 5.3 in terms of quality evaluation: to assess the quality of the harmonised vs. the source datasets
A four steps methodology has been developed and applied
DATA QUALITY EVALUATION METHODOLOGY
Step 1deep analysis of the background documentation:
the international standards EN ISO 19113, 19114, ISO/TS 19138the data quality issues covered by INSPIREthe NatureSDIplus Metadata profile
from which the data quality elements and subelements, together with the corresponding measures and their reporting have been extracted
Step 2Elaboration of a set of guidelines enabling the quality evaluation of spatial datasets belonging to the four INSPIRE themes covered by NatureSDIplus (PS, BR, HB, SD)
Step 3Adaptation of the step 2 guidelines in order to use the selected data quality elements and subelements to assess the quality of the NatureSDIplus harmonised vs. source datasets
Step 4Application of the step 3 guidelines to 4 harmonised datasets (1 harmonised dataset for each of the four INSPIRE themes – PS, BR, HB, SD) and reporting of the quality evaluation results
23
QUALITY OF DS & MD
EN ISO 19113 Geographic Information – Quality principlesEN ISO 19114 Geographic Information – Quality evaluation proceduresTS ISO 19138 Geographic information – Data quality measuresEN ISO 19115 Geographic Information – Metadata
DQMD
INSPIRE DS Req’s and
Rec’s
Select the DQ elements and sub-elements(cross-checking INSPIRE PS Data Specifications,
NatureSDIplus MD profile and EN ISO 19113)
For each sub-element define a DQ measure(in adherence to ISO/TS 19138)
For each sub-element define a DQ reporting(in adherence to EN ISO 19114 and EN ISO 19115)
For each sub-element provide an example of DQ evaluation
METHODOLOGY FOLLOWED TO DEVELOP THE GUIDELINES FOR DQ EVALUATION
Data quality element Data quality sub-element
Covered by INSPIRE specification for Protected sites
Covered by NatureSDI+ Metadata profile
Completeness Commission Optional - PS Optional - PS
Omission Optional - PS Optional - PS, BR, HB, SD
Positional accuracy Absolute or external accuracy
Optional - PS Optional - PS
Temporal accuracy Accuracy of a time measurement
Optional –BR, HB, SD
Temporal consistency Optional –BR, HB, SD
Thematic accuracy Classification correctness
Optional –BR, HB, SD
Quantitative attribute correctness
Optional –BR, HB, SD
DATA QUALITY ELEMENTS AND SUB-ELEMENTS
Data quality component
Data quality scope All items of PS datasets of CountryX
Data quality element Completeness Completeness
Data quality subelement Commission Omission
Data quality measure
Data quality measure description Rate of excess items Rate of missing items
Data quality basic measure Error rate Error rate
Data quality measure identification code
3 (ISO/TS 19138) 7 (ISO/TS 19138)
Data quality evaluation method
Data quality evaluation method type External External
Data quality evaluation method description
Number of excess items in the dataset in relation to the number of items that should have been present
Number of missing items in the dataset in relation to the number of items that should have been present
Data quality result
Data quality value type Percentage Ratio
Data quality value 0% 20:500
Data quality value unit - -
Data quality date 2011-02-01 2011-02-02
Conformance quality level Zero items Zero items
Dataset parameters 0 excess items are present in the harmonised dataset; 480 items are present in the dataset.
480 items in dataset are within the data quality scope; 500 items in the universe of discourse are within the scope.
Quality result meaning Dataset pass. No excess items exist.
Dataset fails. The number of missing items in the dataset exceeds the data quality conformance quality level..
DATA QUALITY EVALUATION REPORTING
The Data Quality elements and subelements have been structured according to the EN ISO 19115 formalisms, enabling their eventual future encoding as metadata according to the CEN ISO/TS 19139
The results achieved can be easily applied also to the other data themes, therefore providing a basis for Data Quality issues in the INSPIRE context
DATA QUALITY EVALUATION ADDITIONAL RESULTS
DATASETS GENERALISATION
Main objective: to assess issues related to datasets generalisation from the local level to the national/European level
Method: design of an off-line questionnaire to collect the feedback of the NatureSDIplus Data Providers (DPs) about the usability of the PS, BR, HB and SD Data Models and of the NatureSDIplus Metadata Profile when harmonising data and metadata at local level and aiming at generalising them from the local to the national/European level.
DATASETS GENERALISATION QUESTIONNAIRE In particular, the feedback focused on two main aspects:
if DPs have noticed the need/opportunity to extend/modify the target data models, in order to better take into account local aspects
if DPs have noticed the need/opportunity to extend/modify the source data models, in order to facilitate INSPIRE compliance.
The first aspect is coherent with the Annex F (Example for an extension to an INSPIRE application schema) of the INSPIRE Data Specification D2.5 Generic Conceptual Model, according to which the INSPIRE data specifications can be modified at local level, in terms of data model, in order to take into account local aspects.
The feedback collected on the second aspect can support local communities engaged in implementing INSPIRE.
DATASETS GENERALISATION QUESTIONNAIRE MAIN RESULTS
19 questionnaires filled-in by 19 different DPs, replies analysed and results processed
Yes, I noticed the need/opportunity;
7; 37%
No;12; 63%
No; 3; 16%
Yes, I noticed the need/opportunity;
16; 84%
Need/opportunity to extend/modify the target data models, in order to better take
into account local aspects
Need/opportunity to extend/modify the source data
models, in order to facilitate INSPIRE compliance
DATASETS GENERALISATION FEEDBACK (1/2)
Some feedback about the need/opportunity to extend/modify the target data models, in order to better take into account local aspects: “The information contained in the source PS dataset site Protection
Classification is divided to 20 values. In NATURE-SDIplus data model 7 values. It is difficult to select the suitable one”
“We noticed that the Habitats and Biotopes target data Model doesn’t cover the whole information we have describing each habitats. Our information relates several habitats to one geographical feature and the target data model expects only the relation 1 – 1. Other ways, like duplicating geographical information could be taken into account but from our point of view is not the best solution in the future.”
“Metadata: I would leave out Data quality - Thematic and temporal accuracy and Acquisition method. The latter because it can be described already in the lineage part. Data models: BGR: I would leave out the detailed class description parameters such as temperature, rainfall, etc… HB: Also here I would leave out a number of attribute such as elevation, activities and impacts, development Stage, monitoring Assessment.”
DATASETS GENERALISATION FEEDBACK (2/2)
Some feedback about the need/opportunity to extend/modify the source data models, in order to facilitate INSPIRE compliance: “Some source datasets are missing a lot of mandatory
information to be INSPIRE compliant. The need to restructure the datasets into a database (and not a collection of flat files) is crucial for some data providers”
“The information contained in the source dataset is not sufficient to populate the corresponding attributes of the target data model. E.g.: The attribute ‘MANAGPL’ of the source dataset contains, for some sites, information about the type of site management, whilst it should contain the URL or citation of a document describing the site management plans. Moreover, for other sites, the attribute contain references to many documents.
“Our datasets are simple shapefiles with attributes.”