1 eXtended Metadata Registry (XMDR) for Ecoinformatics Test Bed Interagency/International...
-
Upload
linette-marshall -
Category
Documents
-
view
225 -
download
1
Transcript of 1 eXtended Metadata Registry (XMDR) for Ecoinformatics Test Bed Interagency/International...
1
eXtended Metadata Registry (XMDR)for Ecoinformatics Test Bed
Interagency/International Cooperation on EcoinformaticsCopenhagen, Denmark
June, 20 2006
Bruce BargmeyerLawrence Berkeley National LaboratoryandBerkeley Water CenterUniversity of California, BerkeleyTel: +1 [email protected]
XMDR Purpose
Improve data management through use of stronger semantics management Databases XML data
Enable new wave of semantic computing Take meaning of data into account Process across relations as well as properties May use reasoning engines, e.g., to draw inferences
2
Vocabulary Management
Vocabulary Management is the first step for use of semantic technologies Define concepts and relationships Harmonize terminology, resolve conflicts Collaborate with stakeholders
An approach Select a domain of interest Enter core concepts and relationships Enter metadata describing enterprise data Engage community in vocabulary review Harmonize, validate and vet the vocabulary
4
Use XMDR
For vocabulary repository Register, harmonize, validate, and vet definitions and
relations To register mappings between multiple vocabularies To register mappings of concepts to data To provide semantics services To register and manage the provenance of data
XMDR is part of the infrastructure for semantics and data management.
5
XMDR Use
Upside Collaborative
Supports interaction with community of interest Shared evolution and dissemination Enables Review Cycle
Standards-based – don’t lock semantics into proprietary technology Foundation for strategic data centric applications Lays the foundation for
Ontology-based Information Management Content is reusable for many purposes
Downside Managing semantics is HARD WORK
- No matter how friendly the tools Needs integration with other components
6
XMDR Project Participants
Collaborative, interagency effort EPA, USGS, NCI, Mayo Clinic, DOD, LBNL …&
others Draws on and contributes to interagency/ International
Cooperation on Ecoinformatics Involves Ecoterm, international, national, state, local
government agencies, other organizations as content providers and potential users
Interacts with many organizations around the world through ISO/IEC standards committees
Expected to interact with R&D under EU 7th Framework Program
7
XMDR Update
Extended the capabilities to register more difficult kinds of metadata and concept systems
Linguistic ontologies (OMEGA)Axiomatized ontologies (OpenCyc)
Created new draft of ISO/IEC 11179. Working Draft 4 out for Comment, Committee Draft 1 to go out in June.
Includes UML packages to make it easier to understand and easier to align with other standardsLooking at alignment with OASIS ebXML Registry
Worked on mapping existing 11179 MDR (E2) extended content to proposed Edition 3, particularly Cancer Data Standards Repository (caDSR).
8
XMDR Update
Created new version of XMDR prototype software keyed to ISO/IEC 11179 Working Draft 4.
Revised ontologyRevised softwareReloaded previous contentLoading new content (ongoing)
OMEGA linguistic ontologyCancer Data Standards Repository (caDSR)OpenCyc ontologySIC – NAICS codesMapping of NAICS to SIC codes
Improved interface
9
XMDR for Ecoinformatics Test Bed
Demonstrate the use of the eXtended Metadata Registry (XMDR) to unite concept systems (such as ontologies) and metadata (which describes data) to support semantic services that help to answer tough questions.
Load selected concept systems and metadata into the XMDR and then utilize semantics technologies, including semantics services to make use of data to demonstrate the results.
The demonstration is intended to help answer questions that are swirling around emerging semantics technologies.
Do these open new doors? Answer new questions? How does this fit into the rest of what EPA is doing? How can EPA lead in the use of these new technologies? Why and how should EPA invest in the infrastructure that is necessary to make effective
use of semantic technologies? How is EPA aligning? What is the EPA strategy?
10
XMDR in Ecoinformatics Test Bed
Think of XMDR as “Embedded—an essential part of an infrastructure upon which applications are built.
Embed XMDR in EU FP7 project technology Embed XMDR in traditional database
application environment Embed XMDR in new semantic computing
environment
11
XMDR in Ecoinformatics Test Bed
Include XMDR (ISO/IEC 11179 Edition 3 in architectures – DoD, EPA, Federal Enterprise Architecture
Include XMDR as key enabling capability for Ecoinformatics
Looking for a collaborator who has the “rest of the story” that can demonstrate the utility of XMDR
12
XMDR Demonstration using Water InformationPotential Collaboration with the following:
USGS Terminology Web Services
EU FP 7 EcoSemantics project
GEOSS data integration Water Information System
for Europe Water Data Infrastructure
(WADI) Berkeley Water Center
(BWC) Microsoft Technical Computing Initiative (TCI)
BWC Digital Watershed Research Thrust Area
Estuarine and Great Lakes Program (EAGLES)
LBNL Environmental Modeling projects
XMDR in Ecoinformatics Test Bed
Demonstrate capabilities: Register existing and formative water related concept systems, based on
their underlying structures, such as graphs of varying complexity. Register water ontologies as they are developed.
Interrelate concepts systems with each other. Support efforts to converge on consistency through harmonization and
vetting activities. Interrelate concepts in concept systems with concepts in metadata and
concepts in databases, knowledgebases, and text. Provide semantic services needed to support traditional computing as well
as semantic computing. E.g., dereferencing the URIs used in creating RDF statements, by providing relevant
information describing the referenced concept and its authoritative standing within some community of interest.
14
Collaborate with USGS Terminology Web Services
Already working with Mike Frame Capability to use web service to access
terms in multiple concept systems Developed XMDR REST API to support
this More from Mike Frame
15
XMDR Embedded in EcoSemantics Architecture
16
XMDR Prototype Modular Architecture:primary functional components
Registry Store
Search & Content Serving
XMDR metamodel (OWL & xml schema)
standard XMDR filesstandard XMDR files
standard XMDR filesstandard XMDR files
LogicIndex
Content Loading & Transformation
Human User Interface
Metadata Sources concept systems,
data elements
USERSWeb Browsers…..Client
Software
Application Program Interface
Authentication ServiceValidation
MappingEngine
Logic Indexer Text Indexer
Metamodel specs(UML & Editing)
XMDR data model & exchange format
XML, RDF, OWL
TextIndex
XMDR Prototype open source software components
Registry Store
Search & Content Serving (Jena, Lucene)
XMDR metamodel (OWL & xml schema)
standard XMDR filesstandard XMDR files
standard XMDR filesstandard XMDR files
LogicIndex
Content Loading & Transformation
(Lexgrid & custom)
Human User Interface(HTML fromJSP and javascript; Exhibit)
Metadata Sources concept systems,
data elements
USERSWeb Browsers…..Client
Software
Application Program Interface (REST)
Authentication ServiceValidation
(XML Schema)
MappingEngine
Logic Indexer(Jana & Pellet)
Text Indexer(Lucene)
Metamodel specs(UML & Editing)
(Poseidon, Protege)
XMDR data model & exchange format
XML, RDF, OWL
TextIndex
Postgres Database
New REST style APIfacilitates interface for Web Services
Registry Store
Search & Content Serving (Jena, Lucene)
XMDR metamodel (OWL & xml schema)
standard XMDR filesstandard XMDR files
standard XMDR filesstandard XMDR files
LogicIndex
Content Loading & Transformation
(Lexgrid & custom)
Human User Interface(HTML fromJSP and javascript; Exhibit)
Metadata Sources concept systems,
data elements
USERSWeb Browsers…..Client
Software
Application Program Interface (REST)
Authentication ServiceValidation
(XML Schema)
MappingEngine
Logic Indexer(Jana & Pellet)
Text Indexer(Lucene)
Metamodel specs(UML & Editing)
(Poseidon, Protege)
XMDR data model & exchange format
XML, RDF, OWL
TextIndex
Postgres Database
Third Party Software
Collaborate with GEOSS (with EPA and Others)
Global Earth Observation System of Systems (GEOSS) ten-year implementation plan. GEOSS is envisioned as a large national and international cooperative effort to bring
together existing and new hardware and software, making it all compatible in order to supply data and information at no cost. The U.S. and developed nations have a unique role in developing and maintaining the system, collecting data, enhancing data distribution, and providing models to help all of the world's nations. Outcomes and benefits of a global informational system will include:
disaster reduction integrated water resource management ocean and marine resource monitoring and management weather and air quality monitoring, forecasting and advisories biodiversity conservation sustainable land use and management public understanding of environmental factors affecting human health and well being better development of energy resources adaptation to climate variability and change
Demonstrate data integration
20
ADC Co-Chair Meeting 27 Nov 2006
21
GEOSS GEOSS Standards and Standards and
Interoperability Interoperability ForumForum
Experts, Experts, SDOs, SDOs,
CommunityCommunity
GEOSS GEOSS Interoperability Interoperability
RegistryRegistry
Base GEOSS Base GEOSS StandardsStandards
GEOSS Standards GEOSS Standards RegistryRegistry
GEOSS GEOSS Societal Societal Benefit Benefit ActivityActivity
GEOSS Components GEOSS Components RegistryRegistry
References
Recommendation
Request for help with interoperability between two GOESS components
Study for possible existing solutions
Register the issue as “under review”
Register the recommendations, if
“accepted”
References
References
From: S.J.S. Khalsa, IEEE Geoscience and Remote Sensing Society
GEOS Interoperability
Collaborate with Water Information System for Europe (WISE)
Register metadata about WISE data elements Register concept systems with concepts used in WISE
data (glossary … ontology) Support data harmonization Initially shows support for traditional database
computing Helps to enable introduction of semantic computing for
WISE Are there any people working on WISE metadata and
concept systems?
22
Collaboration with EPA Estuarine and Great Lakes Program (EAGLES)
EAGLES Program is designed to: Develop indicators and/or procedures useful for evaluating the ‘health' or condition of
important coastal natural resources (e.g., lakes, streams, coral reefs, coastal wetlands, inland wetlands, rivers, estuaries) at multiple scales, ranging from individual communities to coastal drainage areas to entire biogeographical regions.
Develop indicators, indices, and/or procedures useful for evaluating the integrated condition of multiple resource/ecosystem types within a defined watershed, drainage basin, or larger biogeographical region of the U.S.
Develop landscape measures that characterize landscape attributes and that concomitantly serve as quantitative indicators of a range of environmental endpoints, including water quality, watershed quality, freshwater/estuarine/marine biological condition, and habitat suitability.
Develop nested suites of indicators that can both quantify the health or condition of a resource or system and identify its primary stressors at local to regional scales.
XMDR as extension to Environnemental Information Management System (EIMS)
23
Collaborate with Water Data Infrastructure (WADI)
WADI is a Semantic Computing application. WADI goes from data collection to indicator display XMDR could support concept management for WADI WADI still needs some R&D and Demonstration E.g., work on "integration" between a "data-layer“
(real data of RWS, all in XML and some basic low level RDF) and some higher layer of vocabularies/thesauri/ontologies
24
Potential Collaboration with Berkeley Water CenterDigital Watershed Research Thrust Area
Understanding hydrological processes with sufficient accuracy--in the face of anthropogenic and global changes--is a prerequisite to successful water management.
Progress in this area requires research in engineering and IT: data, technologies, modeling, analysis tools (Theme 1), and cyberinfrastructure (Theme 2).
Developing an understanding requires synthesis of theory, concepts and engineering/IT tools
Digital Watershed Theme 1-TOOLS
Development of novel sensors, technologies, and modeling/ analysis approaches is needed to provide information about complex water systems and to ensure cost effective and sustainable delivery of clean water. Examples:
SENSORS to autonomously measure important components of the water cycle and water quality at sufficient resolution and coverage.
TECHNOLOGIES that promote, for example, point-of-use clean water use or cost-efficient desalinization.
NUMERICAL APPROACHES that represent the coupling between atmosphere, vegetation, vadose and groundwater processes that are important for accurately predicting watershed behavior and sustainability.
This theme focuses on the development of cyber-infrastructure that will enable researchers and water managers to:
•Curate, assimilate, and clean complex, multi-scale datasets collected from networked micro sensors to global satellite platforms; •Connect datasets to analysis, modeling, and visualization tools
to facilitate hypotheses testing and eventually decision making.
Theme 2: Water CyberInfrastructure
Microsoft Technical Computing Initiative Approach
Demonstrate an advanced cyber-infrastructure approach for tackling 21st century challenges by leveraging web service concepts, technologies, and information technology expertise;
Early focus will integrate the most critical components needed to address relevant science questions, rather than creating a fully developed problem solving environment.
Demonstrate prototypes with end-to-end scenarios, and use feedback from water scientists to refine and augment
Work on two different, yet scientifically related projects that will : Permit us to understand what is common and what is distinct between
different water research approaches; Allow us to work with a wide range of water datasets and analysis
techniques; Provide demonstration vehicles to two different water research
communities.
CA WATER RESOURCES•Extremely diverse datasets from many data providers;•Datasets typically ‘dirtier’ and larger than AmeriFlux;•Project offers significant potential for transferability to other basins;•Will build on advances developed under Carbon-Climate portal.
CARBON-CLIMATE •Protocols for AmeriFlux data acquisition and reporting are well defined;•Data are small and fairly clean;•Will permit development and testing of a portal that will be rapidly useful for water scientists.•Advances developed during this project will be applied to the development of the more challenging Central Valley portal.
The Microsoft TCI will focus on
development based on the
needs of different water
research communities
Technical Computing Initiative
Web Service Interface to Data and Tools
Host AmerifluxClimate Data,Statsgo Soils Data,MODIS products
Web-basedWorkbench access
Tools:StatisticalGraphical
LAITempFparVeg IndexSurf ReflNPP Albedo
Choose Ameriflux Area/Transect, Time Range, Data Type
Gap Fill, A technique
Gap Fill, B technique
Design Workflow
Statistical &graphical analysis
Canoak Model Site 9
Data harvest Sites 1-16
Canoak Model Site 1
Version control
Network display LAI
Statistical & Graphical analysis
Data Cleaning Tools
Data Mining and
Analysis Tools
Modeling Tools
Visualization Tools
Ecology Toolbox
Compute Resources
Carbon-Climate Workbench
ClimateStatsgoMODIS
Import other Datasets
Knowledge Generation Tools
Carbon-Climate Workbench
California Water CyberInfrastructure
BWC is in discussion with several groups to determine optimal project/place to develop and demonstrate Water TCI.
Criteria: Agency involvement and interest; Problem Characteristics (Science and
socioeconomic importance; reward/risk); Leveraging opportunity (projects / datasets); Transferability to other basins; Visibility Springboard for Digital CAL synthesis
Ideal: Work with two different basins to explore what is similar and different in terms of water data IT and science challenges;
Long Term: Scalability between water agency / basin datasets and supply/demand estimates and DWR State components. State Water Plan.
Example Water TCI focus: Central Valley Water Resources and Quality
Across the US, groundwater supplies roughly 40 percent of drinking water;
The State of California alone uses about 16 Million acre-feet of ground water each year, more than any other State in the Nation, and 80% of that goes toward crop irrigation;
The 400 Mile long Central Valley supplies ¼ of the food in the US.
California Groundwater quantity and quality is critical to the economic viability of the state;
Recognizing this importance, USGS has developed a $50 Million program focusing on CA water quality monitoring.
PROBLEM: Disparate datasets and tools hinder ability to assess water resources and quality in Central Valley (and most basins in world)….
Northern San Joaquin70 wells
Southern Sacramento86 wells
Southeast San Joaquin~100 wells
Central Valley
Ken Belitz (USGS))
USGS and State Water Resources Control Board
GAMA* and RASA** Projects
The importance of California groundwater quality and resources has prompted the USGS and SWRCB to develop a project to model flow pathways in the Central Valley (Central Valley RASA) and a $50M project to monitor ground water quality (GAMA);
As the GAMA project focuses on intensive data collection, no plans have been made to curate these data or to federate them with the other water datasets critical for understanding water balance and quality over time in the Central Valley.
* Ground Water Ambient Monitoring and Assessment Program; ** Regional Aquifer Systems Analysis (Ref: Ken Belitz, USGS)
List of Analytes
Volatile organic compounds
Pesticides
Stable Isotopes, D, O-18
Tritium-3He / Noble Gases
Specific Conductance
Stable isotopes, 3H/He, noble gases
Carbon Isotopes (C-13,C-14)
Radon, Radium, gross alpha/beta
Field parameters - temp, EC, DO, turbidity, pH, alk.
Major ions and trace elements
Arsenic & Iron speciation
Nutrients (nitrates, phosphates)
Dissolved Organic Carbon
Emerging Contaminants
E. Coli, total Coliform, Coliphage
Selected “Emerging Contaminants” Pharmaceuticals N-nitrosodimethylamine (NDMA) Perchlorate 1,4-dioxane Chromium (total and VI)
Example of GAMA Water Quality DataKen Belitz (USGS)
Data Harvesting and
Transformations
Knowledge discovery,Hypothesis testing,
Water Synthesis
Distributed California
Water Resource Datasets
Data Cleaning, Models, Analysis
Tools
BW
C A
nalysis Gatew
ay
Dissemination and Archiving
BW
C D
ata
Gat
eway
BWC Water Portal
ComputationalResources
California Water Portal
Digital CAL
FYISpecial Edition of IJMSO
Editing special edition of International Journal of Metadata, Semantics and Ontology Open Forum on Metadata Registries Topics related to metadata registries
Inviting people to write articles Contact Bruce Bargmeyer
36
In Response to Mike Frame’s Question
Describe the API for Terminology Web Services.
37
Initial XMDR REST-style Application Programming Interface (API)
Search Methods (GET) Text Search SPARQL Search XMDR Search (not documented yet)
Registry Information Methods Summary information registered models Identified Items
Method Parameters can be included as part of any method as part of URL Accept_type (what xml components to expect) Stylesheet (how to display results)
REST API (Search Methods)
Resource URI (relative to application root)
Method Representation Accept Request Description
Text Search
search/text?query={queryText}
GET application/xml (searchResult)
Any (ignores) Start a text search.
Text Search Results
search/text/{queryID}?offset={offset}&maxResults={maxResults}
GET application/xml (textResultSet)
application/xml, application/*, or */*
Retrieve the results of a text search.
application/exhibi* application/exhibit
SPARQL Search
search/sparql?query={queryText}&model={modelNameN}
GET application/xml (searchResult)
Any (ignores) Start a SPARQL search.
SPARQL Search Results
search/sparql/{queryID}?offset={offset}&maxResults={maxResults}
GET application/xml(sparqlResultSet)
application/xml, application/*, or */*
Retrieve the results of a SPARQL search.
application/sparql-results+xml **
application/sparql-results+xml
application/sparql-results+json ***
application/sparql-results+json,application/json
application/exhibit * application/exhibit
*REST API (Search Results)
searchResult (application/xml)<searchResult>
<queryID>jfs934js</queryID></searchResult>
textResultSet (application/xml)<resultSet> <itemSet> <item> <!—element names will be names of fields in the Lucene document and element values will be their string values </item> … <item> </item> </itemSet> <locallyAvailable>0</locallyAvailable></resultSet>
sparqlResultSet (application/xml)<resultSet> <itemSet> <item> <!—SPARQL result set – in XML format - fill in from SPARQL protocol spec --> </item> </itemSet> <locallyAvailable>0</locallyAvailable></resultSet>
*REST -- Registry (content) methods
Resource URI (relative to application root)
Method Representation Accept Request
Description
Registry content
content/ GET * application/xml(contentList)
Any (ignores) Retrieve the names of the models (concept systems) registered in the registry.
POST * XML/RDF Create a new item in the registry
content/{path}(where path does not correspond to an identifier for an item in the registry)
GET * application/xml(contentList)
Any (ignores) Retrieves the immediate next portion of the path.
Identified Item content/{ID} GET application/rdf+xml
Any (ignores) Retrieve an Identified Item from the registry
PUT * XML/RDF Update an Identified Item in the registry
DELETE * - Remove an Identified Item from the registry
(* indicates that feature is not yet implemented)
*REST API (Registry Results)
contentList (application/xml)<contentList> <item>nameOfItem</item> … <item>nameOfItemN</item></contentList>
REST API (Method Parameters)
Parameter Description
acceptType Treated as the Accept header value in the HTTP Request (limited support: only 1 type with no modifiers).
stylesheet * Apply the stylesheet at the provided URI or path to the results. (for now must be on application server)
Acknowledgements
Susan Hubbard, BWC John McCarthy, LBNL Karlo Berket, LBNL
This material is based upon work supported by the National Science Foundation under Grant No. 0637122, USEPA and USDOD. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, USEPA or USDOD.
44