Manage Scientific Metadata Using XML
-
Upload
ann-hopkins -
Category
Documents
-
view
29 -
download
1
description
Transcript of Manage Scientific Metadata Using XML
Manage Scientific Metadata Using XML
Yang, R., M. Kafatos and X. Wang, “Managing Scientific Metadata Using XML,” IEEE Internet Computing, Volume: 6 , Issue: 4 , pp.52
- 59 July-Aug, 2002
Outline
Abstract Introduction
Metadata XML
DIMES Conclusion
Abstract
With explosively increasing volumes of remote sensing, model and other Earth Science data available and the popularity of the Internet, scientists are now facing challenges to publish and to find interesting data sets effectively and efficiently.
Introduction
The Earth-observing systems (EOS) satellite Terra alone adds more than half a terabyte of data each day.
Metadata have been recognized as a key technology to ease the search and retrieval of Earth science data.
Metadata( 後設資料 )
描述資料的資料 (data about data) 描述資料的結構化資料 (structure data about
data) 用來定義、辨識電子資源,以及協助資源取用的描述方式( from 國際圖書館協會)
EXAMPLE
大陸兵馬俑自民國九十年三月廿二日起,在臺中國立自然科學博物館展示,至五月十日截止。自由時報 A 記者
主題 - 兵馬俑展覽活動 主辦單位 - 國立自然科學博物
館 地點 1a - 臺中市 地點 1b - 國立自然科學博物館 時間 1a -90/03/22 時間 1b -90/05/20 消息來源 - 自由時報 撰稿人 -A 記者
兵馬俑展覽活動
主辦單位 地點 時間 消息來源 撰稿人
國立自然科學博物館 臺中市
國立自然科學博物館 90/03/22 90/05/20
自由時報 A 記者展覽開始時間 展覽結束時間
後設資料 (Metadata)
早期應用於圖書館中的檢索卡片 現今運用於資料交換及全文檢索等
索書號
BOOK/T58.6/H859
Metadata
Metadata are in very diverse formats since different data providers and data users usually define their own metadata schema.
Example (From 中研院後設資料小組 )
Example
Metadata
How to handle the metadata, therefore, becomes a challenge to the designers and developers of distributed information systems.
XML-BasedDistributed Metadata Server
(DIMES)
In this paper, we discuss the Distributed MEtadata Server (DIMES) pr
ototype system. Designed to be flexible yet simple, DIME
S uses XML to represent, store, retrieve and interoperate metadata in a distributed environment.
XML & Metadata
The Extensible Markup Language (XML) is ideal for describing ASCII-based data because both human users and computers can understand XML-encoded data.
Most Earth science metadata are in ASCII format, and can therefore easily be migrated to XML.
DIMES
Currently, most work on XML-based metadata focuses on defining XML structure (tags and relations) for specific scientific disciplines.
Our XML-based software solution, on the other hand, supports a wide variety of metadata.
DIMES
We have developed such software, based on the XML4J package, with document-type definitions (DTD).
DIMES
Metadata model XML query engine Web-based prototype interface
Metadata Model
A common weakness of many existing Earth science distributed information systems is the lack of metadata interoperability support.
A naive way to integrate metadata from heterogeneous source is to represent metadata from different sources in XML format.
Metadata Model
There are two kinds of elements: 1. Node: Element with an ID attribute. 2. Nonnode: Element without ID attribute.
A node is uniquely identified by the ID attribute’s value.
Metadata Model
A node, together with all its nonnode elements, forms a basic information block for describing objects (data or metadata), and is identified by the ID value.
We assume the metadata provided is an XML document, and that it is in XML nugget form —that is, a separate XML document describes each data object.
XML nugget
Metadata
Node:Element with an ID attribute
Nonnode Nonnode...
XML nugget
USING DTD FOR
Object identification Type information Node relationships
WHY DTD From an ease-of-use viewpoint, DTD is arguably
the best of the six proposed schema languages. XML DTD XML Schema XDR SOX Schematron DSD
D. Lee and W.W. Chu, “Comparative Analysis of Six XML Schema Languages,” SIGMOD Record, vol. 29, no. 3, 2000.
Metadata Model :Object identification
Each XML nugget has a unique ID value, and an ID attribute goes in the root of the XML nugget.
Metadata Model :Type information
Since many XML nuggets can describe similar objects, we introduce a new XML element — a type node, which is assigned an ID attribute — for each object type, and make all XML nuggets that describe similar objects subelements of the type node.
...
Type Node
Nonnode Nonnode...
XML nugget
Metadata Model : Node relationships
There are two ways to code node relationships in XML documents: Subtrees Pointers
Node relationships :Subtrees
When a node is a descendant of another node in the XML tree, the two nodes are related.
Subtrees : Type–Instance relationship
The child–parent relationship between two nodes often reflects the type–instance relationship between concepts.
Node relationships :Pointers
When a node points to another node in the XML tree by an IDREFS attribute, the two nodes are related.
Using IDREFS attribute for: node_type type_instances refer_to inline_types
Node relationships
There can be multiple types for a single instance, however, so it is desirable for a node to have multiple parents.
TYPE Node
INSTANCE
TYPE Node
Type information
Unfortunately, the basic XML model does not support multiple parents for a single element.
Hence, we introduce the attributes node_type to record a node’s additional parents, and type_instances to record the reverse relationship.
Type information
type_instance=3
ID=1
ID=2 ID=3
Node_type=4
ID=4
IDRefs attribute: refer_to For simplicity, we assume that the refer_t
o relationship is symmetric, that is, if node A refers to node B, B also refers back to A.
IDRefs attribute: inline_types Intuitively, a node represents a piece of i
dentifiable metadata. In practice, many nodes share informatio
n.
IDRefs attribute: inline_types For example, many data sets have the s
ame temporal coverage, thus we represent temporal-coverage as a node.
We can define the temporal-coverage node type as an inline node of dataset nodes by using the inline_types attribute.
Metadata Model
This model requires: Well-formed XML. Do not use ID as an attribute name for any ele
ments.
DIMES Metadata Model Summary
Data providers could add new nodes, new node attributes, and new links to satisfy their metadata requirements.
Additionally, having a flexible system implies that we can preserve much of the original metadata structure.
XML query engine
XML query engine
Basic query Nearest-neighbor search Tree-expand query
Basic queries
The simplest query is finding a node by its ID. To answer these queries, our XML-based sear
ch engine evaluates these conditions on each node, including inline nodes.
Nearest-neighbor search
For a given node, its nearest-neighbor node from a given group is the one with the shortest distance.
Shortest distance between two nodes: minimum number of relations (type–instance, parent
–child, or refer_to) needed to connect the nodes.
EXAMPLE
<Query queryType=”IDonly”><Source IDlist=” Phenomenon1”/><Target node_types=”DataSet”/><Constraints></Constraints></Query>
Phenomenon1
Nearest-neighbor 1 Nearest-neighbor n…
Tree-expand query
If we choose one node as a root and all its nearest neighbors as the first-level branches, and so on, we will get a tree presentation.
In practice, we use the tree-expand query to present the metadata such that users can navigate it easily and understand its results quickly.
Prototype Web Browsers
A Web-based Dimes client usually includes a Web interface, an XML translator, and an XML-to-HTML mapper suite.
XML translator
When a Web user submits a query, the client passes the query to a specific XML translator, which automatically translates the query into one or more predefined types of queries in XML format, and then sends them to the XML query engine.
XML-to-HTML mapper An XML-to-HTML mapper converts the output f
rom XML into an HTML page, and returns the result to the user.
We use Java servlets and XSL Transformations for the translator and mapper tools.
Prototype Web Browsers
We have developed two Web-based prototypes for exploring Dimes’ capabilities. Regular search
http://spring.scs.gmu.edu:8499/servlet/VASearchInterface
Metadata navigation http://spring.scs.gmu.edu:8499/servlet/SiesipDataTree
DIMES Conclusion
Our work is closely related to mediators in federated databases, with the goal of accommodating various metadata sources into a unified framework.
Our long-term goal is to integrate software components with existing data servers to build the Scientific Data and Information Super Servers (SDISS) which are defined here as servers to support interactive access to metadata, data, and domain knowledge.
THE END
THANK YOU