Manage Scientific Metadata Using XML

Manage Scientific Metadata Using XML

Yang, R., M. Kafatos and X. Wang, “Managing Scientific Metadata Using XML,” IEEE Internet Computing, Volume: 6 , Issue: 4 , pp.52

- 59 July-Aug, 2002

Outline

Abstract Introduction

Metadata XML

DIMES Conclusion

Abstract

With explosively increasing volumes of remote sensing, model and other Earth Science data available and the popularity of the Internet, scientists are now facing challenges to publish and to find interesting data sets effectively and efficiently.

Introduction

The Earth-observing systems (EOS) satellite Terra alone adds more than half a terabyte of data each day.

Metadata have been recognized as a key technology to ease the search and retrieval of Earth science data.

Metadata( 後設資料 )

描述資料的資料 (data about data) 描述資料的結構化資料 (structure data about

data) 用來定義、辨識電子資源，以及協助資源取用的描述方式（ from 國際圖書館協會）

EXAMPLE

大陸兵馬俑自民國九十年三月廿二日起，在臺中國立自然科學博物館展示，至五月十日截止。自由時報 A 記者

主題 - 兵馬俑展覽活動主辦單位 - 國立自然科學博物

館地點 1a - 臺中市地點 1b - 國立自然科學博物館時間 1a -90/03/22 時間 1b -90/05/20 消息來源 - 自由時報撰稿人 -A 記者

兵馬俑展覽活動

主辦單位地點時間消息來源撰稿人

國立自然科學博物館臺中市

國立自然科學博物館 90/03/22 90/05/20

自由時報 A 記者展覽開始時間展覽結束時間

後設資料 (Metadata)

早期應用於圖書館中的檢索卡片現今運用於資料交換及全文檢索等

索書號

BOOK/T58.6/H859

Metadata

Metadata are in very diverse formats since different data providers and data users usually define their own metadata schema.

Example (From 中研院後設資料小組 )

Example

Metadata

How to handle the metadata, therefore, becomes a challenge to the designers and developers of distributed information systems.

XML-BasedDistributed Metadata Server

(DIMES)

In this paper, we discuss the Distributed MEtadata Server (DIMES) pr

ototype system. Designed to be flexible yet simple, DIME

S uses XML to represent, store, retrieve and interoperate metadata in a distributed environment.

XML & Metadata

The Extensible Markup Language (XML) is ideal for describing ASCII-based data because both human users and computers can understand XML-encoded data.

Most Earth science metadata are in ASCII format, and can therefore easily be migrated to XML.

DIMES

Currently, most work on XML-based metadata focuses on defining XML structure (tags and relations) for specific scientific disciplines.

Our XML-based software solution, on the other hand, supports a wide variety of metadata.

DIMES

We have developed such software, based on the XML4J package, with document-type definitions (DTD).

DIMES

Metadata model XML query engine Web-based prototype interface

Metadata Model

A common weakness of many existing Earth science distributed information systems is the lack of metadata interoperability support.

A naive way to integrate metadata from heterogeneous source is to represent metadata from different sources in XML format.

Metadata Model

There are two kinds of elements: 1. Node: Element with an ID attribute. 2. Nonnode: Element without ID attribute.

A node is uniquely identified by the ID attribute’s value.

Metadata Model

A node, together with all its nonnode elements, forms a basic information block for describing objects (data or metadata), and is identified by the ID value.

We assume the metadata provided is an XML document, and that it is in XML nugget form —that is, a separate XML document describes each data object.

XML nugget

Metadata

Node:Element with an ID attribute

Nonnode Nonnode．．．

XML nugget

USING DTD FOR

Object identification Type information Node relationships

WHY DTD From an ease-of-use viewpoint, DTD is arguably

the best of the six proposed schema languages. XML DTD XML Schema XDR SOX Schematron DSD

D. Lee and W.W. Chu, “Comparative Analysis of Six XML Schema Languages,” SIGMOD Record, vol. 29, no. 3, 2000.

Metadata Model ：Object identification

Each XML nugget has a unique ID value, and an ID attribute goes in the root of the XML nugget.

Metadata Model ：Type information

Since many XML nuggets can describe similar objects, we introduce a new XML element — a type node, which is assigned an ID attribute — for each object type, and make all XML nuggets that describe similar objects subelements of the type node.

．．．

Type Node

Nonnode Nonnode．．．

XML nugget

Metadata Model ： Node relationships

There are two ways to code node relationships in XML documents: Subtrees Pointers

Node relationships ：Subtrees

When a node is a descendant of another node in the XML tree, the two nodes are related.

Subtrees : Type–Instance relationship

The child–parent relationship between two nodes often reflects the type–instance relationship between concepts.

Node relationships ：Pointers

When a node points to another node in the XML tree by an IDREFS attribute, the two nodes are related.

Using IDREFS attribute for: node_type type_instances refer_to inline_types

Node relationships

There can be multiple types for a single instance, however, so it is desirable for a node to have multiple parents.

TYPE Node

INSTANCE

TYPE Node

Type information

Unfortunately, the basic XML model does not support multiple parents for a single element.

Hence, we introduce the attributes node_type to record a node’s additional parents, and type_instances to record the reverse relationship.

Type information

type_instance=3

ID=1

ID=2 ID=3

Node_type=4

ID=4

IDRefs attribute: refer_to For simplicity, we assume that the refer_t

o relationship is symmetric, that is, if node A refers to node B, B also refers back to A.

IDRefs attribute: inline_types Intuitively, a node represents a piece of i

dentifiable metadata. In practice, many nodes share informatio

n.

IDRefs attribute: inline_types For 　 example, many data sets have the s

ame temporal 　 coverage, 　 thus we represent temporal-coverage 　 as a node.

We can define the temporal-coverage 　 node type as an inline node of dataset nodes by 　 using the inline_types attribute.

Metadata Model

This model requires: Well-formed XML. Do not use ID as an attribute name for any ele

ments.

DIMES Metadata Model Summary

Data providers could add new nodes, new node attributes, and new links to satisfy their metadata requirements.

Additionally, having a flexible system implies that we can preserve much of the original metadata structure.

XML query engine

XML query engine

Basic query Nearest-neighbor search Tree-expand query

Basic queries

The simplest query is finding a node by its ID. To answer these queries, our XML-based sear

ch engine evaluates these conditions on each node, including inline nodes.

Nearest-neighbor search

For a given node, its nearest-neighbor node from a given group is the one with the shortest distance.

Shortest distance between two nodes: minimum number of relations (type–instance, parent

–child, or refer_to) needed to connect the nodes.

EXAMPLE

<Query queryType=”IDonly”><Source IDlist=” Phenomenon1”/><Target node_types=”DataSet”/><Constraints></Constraints></Query>

Phenomenon1

Nearest-neighbor 1 Nearest-neighbor n…

Tree-expand query

If we choose one node as a root and all its nearest neighbors as the first-level branches, and so on, we will get a tree presentation.

In practice, we use the tree-expand query to present the metadata such that users can navigate it easily and understand its results quickly.

Prototype Web Browsers

A Web-based Dimes client usually includes a Web interface, an XML translator, and an XML-to-HTML mapper suite.

XML translator

When a Web user submits a query, the client passes the query to a specific XML translator, which automatically translates the query into one or more predefined types of queries in XML format, and then sends them to the XML query engine.

XML-to-HTML mapper An XML-to-HTML mapper converts the output f

rom XML into an HTML page, and returns the result to the user.

We use Java servlets and XSL Transformations for the translator and mapper tools.

Prototype Web Browsers

We have developed two Web-based prototypes for exploring Dimes’ capabilities. Regular search

http://spring.scs.gmu.edu:8499/servlet/VASearchInterface

Metadata navigation http://spring.scs.gmu.edu:8499/servlet/SiesipDataTree

DIMES Conclusion

Our work is closely related to mediators in federated databases, with the goal of accommodating various metadata sources into a unified framework.

Our long-term goal is to integrate software components with existing data servers to build the Scientific Data and Information Super Servers (SDISS) which are defined here as servers to support interactive access to metadata, data, and domain knowledge.

THE END

THANK YOU

Manage Scientific Metadata Using XML

Documents

Transcript of Manage Scientific Metadata Using XML