September, 1999 Grace Agnew Metadata Overview Metadata: Data that describes data Structured data...
-
Upload
jesus-conway -
Category
Documents
-
view
216 -
download
1
Transcript of September, 1999 Grace Agnew Metadata Overview Metadata: Data that describes data Structured data...
September, 1999 Grace Agnew
Metadata Overview
Metadata:
Data that describes data
Structured data about data
“Pure” metadata has meaning only in relation to the primary data that is being described.
September, 1999 Grace Agnew
Metadata Overview
Metadata may be either:
Extrinsic: Existing indendepently of the primary data being described, usually in an indexable metadata base
or
Intrinsic: Existing as a part of the primary data being described
September, 1999 Grace Agnew
Metadata Overview
Design Criteria for a Metadata System:
Durable - independent of changes to hardware, software and network infrastructure
Interoperable Can be seamlessly shared across the web with disparate hardware,
software, network infrastructure and search engines
September, 1999 Grace Agnew
Metadata Overview
Precise Enables the creation of customized “virtual collections”--pulling objects together seamlessly from any digital space to meet exact information requirements.
Flexible Supports any search engine, search strategy, transport or display
option
Efficient Provides immediate access to the mostappropriate asset for the searcher.
Controlled Insures digital assets are from atrusted source to an
authorized end user.
September, 1999 Grace Agnew
Metadata Overview
Granular - Able to search the top page, subsequent pages, or drill
down to an underlying database of objects.
“Break through the web skin”Query
metadatabase
Search Engine
Underlying ObjectDatabase
September, 1999 Grace Agnew
Metadata Overview
Key Concepts:
Semantics: Meaning ascribed by a community to a metadata element or to the values for that element. Organized into a “vocabulary.”
Structure: Imposes order for the unambiguous expression of the semantics--consistent coding, exchange and display of metadata elements, providing
consistent interpretation by the end user.
Syntax: Provides a means to represent one or more structures in a flexible, extensible manner. Provides underlying mechanism for encoding, exchange, display and machine processing of metadata . Example: XML
September, 1999 Grace Agnew
Metadata Overview
Schema Identifies, defines, organizes and constrains the elements in a set, their characteristics and descriptions. Involves both semantics and structure. Examples: Dublin Core, RDF
September, 1999 Grace Agnew
Metadata Overview
Types of Metadata:
Structural
Describes the physical and logical attributes of the object, related to creation, transport, storage and display;
Describes the hardware and software used to create the object; (Some place this in Administrative metadata)
Describes the hardware, software and bandwidth needed to transport and display the object.
May be machine-readable, human readable or both. May be part of digital object header (ex: TWAIN)
September, 1999 Grace Agnew
Metadata Overview
Provenance/Ingest Metadata:
“Admission ticket” to the Archive or Data Repository. Acknowledges the rules of entry and identifies the object for positioning within the Archive. Best if intrinsic in the object, e.g. in the Header.
Identifies the owner/creator of the metadata.
Identifies the owner/creator of the digital asset.
Provides date created, permanence of asset; updates and modifications to asset. May “push” asset to users when content changes.
September, 1999 Grace Agnew
Metadata Overview
Rights & Access:
Provides requirements for access, display and download/storage of asset.
Should integrate with organization’s access and authorization system, e.g. Reference/hyperlink to Digital certificate authority
Indicate User restrictions (may reference attribute on certificate authority’s user attribute server
Support multilayered access:
download only vs. store;free vs. fee;asset versions (high res. Vs. low res.)
September, 1999 Grace Agnew
Metadata Overview
DescriptiveShould uniquely identify an asset through:
Physical description (overlap with structural metadata)
Publication/Creation information (overlap with ingest metadata)
Should describe the information content in subject and free-text fields to identify and select the asset in response query from a search engine.
September, 1999 Grace Agnew
Metadata Overview
Linking Metadata
Persistent Links:
Metadata record and the described asset.
All physical instantiations of the asset.
Registries for metadata schemas used to provide a “meta-schema” to describe the object.
Security system for access and authorization and/or link to intermediary
access page
Considerable overlap with other metadata types
September, 1999 Grace Agnew
Metadata Overview
Mining Web Assets: Current Practice
A query is sent to a proprietary search engine, or a metasearch engine which queries many engines.
Benefits:
Ubiquitous and free; competition results in better precision and coverage
Drawbacks:
Access for assets only, not long-term management; “Ephemeral” metadata; Asset creator has no control over description and access.
September, 1999 Grace Agnew
Metadata Overview
Standards are Developed to: Create durable, persistent metadata records that precisely define the asset so that exactly-relevant assets are identified and retrieved in response to a query.
Create metadata that is flexible, extensible, and scalable to support the needs of any organization, any type of asset, and varying skill and interest levels of metadata creators.
Allow the metadata records from many schemas with differing levels of complexity to interoperate for data discovery.
Enable machine-intervention for automatic interpretation of metadata and data discovery, particularly among disparate search and retrieval platforms
September, 1999 Grace Agnew
Metadata Overview
ISO 11179Joint Standard of the ISO (International Organization for Standardization) and IEC (International Electrotechnical Commission) to provide a robust framework for defining data elements in an unambiguous and persistent manner within user committees.
Also provides a framework for creating and maintaining metadata registries to store and maintain data element definitions.
NCITS L8 Draft Standards available at the following websites: http://www.jtc1.org/
http://pueblo.lbl.gov/~olken/X3L8/drafts/draft.docs.html
September, 1999 Grace Agnew
Metadata Overview
Relevant Metadata Standards:
Dublin Core Element Set V. 1.1 (IETF Recommendation)
- Flexible “lowest common denominator” standard with 15 optional, repeatable fields;
- XML and HTML based - integrates completely with assets that live on the web or are accessed via the web and live in an attached database May be intrinsic or separate from the asset described;
- Automated tools for generating/validating Dublin Core are freely available, e.g. DC.dot: http://www.ukoln.ac.uk/metadata/dcdot/
September, 1999 Grace Agnew
Metadata Overview
Content IntellectualProperty
Instantiation
Title Creator DateSubject Publisher TypeDescription Contributor FormatSource Rights IdentifierLanguageRelationCoverage
From “Description of Dublin Core Elements”http://purl.oclc.org/metadata/dublin_core_elements
September, 1999 Grace Agnew
Metadata Overview
Dublin CoreDrawbacks:
Too Flexible and Simple for complex, sophisticated collections;
Elements lack standardized use and precision. Different communities are developing extensions to
specify and categorize the elements. Approved extensions are available but slow to appear.
Some elements (rights, coverage) are ambiguous in their application
September, 1999 Grace Agnew
Metadata Overview
Dublin CoreDrawbacks:
Intended for web objects that are textual or primarily textual. Does not provide for:
Media asset components (video sequences, scenes, shots, frames, objects);
sequential media (audio and video, slide shows);
synchronized media (video, audio, caption file or transcription; slide shows).
September, 1999 Grace Agnew
Metadata Overview
Result: Every Community Creates Their Own Metadata
Archives: EAD (Encoded Archival Description)
Government: GILS (Government or Global Information Locator System)
IMS: Instructional Metadata System
TEI: Text Encoding Initiative - books and humanities; TEIH (TEI Header used for
metadata description
Dublin Core EdNA http://www.edna.edu.au/edna/owa/info.getpage?sp=auto&pagecode=5210
“Flavors” CIMI Guide to Best Practice: Dublin Core. Available as PDF
from http://www.cimi.org/
September, 1999 Grace Agnew
Metadata Overview
MARC Machine-readable cataloging: most library catalogs
worldwide.
MPEG-7 Digital Audio, Video and Still Image files. (In development. Committee
draft due October 2000)
September, 1999 Grace Agnew
Metadata Overview
MPEG-7:Intended to describe audiovisual information regardless of storage, coding, display, medium of technology--will include analog and digital media and combinations of media formats
Will Standardize:
* Core set of Descriptors (D)
* Description Schemes (codified structures of Descriptors-- definition, constraints, relationships among Descriptors) (DS)
* Language defining Description Schemes and Descriptors
September, 1999 Grace Agnew
Metadata Overview
Jane Hunter. “MPEG-7: Behind the Scenes” in D-Lib Magazine September, 1999 (v. 5, no. 9): 6)
MPEG-7 Structural Model
September, 1999 Grace Agnew
Metadata Overview
Possible MPEG7 schema incorporating DC<DC:Type>Image.Moving.TV.News.sequence.scene</DC:Type>
<DC:Description.text>”Footage of Grenade Attack”</DC:Description.text>
<DC:Description.transcript>”Sam Rainsy knows the violence of political life in Cambodia. Four months ago, 16 of his supporters were killed in a grenade attack in Phom Penh.”</DC:Description.transcript>
<DC:Format.Length>10seconds</DC:Format.Length>
<DC:Coverage.t.min DC.Scheme=“SMPTE”>19:31:57;1</DC:Coverage.t.min>
<DC:Coverage.t.maxDC Scheme=“SMPTE”>19:32:07;1</DC:Coverage.t.max>
From: Jane Hunter and Renato Iannella. “The Application of Metadata Standards to Video Indexing.” In Research and advanced technology for digital libraries : second European conference, ECDL '98, Heraklion, Crete, Cyprus, September 21-23, 1998 : Proceedings. Berlin: Springer: 1998 (Lecture Notes in Computer Science: 1513): 135-156.
September, 1999 Grace Agnew
Metadata Overview
Beyond the Metadata Schema:
Access to Information:
Information stored and managed within your organization (possibly under different metadata schema)
Information stored and managed by outside organizations
September, 1999 Grace Agnew
Metadata Overview
Metadatabase - Dublin CoreRecord 1
DC.Creator Grace Agnew
Record 70
DC.Contributor. Grace Agnew
Books and web sites written by Grace Agnew
Author: Agnew, GraceParameter mapping: DC.Creator, DC.Contributor
Result Set:AGNEW, GRACE…1999………………
AGNEW, GRACE…1994……………...
September, 1999 Grace Agnew
Metadata Overview
Books and web sites written by Grace Agnew
Author: Agnew, GraceParameter mapping: DC.Creator, DC.Contributor
SEARCH ENGINE 1 SEARCH ENGINE 2
Author: Agnew, GraceParameter mapping: 100, 700
September, 1999 Grace Agnew
Metadata Overview
Z39.50
Information Retrieval (Z39.50): Application Service Definition and Protocol Specification
Enables a client to interact with multiple servers, employing different search engines and different data element formats and definitions, to search databases and retrieve the records that result from the search
September, 1999 Grace Agnew
Metadata Overview
Z39.50
Initiates a session between client and server
Executes a query from the client against one or more databases on the server
Creates a result set consisting of records that match the query on one or more query attributes (access points)
September, 1999 Grace Agnew
Metadata Overview
Z39.50
Returns a report on the number of records matching the search
Returns records--individual records selected by the client--in a format selected by the client
Primary formats returned: MARC, SUTRS,
extending to SQL, Dublin Core, other schema
September, 1999 Grace Agnew
Metadata Overview
Z39.50 Version 3
Extends the capabilities of the standard to include:
• Boolean and proximity searching
• Extended services, including saved queries to be periodically re-executed (“SDI”)
• “Explain” facility to allow client to solicit information about the server and
dynamically reconfigure itself.
September, 1999 Grace Agnew
Metadata Overview
Z39.50
Profiles for User Groups:
LOC: Access to Digital Collections
LOC: Access to Digital Library Objects
CIMI: Companion profile for museum digital collections and objects
GEO: Geospatial Datasets
Z+SQL: extension to the SQL query language
September, 1999 Grace Agnew
Metadata Overview
Z39.50 - Limitations
Requires client software and Z39.50-enabled server software (which requires Z39.50 aware search engine)
Most commercial C/S Products have not implemented the “explain” feature in version 3
Requires human collaboration for implementation, particularly at the profile level
Limited primarily to features provided by commercial servers and clients
September, 1999 Grace Agnew
Metadata Overview
Z39.50 Limitations
Indexing parameters proprietary to server database are not shared with client to allow client to override or extend the proprietary search parameters
Databases that are not on a Z39.50 server are invisible
September, 1999 Grace Agnew
Metadata Overview
Metadata Registries:Dynamic specification, maintenance and description of metadatabase structures:
unambiguous definition of data structures
unambiguous definition and description of relationships between data structures,
behaviors of data structures, integrity constraints on the contents of data structures.
semantics (meaning in context) and structure definition
September, 1999 Grace Agnew
Metadata Overview
Metadata RegistriesLinks/Hooks into subordinate registries used to define data content within a metadata element
Mapping of data structures between registries
Should be both eye-readable and able to be interpreted by computer programs for seamless, unambiguous discovery, query and display across disparate database and search engine structures and to enable intelligent query agents, advanced data mining, etc.
September, 1999 Grace Agnew
Metadata Overview
Metadata Registries
Collaborative Effort of the Joint Technical Committee 1 (JTC1) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC)
Open Forum on Metadata Registries:
http://www.sdct.itl.nist.gov/~ftp/l8/sc32wg2/2000/events/openforum/index.htm
September, 1999 Grace Agnew
Metadata Overview
Metadata Registries
REGGIE - Java Applet that dynamically creates metadata according to available online registries;
Allows you to enter your own registry, describing, characterizing and constraining all the elements in the set.
http://metadata.net
UK/Australia joint effort
September, 1999 Grace Agnew
Metadata OverviewAnything by Grace Agnew?
Metadatabase
Scheme = DC <URL of Registry>
Dublin Core.
Author defined as:
Creator,
Contributor
September, 1999 Grace Agnew
Metadata Overview
Resource Description FrameworkW3C Resource Description Framework (RDF) Model and Syntax Specification (22 February 1999): http://www.w3.org/TR/REC-rdf-syntax/
Provide robust application of metadata in the web environment:
Model for unambiguous, schema-independent description of resources.
Key Concepts:
Resource: Any object uniquely identifiable by a URI (uniform resource identifier)
Property-type: Property associated with a resource.
Value: Associated with a property type--may be atomic (a string) or another resource,
creating a new hierarchy)
September, 1999 Grace Agnew
Metadata Overview
RDFProperty types express the relationships of values associated with resources:
“Famous Example”
The Author of “Metadata Overview” is Grace Agnew
Metadata Overview
http://www…….edu/mo“Grace Agnew”
Resource
Property Type
Value
Author
September, 1999 Grace Agnew
Metadata Overview
RDFEnables interoperability among metadata schemes, including the modular use of multiple schemes within a metadata record utilizing the XML namespace facility;
Adds machine-interpretable semantics to the encoding, exchange and reuse of structured metadata;
Enables automatic negotiation between search engine, metadata record, and metadata registry for powerful, flexible search and retrieval independent of server and client search and retrieval infrastructures (or, at least, it will!)
September, 1999 Grace Agnew
MetadataApplication of Dublin Core and RDF for resource description: Dublin Core in HTML - Resides in the Header Element<html><head><title>A Thousand Wheels are set in Motion - Georgia Tech Library and Information Center </title><link rel=schema.DC" href="http://purl.org/dc"><meta name="DC.Title" content="A Thousand Wheels are Set in Motion”><meta name=“DC.Title.Alternative" content="The Building of Georgia Tech at the Turn of the 20th Century, 1888-1908"><meta name="DC.Creator.CorporateName” scheme="LCNAF" content="Georgia Institute of Technology Library and Information Center"><meta name="DC.Subject" scheme="LCSH" content="Georgia Institute of Technology--Buildings"><meta name="DC.Description" content="This Web site provides photographs, engravings and sketches of the first buildings on the Georgia Tech Campus, from 1888-1908. As of 9/20/1999, 88 images are provided but more will be added. Cataloged in EAD Single Item Metadata format."><meta name="DC.Publisher.CorporateName" scheme="LCNAF" content="Georgia Institute of Technology Library and Information Center"><meta name="DC.Contributor.PersonalName" scheme="LCNAF" content="Chritton, Heather"><meta name=Dc.Contributor.PersonalName” scheme=“LCNAF”content=“Crafts, Laurel”>
Full Metadata record: http://www.library.gatech.edu/gtbuildings
September, 1999 Grace Agnew
Metadata Overview
RDF / Dublin Core in XML<?xml:namespace href=“http://www.w3c.org/RDF/”=as=”RDF”?>
<?xml:namespace href=“http://purl.org/RDF/DC”as=”DC”?>
<?XMl:namespace href=“http://loc.gov/LCNAF”as=”LCNAF”?>
<?XML:namespace href=“http://loc.gov/LCSH” as= “LCSH”?>
<RDF:RDF>
<RDF: Description RDF: HREF=“http://purl.org/metadata/dublin_core_elements”>
<DC.Title> A Thousand Wheels are Set in Motion</DC:Title>
< DC.Title.Alternative> The Building of Georgia Tech at the Turn of the 20th Century, 1888-1908</DC.Title.Alternative>
<DC:Creator.CorporateName>
<RDF:Description>
<LCNAF:CorporateName>Georgia Tech Library and Information Center</LCNAF:Corporate Name>
</RDF:Description>
September, 1999 Grace Agnew
Metadata Overview<DC:Subject>
<RDF:Description>
<LCSH:CorporateName>Georgia Institute of Technology-- Buildings</LCSH:CorporateName>
</RDF:Description>
<DC:Description> This Web site provides photographs, engravings and sketches of the first buildings on the Georgia Tech Campus, from 1888-1908. As of 9/20/1999, 88 images are provided but more will be added. Cataloged in EAD Single Item Metadata (SIM) format.</DC:Description>
<RDF:Seq>
<RDF:Description>
<RDF:LI><LCSH:PersonalName>Chritton, Heather</LCSH:PersonalName></RDF:LI>
<RDF:LI><LCSH:PersonalName>Crafts,Laurel</LCSH:PersonalName></RDF:LI>
</RDF:Description>
</RDF:Seq>
September, 1999 Grace Agnew
Metadata OverviewNotes:
1. RDF shows three types of relationships among collected resources:
Sequence (specified ordering of elements)
Bag (all members of equal importance)
Alternatives (choice between members)
In this example, I am specifying among contributors that Heather Chritton, the web page developer, appears first among contributors and Laurel Crafts, the digital image creator, appears second. Other contributors follow (text creation, metadata creation, indexing, etc.) in specified order in the complete record. I use the RDF Sequence list to establish this fixed contributor order.
2. LCSH (Library of Congress Subject Headings) and LCNAF (Library of Congress Name Authority File) do not currently reside on web pages at a URL. The URLs provided are for illustration only
September, 1999 Grace Agnew
Metadata Overview
XMLExtensible Markup Language, a subset of SGML (Standard Generalized Markup Language) provides the ability to define elements within a web document. XML documents have a logical and a physical structure. Each unit of an XML document is an entity. Entities are defined within the document in relation to each other. The logical and physical structures of the document include declarations, elements, comments, character references and processing instructions. Structural relationship is provided through nesting.
September, 1999 Grace Agnew
Metadata Overview
XML
XML display is governed by an attached style document, formulated in CSS (Cascading Style Sheet) or XSL (Extensible Style Language) to provide rules for display. Styles can be applied to single elements as well as to the entire document. More than one style sheet or style document can be provided for a document or element, with precedence rules governing the given display.
September, 1999 Grace Agnew
Metadata Overview
DTD The Document Type Declaration provides a formally defined structure, vocabulary and
syntax for an XML document type. Documents are validated against a DTD to insure nested structure and semantic constraints are followed to insure consistent meaning across documents.
DCD A semantic superset of XML DTDs--intended to be conformant with the RDF Model and Syntax
Specification. Describes an XML vocabulary for schemas--for specifying object classes. Based on elements (RDF property types) and attributes Supports RDF vocabulary and constructs.
September, 1999 Grace Agnew
Metadata Overview
SOX Schema for Object-Oriented XML
Alternative to DTD for validating XML documents. Supports scalar (numeric) datatypes,
enumerated datatypes (values enumeration) and format datatypes. An expanded namespace facility supports objects from any identifiable namespace to be used to build the document.
September, 1999 Grace Agnew
Metadata Overview
Role of the Database: A database that can be parsed and reported to a validated XML metadata format, as well as other metadata syntaxes, provides a robust space for metadata development. Also reports to any XML Document type and hooks into applications via APIs, to support unique user needs ORACLE DATABASE
MARC-BASED CATALOG
COLLABORATIVE RESEARCH SPACE
WEB-BASED COURSEWARE APPLICATION
SUBJECT-SPECIFIC WEB RESEARCH TOOL
PERSONAL
RESEARCH
SPACE
September, 1999 Grace Agnew
Metadata Overview
Last Step: Data RetrievalData storage, access and delivery architecture should be open, standards-based, hardware and software independent, providing users across platforms with common, consistent interface and underlying storage structure for efficient retrieval, display, storage and use of digital information
Data architecture should support a well-defined, widely available security system to validate authenticity of users and provide data for a variety of uses according to a scalable authorization hierarchy
September, 1999 Grace Agnew
Metadata Overview
Last Step: Data RetrievalData architecture should support data as objects for scalable, extensible access, with sophisticated and flexible support for object relationships, particularly to support different physical instantiations of identical data, e.g. digital video object as D1, MPEG1, Quicktime, etc.
CORBA Common Object Request Broker Architecture - emerging architecture for open distributed object computing. Intended to provide transparent access to applications and databases, regardless of the hardware and software infrastructure at each end of the transaction
September, 1999 Grace Agnew
Metadata Overview
Putting It All Together:
A Digital Archive Architecture
Reference Model for Open Archival Information Systems (OAIS),
Developed by a US ISO archiving group under ISO TC20/SC13 and the Consultative Committee for Space Data Systems (CCSDS). This model has recently been released for formal ISO and CCSDS review. An electronic version of the OAIS Reference Model can be found at
http://www.ccsds.org/RP9905/RP9905.html
September, 1999 Grace Agnew
Reference Model for Open Archival Information Systems (OAIS)
EXTERNAL DATA FLOW DIAGRAM