CS-502 Fall 2006Introduction1 CS-502 Operating Systems Hugh C. Lauer Adjunct Professor.
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze –...
-
Upload
lora-claribel-dalton -
Category
Documents
-
view
214 -
download
0
Transcript of Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze –...
Cornell CS 502
Metadata for the WebIssues and Simple Answers
CS 502 – 20030219Carl Lagoze – Cornell University
Cornell CS 502
“Metadata is data about data”
Cornell CS 502
Metadata is semi-structured data conforming to commonlyagreed upon models, providing operational interoperability
in a heterogeneous environment
Cornell CS 502
Some untested hypotheses
• Metadata is useful for…– People– Machines
• More metadata is better• (semi) automated digital libraries and simple
metadata
Cornell CS 502
Some known facts
• Number and variety of metadata vocabularies will continue to increase
• The Tower of Babel is a franchise– There is not one common view of reality
• “The one thing I know about metadata is that it is expensive” (Bill Arms)
• “I hate metadata projects because they make every other digital library project more expensive” (Michael Lesk)
Cornell CS 502
Are metadata and data distinguishable?
• Objectivity?• Intellectual property?• Structure?• Aboutness?
Cornell CS 502
The fiction of classification
…there is no classification of the universe that is not fictional and
conjectural.
Jorge Luis Borges
Cornell CS 502
Lenses and Views
• All classification does and should provide a biased lens or view of reality
• Each view emphasizes certain characteristics and hides others
GeospatialRights
Museum
Cornell CS 502
Reality is Complex
Created by:George Castaldo
Created on:1994
Created by:Leonardo da Vinci
Created on:1506
Relationship?
Cornell CS 502
Objects are Related
IFLA Entity Model
Cornell CS 502
Entities, Events, and Agents
Photographer
Camera type Software
Computer artist
Cornell CS 502
Haven’t we done metadata already?
Cornell CS 502
What’s wrong with this model?
• Expensive– Complex (even for its original goal?) – Professional intervention (assumes single community
of expertise)
• Monolithic– One size fits all approach– Reflects its centralized system origins
• Bias towards physical artifacts– Fixed resources– Incomplete handling of resource evolution and other
resource relationships
• Anglo-centric
Cornell CS 502
Web Challenge to Traditional Cataloging
• Scale
• Permanence
• Authenticity
• Organizational Context
• Custodial Control
• Variety
Cornell CS 502
Internet Commons includes Multiple Communities
ScientificData
HomePages Geo
InternetCommons
Library
Museums
Commerce
Whatever...
Cornell CS 502
Metadata Takes Many Forms
resourcediscovery
documentadministration
rightsmanagement
contentrating
security andauthentication
archivalstatus
products andservices
databaseschemas
process controlor description
Cornell CS 502
Metadata Challenges
• Accommodate multiple varieties of metadata– community-specific functionality, creation,
administration, access
• Tensions– functionality and simplicity – extensibility and interoperability– human and machine creation and use
Cornell CS 502
Interoperability has many facets
• Semantics– Meaning/classification/ontology
• Models/Structure– Entities and relationships
• Syntax– grammars to convey semantics and structure
Cornell CS 502
Warwick Framework: Containing Chaos
• Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2)
• Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata
• Provide context for metadata efforts (including Dublin Core)– avoids the “black-hole” of comprehensive element
sets– focuses interoperability issues at package level
Cornell CS 502
Metadata Container
Container
Package
Dublin Core
Package
MARC record
Package
Indirect Reference
Package
Terms and Conditions
URI
Cornell CS 502
Modularization Allows Distributed Management
• Communities of expertise (not software vendors) are responsible for:– Semantics– Registration– Administration– Access management– Authority of data– Sharing and Distribution
Cornell CS 502
Realities of Web search and discovery
• Search systems are motivated by advertising• Index coverage is unpredictable and limited• Too much recall, too little precision• Index spam abounds• Resources (and their names) are volatile
Cornell CS 502
Metadata: Part of a Solution
• Structured data about data– helps to impose order on chaos– enables automated discovery/manipulation
• Variety across various dimensions:– specialization– decentralization– democratization
Cornell CS 502
Web Metadata Models:Drill-Down Searching Paradigm
• Moving along a specificity spectrum• Inter-domain vs. intra-domain terms, models,
query mechanisms• One size doesn't fit all
– Cognitive models of searching and browsing
Cornell CS 502
Drill-down search paradigm
DomainIndependent
view
DomainSpecific
View
Cornell CS 502
Metadata:Part of the problem
cost
functionality
AACR2/MARC
googleDublin Core
Cornell CS 502
Why hasn’t metadata worked on the Web?
• Its all about trust• People are lazy• Metadata is hard• No perceived benefit
– “Reverse tragedy of the commons”
• No agreement on one way to describe things
• “Metacrap” - http://www.well.com/~doctorow/metacrap.htm