Cornell CS 502 More XML XML schema, XPATH, XSLT CS 502 – 20020214 Carl Lagoze – Cornell University.
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze –...
-
Upload
colleen-craig -
Category
Documents
-
view
216 -
download
0
Transcript of Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze –...
![Page 1: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/1.jpg)
Cornell CS 502
Metadata for the WebIssues and Simple Answers
CS 502 – 20020221Carl Lagoze – Cornell University
![Page 2: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/2.jpg)
Cornell CS 502
“Metadata is data about data”
![Page 3: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/3.jpg)
Cornell CS 502
Some untested hypotheses
• Metadata is useful for…– People– Machines
• More metadata is better• (semi) automated digital libraries and simple
metadata
![Page 4: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/4.jpg)
Cornell CS 502
Some known facts
• Number and variety of metadata vocabularies will continue to increase
• The Tower of Babel is a franchise– There is not one common view of reality
• “The one thing I know about metadata is that it is expensive”
![Page 5: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/5.jpg)
Cornell CS 502
Are metadata and data distinguishable?
• Objectivity?• Intellectual property?• Structure?• Aboutness?
![Page 6: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/6.jpg)
Cornell CS 502
The fiction of classification
…there is no classification of the universe that is not fictional and
conjectural.
Jorge Luis Borges
![Page 7: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/7.jpg)
Cornell CS 502
Lenses and Views
• All classification does and should provide a biased lens or view of reality
• Each view emphasizes certain characteristics and hides others
GeospatialRights
Museum
![Page 8: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/8.jpg)
Cornell CS 502
Reality is Complex
Created by:George Castaldo
Created on:1994
Created by:Leonardo da Vinci
Created on:1506
Relationship?
![Page 9: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/9.jpg)
Cornell CS 502
Objects are Related
IFLA Entity Model
![Page 10: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/10.jpg)
Cornell CS 502
Entities, Events, and Agents
Photographer
Camera type Software
Computer artist
![Page 11: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/11.jpg)
Cornell CS 502
Haven’t we done metadata already?
![Page 12: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/12.jpg)
Cornell CS 502
What’s wrong with this model?
• Expensive– Complex (even for its original goal?) – Professional intervention (assumes single community
of expertise)
• Monolithic– One size fits all approach– Reflects its centralized system origins
• Bias towards physical artifacts– Fixed resources– Incomplete handling of resource evolution and other
resource relationships
• Anglo-centric
![Page 13: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/13.jpg)
Cornell CS 502
Web Challenge to Traditional Cataloging
• Scale
• Permanence
• Authenticity
• Organizational Context
• Custodial Control
• Variety
![Page 14: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/14.jpg)
Cornell CS 502
Internet Commons includes Multiple Communities
ScientificData
HomePages Geo
InternetCommons
Library
Museums
Commerce
Whatever...
![Page 15: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/15.jpg)
Cornell CS 502
Realities of Web search and discovery
• Search systems are motivated by advertising• Index coverage is unpredictable and limited• Too much recall, too little precision• Index spam abounds• Resources (and their names) are volatile
![Page 16: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/16.jpg)
Cornell CS 502
Metadata: Part of a Solution
• Structured data about data– helps to impose order on chaos– enables automated discovery/manipulation
• Variety across various dimensions:– specialization– decentralization– democratization
![Page 17: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/17.jpg)
Cornell CS 502
Web Metadata Issues:Description vs. Discovery
• Library cataloging motivated by describing resources
• Fuzzy search buckets– Separate books about Sigmund Freud versus books
by Sigmund Freud into different buckets– But, different types of data appropriate for different
buckets: URLs, date strings, word strings, names
• But general, fuzzy categories may not be sufficient for describing resources
![Page 18: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/18.jpg)
Cornell CS 502
Web Metadata Models:Drill-Down Searching Paradigm
• Moving along a specificity spectrum• Inter-domain vs. intra-domain terms, models,
query mechanisms• One size doesn't fit all
– Cognitive models of searching and browsing
![Page 19: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/19.jpg)
Cornell CS 502
Metadata Takes Many Forms
resourcediscovery
documentadministration
rightsmanagement
contentrating
security andauthentication
archivalstatus
products andservices
databaseschemas
process controlor description
![Page 20: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/20.jpg)
Cornell CS 502
Metadata:Part of the problem
cost
functionality
AACR2/MARC
googleDublin Core
![Page 21: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/21.jpg)
Cornell CS 502
Metadata Challenges
• Accommodate multiple varieties of metadata– community-specific functionality, creation,
administration, access
• Tensions– functionality and simplicity – extensibility and interoperability– human and machine creation and use
![Page 22: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/22.jpg)
Cornell CS 502
Interoperability has many facets
• Semantics– Meaning/classification/ontology
• Models/Structure– Entities and relationships
• Syntax– grammars to convey semantics and structure
![Page 23: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/23.jpg)
Cornell CS 502
Warwick Framework: Containing Chaos
• Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2)
• Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata
• Provide context for metadata efforts (including Dublin Core)– avoids the “black-hole” of comprehensive element
sets– focuses interoperability issues at package level
![Page 24: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/24.jpg)
Cornell CS 502
Metadata Container
Container
Package
Dublin Core
Package
MARC record
Package
Indirect Reference
Package
Terms and Conditions
URI
![Page 25: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/25.jpg)
Cornell CS 502
Modularization Allows Distributed Management
• Communities of expertise (not software vendors) are responsible for:– Semantics– Registration– Administration– Access management– Authority of data– Sharing and Distribution
![Page 26: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/26.jpg)
Cornell CS 502
Modularization Implementation Issues
• Data encoding • Semantic interaction of overlapping sets
– between semantically-related packages– between semantically distinct packages
• Type registry
![Page 27: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/27.jpg)
Cornell CS 502
Dublin Core Metadata Initiative
• A simple set of properties to support resource discovery on the web (fuzzy search buckets)?
• A cross-domain switchboard for interoperable metadata?
• An extensible ontology for resource desciption?
http://dublincore.org
![Page 28: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/28.jpg)
Cornell CS 502
The fifteen Dublin Core Elements
Creator Title Subject
Contributor Date Description
Publisher Type Format
Coverage Rights Relation
Source Language I dentifi er
http://www.dublincore.org/documents/1999/07/02/dces/
![Page 29: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/29.jpg)
Cornell CS 502
A Pidgin for Digital Tourists
• Metadata is language• Dublin Core is a small and simple language -- a
pidgin -- for finding resources across domains.• Speakers of different languages naturally
"pidginize" to communicate– E.g., tourists using simple phrases to order beer
("zwei Bier bitte" "dva pivo" "biru o san bai"...)
• We are all "tourists" on the global Internet.
![Page 30: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/30.jpg)
Cornell CS 502
A Grammar of Dublin Core
• http://www.dlib.org/dlib/october00/baker/10baker.html
• By design not as subtle as mother tongues, but easy to learn and extremely useful in practice
• Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives)
• Simple grammars: sentences (statements) follow a simple fixed pattern...
![Page 31: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/31.jpg)
Cornell CS 502
Example Dublin Core statements
• Resource has Title 'Grammar of Dublin Core'.• Resource has Creator 'Tom Baker'.• Resource has Subject 'Metadata'.• Resource has Relation http://foo.org/file.htm.
![Page 32: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20020221 Carl Lagoze – Cornell University.](https://reader035.fdocuments.us/reader035/viewer/2022062517/56649ee05503460f94bf0daf/html5/thumbnails/32.jpg)
Cornell CS 502
Resource has property
DC:CreatorDC:TitleDC:SubjectDC:Date...
X
implied subject
impliedverb
one of 15properties
property value(an appropriateliteral)