University of Washington Digital Library Project Geri Bunker The Fifth Dublin Core Metadata Workshop...
-
Upload
kristian-davidson -
Category
Documents
-
view
216 -
download
0
Transcript of University of Washington Digital Library Project Geri Bunker The Fifth Dublin Core Metadata Workshop...
University of Washington Digital Library Project
Geri Bunker
The Fifth Dublin Core Metadata Workshop
October 7, 1997
Perspective
• Large, public research university: – Multiple branches
– Growing demand for access, distance education
– Focus on collaboration, with government, business, academia
• University Libraries– 20 branches; 4 major units
– Central technical and system services; rest distributed
UW Priorities: User-centered focus in design of services and products
• Web access for resources and services– strategic, “not only or last”
• Digital Library– commercially acquired full text– locally digitized multimedia
• Process Improvement– for redeployment of assets and resources
Convergence of developments
• University re-focus on outreach and access
– especially through use of technology (Web, distance education)
• Appearance of great multimedia archiving software--CONTENT
• ..and success with an Intel grant for hardware
• faculty and library collections abound--students need digital access to ever more course information
• International consensus-building around metadata--
• The Dublin Core
Avoiding “the 21st Century nightmare”
• The Digital Orphan metaphor created stark terror and focused us on need for inexpensive description
• Opportunity to test Dublin Core for images with new software tool
Content: a practical, scalable, high-performance multimedia
archive
• to solve the need for small, fast startup systems for images
• …allowing them to grow to millions of images with fast, accurate retrieval
• standards-based and extensible, flexible
• facilitates collaboration among content providers, librarians, curators, archivists
Client-server; “federated heterogeneous system”
• Server on Windows NT, AIX and HP-UX
• Clients on Windows 95 (Search, Acquire and Administer) and Java (Search)
• Set of API functions--http/cgi-based
• Acquisition and DB admin for distribution of tasks
Workbox• Keep track of images and videos• Store only the link• Items can be categorized• HTML document can be built based on the workbox
to share results with colleagues
Dictionary and Thesaurus• Dictionary
– Contains all valid search words
– Dictionary for every field
• Hierarchical Thesaurus– Organize related words
– Group in hierarchy
– Simple text file
– Indentation (tabbing) to indicate hierarchy
– Every field can have an optional thesaurus
The thesaurus
Current projects
• All retrospective conversion to DC– Collection of historical photographs– Collection of teaching slides
• All happening in the context of Web integration for all resources
• All want their own labels displaying– will be mapped to DC in the server
What problems have we encountered in our implementation?
– Need understanding of key elements– Need some qualification method for more
precision narrowing of hit set– Need to distinguish DC names from
administrative metadata
Elements causing the most confusion
• Source (visual resources have many levels)
• Date (of what?)
• Coverage (only useful if heavily qualified)
• Relation (is this the key to “containers”?)
…even some elements thought to be “safe”...
• Some elements perhaps thought to be clear are not--depends upon reason for digitizing
• E.g., in a set of instructional slides of architectural images, the name or address of a building shown in a photograph may be considered the “subject” of the photo.
• Here we map to “title”; possibly behind the scenes...
AdministrationDefault metadataas specified byDublin Core, butconfigurable byCONTENTadministrator’s tool
All CONTENTserver administrationcan be performed viaa Web interface
Database Configuration File
Slide number:identi:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDEFile format:format:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDESubject:title:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDEDetail:covera:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDENation:cover2:TEXT:SMALL:YES:BLANK:SEARCH:NOHIDECity:cover3:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDEState:cover4:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDESite:cover5:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDEDate:cover6:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDEArchitect:creat2:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDEMaterials:subje2:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDEBuilding type:subje3:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDEKeywords:subjec:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDEPhotographer:creato:TEXT:BIG:BLANK:BLANK:SEARCH:NOHIDECopyright holder:rights:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDEPhotographer refno:ident2:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDETransmission data:forma2:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDEDate of photograph:date:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDESource:source:TEXT:SMALL:BLANK:BLANK:NOSEARCH:NOHIDE
<title>Fallingwater</title><subjec>house</subjec><descri></descri><creato>Wright, Frank Lloyd; Meredith L. Clausen</creato><publis></publis><contri></contri><date></date><type></type><format>.JPG</format><identi>USA344</identi><source></source><langua></langua><relati></relati><covera>Bear Run, Pennsylvania, USA, distant view, 1936-37</covera><rights>Meredith L. Clausen</rights><find>USA344.JPG</find>
Text Description
Features: implementation can influence success of metadata
• Chaos of the Web confounds instruction and research--faculty want a system built for them
• Template for Dublin Core description
– locally configurable to the field/tag level
– with compliant mapping behind the scenes to DC labels
• Vocabulary controls optional
• Work-box
Content tool plays to strengths of participants
• Acquisition module: start by scanning and build a skeletal record from template
• DB Admin: oversight and maintenance can be done remotely, asynchronously
• Metadata can be continuously enhanced through workbox and uploaded remotely.
Promise of the Dublin Core
• What holds the most promise for UW Digital Library Project?– Core element set, an intersection of minimal
but critical discovery elements– Turn websites into repositories with reasonbly
accurate indexing.
• DC template focuses paraprofessional, student, faculty cataloging
Future work includes
• Code to the standards; Collaborate for synergy
• Train, document; Test for usability
• Enhance – Adding thesauri and locally developed
vocabularies– Planning additional reformatting for open Web
access
Contact• Development lab: [email protected]
– Greg Zick– Lawrence Yapp– Craig Yamashita
• U.W. Library Project:– Geri Bunker, UW Digital Library Coordinator/
Interim Associate Director of Libraries for Technical ServicesUniversity of WashingtonEmail: [email protected]