Gettingstartedwithdigitalcollectionsweb[1]
-
Upload
guest410707c -
Category
Education
-
view
679 -
download
0
Transcript of Gettingstartedwithdigitalcollectionsweb[1]
Getting Started with Digital Collections
Erin Logsdon
Consultant, Digital Solutions
NELINET, Inc.
Details
• AM & PM Break– 10:45 & 2:15
• Lunch– 12:00 to 1:00PM
• Questions anytime
Introductions
• Name & organization/role
• What do you already know?
• What do you want to learn?
What is a Digital Library?
Define: Digital Library
“Digital libraries are organizations that provide the resources, including specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities.”Digital Library Federation Annual Report,(1998-1999) 1.
Components“Digital libraries are organizations that provide the resources, including specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities.”Digital Library Federation Annual Report,(1998-1999) 1.
Digitization ≠ Preservation
Six Methods of Digital Preservation
1. Technology preservation
2. Technology emulation
3. Data migration
4. Enduring care
5. Refreshing
6. Digital Archaeology
Why should we create a digital collection?
Sustainability
First Step
Audience
http://interconnectionsreport.org/
http://interconnectionsreport.org/
http://interconnectionsreport.org/
Stakeholders
What should we choose?
Selection Committee
Selection Criteria
Selection Process
HANDBOOK FOR DIGITAL PROJECTS:A Management Tool for Preservation and AccessNEDCC
Should, May, Can
• Should it be digitized?
• May it be digitized?
• Can it be digitized?
http://www.nedcc.org/resources/leaflets/6Reformatting/06PreservationAndSelection.php
Intellectual Property Rights
• Do you have the right rights?– Public domain
– Fair use
– Obtain clearance from copyright holders
– Restrict access to comply with licensing and/or privacy stipulations
– Donor concerns
• Check with an expert• See also:
– http://www.copyright.cornell.edu/public_domain/
Other Considerations• Right of Publicity
• Right of Privacy
• Defamation: Libel and slander
• Obscenity and pornography
• Sensitivity to content
• Freedom of Information Act
• Linking
MONE
Y
Operational Costs
Organizational Costs
Staffing Costs
Breakdown
• 1/3 the cost is digital conversion (32% overall)
• Slightly less than 1/3 the cost is in metadata creation--cataloguing, description, and indexing (29% overall)
• Slightly more than 1/3 the cost is in other activities, such as administration and quality control (39% overall)
From Robin Crumri, Indiana University-Purdue University, 2003
Cost Factors• Costs can vary considerably from project to
project
– Size of collection / number of items
– Uniformity of collection
• Books, photos, newspaper articles, sound clips, videos
– Age and condition of originals
– Preparation of originals
– Descriptions/cataloging
Cost Factors• Imaging requirements
– Illustrations
– Charts, tables
• Post-processing of digital files
• Metadata requirements
• Text conversion
– Optical Character Recognition
– Keying
• Markup/encoding costs (HTML, XML) http://flickr.com/photos/cheesepicklescheese/419050330/sizes/m/
Sample Digitization Costs*
Printed Letter, B&W
Printed Letter, Color
Color Photos,
5” x 4”
35 mm color slides
Unmounted negative film, B&W
Suggested digitization
specs
300 dpi
1-bit
B&W
300 dpi
8-bit
Color
600 dpi
24-bit
Color
2700 dpi
24-bit
Color
2700 dpi
8-bit
Grayscale
Average unit cost per item
$0.18 $1.31 $4.82 $3.21 $2.34
*From: “Digitization: is it worth it?” by Stuart D. Lee in Computers in Libraries, vol. 21, no. 5, May 2001, pp. 28-31.
http://www.clir.org/pubs/reports/pub103/appendix6.html
Funding Research• Mission / goals of
agency
• Geographic restrictions
• Subject focus
• Type of support (capital funds, research, programs, etc.)
• Type of institutions supported
• Populations served
• Communicate with potential funders– Letter of inquiry / pre-
proposal
Funding Trends
Out-house vs. In-house
Acquire
1. Gather and prepare source materials
2. Digitally capture originals
3. Process images
4. Store files
5. Maintain files - quality control
Standards
Establish Quality Benchmarks
Image Processing
• Image capture– Resolution
– Bit depth
– Color control
• File formats– TIFF, GIF, JPEG,
PDF ...http://daily.stanford.edu/article/2003/5/22/robotHelpsToDigitizeLibrary
Image Processing: Resolution
Image Resolution - Low
Image Resolution - High(er)
Archival Images/Master Files
• Scanned at highest possible resolution - 600 dpi or higher
• High resolution scans allow for multiple uses (print, zoom, etc.)
• Large file size
• Often stored on CDs, DVDs, external drives, etc.
• TIFF file format
• Maintain over time: refresh/migrate
Derivative Images• Access image (JPG, GIF, PNG, PDF)
– Smaller file size for display/delivery
•Compressed and reduced resolution
– Requires less disk space
– Faster download times
• Thumbnail (JPG, GIF, PNG)
– Even smaller files
– Reference image of sufficient quality to determine further usefulness
Image Storage and Presentation
• File naming – Use a system to keep track of the
multiple files associated with one source object• Original object
• Archival TIFF
• JPEGs (access and thumbnail)
• Backup/storage copy on CD or tape
• Print copy
– Link to description/metadata
100 Pixel GIF
800 Pixel JPG
1400 Pixel JPG
2000 Pixel JPG TIFF PDF TEI MrSid AIFF
Whole DocumentPage 1Page 2Page 3Page 4
Object Components(21 Files and counting…)
Starting a new Family northwest of West Union, Nebraska.
http://memory.loc.gov/cgi-bin/displayPhoto.pl?path=/award/nbhips/lca/103&topImages=10358r.jpg&topLinks=10358v.jpg&displayProfile=0&title=Starting%20a%20new%20Family%20northwest%20of%20West%20Union,%20Nebraska.&m856s=$dnbhips$f10358&dir=ammem&itemLink=r?ammem/psbib:@field(DOCID+@lit(p10358))
New Insights
What is metadata?
http://www.flickr.com/photos/caterina/915384/sizes/o/
Why is metadata important?
• Legal issues
• Preservation
• System improvement and economics
http://www.flickr.com/photos/biwook/145765624/sizes/m/
Why is metadata UNimportant?
• Seven insurmountable obstacles to reliable metadata:1. People lie
2. People are lazy
3. People are stupid
4. Mission Impossible: know thyself
5. Schemas aren't neutral
6. Metrics influence results
7. There's more than one way to describe something
Cory Doctorow - Metacrap
http://www.well.com/~doctorow/metacrap.htm
Metadata Types• Descriptive
– What is it?
– Where is it?
– What is it about?
• Structural– How many files are there?
– Which file is on page one?
• Administrative– What do I need to know to manage it?
– Who can access it?
– What needs to be preserved?
• Technical– What is the resolution of the image?
– What compression format was used?
http://www.flickr.com/photos/saltatempo/323462998/sizes/s/
Metadata Standards• Metadata format standards
– XML• Metadata element sets
– MARC, MODS, DC, EAD, TEI, ONIX
• Metadata content standards– AACR/RDA, DACS, CCO
• Transmission standards and protocols– OAI
• Controlled vocabularies / Thesauri– LCSH, Getty Art and
Architecture
Element Set Overview
Metadata Requirements• Metadata requirements for project
– Determine metadata needs up front
– Documentation, guidelines, and training
– Consistency
• Constraints– System
• OPAC = MARC
– Staff skills / training
It is very important to decide what the material is, what needs to be described, who it is intended for, how it will be retrieved, and how it will be processed and used before deciding on a scheme forits description.
Deciding on a scheme
- Dr. Peter Noerr Digital Library Toolkit – Sun Microsystems
Metadata Content Standards
• In other words, rules for how we describe things
• May include punctuation, format, etc.
http://www.flickr.com/photo_zoom.gne?id=1252545857&size=m
Metadata Content Standards
• Rules and guidelines for metadata content
• Choice usually driven by type of content being described– Anglo American Cataloging Rules (AACR)
– Describing Archives: A Content Standard (DACS)
– Cataloging Cultural Objects (CCO)
Relationships:content standard + element set
• AACR + MARC
• CCO + CDWA/VRA Core
• DACS + EADhttp://www.flickr.com/photo_zoom.gne?id=384440326&size=m
What data structure(s) do staff use to create metadata?
Metadata du Jour
• Description vs. discovery
– Full description is important for collection inventory and management - less so for discovery
• Basic and shallow or deep and sophisticated?
– Basic discovery metadata supports broad, cross-domain searching that can lead users to more complete search mechanisms and descriptions
• Context
– Will your descriptions be adequate outside your institution’s environment?
Interoperability
• Allows different systems to make use of the same data
• Usually achieved by following standards
• In general, an increase in specialization results in a decrease in interoperability
• Important feature of metadata in today’s world
Interoperability
• National Initiative for a Networked Cultural Heritage (NINCH) Guide to Good Practice first two of its six core principles:
1. Optimize interoperability
2. Enable broadest use
• IMLS Leadership Grant– “Project design should demonstrate the use of existing
standards and best practices for digital material where applicable, and products should be interoperable with digital content.”
Shareable Metadata
•Six C’s:– Content
– Consistency
– Coherence
– Context
– Communication
– Conformance
Information R/evolution
http://youtube.com/watch?v=-4CV05HyAbM
Technology
Technical Considerations
• Storage of metadata and digital files
• Database software
– Stores and organizes metadata for each digital file
– Includes link from metadata to resource
• Hardware
– Servers – storage and access
– Bandwidth
• User interface– Usability testing
Database Software
• Types– Library automation software (ILS)
– Digital content management software
– Database software and Web tools
– Shared repository
Database Software• Options
– “Off the shelf”
• CONTENTdm, Luna Insight, DigiTool, etc.
– Open source
• DSpace, Greenstone, Fedora
– Design your own
• Microsoft Access, MySQL
– Shared repositories
• Digital Commonwealth, Maine Memory
– Outsourced hosting
Database Software• Which product is right for you?
• Considerations
– Functionality
• Meet goals for access to collections
– Software already in use at institution
– IT Dept recommendations / support
– Customization
– Cost
User Interface
• Intuitive
• Provide access to multiple file formats: PDF, HTML, Word
• Allow resource manipulation by user
• Ensure adequate information and options for appropriate use of the collection
Security?
Another Way
http://www.flickr.com/photos/chelmsfordpubliclibrary/sets/
Source: http://www.flickr.com/photo_zoom.gne?id=327122302&size=m
Contact Info:Erin [email protected]
Questions?