Gettingstartedwithdigitalcollectionsweb[1]

91
Getting Started with Digital Collections Erin Logsdon Consultant, Digital Solutions NELINET, Inc.

Transcript of Gettingstartedwithdigitalcollectionsweb[1]

Page 1: Gettingstartedwithdigitalcollectionsweb[1]

Getting Started with Digital Collections

Erin Logsdon

Consultant, Digital Solutions

NELINET, Inc.

Page 2: Gettingstartedwithdigitalcollectionsweb[1]

Details

• AM & PM Break– 10:45 & 2:15

• Lunch– 12:00 to 1:00PM

• Questions anytime

Page 3: Gettingstartedwithdigitalcollectionsweb[1]

Introductions

• Name & organization/role

• What do you already know?

• What do you want to learn?

Page 4: Gettingstartedwithdigitalcollectionsweb[1]

What is a Digital Library?

Page 5: Gettingstartedwithdigitalcollectionsweb[1]

Define: Digital Library

“Digital libraries are organizations that provide the resources, including specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities.”Digital Library Federation Annual Report,(1998-1999) 1.

Page 6: Gettingstartedwithdigitalcollectionsweb[1]

Components“Digital libraries are organizations that provide the resources, including specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities.”Digital Library Federation Annual Report,(1998-1999) 1.

Page 7: Gettingstartedwithdigitalcollectionsweb[1]

Digitization ≠ Preservation

Page 8: Gettingstartedwithdigitalcollectionsweb[1]

Six Methods of Digital Preservation

1. Technology preservation

2. Technology emulation

3. Data migration

4. Enduring care

5. Refreshing

6. Digital Archaeology

Page 9: Gettingstartedwithdigitalcollectionsweb[1]

Why should we create a digital collection?

Page 10: Gettingstartedwithdigitalcollectionsweb[1]
Page 11: Gettingstartedwithdigitalcollectionsweb[1]

Sustainability

Page 12: Gettingstartedwithdigitalcollectionsweb[1]

First Step

Page 13: Gettingstartedwithdigitalcollectionsweb[1]

Audience

Page 17: Gettingstartedwithdigitalcollectionsweb[1]

Stakeholders

Page 18: Gettingstartedwithdigitalcollectionsweb[1]

What should we choose?

Page 19: Gettingstartedwithdigitalcollectionsweb[1]

Selection Committee

Page 20: Gettingstartedwithdigitalcollectionsweb[1]

Selection Criteria

Page 21: Gettingstartedwithdigitalcollectionsweb[1]
Page 22: Gettingstartedwithdigitalcollectionsweb[1]

Selection Process

HANDBOOK FOR DIGITAL PROJECTS:A Management Tool for Preservation and AccessNEDCC

Page 23: Gettingstartedwithdigitalcollectionsweb[1]

Should, May, Can

• Should it be digitized?

• May it be digitized?

• Can it be digitized?

http://www.nedcc.org/resources/leaflets/6Reformatting/06PreservationAndSelection.php

Page 24: Gettingstartedwithdigitalcollectionsweb[1]

Intellectual Property Rights

• Do you have the right rights?– Public domain

– Fair use

– Obtain clearance from copyright holders

– Restrict access to comply with licensing and/or privacy stipulations

– Donor concerns

• Check with an expert• See also:

– http://www.copyright.cornell.edu/public_domain/

Page 25: Gettingstartedwithdigitalcollectionsweb[1]

Other Considerations• Right of Publicity

• Right of Privacy

• Defamation: Libel and slander

• Obscenity and pornography

• Sensitivity to content

• Freedom of Information Act

• Linking

Page 26: Gettingstartedwithdigitalcollectionsweb[1]
Page 27: Gettingstartedwithdigitalcollectionsweb[1]

MONE

Y

Page 28: Gettingstartedwithdigitalcollectionsweb[1]

Operational Costs

Page 29: Gettingstartedwithdigitalcollectionsweb[1]

Organizational Costs

Page 30: Gettingstartedwithdigitalcollectionsweb[1]

Staffing Costs

Page 31: Gettingstartedwithdigitalcollectionsweb[1]

Breakdown

• 1/3 the cost is digital conversion (32% overall)

• Slightly less than 1/3 the cost is in metadata creation--cataloguing, description, and indexing (29% overall)

• Slightly more than 1/3 the cost is in other activities, such as administration and quality control (39% overall)

From Robin Crumri, Indiana University-Purdue University, 2003

Page 32: Gettingstartedwithdigitalcollectionsweb[1]

Cost Factors• Costs can vary considerably from project to

project

– Size of collection / number of items

– Uniformity of collection

• Books, photos, newspaper articles, sound clips, videos

– Age and condition of originals

– Preparation of originals

– Descriptions/cataloging

Page 33: Gettingstartedwithdigitalcollectionsweb[1]

Cost Factors• Imaging requirements

– Illustrations

– Charts, tables

• Post-processing of digital files

• Metadata requirements

• Text conversion

– Optical Character Recognition

– Keying

• Markup/encoding costs (HTML, XML) http://flickr.com/photos/cheesepicklescheese/419050330/sizes/m/

Page 34: Gettingstartedwithdigitalcollectionsweb[1]

Sample Digitization Costs*

Printed Letter, B&W

Printed Letter, Color

Color Photos,

5” x 4”

35 mm color slides

Unmounted negative film, B&W

Suggested digitization

specs

300 dpi

1-bit

B&W

300 dpi

8-bit

Color

600 dpi

24-bit

Color

2700 dpi

24-bit

Color

2700 dpi

8-bit

Grayscale

Average unit cost per item

$0.18 $1.31 $4.82 $3.21 $2.34

*From: “Digitization: is it worth it?” by Stuart D. Lee in Computers in Libraries, vol. 21, no. 5, May 2001, pp. 28-31.

Page 35: Gettingstartedwithdigitalcollectionsweb[1]

http://www.clir.org/pubs/reports/pub103/appendix6.html

Page 36: Gettingstartedwithdigitalcollectionsweb[1]

Funding Research• Mission / goals of

agency

• Geographic restrictions

• Subject focus

• Type of support (capital funds, research, programs, etc.)

• Type of institutions supported

• Populations served

• Communicate with potential funders– Letter of inquiry / pre-

proposal

Page 37: Gettingstartedwithdigitalcollectionsweb[1]

Funding Trends

Page 38: Gettingstartedwithdigitalcollectionsweb[1]

Out-house vs. In-house

Page 39: Gettingstartedwithdigitalcollectionsweb[1]

Acquire

1. Gather and prepare source materials

2. Digitally capture originals

3. Process images

4. Store files

5. Maintain files - quality control

Page 40: Gettingstartedwithdigitalcollectionsweb[1]

Standards

Page 41: Gettingstartedwithdigitalcollectionsweb[1]

Establish Quality Benchmarks

Page 42: Gettingstartedwithdigitalcollectionsweb[1]

Image Processing

• Image capture– Resolution

– Bit depth

– Color control

• File formats– TIFF, GIF, JPEG,

PDF ...http://daily.stanford.edu/article/2003/5/22/robotHelpsToDigitizeLibrary

Page 43: Gettingstartedwithdigitalcollectionsweb[1]

Image Processing: Resolution

Page 44: Gettingstartedwithdigitalcollectionsweb[1]

Image Resolution - Low

Page 45: Gettingstartedwithdigitalcollectionsweb[1]

Image Resolution - High(er)

Page 46: Gettingstartedwithdigitalcollectionsweb[1]
Page 47: Gettingstartedwithdigitalcollectionsweb[1]
Page 48: Gettingstartedwithdigitalcollectionsweb[1]
Page 49: Gettingstartedwithdigitalcollectionsweb[1]

Archival Images/Master Files

• Scanned at highest possible resolution - 600 dpi or higher

• High resolution scans allow for multiple uses (print, zoom, etc.)

• Large file size

• Often stored on CDs, DVDs, external drives, etc.

• TIFF file format

• Maintain over time: refresh/migrate

Page 50: Gettingstartedwithdigitalcollectionsweb[1]

Derivative Images• Access image (JPG, GIF, PNG, PDF)

– Smaller file size for display/delivery

•Compressed and reduced resolution

– Requires less disk space

– Faster download times

• Thumbnail (JPG, GIF, PNG)

– Even smaller files

– Reference image of sufficient quality to determine further usefulness

Page 51: Gettingstartedwithdigitalcollectionsweb[1]
Page 52: Gettingstartedwithdigitalcollectionsweb[1]
Page 53: Gettingstartedwithdigitalcollectionsweb[1]
Page 54: Gettingstartedwithdigitalcollectionsweb[1]

Image Storage and Presentation

• File naming – Use a system to keep track of the

multiple files associated with one source object• Original object

• Archival TIFF

• JPEGs (access and thumbnail)

• Backup/storage copy on CD or tape

• Print copy

– Link to description/metadata

Page 55: Gettingstartedwithdigitalcollectionsweb[1]

100 Pixel GIF

800 Pixel JPG

1400 Pixel JPG

2000 Pixel JPG TIFF PDF TEI MrSid AIFF

Whole DocumentPage 1Page 2Page 3Page 4

Object Components(21 Files and counting…)

Page 56: Gettingstartedwithdigitalcollectionsweb[1]

Starting a new Family northwest of West Union, Nebraska.

http://memory.loc.gov/cgi-bin/displayPhoto.pl?path=/award/nbhips/lca/103&topImages=10358r.jpg&topLinks=10358v.jpg&displayProfile=0&title=Starting%20a%20new%20Family%20northwest%20of%20West%20Union,%20Nebraska.&m856s=$dnbhips$f10358&dir=ammem&itemLink=r?ammem/psbib:@field(DOCID+@lit(p10358))

Page 57: Gettingstartedwithdigitalcollectionsweb[1]

New Insights

Page 58: Gettingstartedwithdigitalcollectionsweb[1]

What is metadata?

http://www.flickr.com/photos/caterina/915384/sizes/o/

Page 59: Gettingstartedwithdigitalcollectionsweb[1]

Why is metadata important?

• Legal issues

• Preservation

• System improvement and economics

http://www.flickr.com/photos/biwook/145765624/sizes/m/

Page 60: Gettingstartedwithdigitalcollectionsweb[1]

Why is metadata UNimportant?

• Seven insurmountable obstacles to reliable metadata:1. People lie

2. People are lazy

3. People are stupid

4. Mission Impossible: know thyself

5. Schemas aren't neutral

6. Metrics influence results

7. There's more than one way to describe something

Cory Doctorow - Metacrap

http://www.well.com/~doctorow/metacrap.htm

Page 61: Gettingstartedwithdigitalcollectionsweb[1]

Metadata Types• Descriptive

– What is it?

– Where is it?

– What is it about?

• Structural– How many files are there?

– Which file is on page one?

• Administrative– What do I need to know to manage it?

– Who can access it?

– What needs to be preserved?

• Technical– What is the resolution of the image?

– What compression format was used?

http://www.flickr.com/photos/saltatempo/323462998/sizes/s/

Page 62: Gettingstartedwithdigitalcollectionsweb[1]

Metadata Standards• Metadata format standards

– XML• Metadata element sets

– MARC, MODS, DC, EAD, TEI, ONIX

• Metadata content standards– AACR/RDA, DACS, CCO

• Transmission standards and protocols– OAI

• Controlled vocabularies / Thesauri– LCSH, Getty Art and

Architecture

Page 63: Gettingstartedwithdigitalcollectionsweb[1]

Element Set Overview

Page 64: Gettingstartedwithdigitalcollectionsweb[1]

Metadata Requirements• Metadata requirements for project

– Determine metadata needs up front

– Documentation, guidelines, and training

– Consistency

• Constraints– System

• OPAC = MARC

– Staff skills / training

Page 65: Gettingstartedwithdigitalcollectionsweb[1]

It is very important to decide what the material is, what needs to be described, who it is intended for, how it will be retrieved, and how it will be processed and used before deciding on a scheme forits description.

Deciding on a scheme

- Dr. Peter Noerr Digital Library Toolkit – Sun Microsystems

Page 66: Gettingstartedwithdigitalcollectionsweb[1]

Metadata Content Standards

• In other words, rules for how we describe things

• May include punctuation, format, etc.

http://www.flickr.com/photo_zoom.gne?id=1252545857&size=m

Page 67: Gettingstartedwithdigitalcollectionsweb[1]

Metadata Content Standards

• Rules and guidelines for metadata content

• Choice usually driven by type of content being described– Anglo American Cataloging Rules (AACR)

– Describing Archives: A Content Standard (DACS)

– Cataloging Cultural Objects (CCO)

Page 68: Gettingstartedwithdigitalcollectionsweb[1]

Relationships:content standard + element set

• AACR + MARC

• CCO + CDWA/VRA Core

• DACS + EADhttp://www.flickr.com/photo_zoom.gne?id=384440326&size=m

Page 69: Gettingstartedwithdigitalcollectionsweb[1]
Page 70: Gettingstartedwithdigitalcollectionsweb[1]

What data structure(s) do staff use to create metadata?

Page 71: Gettingstartedwithdigitalcollectionsweb[1]
Page 72: Gettingstartedwithdigitalcollectionsweb[1]

Metadata du Jour

• Description vs. discovery

– Full description is important for collection inventory and management - less so for discovery

• Basic and shallow or deep and sophisticated?

– Basic discovery metadata supports broad, cross-domain searching that can lead users to more complete search mechanisms and descriptions

• Context

– Will your descriptions be adequate outside your institution’s environment?

Page 73: Gettingstartedwithdigitalcollectionsweb[1]

Interoperability

• Allows different systems to make use of the same data

• Usually achieved by following standards

• In general, an increase in specialization results in a decrease in interoperability

• Important feature of metadata in today’s world

Page 74: Gettingstartedwithdigitalcollectionsweb[1]

Interoperability

• National Initiative for a Networked Cultural Heritage (NINCH) Guide to Good Practice first two of its six core principles:

1. Optimize interoperability

2. Enable broadest use

• IMLS Leadership Grant– “Project design should demonstrate the use of existing

standards and best practices for digital material where applicable, and products should be interoperable with digital content.”

Page 75: Gettingstartedwithdigitalcollectionsweb[1]

Shareable Metadata

•Six C’s:– Content

– Consistency

– Coherence

– Context

– Communication

– Conformance

Page 77: Gettingstartedwithdigitalcollectionsweb[1]

Technology

Page 78: Gettingstartedwithdigitalcollectionsweb[1]

Technical Considerations

• Storage of metadata and digital files

• Database software

– Stores and organizes metadata for each digital file

– Includes link from metadata to resource

• Hardware

– Servers – storage and access

– Bandwidth

• User interface– Usability testing

Page 79: Gettingstartedwithdigitalcollectionsweb[1]

Database Software

• Types– Library automation software (ILS)

– Digital content management software

– Database software and Web tools

– Shared repository

Page 80: Gettingstartedwithdigitalcollectionsweb[1]

Database Software• Options

– “Off the shelf”

• CONTENTdm, Luna Insight, DigiTool, etc.

– Open source

• DSpace, Greenstone, Fedora

– Design your own

• Microsoft Access, MySQL

– Shared repositories

• Digital Commonwealth, Maine Memory

– Outsourced hosting

Page 81: Gettingstartedwithdigitalcollectionsweb[1]

Database Software• Which product is right for you?

• Considerations

– Functionality

• Meet goals for access to collections

– Software already in use at institution

– IT Dept recommendations / support

– Customization

– Cost

Page 82: Gettingstartedwithdigitalcollectionsweb[1]

User Interface

• Intuitive

• Provide access to multiple file formats: PDF, HTML, Word

• Allow resource manipulation by user

• Ensure adequate information and options for appropriate use of the collection

Page 83: Gettingstartedwithdigitalcollectionsweb[1]

Security?

Page 84: Gettingstartedwithdigitalcollectionsweb[1]
Page 85: Gettingstartedwithdigitalcollectionsweb[1]

Another Way

http://www.flickr.com/photos/chelmsfordpubliclibrary/sets/

Page 86: Gettingstartedwithdigitalcollectionsweb[1]
Page 87: Gettingstartedwithdigitalcollectionsweb[1]
Page 88: Gettingstartedwithdigitalcollectionsweb[1]
Page 89: Gettingstartedwithdigitalcollectionsweb[1]
Page 90: Gettingstartedwithdigitalcollectionsweb[1]
Page 91: Gettingstartedwithdigitalcollectionsweb[1]

Source: http://www.flickr.com/photo_zoom.gne?id=327122302&size=m

Contact Info:Erin [email protected]

Questions?