A Library Science Perspective on Digitization

23
A Library Science Perspective on Digitization Bryan Heidorn University of Arizona

description

A Library Science Perspective on Digitization. Bryan Heidorn University of Arizona. Library-Museum Parallels. Intellectual Property Rights Physical /Digital Objects Sharing Descriptive Metadata Formats Preservation Metadata Transport Metadata Formats Communication Protocols (no so much) - PowerPoint PPT Presentation

Transcript of A Library Science Perspective on Digitization

Page 1: A Library Science Perspective on Digitization

A Library Science Perspective on Digitization

Bryan HeidornUniversity of Arizona

Page 2: A Library Science Perspective on Digitization

Library-Museum Parallels

• Intellectual Property Rights• Physical/Digital Objects Sharing• Descriptive Metadata Formats• Preservation Metadata • Transport Metadata Formats• Communication Protocols (no so much)• Similar Digitization Workflow• OCR Challenges

Page 3: A Library Science Perspective on Digitization

Intellectual Property Rights

• Expanded to 75yrs in US from 25• Academic Publishing anomalies• Attribution required (data no so much) • Decoupling of Data from Text

Page 4: A Library Science Perspective on Digitization

Online Computer Library Center (OCLC)

• Collaborative Automation of libraries including copy cataloging

• Started 1967• Catalog 271 million items/year• 72,000 libraries in 170 countries and

territories use OCLC services to locate, acquire, catalog, lend and preserve library materials.

Page 5: A Library Science Perspective on Digitization

Descriptive Metadata Formats

• MARC(XML) 21 Standard• METS• Dublin Core (Interchange Format only)

Page 6: A Library Science Perspective on Digitization

Biodiversity Heritage Library Workflow

Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries

Page 7: A Library Science Perspective on Digitization
Page 8: A Library Science Perspective on Digitization

MARC 21 Standard

• Formats: Bibliographic, Authority, Holdings, Classification, Community

• Bibliographic Material Types: – Books (BK)– Continuing resources (CR) – Computer files (CF) – Maps (MP) – Music (MU) – Visual materials (VM) – Mixed materials (MX)

http://www.loc.gov/marc/

Page 9: A Library Science Perspective on Digitization

MARC Fields• 00X: Control Fields• 01X-09X: Numbers and Code Fields• Heading Fields - General Information• 1XX: Main Entry Fields• 20X-24X: Title and Title-Related Fields• 25X-28X: Edition, Imprint, Etc. Fields• 3XX: Physical Description, Etc. Fields• 4XX: Series Statement Fields• 5XX: Note Fields• 6XX: Subject Access Fields• 70X-75X: Added Entry Fields• 76X-78X: Linking Entry Fields• 80X-83X: Series Added Entry Fields• 841-88X: Holdings, Location, Alternate Graphics, Etc. Fields

Page 10: A Library Science Perspective on Digitization

MARC Book Exampleeader/00-23 *****nam##22*****#a#4500001 <control number>003 <control number identifier>005 19920331092212.7007/00-01 ta008/00-39 820305s1991####nyu###########001#0#eng##020 ##$a0845348116 :$c$29.95 (£19.50 U.K.)020 ##$a0845348205 (pbk.)040 ##$a[organization code]$c[organization code]050 14$aPN1992.8.S4$bT47 1991082 04$a791.45/75/0973$219100 1#$aTerrace, Vincent,$d1948-245 10$aFifty years of television :$ba guide to series and pilots, 1937-1988 /$cVincent Terrace.246 1#$a50 years of television260 ##$aNew York :$bCornwall Books,$cc1991.300 ##$a864 p. ;$c24 cm.500 ##$aIncludes index.650 #0$aTelevision pilot programs$zUnited States$vCatalogs.650 #0$aTelevision serials$zUnited States$vCatalogs.

Page 11: A Library Science Perspective on Digitization

Difference between Museum and Library

• Full Darwin code has parallels in MARC• Many more commercial and custom products• Larger installed base• Library Entries somewhat more detailed • There is a MARC(XML) and MARC Lite• MARC differentiates among material types

Page 12: A Library Science Perspective on Digitization

Digital Content Transport

• METS – Metadata Encoding and Transmission Standard

• The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language.

Page 13: A Library Science Perspective on Digitization

Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries

Page 14: A Library Science Perspective on Digitization

METS Components• METS Header • Descriptive Metadata • Administrative Metadata • File Section - The file section lists all files containing content

which comprise the electronic versions of the digital object. <file> elements may be grouped within <fileGrp> elements, to provide for subdividing the files by object version.

• Structural Map • Structural Links • Behavior

Page 15: A Library Science Perspective on Digitization

I/O

• Submission Information Package (SIP), which is sent from the information producer to the archive;

• the Archive Information Package (AIP), which is the information package actually stored by the archive; and

• the Dissemination Information Package (DIP), which is the information package transferred from the archive in response to a request by a consumer.

Page 16: A Library Science Perspective on Digitization

Courtesy: Martin KalfatovicProgram Director, Biodiversity Heritage Library, Smithsonian Institution Libraries

Page 17: A Library Science Perspective on Digitization

Open Archives Initiative Protocol for Metadata Harvesting

• The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.

Page 18: A Library Science Perspective on Digitization

OAI Verbs

• Get• Identify• ListIdentifiers• ListMetadataFormats• ListRecords• ListSets

Page 19: A Library Science Perspective on Digitization

Get

• http://arXiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:cs/0112017&metadataPrefix=oai_dc

Page 20: A Library Science Perspective on Digitization

<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-02-08T08:55:46Z</responseDate> <request verb="GetRecord" identifier="oai:arXiv.org:cs/0112017" metadataPrefix="oai_dc">http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv.org:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Using Structural Metadata to Localize Experience of Digital Content</dc:title> <dc:creator>Dushay, Naomi</dc:creator> <dc:subject>Digital Libraries</dc:subject> <dc:description>With the increasing technical sophistication of both information consumers and providers, there is increasing demand for more meaningful experiences of digital information. We present a framework that separates digital object experience, or rendering, from digital object storage and manipulation, so the rendering can be tailored to particular communities of users. </dc:description> <dc:description>Comment: 23 pages including 2 appendices, 8 figures</dc:description> <dc:date>2001-12-14</dc:date> </oai_dc:dc> </metadata> </record> </GetRecord></OAI-PMH>

Page 21: A Library Science Perspective on Digitization

Metadata Collection and Workflow (Macaw)

Page 22: A Library Science Perspective on Digitization

Physical/Digital Objects Sharing

• Books both part of an Edition and Unique• 20th century books have standard front matter• LMS contained Metadata Only• Journals indexed by article• Most digital content is commercially owned and

born digital• 2011 author-publishing exceeded commercial • Born analog digitization (Google Books and BHL)

Page 23: A Library Science Perspective on Digitization

Governance

• Libraries pay for OCLC• OCLC is Participatory• Close Collaboration with Library of Congress

on Standards• School System exists to train librarians• Libraries are being cut in academic, public and

school sectors