Danielle Mericle [email protected] October 7, 2010 Digital Stewardship & Preservation.

61
Danielle Mericle [email protected] October 7, 2010 Digital Stewardship & Preservation

Transcript of Danielle Mericle [email protected] October 7, 2010 Digital Stewardship & Preservation.

Page 1: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Danielle Mericle [email protected]

October 7, 2010

Digital Stewardship & Preservation

Page 2: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Learning GoalsUnderstand basic strategies of digital

stewardship, both technical and institutional

Identify important metadata components of a well-managed collection

Understand the basics of  the PREMIS metadata model

Understand the basics of a preservation repository (OAIS)

Assess your own institution with the goal of implementing a preservation plan & policy

Page 3: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Course outline

9:30-10:45 Part I: Digital stewardship basics, technical and institutional strategies

11-12 Part II: Assessing your own institution(exercises)

1-2 Part III: Framework, metadata, and more

2:15-3:15 Part IV: Implementation strategies

Page 4: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Part I The Basics of Digital Stewardship

Page 5: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

What is Digital Preservation?

“A whole range of activities designed to extend the usable life of digital information and protect them from media failure, physical loss, and obsolescence.” —Libraries & Archives Canada

“Digital preservation combines policies, strategies and actions that ensure access to information in digital formats over time”—ALA working definition

Page 6: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Preservation versus backup

Back-up is the “periodic capture of information to guard against system or component failure or against accidental or deliberate corruption of the system or system metadata.” —Trustworthy Repositories, Center for Research Libraries, March 2007

Backing up is not the same as digital stewardship

Page 7: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Goals of preservation activities

Creation of digital objects that are, over time, -

– Authentic– Renderable– Understandable– Viable

Establishment of sustainable preservation system & accompanying institutional policies

Page 8: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Digital preservation strategies- technicalStandardization

File formats / FilenamingMinimum metadata requirements

Basic metadata for preservationCopying & replicationRefreshingMigration / Normalization / Localization Data recoveryEquipment / technologyEmulation Storage & backup

Page 9: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Standardization

Benchmarking strategiesUse known, widely adopted file-formats vetted

by community, preferably non-proprietary(http://www.digitalpreservation.gov/formats/)

Establish consistent naming convention throughout collections

Establish minimum metadata requirementsTechnicalAdministrativeDescriptivePreservation

Page 10: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Preservation metadata

Provenance: for authentication and a documented history of the file’s contents

Context: why the data was created, how it relates to other data

Reference identifiers: ISBN, accession number, etc. to demonstrate the relationship between the digital file and any physical holding you have

Technical: to describe the technology environment used to create the digital objects and suggest how the files might be read/used

Page 11: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Bitstream copying & replicationBitstream copying: the making of an exact

duplicate of a digital object

Replication: keeping many copies of the same digital object, preserving copies variously, with the hope that one of them will still be viable when it is needed

Combined copying, replication, and metadata form baseline preservation strategy

Page 12: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

RefreshingRefreshing is to copy digital information

from one long-term storage medium to another of the same type with no change whatsoever in the bitstream.

Example- Reburning CD’s in collection every three years

Page 13: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

MigrationWhen a format is at risk for obsolescence,

migration will convert data from one technology or format to another while preserving the essential characteristics of the data.

Some inherent challenges to migrating data- always do thorough quality analysis of material (preferably in an automated fashion)

Page 14: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

NormalizationWhen a file is in a less than optimal format

for preservation, a “normalized” version will be created in a preservation-worthy format, to be archived along with original

Example- PDF broken into page-level TIFFS (note, some functionality lost, such as hyperlinks, etc)

Page 15: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Localization When a file contains links to other files, a

localized version of file will be downloaded and maintained in the repository

Example- a link to an XML file with a metadata schema definition

Guarantees that references between files can always be resolved

Page 16: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Data recoveryData Recovery is rescuing content from

damaged media or hardware

Usually performed by commercial vendors prepared for broken CDs and other critical damage

This is an emergency recovery strategy ONLY!

Page 17: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Technology preservationTechnology Preservation is to preserve the

historic technological environment-- equipment, software, operating systems

Requires space / resource allocation

Page 18: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

EmulationEmulation combines software and

hardware to reproduce the essential characteristics of a different computer so that media designed for one environment can be used in another one.

Can be quite challenging and not fully represent original object

Page 19: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Storage strategies

Optical Media (CD’s & DVD’s)Magnetic tapeExternal driveRAID (redundant disk arrays)Consortium (LOCCKS, Meta-archive)Commercial (Amazon, OCLC)

Page 20: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Storage strategies

Always maintain at least two versions of master images in different geographic locations

Distinguish your masters from your access files- severely limit access to master images

When using localized media (disks, external drives), use common-sense handling- minimize dust, jarring, temperature fluctuations, magnets, UV, etc

Page 21: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

1) Start early in the project/program (when possible)2) Create extensive documentation3) Flexible design- have components separated out for

easy migration, repurposing, so that each component can be updated, altered or removed without interfering with another part of the system.

4) Understand minimum functional requirements and cost/benefit of option to convert in future iterations

Other technical strategies

Page 22: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Digital preservation activities- institutional

Develop preservation policy & plan

Establish digital rights management, including access provisions

Educate community

Secure institutional commitment

Page 23: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Clearly articulates roles & responsibilities among organization (who does what) Maintains & migrates data (back-end) Ensures ongoing access to data (front-end) Monitors & ensures ongoing financial support for

preservation

Defines scope/length of preservation Short/medium/long-term preservation models (may

vary according to data-type) Rate of refreshment Redundancy

Defines ultimate accountability for program

A preservation policy:

Page 24: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Short-term Access to digital materials either for a defined

period of time while use is predicted but which does not extend beyond the foreseeable future and/or until it becomes inaccessible because of changes in technology.

Medium-term Continued access to digital materials beyond

changes in technology for a defined period of time but not indefinitely.

Long-term Continued access to digital materials, or at least to

the information contained in them, indefinitely.

Types of preservation:

Page 25: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Align policy with goals and mission of institution

Include all key stakeholders in policy creation

Couple with other relevant policies, such as IP & IT standards (including benchmarking for minimum digitization requirements)

Should be documented/dated/signed policy by all stakeholders

Preservation policy considerations

Page 26: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Relevant policies Selection- content types, scope, etc Roles & responsibilities Institutional support/funding Format types, at appropriate benchmarks Metadata types, minimum requirements Migration, refreshment strategies Storage, including back-up / redundancy Copyright compliance Access and permissions, both for ingest and

download

All within an integrated technical architecture

Preservation plan

Page 27: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Digital Rights Management

•Acquire and maintain contractual and legal rights and responsibilities

•Includes, but not limited to:•Agreements with faculty regarding preservation and use of contributed items •Understanding of copyright law and fair use for preservation and education•Access provisions to content in preservation repository

Page 28: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Education

•Develop educational program for faculty, staff, and administrators to encourage widespread adoption of best practices and establish buy-in for importance of digital preservation

•Cover basic topics on creating preservation-worthy content, including scanning & metadata creation; copyright law; repository functionality

•Include overview of policies set by institution

Page 29: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Institutional commitment

Recognition of value/ importance by key decision makers

Financial support

Plan or program for responsible digital stewardship

Policies in place for preservation; selection criteria; collection development

Page 30: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

“Framing the benefits from preservation in ways that emphasize outcome rather than process helps place

the cost/benefit analysis underpinning digital preservation in its proper perspective.”

(Sustaining Digital Resources, JISC, 2009)

Outcomes:Maintaining current access to digitized contentServing broad user baseInstitutional recognition Future possibilities for new use and innovationReducing costs over the long-term

Making your case…

Page 31: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Part IIInstitutional Review & Policy Development

Page 32: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Collection Analysis

•Size of existing collection: number of items and storage requirements

•Format of objects

•Relationships between objects

•Anticipated growth rate- existing and new collections

•Existing metadata

•Copyright/ access restrictions

•Search functionality for objects

•Vulnerabilities

Page 33: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Policy analysis

•Who coordinates & approves preservation policies? Copyright compliance?

•Who creates and manages content?

•What content will be retained and for how long?

•Who can submit to the repository?

•Who can download or view content?

•What metadata standards are used?

•Who owns the content?

•How is copyright and IP controlled?

Page 34: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Part IIIRepository framework, metadata, and more…

Page 35: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Pieces of the puzzleDark archive / Light archiveObject data modelMETS metadata modelMIX metadata model PREMIS metadata modelRepository “packages” - SIP /AIP/ DIP OAIS framework

Page 36: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Repository by typeA dark archive cannot be accessed by

any users. Access to the data is either limited to a set few individuals or completely restricted to all.

A light or open archive can be accessed by many authorized users. Access to the data is open to all the members of the "community" that have a need for the data.

Page 37: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Object Data Model Composition of your digital object, may

include:

data file (PDF, TIFF, etc)Bitstream: a sequence of bits with common

attributes for preservation purposes; a data file may have one or more bitstreams embedded (AVI file contains audio and video)

Intellectual entity: coherent set of content that is described as unit (a book, a map, a photograph)

Persistent, unique identifier for object (PID)

Page 38: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

METS metadata model (metadata encoding & transmission standard)

“The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.”

Can be used in tandem with other metadata schemas, such as Dublin Core, VRA (for descriptive), and Premis (for preservation)

Commonly used in preservation repositories to describe digital objects

Page 39: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Seven sections of METS 1.METS Header - The METS Header contains metadata describing the METS document itself,

including such information as creator, editor, etc.

2. Descriptive Metadata - The descriptive metadata section may point to descriptive metadata external to the METS document (e.g., a MARC record in an OPAC or an EAD finding aid maintained on a WWW server), or contain internally embedded descriptive metadata, or both.

3. Administrative Metadata - The administrative metadata section provides information regarding how the files were created and stored, intellectual property rights, metadata regarding the original source object from which the digital library object derives, and information regarding the provenance of the files comprising the digital library object (i.e., master/derivative file relationships, and migration/transformation information

4. File Section - The file section lists all files containing content which comprise the electronic versions of the digital object. <file> elements may be grouped within <fileGrp> elements, to provide for subdividing the files by object version.

5. Structural Map - The structural map is the heart of a METS document. It outlines a hierarchical structure for the digital library object, and links the elements of that structure to content files and metadata that pertain to each element.

6. Structural Links - The Structural Links section of METS allows METS creators to record the existence of hyperlinks between nodes in the hierarchy outlined in the Structural Map. This is of particular value in using METS to archive Websites.

7. Behavior - A behavior section can be used to associate executable behaviors with content in the METS object. Each behavior within a behavior section has an interface definition element that represents an abstract definition of the set of behaviors represented by a particular behavior section. Each behavior also has a mechanism element which identifies a module of executable code that implements and runs the behaviors defined abstractly by the interface definition.

Page 40: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

MIX “MIX is an XML schema for a set of technical data elements required to manage digital image collections. The schema provides a format for interchange and/or storage of the data specified in the Data Dictionary - Technical Metadata for Digital Still Images (ANSI/NISO Z39.87-2006). This schema is currently referred to as ‘NISO Metadata for Images in XML (NISO MIX)’.”

Metadata For Images in XML An XML Schema designed for expressing technical metadata for digital still

images Based on the NISO Z39.87 Data Dictionary – Technical Metadata for Digital

Still Images Used to express attributes of digital images such as file format, file size,

dimensions, resolution, compression, etc. Version 1.0 (recently released) includes support for GIS images and JPEG

2000 images; data element names harmonized with PREMIS Can be used standalone or as an extension schema with METS JHOVE tool can automatically extract much of data

Page 41: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

PREMIS metadata model

A data dictionary for metadata to support the long-term preservation of digital objects

A supporting set of XML schema for implementation in a variety of contexts

Please note- many of the fields are meant to be automatically generated from repository system

Page 42: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

PREMISProvenance:

Who has had custody/ownership of the digital object?

Authenticity:Is the digital object what it purports to be?

Preservation Activity:What has been done to preserve the digital object?

Technical Environment:What is needed to render and use the digital object?

Rights Management:What IPR must be observed?

Page 43: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

PREMIS data model

Page 44: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

PREMIS objects a unique identifier for the object (type and value) fixity information such as a checksum (message digest) and

the algorithm used to derive it the size of the object the format of the object, which can be specified directly or by

linking to a format registry the original name of the object information about its creation information about inhibitors (passwords or encryption, for

example) information about its significant properties information about its environment (see below)

where and on what medium it is stored relationships with other objects and other types of entities

Page 45: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

PREMIS eventsa unique identifier for the event (type and

value)the type of event (creation, ingestion,

migration, etc.)the date and time the event occurreda detailed description of the eventa coded outcome of the event a more detailed description of the outcomeagents involved in the event and their roles objects involved in the event and their roles

Page 46: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Preservation metadata in context

Page 47: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Using the standards together A container format such as METS allows for packaging together forms

of metadata with objects or pointers to objects METS profiles detail how METS is used for particular object types or

applications Best practices are needed (and being developed) for use of PREMIS

with METS and MIX Using METS, MODS, PREMIS and MIX:

http://www.loc.gov/premis/louis.xml

Page 48: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Repository packagesSubmission Information Package (SIP):

metadata and content data files to be ingested in repository

Archival Information Package (AIP): ingested SIP that may have been normalized/localized for long-term storage

Dissemination Information Package (DIP): Export package of AIP, including “last, best” version of any file

Page 49: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Open Archival Information System

“An archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. It meets a set of responsibilities that allows an OAIS archive to be distinguished from other uses of the term ‘archive’.”

Provides a standardized framework for the management and long-term preservation of content

“Open” refers to “open-source” development, and not open access

Options for implementation- inhouse, partnership, outsourced

Page 50: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Open Archival Information System

Page 51: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

OAIS dataflow overview Ingest functions

Receive SIPS; automated qa; generate AIPS; populate archive database with SIP information; localize, normalize, migrate

Archival storage functions Receive AIPS and add to permanent storage; create

redundant copies; perform error checking; refresh media; provide disaster recovery; provide AIPS for access (DIPs)

Database management functions Perform database updates; perform search queries on

archive; produce reports; maintain metadata schema and referential integrity

Administrative functions Soliciting & negotiating submission agreements;

auditing SIPS; developing standards and policies

Page 52: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Dataflow

Page 53: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Part IVImplementation

Page 54: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Develop relevant policies Determine selection- content types, scope, etc Define roles & responsibilities Secure institutional support/funding Benchmark

Format types and settings Metadata types, minimum requirements

Migration, refreshment strategies Storage, including back-up / redundancy Copyright compliance, access and permissions,

both for ingest and download Determine your solution for integrated technical

architecture

Preservation plan

Page 55: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Different strategiesBaseline – inhouse

Minimum requirementsOutsourced - OCLCPartnerships / consortiums

CLOCKSSMeta-archive

Full scale development- inhouse based on existing open-source softwareDAITSSFedora

Page 56: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Baseline preservation requirementsStandardized file formats & naming

conventionsMaintenance of “master” images separate from

“access” imagesPlan for refreshing and/or migrating dataMetadata capture and management

Technical- device; color management info; date; operator; format; inhibitors

Descriptive- any bib info; existing or new data Administrative- project info; provenance; history

Redundant storage Preferred geographic separation and media variety

Page 57: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

OCLCOutsourced digital archive- managed

storage; monitoring & reports; integrated workflows

Works with CONTENTdm; Fedora; D-Space Supports mandatory PREMIS informationCan support in file-migration /

normalizationSomewhat expensive

Page 58: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

LOCKSS/ CLOCKSS (Controlled)/ Lots of Copies Keep Stuff Safe LOCKSS can preserve local collections, including thesis, images,

AND subscription content from participating publishers. CLOCKSS is primarily for publisher generated content. Both are geared towards newly generated content and born digital.

Distributed preservation model- storage nodes exist throughout world

Provides an OAIS-compliant, open source, peer-to-peer, decentralized digital preservation infrastructure. It is format-agnostic, preserving all formats and genres of web-published content, provided the content has an authoritative version. The intellectual content, which includes the historical context (the look and feel), is preserved. Content preserved by libraries in their LOCKSS Box becomes a part of their collection, and they have perpetual access to all of it.

Annual fee

Page 59: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Meta-archiveCollaborative, community led initiative to provide low-

cost, high-impact preservation services to help ensure the long-term accessibility of the digital assets of universities, libraries, museums, and other cultural heritage institutions

Using a technical framework that is based on the LOCKSS (Lots of Copies Keep Stuff Safe) software, collections are ingested into a geographically distributed network where they are stored on secure file servers in multiple locations. These servers do not merely back up the materials, but rather provide a dynamic means of constantly checking each file and providing repairs whenever necessary.

Requires programmer/ hardware commitment Annual fee

Page 60: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

DAITSS, Fedora, and Planets

Skeletal preservation repository application as open source software

Dark Archive (DAITSS) / optional (Fedora)Supports full OAIS functional model & METS

metadata with PREMIS compliancyRequires robust inhouse programming &

hardware

Page 61: Danielle Mericle dkm26@cornell.edu October 7, 2010 Digital Stewardship & Preservation.

Contact [email protected]