Preservation Rumination Priscilla Caplan, FCLA OCLC DSS February 16, 2005.

Post on 27-Mar-2015

218 views 3 download

Tags:

Transcript of Preservation Rumination Priscilla Caplan, FCLA OCLC DSS February 16, 2005.

Preservation Rumination

Priscilla Caplan,FCLA

OCLC DSSFebruary 16, 2005

Preservation Basics

THE NEED FOR DIGITAL PRESERVATION

Number of academic/scholarly journals published online: 15,757

Percent of U.S. federal government publications produced only online in 2003: 65 percent

Estimated percent of U.S. federal government publications available only online by 2008: 90 percent

From: California Digital Libraryhttp://www.cdlib.org/inside/projects/preservation

The problem of abundance

0500

10001500200025003000350040004500

items (millions)

LoCWeb

•Percent of web-based references in scientific articles from 3 major journals inaccessible within 2 years of publication: 21%

•Proportion of websites in 1998 gone in 1999: 44%

•Life of an average website: 44 days

The problem of ephemerality

The problems of media life expectancy and obsolescence

The problem of format obsolescence

Maintain original

technology

Preserve Technology

OBJECTIVE

Preserve Objects

Spec

ific

APPLI

CABIL

ITY

Gen

eral

ProgrammableChips

Emulation

Viewer

Re-engineerSoftware

VirtualMachine

UniversalVirtual

Computer

VersionMigration

FormatStandardization

Rosetta StoneTranslation

Typed ObjectConversion

PersistentArchives

ObjectI nterchange

Format

Source: Thibodeau, 2002.

The problem of rights

Integrity

Viability

Renderability

The Preservation Pyramid

Description

Secure storage

Media management

Preservation strategies

Availability

Identity

CaptureSelection

Authenticity

Traditionally, preserving things meant keeping them unchanged; however … if we hold on todigital information without modifications, accessing the information will become increasinglymore difficult, if not impossible.

From: The Paradox of Preservation,Su-Shing Chen

“Preservation metadata ...is the information necessary to maintain the viability, renderability, and understandability of digital resources over the long-term.”

OCLC/RLGPreservation

Metadata Framework Working Group

Understandability

Integrity

Viability

Renderability

Revised Preservation Pyramid

Description

Secure storage

Media management

Availability

Identity

CaptureSelection

UnderstandabilityAuthenticity

Preservation strategies

Who is doing preservation?

Research Libraries

Government Archives

Historical Societies

Individual Collectors

Who is doing digital preservation?

Research Libraries

Government Archives

Historical Societies

Individual Collectors

National Libraries

Research Centers

Public broadcasting

Integrity

Viability

Renderability

Description

Secure storage

Media management

Availability

Identity

CaptureSelection

UnderstandabilityAuthenticity

Preservation strategies

DSPACE

Integrity

Viability

Renderability

Description

Secure storage

Media management

Availability

Identity

CaptureSelection

UnderstandabilityAuthenticity

Preservation strategies

LOCKSS

Integrity

Viability

Renderability

Description

Secure storage

Media management

Availability

Identity

CaptureSelection

UnderstandabilityAuthenticity

Preservation strategies

OCLCDigitalArchive

Integrity

Viability

Renderability

Description

Secure storage

Media management

Availability

Identity

CaptureSelection

UnderstandabilityAuthenticity

Preservation strategies

LCMinerva

Integrity

Viability

Renderability

Description

Secure storage

Media management

Availability

Identity

CaptureSelection

UnderstandabilityAuthenticity

Preservation strategies

FCLADigitalArchive

Preservation in Action

State Universities

FCLA

•Designed as a “dark archive”•Preservation repository functions only•Based on OAIS functional architecture•“Bit-level” and “Full” preservation•Format migration and normalization

OAIS Functional Architecture

4-1.

2

MANAGEMENT

Ingest

Data Management

SIP

AIPDIP

queries

result setsAccess

PRODUCER

CONSUMER

Descriptive Info

AIP

orders

Descriptive Info

Archival Storage

Administration

Preservation Planning

DAITSS Functional Architecture

IngestSIP

AIP

Storagemanagement

Access

DIP

Reporting

MgmtDB

L

I

B

R

A

R

Y

L

I

B

R

A

R

Y

DAITSS Data Model

Intellectualentity

(1)

Bitstream(0..n)

Information Package

Data File (1..n)

DAITSS Data File Object

X M L S G M L

M a rku p F ile T IF F F ile

D T D

T e x tF ile P D F F ile

D a ta F ile

A u d io

JP E G Im a ge T IF F Im a ge

Im a ge T e xt V id eo

B its tre am

DAITSS Bitstream Object

Risk Management

•Storing multiple master copies of files•Calculating two message digests•Storing metadata as XML and in RDBMs•Normalizing when possible•Always retaining original•Action plans and background papers

Ingest Functions

METS validation and metadata extraction Virus check and checksum verification File format identification Creation of Data File and Bitstream objects Harvesting of external files Normalization and Forward Migration Technical, relationship and event metadata AIP creation Storage update Data table update

Ingest Example: A simple SIP

XML

PDF AVI

SIP

XML

PDF AVI

SIP

XML

XML

XML

XML

XML

XML

TIFF

TIFF

TIFF

Database

AIP

Future Plans

Find partners to install at other places

Finish DAITSS

Release under open source license

Build a community of developers for different formats

References

Priscilla Caplan: www.fcla.edu/~pcaplan, pcaplan@ufl.edu FCLA Digital Archive: www.fcla.edu/digitalArchive Terry Kuny, “A Digital Dark Ages?”

www.ifla.org/IV/ifla63/63kuny1.pdf PREMIS Implementation Survey

www.oclc.org/research/projects/pmwg/surveyreport.pdf Roy Rosenzweig, “Scarcity or Abundance?”

www.historycooperative.org/journals/ahr/108.3/rosenzweig.html

O’Neil et al. “Trends in the Evolution of the Public Web” www.dlib.org/dlib/april03/lavoie/04lavoie.html

Clifford Lynch, “Authenticity and Integrity in the Digital Environment” www.clir.org/pubs/reports/pub92/lynch.html