Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill...

31
Preserving Preserving eScholarship and eScholarship and Digitized Special Digitized Special Collections Collections Distributed Digital Distributed Digital Preservation Preservation Bill Donovan Bill Donovan [email protected] [email protected]

Transcript of Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill...

Page 1: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

Preserving Preserving eScholarship and eScholarship and Digitized Special Digitized Special

Collections Collections Distributed Digital Distributed Digital

PreservationPreservation

Bill DonovanBill [email protected]@bc.edu

Page 2: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 22

SummarySummary

As stewards of eScholarship and digitized As stewards of eScholarship and digitized special collections, special collections, we are responsiblewe are responsible for for saving these and other treasures saving these and other treasures effectively effectively and economicallyand economically..

One approach for digital preservation is being One approach for digital preservation is being spearheaded by the spearheaded by the MetaArchive CooperativeMetaArchive Cooperative; ; collections are replicated by peer institutions collections are replicated by peer institutions to guard against loss. The MetaArchive to guard against loss. The MetaArchive approach is approach is one modelone model for cultural memory for cultural memory organizations to consider adopting/adapting organizations to consider adopting/adapting for their own use.for their own use.

Page 3: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 33

Rationale for this talkRationale for this talk

Not recruiting for MetaArchive Not recruiting for MetaArchive CooperativeCooperative

DDP = a work in progressDDP = a work in progress Just one approach, but promisingJust one approach, but promising

– Adaptable for other “CMO” consortia?Adaptable for other “CMO” consortia?– Cultural memory organizations (CMOs)Cultural memory organizations (CMOs)

Perspective of just one memberPerspective of just one member Ulterior motive: convince managementUlterior motive: convince management

Page 4: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 44

eScholarship@BCeScholarship@BC

Page 5: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 55

Special CollectionsSpecial Collections

Page 6: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 66

““Digital Preservation”Digital Preservation” defineddefined

““Digital preservation” combines Digital preservation” combines policiespolicies, , strategiesstrategies and and actionsactions that ensurethat ensure accessaccess toto digital content digital content over time.over time.

http://www.ala.org/ala/mgrps/divs/alchttp://www.ala.org/ala/mgrps/divs/alcts/resources/preserv/defdigpres0408.ts/resources/preserv/defdigpres0408.cfmcfm

Page 7: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 77

DistributedDistributed Digital Preservation Digital Preservation (DDP)(DDP)

geographically dispersed sitesgeographically dispersed sites

Page 8: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 88

““MetaArchive Cooperative”?MetaArchive Cooperative”?

low-cost, high-impact DDP for “CMOs”– e.g. libraries, research centers, and

museums founded in 2004; funding from:

– NDIIPP (Library of Congress)– NHPRC (National Archives)

Not vendor-based; enable CMOs to own and control the process of digital preservation for themselves.

Bill Donovan
Nat’l Dig Info Infrastructure & Preserv Prgm
Bill Donovan
Nat’l Historical Pubs and Records Commission
Page 9: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 99

MetaArchives’s networksMetaArchives’s networks

Page 10: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1010

MetaArchive’s ETD networkMetaArchive’s ETD network

Page 11: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1111

Policies & Strategy --- 1Policies & Strategy --- 1

Flat, Trim, Tight-Knit organizationFlat, Trim, Tight-Knit organization• P2P: no supermember, no host institutionP2P: no supermember, no host institution• Minimal overhead, bureaucracyMinimal overhead, bureaucracy• Emphasis on communication & collaborationEmphasis on communication & collaboration

• Committees: steering, technical, content, preservationCommittees: steering, technical, content, preservation

Self-sufficiencySelf-sufficiency• avoid outsourcing; retain controlavoid outsourcing; retain control

• cost containment, understand & refine processcost containment, understand & refine process• sustainable sources of fundingsustainable sources of funding

Page 12: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1212

Policies & Strategy --- 2Policies & Strategy --- 2

Caches (dark archives)Caches (dark archives)– 6 replications6 replications– Access only via contributing memberAccess only via contributing member

Active monitoring of the integrity of Active monitoring of the integrity of stored digital content --- NOT just stored digital content --- NOT just back-upsback-ups

For ETDs, discovery via Networked For ETDs, discovery via Networked Digital Library of Theses & Digital Library of Theses & Dissertations, NDLTD Dissertations, NDLTD

Page 13: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1313

Local actions/responsibilitiesLocal actions/responsibilities

Skills & infrastructureSkills & infrastructure Copyright responsibilityCopyright responsibility Data wranglingData wrangling

– Format choicesFormat choices Proprietary versus open formatsProprietary versus open formats

– Bit preservation versus migrationBit preservation versus migration

– Filenaming & directoriesFilenaming & directories Preservation information (OAIS)Preservation information (OAIS)

Page 14: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1414

Adapted from: “Reference Model for an Open Archival Information System” CCSDS 650.0-B-1 (2002)

OAIS = OAIS = OOpen pen AArchival rchival IInformation nformation SSystemystem

Page 15: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1515

OAIS preservation OAIS preservation informationinformation

PreservationDescriptionInformation

ReferenceInformation

ProvenanceInformation

ContextInformation

FixityInformation

Page 16: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1616

OAIS preservation OAIS preservation informationinformation

PreservationDescriptionInformation

ReferenceInformation

ProvenanceInformation

ContextInformation

FixityInformation

… identifies, and if necessary describes, one or more mechanisms used to provide assigned identifiers for the Content Information. It also provides identifiers that allow outside systems to refer, unambiguously, to a particular Content Information. An example of Reference Information is an ISBN.

Page 17: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1717

OAIS preservation OAIS preservation informationinformation

PreservationDescriptionInformation

ReferenceInformation

ProvenanceInformation

ContextInformation

FixityInformation

… documents the history of the Content Information. … tells the origin or source of the Content Information, any changes that may have taken place since it was originated, and who has had custody of it since it was originated. Examples of Provenance Information are the principal investigator who recorded the data, and the information concerning its storage, handling, and migration.

Page 18: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1818

OAIS preservation OAIS preservation informationinformation

PreservationDescriptionInformation

ReferenceInformation

ProvenanceInformation

ContextInformation

FixityInformation

… documents the relationships of the Content Information to its environment. This includes why the Content Information was created and how it relates to other Content Information objects.

Page 19: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1919

OAIS preservation OAIS preservation informationinformation

PreservationDescriptionInformation

ReferenceInformation

ProvenanceInformation

ContextInformation

FixityInformation

… documents the authentication mechanisms and provides authentication keys to ensure that the Content Information object has not been altered in an undocumented manner. Example: Cyclical Redundancy Check code for a file.

Page 20: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2020

MetaArchive hierarchyMetaArchive hierarchy

Archive (6Archive (6++ caches per network) caches per network)– Genre- or Format-basedGenre- or Format-based

Collections (1Collections (1++ per member) per member)– Collection level metadataCollection level metadata

Archival unit (1Archival unit (1++ per ingest) per ingest)– e.g., all ETDs for each yeare.g., all ETDs for each year

Page 21: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2121

Lots of Copies Keep Stuff Lots of Copies Keep Stuff SafeSafe

LOCKSS open-source software/support LOCKSS open-source software/support to preserve web-published materials to preserve web-published materials

decentralized digital preservation decentralized digital preservation infrastructure infrastructure

migrates content forward in time migrates content forward in time bits & bytes continually audited & bits & bytes continually audited &

repaired repaired MetaArchive members also join MetaArchive members also join

LOCKSS LOCKSS

Page 22: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2222

PrivatePrivate LOCKSS network LOCKSS network (PLN)(PLN)

PLN is a LOCKSS network deployed PLN is a LOCKSS network deployed by a set of like-minded institutions in by a set of like-minded institutions in order to preserve content in a closed order to preserve content in a closed preservation network. preservation network.

Not maintained by the Stanford Not maintained by the Stanford University-based LOCKSS staff University-based LOCKSS staff

Page 23: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2323

Manifest pageManifest page

Page 24: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2424

Archival unitArchival unit

An independent collection of content in a LOCKSS cache. Archival units are maintained as a whole by LOCKSS daemons. They are defined by the plugin and plugin parameters.

Page 25: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2525

http://dcollections.bc.edu/webclient/DeliveryManager?metadata_request=true&GET_XML=1&pid=71872

http://dcollections.bc.edu/webclient/DeliveryManager?pid=71872

Digital object and its metadataDigital object and its metadata

Page 26: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2626

Metadata xml fileMetadata xml file

Page 27: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2727

ETD (electronic ETD (electronic thesis/dissertation)thesis/dissertation)

Page 28: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2828

Plug-inPlug-in

An XML file that instructs the LOCKSS software how to ingest and preserve content.

Each cache on the network writes a plug-in for its collection, enabling other caches to replicate its content

Page 29: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2929

SecuritySecurity

Copies on different power gridsCopies on different power grids All copies not accessible to one All copies not accessible to one

personperson Each cache secure and for DDP-onlyEach cache secure and for DDP-only Security-enhanced LinuxSecurity-enhanced Linux SSL-encrypted inter-cache SSL-encrypted inter-cache

communicationcommunication IP address based Firewall exceptionsIP address based Firewall exceptions

Page 30: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 3030

For more details…For more details…

http://metaarchive.org/GDDP

Page 31: Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan donovawf@bc.edu.

25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 3131

MA regional library systems MA regional library systems

Massachusetts Networks:

CLAMS* MBLN SAILS*

NOBLE* C/W MARS* MVLC

Minuteman* OCLN