Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill...
-
Upload
susan-dixon -
Category
Documents
-
view
215 -
download
1
Transcript of Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill...
Preserving Preserving eScholarship and eScholarship and Digitized Special Digitized Special
Collections Collections Distributed Digital Distributed Digital
PreservationPreservation
Bill DonovanBill [email protected]@bc.edu
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 22
SummarySummary
As stewards of eScholarship and digitized As stewards of eScholarship and digitized special collections, special collections, we are responsiblewe are responsible for for saving these and other treasures saving these and other treasures effectively effectively and economicallyand economically..
One approach for digital preservation is being One approach for digital preservation is being spearheaded by the spearheaded by the MetaArchive CooperativeMetaArchive Cooperative; ; collections are replicated by peer institutions collections are replicated by peer institutions to guard against loss. The MetaArchive to guard against loss. The MetaArchive approach is approach is one modelone model for cultural memory for cultural memory organizations to consider adopting/adapting organizations to consider adopting/adapting for their own use.for their own use.
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 33
Rationale for this talkRationale for this talk
Not recruiting for MetaArchive Not recruiting for MetaArchive CooperativeCooperative
DDP = a work in progressDDP = a work in progress Just one approach, but promisingJust one approach, but promising
– Adaptable for other “CMO” consortia?Adaptable for other “CMO” consortia?– Cultural memory organizations (CMOs)Cultural memory organizations (CMOs)
Perspective of just one memberPerspective of just one member Ulterior motive: convince managementUlterior motive: convince management
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 44
eScholarship@BCeScholarship@BC
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 55
Special CollectionsSpecial Collections
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 66
““Digital Preservation”Digital Preservation” defineddefined
““Digital preservation” combines Digital preservation” combines policiespolicies, , strategiesstrategies and and actionsactions that ensurethat ensure accessaccess toto digital content digital content over time.over time.
http://www.ala.org/ala/mgrps/divs/alchttp://www.ala.org/ala/mgrps/divs/alcts/resources/preserv/defdigpres0408.ts/resources/preserv/defdigpres0408.cfmcfm
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 77
DistributedDistributed Digital Preservation Digital Preservation (DDP)(DDP)
geographically dispersed sitesgeographically dispersed sites
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 88
““MetaArchive Cooperative”?MetaArchive Cooperative”?
low-cost, high-impact DDP for “CMOs”– e.g. libraries, research centers, and
museums founded in 2004; funding from:
– NDIIPP (Library of Congress)– NHPRC (National Archives)
Not vendor-based; enable CMOs to own and control the process of digital preservation for themselves.
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 99
MetaArchives’s networksMetaArchives’s networks
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1010
MetaArchive’s ETD networkMetaArchive’s ETD network
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1111
Policies & Strategy --- 1Policies & Strategy --- 1
Flat, Trim, Tight-Knit organizationFlat, Trim, Tight-Knit organization• P2P: no supermember, no host institutionP2P: no supermember, no host institution• Minimal overhead, bureaucracyMinimal overhead, bureaucracy• Emphasis on communication & collaborationEmphasis on communication & collaboration
• Committees: steering, technical, content, preservationCommittees: steering, technical, content, preservation
Self-sufficiencySelf-sufficiency• avoid outsourcing; retain controlavoid outsourcing; retain control
• cost containment, understand & refine processcost containment, understand & refine process• sustainable sources of fundingsustainable sources of funding
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1212
Policies & Strategy --- 2Policies & Strategy --- 2
Caches (dark archives)Caches (dark archives)– 6 replications6 replications– Access only via contributing memberAccess only via contributing member
Active monitoring of the integrity of Active monitoring of the integrity of stored digital content --- NOT just stored digital content --- NOT just back-upsback-ups
For ETDs, discovery via Networked For ETDs, discovery via Networked Digital Library of Theses & Digital Library of Theses & Dissertations, NDLTD Dissertations, NDLTD
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1313
Local actions/responsibilitiesLocal actions/responsibilities
Skills & infrastructureSkills & infrastructure Copyright responsibilityCopyright responsibility Data wranglingData wrangling
– Format choicesFormat choices Proprietary versus open formatsProprietary versus open formats
– Bit preservation versus migrationBit preservation versus migration
– Filenaming & directoriesFilenaming & directories Preservation information (OAIS)Preservation information (OAIS)
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1414
Adapted from: “Reference Model for an Open Archival Information System” CCSDS 650.0-B-1 (2002)
OAIS = OAIS = OOpen pen AArchival rchival IInformation nformation SSystemystem
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1515
OAIS preservation OAIS preservation informationinformation
PreservationDescriptionInformation
ReferenceInformation
ProvenanceInformation
ContextInformation
FixityInformation
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1616
OAIS preservation OAIS preservation informationinformation
PreservationDescriptionInformation
ReferenceInformation
ProvenanceInformation
ContextInformation
FixityInformation
… identifies, and if necessary describes, one or more mechanisms used to provide assigned identifiers for the Content Information. It also provides identifiers that allow outside systems to refer, unambiguously, to a particular Content Information. An example of Reference Information is an ISBN.
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1717
OAIS preservation OAIS preservation informationinformation
PreservationDescriptionInformation
ReferenceInformation
ProvenanceInformation
ContextInformation
FixityInformation
… documents the history of the Content Information. … tells the origin or source of the Content Information, any changes that may have taken place since it was originated, and who has had custody of it since it was originated. Examples of Provenance Information are the principal investigator who recorded the data, and the information concerning its storage, handling, and migration.
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1818
OAIS preservation OAIS preservation informationinformation
PreservationDescriptionInformation
ReferenceInformation
ProvenanceInformation
ContextInformation
FixityInformation
… documents the relationships of the Content Information to its environment. This includes why the Content Information was created and how it relates to other Content Information objects.
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 1919
OAIS preservation OAIS preservation informationinformation
PreservationDescriptionInformation
ReferenceInformation
ProvenanceInformation
ContextInformation
FixityInformation
… documents the authentication mechanisms and provides authentication keys to ensure that the Content Information object has not been altered in an undocumented manner. Example: Cyclical Redundancy Check code for a file.
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2020
MetaArchive hierarchyMetaArchive hierarchy
Archive (6Archive (6++ caches per network) caches per network)– Genre- or Format-basedGenre- or Format-based
Collections (1Collections (1++ per member) per member)– Collection level metadataCollection level metadata
Archival unit (1Archival unit (1++ per ingest) per ingest)– e.g., all ETDs for each yeare.g., all ETDs for each year
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2121
Lots of Copies Keep Stuff Lots of Copies Keep Stuff SafeSafe
LOCKSS open-source software/support LOCKSS open-source software/support to preserve web-published materials to preserve web-published materials
decentralized digital preservation decentralized digital preservation infrastructure infrastructure
migrates content forward in time migrates content forward in time bits & bytes continually audited & bits & bytes continually audited &
repaired repaired MetaArchive members also join MetaArchive members also join
LOCKSS LOCKSS
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2222
PrivatePrivate LOCKSS network LOCKSS network (PLN)(PLN)
PLN is a LOCKSS network deployed PLN is a LOCKSS network deployed by a set of like-minded institutions in by a set of like-minded institutions in order to preserve content in a closed order to preserve content in a closed preservation network. preservation network.
Not maintained by the Stanford Not maintained by the Stanford University-based LOCKSS staff University-based LOCKSS staff
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2323
Manifest pageManifest page
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2424
Archival unitArchival unit
An independent collection of content in a LOCKSS cache. Archival units are maintained as a whole by LOCKSS daemons. They are defined by the plugin and plugin parameters.
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2525
http://dcollections.bc.edu/webclient/DeliveryManager?metadata_request=true&GET_XML=1&pid=71872
http://dcollections.bc.edu/webclient/DeliveryManager?pid=71872
Digital object and its metadataDigital object and its metadata
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2626
Metadata xml fileMetadata xml file
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2727
ETD (electronic ETD (electronic thesis/dissertation)thesis/dissertation)
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2828
Plug-inPlug-in
An XML file that instructs the LOCKSS software how to ingest and preserve content.
Each cache on the network writes a plug-in for its collection, enabling other caches to replicate its content
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 2929
SecuritySecurity
Copies on different power gridsCopies on different power grids All copies not accessible to one All copies not accessible to one
personperson Each cache secure and for DDP-onlyEach cache secure and for DDP-only Security-enhanced LinuxSecurity-enhanced Linux SSL-encrypted inter-cache SSL-encrypted inter-cache
communicationcommunication IP address based Firewall exceptionsIP address based Firewall exceptions
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 3030
For more details…For more details…
http://metaarchive.org/GDDP
25 March 201025 March 2010 Bill Donovan Boston CollegeBill Donovan Boston College 3131
MA regional library systems MA regional library systems
Massachusetts Networks:
CLAMS* MBLN SAILS*
NOBLE* C/W MARS* MVLC
Minuteman* OCLN