Digital Preservation Best Practices: Lessons Learned From Across the Pond

153
C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A Slavko Manojlovich Associate University Librarian (IT) / Manager, Digital Archives Initiative and Benoit Pauwels Head, Library Automation Team Université Libre de Bruxelles [with input from Michael J. Bennett, Digital Projects Librarian and Institutional Repository Coordinator, University of Connecticut] Digital Preservation Best Practices Lessons Learned From Across the Pond

description

Digital Preservation Best Practices: Lessons Learned From Across the Pond. Slavko Manojlovich (Associate University Librarian (IT) / Manager, Digital Archives Initiative Memorial University St Johns Canada) and Benoit Pauwels (Head, Library Automation Team, Université libre de Bruxelles Belgium)

Transcript of Digital Preservation Best Practices: Lessons Learned From Across the Pond

Page 1: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Slavko ManojlovichAssociate University Librarian (IT) / Manager, Digital Archives

Initiativeand

Benoit PauwelsHead, Library Automation Team

Université Libre de Bruxelles

[with input from Michael J. Bennett, Digital Projects Librarian and Institutional Repository Coordinator, University of

Connecticut]

Digital Preservation Best Practices

Lessons Learned From Across the Pond

Page 2: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

What is digital preservation? Best practices information resources Open Archives Information System

(OAIS) Preservation Planning Digital Preservation in

Action(Archivematica) Digital preservation @ ULB Our issues

Outline

Page 3: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

What is digital preservation?Digital preservation is NOT digitization!!!!!!!!

Page 4: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation is the series of actions and interventions required to ensure continued and reliable access to authentic digital objects for as long as they are deemed to be of value. This encompasses not just technical activities, but also all of the strategic and organisational considerations that relate to the survival and management of digital material.

Source

What is digital preservation?

Page 5: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation is the series of actions and interventions required to ensure continued and reliable access to authentic digital objects for as long as they are deemed to be of value. This encompasses not just technical activities, but also all of the strategic and organisational considerations that relate to the survival and management of digital material.

Source

What is digital preservation?

Page 6: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Disaster recovery strategies and backup systems are not sufficient to ensure survival and access to authentic digital resources over time.

Source

What is digital preservation?

Page 7: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation includes:– Digitized analogue content (easy)– Born–digital content (more difficult)

What is digital preservation?

Text Research Data

Audio Databases

Video Container Files

Email Spreadsheets

Web Sites Software

Digital New Media Art

Page 8: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Recent example from Memorial University– Preserve faculty member’s research

outputs from 1977 – present stored in a variety of formats.

“All of the above represents a vast resource which cannot be lost from the University”.

What is digital preservation?

Access Databases Paper Files (14 filing cabinets)Excel Spreadsheets Progeny Files

Cyrillic Files Photographic Slides

JPEG Files of Testing Images PowerPoint Presentations

Web Sites Researcher’s memory

Page 9: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Best practices may not always be the best option for your organization:– British Library Microsoft Live Book Data

Project The DPT [Digital Preservation Team] have taken

the view that since the budget for hard drive storage for this project has already been allocated, it would be impractical to recommend a change in the specifics as far as file format is concerned for this project...... JPEG 2000 files compressed to 70 dB PSNR for the preservation copy.

Source

Digital Preservation Best Practices

Page 10: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Best practices may not always be the best option for your organization:– British Library Microsoft Live Book Data

Project The DPT [Digital Preservation Team] have taken

the view that since the budget for hard drive storage for this project has already been allocated, it would be impractical to recommend a change in the specifics as far as file format is concerned for this project...... JPEG 2000 files compressed to 70 dB PSNR for the preservation copy.

Source

Digital Preservation Best Practices

Page 11: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

– The National Gallery (UK) Preservation of Digital Photographs of the CollectionThe National Gallery has photographed their entire collection using a high-end digital MARC camera capable of capturing and rendering colour accuracy which is at least 5 times better than traditional photography. They have selected the proprietary raw camera output format for long-term preservation because it supports an advanced level of colour management. The company supporting the camera and associated software is very smalland is not a market leader.

Source: Site Visit to National Gallery Photography Department, April, 2010.

Digital Preservation Best Practices

Page 12: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

– The National Gallery (UK) Preservation of Digital Photographs of the CollectionThe National Gallery has photographed their entire collection using a high-end digital MARC camera capable of capturing and rendering colour accuracy which is at least 5 times better than traditional photography. They have selected the proprietary raw camera output format for long-term preservation because it supports an advanced level of colour management. The company supporting the camera and associated software is very smalland is not a market leader.

Source: Site Visit to National Gallery Photography Department, April, 2010.

Digital Preservation Best Practices

Page 13: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Eighth European Conference on Digital ArchivingGeneva, Switzerland / April 28 -30, 2010Source

Archiving 2010The Hague, Netherlands / June 1-4, 2010Note: Archiving 2011 – Salt Lake City (May, 16-19, 2011)Source

Best Practices Information Sources

Conferences

Page 14: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

OR2010: The 5th International Conference on Open RepositoriesMadrid, Spain / July 6-9, 2010Note: Or2011 – Austin, Texas (June 7-11, 2011) Source

iPRES2010: 7th International Conference on Preservation of Digital ObjectsVienna, AustriaSeptember 19-24, 2010Source

Best Practices Information Sources

Conferences

Page 15: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Preservation – The Planets WayLondon, UK / February 9, 2010 Source

Digital Futures London 2010: From digitization to delivery King’s Digital Consultancy Services (KDCS)King’s College, London, UK April 19 – 23, 2010Source

Best Practices Information Sources

Workshops

Page 16: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Preservation Management: Implementing Short-term Solutions for Long-term ProblemsCambridge, MA, USA / June 13-18, 2010Note: Albany, New York / June 5-10, 2011Source

Short digital preservation workshops are typically offered in conjunction withmost digital preservationconferences.

Best Practices Information Sources

Workshops

Page 17: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Open Planets Foundation Source

Digital Curation Centre Source

Library of Congress National Digital Information Infrastructure and Preservation ProgramSource

Best Practices Information Sources

Web Sites/Listservs/Blogs

Page 18: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

JISC Digital Preservation and Records Management Programme Source

PrestoPRIME Keeping Audiovisual Contents AliveSource

International Internet Preservation Consortium Source

Best Practices Information Sources

Web Sites/Listservs/Blogs

Page 19: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Best Practices Information Sources

Web Sites/Listservs/Blogs

Source

Page 20: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

International Journal of Digital CurationSource

ARIADNESource

D-Lib MagazineSource

Best Practices Information SourcesJournals

Page 21: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

International Journal of Digital Curation

Source

Page 22: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

International Journal of Digital Curation

Source

Page 23: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

International Journal of Digital Curation

Source

Page 24: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Best Practices Information Sources

Education

Source

Page 25: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Best Practices Information Sources

Education

Source

Page 26: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Best Practices Information Sources

Education

Source

Page 27: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Best Practices Information Sources

Education

Source

Page 28: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Best Practices Information Sources

Employment

Source

Page 29: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Developed by the Consultative Committee for Space Data Systems in 2002 and became an ISO standard in 2003 (ISO 14721:2003).148 pages of heavy reading

“Those who will implement OAIS archives or administer them on a daily basisshould read the entire document.”

Source

Open Archives Information System(OAIS)

Page 30: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

OCLC claims OAIS compliance for their “Digital Archive”.Source

Library and Archives Canada’s Trusted Digital Repository is based on OAIS.Source

National Library of the Netherlands’ e-Depot is an exemplar world classOAIS based digital repository.Source

Open Archives Information System

Page 31: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

“GPO’s world-class preservation repository [Fdsys] went live in March 2009. The repository was built upon the Open Archival Information System (OAIS) model and provides sufficient control to ensure long-term preservation and access.” Source

Open Archives Information System

Page 32: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

“The use of this reference model as the basis of any archive implementation is recommended as it allows practitioners to use common language and potentially common tools to address common problems.”

Tessella Technology & Consulting White PaperSource

Open Archives Information System

Page 33: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

OAIS Reference Model

“The use of

Source

Page 34: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

OAIS Reference Model

Source

Page 35: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

OAIS Reference Model

Source

Page 36: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

OAIS Reference Model - Actors

Source

Page 37: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

OAIS Reference Model - Objects

Source

Page 38: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

OAIS Reference Model - Actions

Source

Page 39: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Monitor designated community (consumer needs and expectations)

Monitor technology Develop preservation strategies and

standards Develop packaging designs and

migration plans

Preservation Planning

Source

Page 40: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Monitor TechnologyInternet Archive Wayback Machine

Wayback for www.unb.ca

Page 41: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Monitor TechnologyCross-Platform Access Video

Format 2005: wmv (Windows Media Video) format using Windows Media Player (or other players) for Windows and Flip4MAC Quicktime extension for Macintosh.

2005 – 2009: swf (Adobe Flash) format with Adobe flash plug-ins available for Windows and Macintosh browsers becomes the flavour of the day for web delivery of video content.

Page 42: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Monitor TechnologyCross-Platform Access Video

Format Fast forward to April, 2010: mp4 (H.264) format with players/support for Windows, Macintosh and IPAD.

IPAD does not support wmv or swf video formats.

Video conversion history: wmvswfmp4 from original DVD vobs.

DVD vob files are being preserved with agoal of converting them to MXF MotionJPEG 2000 for long-term preservation.

Page 43: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Monitor TechnologyGoogle Drops H.264 Support (Jan

11, 2011)

Source

Page 44: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Monitor TechnologyMicrosoft Adds H.264 Support (Feb

2, 2011)

Source

Page 45: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Plato: The PLANETS Preservation Planning Tool

Source

Page 46: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Plato: The PLANETS Preservation Planning Tool

Source

Developed by the PLANETS Consortium

The British LibraryThe National Library of the NetherlandsAustrian National LibraryThe Royal Library of DenmarkState and University Library, DenmarkThe National Archives of the NetherlandsThe National Archives of England, Wales and the UKSwiss Federal Archives

University of CologneUniversity of FreiburgHATII at the University of GlasgowVienna University of TechnologyThe Austrian Institute of TechnologyIBM NetherlandsMicrosoft Research LimitedTessella Plc

Page 47: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Plato: The PLANETS Preservation Planning Tool

Source

A preservation plan defines a series of preservation actions to be taken by a responsible institution due to an identified risk for a given set of digital objects or records (called collection).

The preservation plan takes into account the preservation policies, legal obligations, organisational and technical constraints, user requirements and preservation goals and describes the preservation context, the evaluated preservation strategies and the resulting decision for one strategy, including the reasoning for the decision.

It also specifies a series of steps or actions (called preservation action plan) along with responsibilities and rules and conditions for execution on the collection.

Provided that the actions and their deployment as well as the technical environment allow it, this action plan is an executable workflow definition.

Access to a library of preservation plans.

Page 48: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Plato: The PLANETS Preservation Planning Tool

Source

Page 49: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Plato: TIFF to JPEG 2000 Case Study

Source YouTube Video

Page 50: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Plato: TIFF to JPEG 2000 Case Study

Source

British Library’s 2 million newspaper pages in TIFF-5 uncompressed and high quality. File size is 40 MB/ page.

PLATO experiment compares image quality and size of TIFF-5 images converted to JPEG 2000 lossless.

Experiment results: JPEG 2000 lossless image quality is as good as TIFF-5 uncompressed and image file size is reduced by 25-30 percent. JPEG derivatives from TIFF-5 are as good as JPEG derivativesfrom JPEG 2000 lossless.

Page 51: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Planets Time Capsule

Source

Page 52: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

E-Prints: Integration of Bit-Level and Logical Preservation (New)

Source

Page 53: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

E-Prints: Integration of Bit-Level and Logical Preservation (New)

Source

Page 54: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

E-Prints: Integration of Bit-Level and Logical Preservation (New)

Source

GIF files will be migrated to PNG with the ImageMagick utility

Page 55: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

E-Prints: Integration of Bit-Level and Logical Preservation (New)

Source

Upload Plato preservation plan to E-Prints

Prescribed preservation plan action applied to each set of identified “at risk” classified files

E-Prints creates provenance metadata for all preservation actions (i.e. File was migrated from “file format A” to “file format B” on this date according to preservationplan NNN).

Page 56: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Sample Media Type Preservation Plan

Source

Page 57: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Trustworthy Repositories Audit & Certification (TRAC) Checklist

Source

Page 58: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Trustworthy Repositories Audit & Certification (TRAC) Checklist

Source

Page 59: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Trustworthy Repositories Audit & Certification (TRAC) Checklist

Source

Page 60: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Trustworthy Repositories Audit & Certification (TRAC) Checklist

Source

1. The repository commits to continuing maintenance of digital objects for identified community/communities.

2. Demonstrates organizational fitness (including financial, staffing structure, and processes) to fulfill its commitment.

Page 61: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Trustworthy Repositories Audit & Certification (TRAC) Checklist

Source

3. Acquires and maintains requisite contractual and legal rights and fulfills responsibilities.

4. Has an effective and efficient policy framework.

5. Acquires and ingests digital objects based upon stated criteria that correspond to its commitments and capabilities.

Page 62: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Trustworthy Repositories Audit & Certification (TRAC) Checklist

Source

6. Maintains/ensures the integrity, authenticity and usability of digital objects it holds over time.

7. Creates and maintains requisite metadata about actions taken on digital objects during preservation as well as about the relevant production, access support, and usage process contexts beforepreservation.

Page 63: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Trustworthy Repositories Audit & Certification (TRAC) Checklist

Source

8. Fulfills requisite dissemination requirements.

9. Has a strategic program for preservation planning and action.

10.Has technical infrastructure adequate to continuing maintenance and security of its digital objects.

Complete TRAC Document

Page 64: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Micro-Services“Micro-services are an approach to digital curation

based ondevolving curation function into a set of independent,

butinteroperable, services that embody curation values

and strategies.Since each of the services is small and self-contained,

they arecollectively easier to develop, deploy, maintain, and

enhance.Equally as important, they are more easily replaced

when they haveoutlived their usefulness. Although the individual

services arenarrowly scoped, the complex function needed for

effectivecuration emerges from the strategic combination ofindividual services.”

Source

Page 65: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica http://archivematica.org is an open source software toolkit that takes the OAIS model and turns its various conceptual entities into actionable functionalities.

Take SIPs and turn them into AIPs and DIPs.

In v. 0.7 alpha this is accomplished through a Unix pipeline design which makes use of various open-source utilities toperform designated actions.

Digital Preservation in Action Archivematica

(version 0.7 alpha)

Page 66: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Open source software developed by Artefactual Systems (Vancouver, Canada)

Development partners include:–UNESCO Memory of the World

Programme– International Monetary Fund– Vancouver City Archives–University of British Columbia–University of Virginia (Rubymatica)–Many alpha installations

Digital Preservation in Action Archivematica

(version 0.7 alpha)

Page 67: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica & OAISSIP > AIP > DIP

Source

Page 69: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica & OAISCuration Micro-services

1. Receive SIP1. verifyChecksum

2. Review SIP1. extractPackage2. assignIdentifier3. parseManifest4. cleanFilename

Source

Page 70: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica & OAISCuration Micro-services

3. Quarantine SIP1. lockAccess2. virusCheck

4. Appraise SIP1. identifyFormat2. validateFormat3. extractMetadata4. decidePreservationAction

Source

Page 71: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica & OAISCuration Micro-services

5. Prepare AIP1. gatherMetadata2. normalizeFiles3. createPackage

6. Review AIP1. decideStorageAction

Source

Page 72: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica & OAISCuration Micro-services

7. Store AIP1. writePackage2. replicatePackage3. auditfixity4. readPackage5. updatePackage

8. Provide DIP1. uploadPackage2. updateMetadata

Source

Page 73: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica & OAISCuration Micro-services

9. Monitor Preservation1. checkFormatRegistry2. updatePreservationPlanPolicies3. migrateFormat4. synchronizeAIPsandDIPs

Source

Page 74: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools Pronom File Format Registry

PRONOM is a resource for anyone requiring impartial and definitive information about the 320+ file formats, software products and other technical components required to support long-term access to electronic records and other digital objects of cultural, historical or business value. It is maintained by The National Archive(UK). Source

Source

Page 75: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools Pronom File Format Registry (Excel

2.1)

Source

Page 76: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools Pronom File Format Registry (Excel

2.1)

Source

Page 77: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (Developed by Harvard

University)– The File Information Tool Set (FITS)

identifies, validates, and extracts technical metadata for various file formats. It wraps several third-party open source tools, normalizes and consolidates their output, and reports any errors.

– Current tools are: Jhove, Exiftool, National Library of New Zealand Metadata Extractor, DROID, FFIdent, File Utility, Fileinfo andXMLMetadata.

Source

Page 78: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (Developed by Harvard

University)– File identification using DROID– File validation using Jhove–Metadata extraction using NZ Metadata

Extractor–Metadata normalization and

consolidation using XMLMetadata

Source

Page 79: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (Developed by Harvard

University)– All digital file formats are not supported

by every tool as illustrated in the latest FITS release notes: Improved support for audio formats Better identification of JP2 and JPx images Improved identification of EXIF and JFIF

JPEGs Fixed DROID format output for SVG files

Source

Page 80: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (DROID Tool – file identification)

– DROID (Digital Record Object Identification) uses internal and external signatures, maintained in the PRONOM technical registry, to identify and report the specific file format versions of digital files.

Source

Page 81: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (JHOVE Tool – file identification,

validation and characterization)– File identification as per DROID– File validation

A file is well-formed if it meets the purely syntactic requirements for a format.For example, a TIFF object is well-formed if it starts with an 8 byte header followed by a sequence of Image File Directories (IFDs), each composed of a 2 byteentry count and a series of 8 byte taggedentries.

Source

Page 82: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (JHOVE Tool – file identification,

validation and characterization)– File validation (continued)

A well-formed file is also valid if it meets additional semantic level requirements.For example, an RGB file must have at least three sample values per pixel.

Source

Page 83: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (JHOVE Tool – file identification,

validation and characterization)– File characterization

The process of determining the format-specific significant properties of an object of a given format.– JHOVE can report the file pathname or URI, last

modification date, byte size, format, format version, MIME type, format profiles and, optionally, a checksum.

Source

Page 84: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (JHOVE Tool – sample output)

Source

Page 85: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (JHOVE Tool – supported file

formats)

Source

AIFF ASCII BYTESTREAM GIF HTML JPEG JPEG 2000 PDF TIFF UTF-8 WAVE XML

Page 86: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (New Zealand Metadata

Extraction Tool)– Automatically extracts preservation-

related metadata from digital files.– Supported file formats:

Images: BMP, GIF, JPEG and TIFF. Office documents: MS Word (version 2, 6),

Word Perfect, Open Office (version 1), MS Works, MS Excel, MS PowerPoint, and PDF.

Audio and Video: WAV and MP3. Markup languages: HTML and XML.

Source

Page 87: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools FITS (New Zealand Metadata

Extraction Tool)– Potential metadata elements which can be

extracted from an audio file header include: Resolution Duration Bitrate Compression Encapsulation Channels

Source

Page 88: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools BagIt

A specification for the packaging of digital content for transfer. Content is packaged (the bag) along with a small amount of machine-readable text (the tag) to help automate the content's receipt, storage and retrieval. There is no software to install. A bag consists of a base directory containing the tag and a subdirectory that holds the content files. The tag is a simple text-file manifest, like a packing slip, that consists of two elements:– An inventory of the content files in the bag– A checksum for each file

Source

Source

Page 89: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools BagIt: bag directory contents

/6‐1999‐06‐07bagit.txtbag‐info.txtmanifest‐md5.txt/data6‐1999‐06‐07.tif6‐1999‐06‐07_general_metadata.xml6‐1999‐06‐07_technical_metadata.xml

Source

Source

Page 90: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools

BagIt: bagit.txtBagIt‐Version: 0.96Tag‐File‐Character‐Encoding: UTF‐8

Source

Page 91: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools

BagIt: bag‐info.txtSource‐organization: Simon Fraser University LibraryOrganization‐URL: http://www.lib.sfu.caBagging‐Date: 2009‐06‐26External‐Description: TIFF master files and associated metadata for item 6‐1999‐06‐07 in the SFU Editorial Cartoons Collection.

Source

Page 92: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software Tools

BagIt: manifest‐md5.txt91a6ce58ad2628b81c46c034d434816f data/6‐ 1999‐06‐07.tif8c2712026f0f54c4ad156674e87f573b data/6‐1999‐06‐07_general_metadata.xml28fa197bbfd61e4da0f6119ed7420bff data/6‐ 1999‐06‐07_technical_metadata.xml

Source

Page 93: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software ToolsBagIt: 1999 06 07.tif‐ ‐

Ingrid Rice, June 7, 1999

Source

Page 94: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software ToolsBagIt: General metadata file

Source

Page 95: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital Curation Software ToolsBagIt: Technical metadata file

/6 1999 06 07‐ ‐ ‐bagit.txt bag info.txt ‐manifest md5.txt‐/data6 1999 06 07.tif‐ ‐ ‐6 1999 06 07_general_metadata.xml‐ ‐ ‐6 1999 06 07_technical_metadata.xml‐ ‐ ‐

Source

Page 96: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

DSpace 1.7 (New Features) AIP Backup and Restore– Outputs metadata and bitstreams into

zipped self-contained Archival Information Packages which can be loaded into another instance of DSpace or another institutional respository platform (Fedora, CONTENTdm, etc.)

– DSpace AIPs can function as SIPs or DIPs.– Possible to load Archivematica AIPs into

DSpace.Source

Page 97: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

DSpace 1.7 (New Features) Curation System– Infrastructure to support the

implementation of digital curation micro-services for the long-term preservation of your DSpace content.

– Initial Services include: Bitstream format profiler: examines all the

bitstreams and generates a count and support level for each type of bitstream format. Useful tool for format migration. Note: this is not identifying and validating bitstreams.

Required metadata: checks to see if requiredmetadata is present in all records.

Virus scan: Virus check using ClamAV tool.

Source

Page 98: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Objectives– Show complete process of

ingest/archival/dissemination chain for one SIP

– Our demo SIP contains object files of various image formats: TIFF, BMP, SVG, PNG, JP2, EPS, GIF, JPG, TGA

– Check contents of ArchiveMatica SIP, throughout the process, as it transforms into a self-contained AIP and DIP

Page 100: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Archivematica Release 0.7 AlphaYouTube Video 1 and 2, along with step by step instructions.

Page 101: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Boot your PC with the bootable Archivematica DVD. Login as: demo Password: demo You see the File Manager

– Shortcuts– Directories used through the archiving process

Imagine you’re an archivist and you have a set of object files sitting in demo/testFiles– structured into a number of directories– each directory corresponds to a logical unit

of resources, be it a distinctive item or a complete fonds

– each directory in testFiles = one SIP You could also drag/drop, copy/paste from

USB stick

Page 102: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Launch dashboard and resize so that it can be viewed as you navigate through the Archivematica processes.– FireFox: uncheck File/Work Offline

Web-based administration for the archivist– Tracks various stages of the archival process

(In this demo setup of ) ArchiveMatica manual approval is required from archivist at various stages in the process:– we’ll have a look at contents of SIP, AIP and

DIP at each of these stages

Page 103: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

ArchiveMatica-SIP

Folder structure, containing metadata, checksums, object files– logs– logs/fileMeta– metadata: checksum and descriptive metadata– objects: digital objects to be preserved

Content changes as SIP is moved through the different stages of the archiving process

Demo SIP = ImagesSIP directory

Page 104: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Start the archival process –Drap and drop the ImagesSIP directory

into the receiveSIP watched directory– Rename the SIP

The SIP appears in the DashBoard

Page 105: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

First approval: appraise SIP for submission

click on Micro-Services to look at actions performed by ArchiveMatica so far– SIP backup, SIP compliant, assign UUIDs (package and

object files), check delivered checksums (if any delivered) click on Browse to see contents of SIP at this

stage– logs/fileUUIDs.log– logs/fileMeta/*.xml

for each object file: PREMIS-formatted metadata file name, uuid, sha256 hash events that occurred on the object file

Page 106: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

First approval: appraise SIP for submission

submitted SIP should be in accordance with institution’s submission agreements

delete any unwanted files or directories File Manager/appraiseSIPForSubmission

add descriptive metadata about the SIP in metadata/dublincore.xml

click on Approve

Page 107: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

SIP quarantined

SIP is placed in quarantine for virus checking Why quarantine? – Give ClamAV a chance to pick up the latest

version of its virus database How long?– demo: preset to one minute– National Archives of Australia: 1 month– archivist can manually remove SIP from

quarantine

Page 108: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Second approval: appraise SIP for preservation

zipped/tarred/… files are extracted check directory and file names scan for viruses using FITS:– identify and validate format of object

files– extract technical metadata – PREMIS

Page 109: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Second approval: appraise SIP for preservation

logs/clamAVScan.txt: report on virus checking logs/extraction.log: report on extracted zip logs/fileMeta/*.xml: augmented PREMIS-

formatted metadata– format designation (PRONOM PUID identifier)– events– technical metadata

Page 110: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Second approval: appraise SIP for preservation

technical metadata: object characteristics <fits_output> XML formatted metadata – <fits/identification>– <fits/fileinfo>– <fits/filestatus>: well-formed / valid– <fits/metadata>: technical metadata of object– <fits/toolOutput>: output results of used tools

Jhove, File Utility, Exiftool, Droid, NLNZ Metadata Extractor, ffidentFile Information, XML Metadata

Page 111: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Second approval: appraise SIP for preservation

delete any unwanted files or directories from the SIP FileManager/appraiseSIPForPreservation

click on Approve ArchiveMatica now creates an AIP and a

DIP for this SIP– normalization based on format identified

Page 112: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Third approval: push AIP to archival storage

storeAIP contains one zip file for the AIP containing a bag (according BagIt specs)

Click on Browse next to Store AIP micro-service

Look in the bag

Page 113: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

ArchiveMatica-AIP

data/– logs/normalizationLog.txt

– metadata: the dublincore.xml– checksum.sha256 for the AIP– objects: all original formats + preservation

formats– METS.xml: METS XML container with

structural, descriptive, administrative metadata of AIP

Page 114: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Building a METS Document:The Framework

<METS:mets>

<METS:metsHdr /> Header

<METS:dmdSec /> Descriptive MD

<METS:amdSec /> Administrative MD

<METS:fileSec /> File list

<METS:structMap /> Structural Map

<METS:behaviorSec /> Behavior Section

</METS:mets>

Source

Page 115: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Source

METS Diagrammed

structMap

div

fileSec

fileGrp

file

amdSec

techMDsourceMD

digiprovMDrightsMD

dmdSec

dmdSec

Content

Administrative Md

Structure

Descriptive Md

behaviorSec

behaviorSec

Behavior

Page 116: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

ArchiveMatica-AIP / METS.xml

<structMap>: structure of the AIP <fileSec>: list of files included in the AIP <dmdSec>: descriptive metadata for the AIP (the

dublincore.xml) <amdSec>: administrative metadata

– <digiprovMD>: PREMIS-formatted digital provenance metadata

most of it is grabbed from the logs/fileMeta files object identification and characteristics events agents relation between original and preservation copies

Page 117: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Third approval: push AIP to archival storage

If wanted, check contents of the AIP : you are not able to make any changes though in an AIP

click on Approve AIP is pushed into archival storage– our demo setup: the AIPsStore directory– real life: cloud storage, Amazon S3,

your own network storage device, CLOCKSS, …

Page 118: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

Fourth approval: upload DIP to public access system

directory created for this DIP under uploadDIP– objects: normalized access copies of the object files – objectsBackup: idem– METS.xml: identical as in the AIP

If wanted, check and change contents of the DIP File Manager / uploadDIP

click on Approve removed from SIPbackups copied to DIPbackups our demo setup: DIP is pushed towards an

ICA-Atom public access system

Page 119: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Archivematica 0.7 Alpha Demo

ICA-AtoM public access system

Fully web-based archival description application based on International Council on Archives standards

AtoM = Access to Memory Point Firefox to http://localhost/ica-atom Uploaded DIPs are by default in draft. Change status to

‘published’ for these to become visible in public access Log in: [email protected] / demo Choose from archival descriptions Edit: change publication status to ‘published’ Log out Selected archive is now publicly visible

Page 120: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB Context: multiple digital archives– DI-pot

All academic output (except PhD theses) Most digital born / some digitized by library

staff Self-submission by academic staff Extensively modified DSpace 1.4.2

– Metadata granularity – Semi-automated metadata ingest from PubMed,

Scopus, Web of Science, BibTex and RIS files– Integrated with central administration databases

(staff, departments, controlled vocabulary, ...) 55K descriptions 8K full-text [ PDF ]

Page 121: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB Context: multiple digital archives– Bictel

PhD theses (since 2004) Most digital born / some digitized by library

staff Self-submission, with some support from

faculty staff ETD software from Virginia Tech Metadata per object file: access restrictions,

deposit dates, mime type, location 1300 descriptions Typically multiple object files per thesis

[ PDF ]

Page 122: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB Context: multiple digital archives– Iconothèque

Audiovisual material as support for courses Most digital born / some digitized by faculty

staff Self-submission by faculty staff ContentDM 5.4 12K descriptions [ JPEG ]

Page 123: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB Context: multiple digital archives– Digithèque

Out of print / public domain books and journals Digitized by library staff Submission by library staff Symphony + file system (available over SMB,

HTTP) 100K pages / 344 publications [ TIFF + PDF ]

Page 124: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB Context: multiple digital archives–Near future: archives of ULB

(our ISADG enabled) DSpace

Page 125: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB All our digital archives :– Talk OAI-PMH– Expose identical exchange format

Based on MPEG21-DIDL Compound object of item and associated object

files– “Globally unique persistent identifier” (GUPI) for

item and each object file– Descriptive metadata for item expressed in MODS– Metadata for object files: descriptive, version, access

restrictions, deposit /embargo dates, mime type, location

Page 126: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB DIDL[1]

Item[1]

Descriptor/Identifier (persistent identifier)

Item[1..∞] (of type descriptiveMetadata)

Descriptor/type (« descriptiveMetadata »)

Component/Resource -- representation by value (XML)

Item[0..∞] (of type objectFile)

Component/Resource -- representation by ref. (URL)

Descriptor/modified

Descriptor/Identifier (persistent identifier)

Descriptor/modified

Descriptor/type (« objectFile »)

Descriptor/Identifier (persistent identifier)

Descriptor/modified

Item[0..1] (of type humanStartPage)

Component/Resource -- representation by ref. (URL)

Descriptor/type (« humanStartPage »)

Page 127: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB One dissemination platform– SAMBURU: harvest and index

DIDL records are harvested from the digital archives

DIDL record is stored as-is in MySQL database DIDL record is transformed into SOLR

document and stored in Lucene indexes

– DI-fusion: web portal Based on VuFind Search/retrieve records through SOLR Use XSLT to transform DIDL into HTML Additional 2.0 functionality with AJAX

technology

Page 128: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULBSamburu

Har

vest

er MySQL

Metadata Store In

dexe

r

Lucene indexes SO

LR

DI-fusion web

portal

DI-pot

BicTel

Icono

Digi

UMons

OAI

-PM

H

Metadata Enrichment O

AI-P

MH

OA

I-PM

H

Page 129: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB

Page 130: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB

Page 131: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB

Page 132: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB Enrichment process– Fetch DIDL records from SAMBURU md

store+ Fetch object files (in function of enrichment type)

– Calculate enrichment and create DIDL formatted enrichment record

–Make enrichment record available over OAI-PMH

– SAMBURU harvests and merges original DIDL record with enrichment DIDL record, before re-indexing into Lucene

– End user sees enrichment through DI-fusion

Page 133: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB Enrichment: 3 prototype setups

1. Enrichment service at Erasmus University in Rotterdam fetches publications in economics from md store, and determines JEL classification codes based on text analysis

2. Enrichment service @ ULB extracts texts from PDFs and indexes on all words. DI-fusion permits end user to do a full-text search

3. Enrichment service @ ULB enriches with

JCR impact factors (based on ISSN and publication year)

Page 134: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB Back to digital preservation– SUBMISSION

metadata and object files (through 4 submission interfaces)

– DISSEMINATION through DI-fusion

– ARCHIVAL we need a PAS: “Perpetual Archiving System” based on the idea of enrichment

Page 135: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULBSamburu

Har

vest

er MySQL

Metadata Store In

dexe

r

Lucene indexes SO

LR

DI-fusion web

portal

DI-pot

BicTel

Icono

Digi

UMons

OAI

-PM

H

PAS

OA

I-PM

H

OA

I-PM

H

SIPs AIPs DIPs

LOCKSSAdmin

Page 136: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB PAS-SIP– Retrieve DIDL records over OAI-PMH from

SAMBURU metadata store– Fetch object files, based on references included in

the DIDL record– Make and store ArchiveMatica-SIP– Alternative to OAI-PMH + web grabbing:

Prepare ArchiveMatica-SIPs on a network-attached filesystem

More practical for bulk ingest into AM: less network traffic

We would probably try a combined approach: bulk + incremental

– Specific package information registered in PAS-Admin

Page 137: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB PAS-AIP– Use ArchiveMatica micro-services to

create and store ArchiveMatica-AIP, according to media type preservation plan

– Fully automated, at least for certain media types (PDF, JPEG, TIFF)

– Update package information in PAS-Admin

Page 138: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB PAS-DIP– Use ArchiveMatica micro-services to

create and store ArchiveMatica-DIP, according to media type preservation plan

– DIPped object files made available through web service

– Update package information in PAS-Admin

Page 139: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB PAS-Admin– Digital preservation status of packages

information accessible over a web service:

Original digital archive wants to find out archival status of its items, based on gupi of item or object file

– End user accesses DIPped object files through web service: not publicly available since dependent on accessibility restrictions set by IPR owner in original digital archive

– AIPs are pushed into outer preservation space, e.g. LOCKSS + registered as suchin PAS-Admin

Page 140: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB PAS-Admin– Throughout SIP/AIP/DIP processing,

relevant information should be registered about the packages in a db

– For each SIP, AIP, DIP: (I) gupi of item and all object files uuid of package (I) identifier of original digital archive (I) date of creation/modification

Page 141: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Digital preservation @ ULB PAS-Admin– relevant metadata of DIPs are made

available as DIDL-structured (enrichment) records over OAI-PMH for SAMBURU to pick up

Parse/extract from METS.xml:– Essentially mime type and location

– sum of original metadata and PAS-created metadata is available to DI-fusion

– DI-fusion could for example decide to only show DIP version of an object file, and inform end user of the existence of the original object file format

Page 142: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Open Discussion

Alternative options for integrating Archivematica or a subset of digital curation micro-services into your

digitization workflow.

Page 143: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Issues

Institutional repositories are also used to maintain an institution’s bibliography, with frequent updates of descriptive metadata and object files.

When should digital objects from an IR be preserved?

Page 144: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Issues

Dappert, A. & Enders M. Using METS, PREMIS and MODS for archiving eJournalsD-Lib Magazine Volume 14 Number (9/10)http://www.dlib.org/dlib/september08/dappert/09dappert.html

“AIP per generation” generation: change in md and/or object file

Page 145: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Issues

Both ArchiveMatica and LOCKSS are looking into solutions for the normalization of objects and packaging. Both systems seem redundant at first.

How does ArchiveMatica interact with LOCKSS?

Page 146: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Issues

ArchiveMatica-AIPs, DSpace-AIPs, exchange of packages between digital archives, nationwide preservation solution.

Need for interoperability standards?

– TIPR: Towards Interoperable Preservation

Respositories– RXP: Repository eXchange Package

Page 147: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

AIP Repository Interoperability

“For reasons of redundancy, succession planning and software migration, repositories must be able to exchange copies of archival information packages with each other. Every different repository application, however, describes and structures its archival packages differently. Therefore each system produces dissemination packages that are rarely understandable or usable as submission packages by other repositories. “

Source

Page 148: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

AIP Repository Interoperability One possible solution: RXP (Repository

eXchange Package), developed by the Towards Interoperable Preservation Repositories (TIPR) project which has defined a standards-based package of metadata files that can act as an intermediary information package, the RXP, a lingua franca all repositories can read and write.

Another option: create AIPS followingthe HathiTrust specification for digital objects.

Source

Source

Page 149: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Issues AIPs are intended for perpetual access

and therefore only contain objects that comply to an open documented format. Any human being within 50 years should be able to re-read the contents of the object files, given a textual documentation.

So, why migrate AIPs into a new(er) format?

Page 150: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Issues

Archivematica normalizes moving pictures into MPEG2 = loss of quality

Lossless conversion would be Motion JPEG2000

However: no open-source CLI-based tool for conversion into Motion JPEG2000 format available

Page 151: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Issues

The more copies of a digital object are stored all over the place, the less trivial becomes control of copyright.

Is geo-independent perpetual archiving in contradiction with IPR issues?

Page 152: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Issues

Packages are self-contained: if you find an AIP, you know what it is about, and you can read, look, hear it. But how do you find the AIP in a see of billions of AIPs?

Don’t forget to preserve finding aids! How?

Page 153: Digital Preservation Best Practices: Lessons Learned From Across the Pond

C O S U G I 2 0 1 1 P H O E N I X, A R I Z O N A

Slavko ManojlovichAssociate University Librarian (IT)

Manager, Digital Archives InitiativeMemorial University of Newfoundland, St. John’s

[email protected]&

Benoit PauwelsHead, Library Automation Team

Université Libre de Bruxelles [email protected]

Contact

*This presentation may be downloaded at: http://dl.dropbox.com/u/18652253/phoenix%20presentation.pptx