The Evolving Process to Add Preservation Support for New Formats at Harvard Library IS&T Archiving...

download The Evolving Process to Add Preservation Support for New Formats at Harvard Library IS&T Archiving 2015 Andrea Goethals. Franziska Frey and David Ackerman.

If you can't read please download the document

description

DRS “Support” Allowable in at least one DRS “content model” Repository tools “know” the format Usable now (e.g. through delivery services) Preservation staff reasonably certain it can be made usable on an ongoing basis via interventions

Transcript of The Evolving Process to Add Preservation Support for New Formats at Harvard Library IS&T Archiving...

The Evolving Process to Add Preservation Support for New Formats at Harvard Library IS&T Archiving 2015 Andrea Goethals. Franziska Frey and David Ackerman Digital Repository Service (DRS), Harvard Librarys Dig. Pres. Repository DRS Support Allowable in at least one DRS content model Repository tools know the format Usable now (e.g. through delivery services) Preservation staff reasonably certain it can be made usable on an ongoing basis via interventions Formats Supported Per Year Text XML Target Images Kodak PhotoCD GIF RealAudio Tiff JPEG JP2 ICC color profiles AIFF ESRI world files WAVSMIL playlists Web harvests GZIP containers ZIP containers PDF documents 55 Harvard Units Using the DRS Born Digital Formats in Harvard Libraries Number of Libraries (out of 21 that answered) Already have Will have in 3 years Source: HL Preservation Needs Assessment (2013) DRS Format Requests (2004 -) Chart last updated: 12/23/2013 (39 requests for 53 formats) DRS Format Requests (2004 -) Support Gap Additional audio Video Vector graphics PDF Databases E-articles Datasets DNG Word processing docs SpreadsheetsSoftware CAD Presentations 3D Models E-books E-newspapers Python notebooks Shapefiles Disk images 2008: Stop-Gap Solution Opaque objects and containers Any format, BUT... Only bit-level preservation No delivery Very coarse description Less attention by preservation staff Moderate uptake - < 20,000 Zip files Adding Format Support Old Workflow All analysis & development done in-house by existing staff concurrently with other projects / operations intermittently (requiring re-familiarization) Sometimes stalled by lack of expertise Ad-hoc, undocumented process Fast-Tracking Experiment 3 year project enabled by the Arcadia Foundation Formats: video word processing vector graphics 3D graphics disk images image stacks spreadsheets presentations Goal: create a faster format support workflow that can be repeated Analysis & Development for New Format A Analysis & Development for New Format B old way: sequential development, one after the other, in between other work Analysis for New Format A Development for New Format A new way: 1.) split into 2 sub-projects Analysis & Development for New Format A Analysis for New Format A Development for New Format A new way: 2.) hire consultants to help with analysis Analysis for New Format A Development for New Format A new way: 3. schedule independently and in parallel as expertise and resources become available Analysis for New Format B Analysis for New Format C Development for New Format B Development for New Format C Analysis for New Format D Analysis for New Format E Analysis for New Format F Development for New Format D etc... Specifications & guidelines are ready in advance for developers Format Expert Consultants AVPreserve (video, disk images, image stacks) Paul Wheatley Consulting (word processing documents, spreadsheets, presentations) Applied Informatics Group (AIG) at the College of Computing and Informatics (CCI), Drexel University (vector graphics, 3D formats) Tarkus Imaging Inc. (camera raw images) Analysis Tasks 1.Divide up analysis responsibilities 2.Determine format analysis criteria 3.Analyze formats 4.Create format profiles 5.Determine preservation strategy 6.Analyze metadata 7.Design DRS content model 8.Analyze tools Collaboration of Internal & External Experts Format groupAnalysis Tasks Divide respons. Format criteria Format analysis Format profiles Preserv. strategy Metad. analysis DRS content model Tool analysis Video Internal ExternalComboExternal Word Processing InternalComboExternal ComboExternalComboExternal 2D vector InternalComboExternal Combo External 3D formats InternalComboExternal Combo External Camera raw InternalComboExternal Combo External Image stacks InternalComboExternal ComboExternalComboExternal Disk images InternalComboExternal ComboExternalComboExternal Ex. Video Format Criteria Generic criteria, prioritized (9) Very important (ex: Dependency on a single organization or company) (9) Somewhat important (ex: standardized) (10) Not very important (ex: descriptive metadata support) (7) Format-specific criteria, examples: Ability to encode in true lossless compression Max resolution Ex. Video Format Analysis Ex. - Video Preservation Strategy Prefer several formats as archival uncompressed, JPEG 2000, MPEG-2 and DV (for DV tape) provide a video reformatting service for these Accept a few popular proprietary formats but expect to fast-track migrations for them DNxHD, ProRes Few wrapper formats (QT, MXF) One delivery format (H.264) Ex. Video Metadata Analysis Technical metadata EBU Core 1.5 (aligns well with AES-60, structure mirrors MediaInfos output) Source metadata A revised UTVideoSrc (native suitability to physical media, right amount of detail) Process history A revised reVTMD (specific, simple, sufficient) Ex. Video DRS Content Model VIDEO OBJECT = 1 Object Descriptor 1..n Video Files 0..n Video Files VIDEO OBJECT = 1 Object Descriptor 1..n Video Files 0..n Video Files 1 metadata file and 1 or more derivative video files HAS_SOURCE VIDEO OBJECT VIDEO EDIT DECISION LIST OBJECT DOUBLE SYSTEM AUDIO OBJECT DOUBLE SYSTEM AUDIO OBJECT CLOSED CAPTION DATA OBJECT SUBTITLE DATA OBJECT POSTER FRAME OBJECT DISK IMAGE OBJECT HAS_DOCUMENTATION HAS_LARGER_CONTEXT HAS_SUPPLEMENT Ex. Video Tool Analysis Incorporate MediaInfo into FITS (fitstool.info) Make FITS track-aware Models for Obtaining Format Expertise Our old model build all expertise internally (slow, inefficient) Our new model build a network of external experts to back up internal experts Other potential models Rely completely on external experts (risky?) Dig. Pres. institutions form a network of experts; declare areas of expertise (NDSA idea) Thank you!