Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field,...

9
Digital Preservation: Logical and bit-stream preservation using Plato and Eprints Physical preservation with Eprints: 2 File Formats and Risk Analysis Hannes Kulovits Andreas Rauber David Tarrant Adam Field Department of Software Technology and Interactive Systems School of Electronics and Computer Science Vienna University of Technology [email protected] [email protected] University of Southampton, UK [email protected] [email protected]

description

This presentation, part of an extensive practical tutorial on logical and bit-stream preservation using Plato (a preservation planning tool) and EPrints (software for creating digital repositories), places the process of managing formats and risk analysis in the EPrints repository interface. The presentation was given as part of module 4 of a 5-module course on digital preservation tools for repository managers, presented by the JISC KeepIt project. For more on this and other presentations in this course look for the tag ’KeepIt course’ in the project blog http://blogs.ecs.soton.ac.uk/keepit/

Transcript of Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field,...

Page 1: Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field, Hannes Kulovits and Andreas Rauber

Digital Preservation: Logical and bit-stream preservation using

Plato and EprintsPhysical preservation with Eprints: 2 File Formats

and Risk Analysis

Hannes KulovitsAndreas Rauber

David TarrantAdam Field

Department of Software Technology and Interactive Systems

School of Electronics and Computer Science

Vienna University of [email protected]@ifs.tuwien.ac.at

University of Southampton, [email protected]@ecs.soton.ac.uk

Page 2: Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field, Hannes Kulovits and Andreas Rauber

Preservation - CheckPreservation - Check

Preservation - AnalysePreservation - Analyse

Preservation - ActionPreservation - Action

• Bit checking & checksum calculation

• What is the type of file, is the file valid?• Is the file at risk of not having an editor/reader?• Is there a better format available? Lossless or Lossy?

• File migration to avert risks found by analysis.• Movement of file to new storage.

The Preservation Process

Page 3: Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field, Hannes Kulovits and Andreas Rauber

Preservation - AnalysePreservation - Analyse

• What is the type of file, is the file valid?• Droid is a good classification tool for this.

• Is the file at risk of not having an editor/reader?• Functionality is being developed in PRONOM technical registry.

• Is there a better format available? Lossless or Lossy?.

Analysis

Page 4: Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field, Hannes Kulovits and Andreas Rauber

Preservation - AnalysePreservation - Analyse EPrints File Classification

File Format Analysis

Page 5: Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field, Hannes Kulovits and Andreas Rauber

Preservation - AnalysePreservation - Analyse

• Is the file at risk of not having an editor/reader?• Functionality is being developed in PRONOM technical registry.

• Simple SOAP web service

• Takes file format identification id’s, hands back risk score. • Breakdown of risk score may also be available in future releases.

• A stub you can download and run providing this functionality before the official release with mock up risk scores is available at http://preserv2.googlecode.com

Risk Analysis

Page 6: Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field, Hannes Kulovits and Andreas Rauber

Preservation - AnalysePreservation - Analyse EPrints File Classification + Risk Analysis

Risk AnalysisRisk Analysis In EPrints

Page 7: Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field, Hannes Kulovits and Andreas Rauber

Preservation - AnalysePreservation - Analyse EPrints File Classification + Risk Analysis

Risk AnalysisRisk Analysis In EPrints Detail View

Page 8: Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field, Hannes Kulovits and Andreas Rauber

Preservation - ActionPreservation - Action Mock up Transformation Interface

Transformation?

Tool Preservation Level

PPT -> PPTX

PPT -> PDF

Migration Tools

Risk Analysis In EPrints Migration?

Page 9: Physical preservation with EPrints: 2 File Formats and Risk Analysis, by David Tarrant, Adam Field, Hannes Kulovits and Andreas Rauber

Preservation - CheckPreservation - Check

Preservation - AnalysePreservation - Analyse

Preservation - ActionPreservation - Action

• Handled by our storage manager and reported back via the preservation interface.

• Parallels can be drawn with storage, in that we are integrating with and utilising currently available services to perform our analysis.• Processing of the results leads to a powerful interface which tells us many things about the repository ecosystem and it’s future.

• Future plan is to utilise further web based services to ensure information remains comprehensive and up to date set, 0day digital preservation.

Recap