Chapter 12 - Basic Preservation Strategies
-
Upload
foveros-foveridis -
Category
Documents
-
view
212 -
download
0
Transcript of Chapter 12 - Basic Preservation Strategies
-
7/31/2019 Chapter 12 - Basic Preservation Strategies
1/6
Chapter 12
Basic Preservation Strategies
Strategy without tactics is the slowest route to victory. Tactics without strategy is the
noise before defeat.
(Sun Tzu)
There are a number of basic preservation strategies upon which one can build more
complex strategies. These are the ones which are described explicitly or implicitly
by OAIS, based around ensuring that the digital object will be usable and under-
standable to the Designated Community. Of course one also has to maintain the
trail of information to support evidence of authenticity and other PDI.
Many publications on digital preservation say that the available strategies may
be summed up in the phrase emulate or migrate. We show here that this is
inadequate.
OAIS discusses some important aspects of information preservation as follows.
The fast-changing nature of the computer industry and the ephemeral nature of
electronic data storage media are at odds with the key purpose of an OAIS: to pre-
serve information over a long period of time. No matter how well an OAIS maintains
its current holdings, it will eventually need to migrate much of its holdings to dif-
ferent media (which may or may not involve changing the bit sequences) and/or
to a different hardware or software environment to keep them accessible. Todaysdigital data storage media can typically be kept at most a few decades before the
probability of irreversible loss of data becomes too high to ignore. Further, the rapid
pace of technology evolution makes many systems much less cost-effective after
only a few years. In addition to the technology changes there will be changes to the
Knowledge Base of the Designated Community which will affect the Representation
Information needed.
There are a number of fundamental approaches to information preservation. In
the first the Content Data Object remains in its original form, and access and use is
achieved by providing adequate descriptions of the digital encoding with Structureand Semantic Representation Information; in some cases the original access and
use mechanisms are adequate, in which case software emulation (using Other
Representation Information) may be useful, although this tends to limit the ways
197D. Giaretta, Advanced Digital Preservation, DOI 10.1007/978-3-642-16809-3_12,C Springer-Verlag Berlin Heidelberg 2011
-
7/31/2019 Chapter 12 - Basic Preservation Strategies
2/6
198 12 Basic Preservation Strategies
in which the Content Data Object may be used. One advantage of leaving the bit
sequences unchanged is that evidence of Authenticity is more easily sustained.
Alternatively the object may be changed into one that can be processed with
contemporary access and use mechanisms. This is referred to in OAIS as a
Transformation, a type of Migration, which is discussed below. There are impli-cations for Authenticity which are discussed in Chap. 13, particularly Sect. 13.6.2.
The following matrix shows the various combinations of these alternatives.
Content data object unchanged Content data object changed
Access service
unchanged
If using the original software
executable: emulation
If using the original source code:rebuild executable
Re-implement access service
Access service
changed
Implement new access services based
on the representation information
describing original content data
object
Implement new access services
based on the representation
describing the new content data
object
12.1 Description Adding Representation Information
As should be clear from the discussion in earlier chapters it is necessary to maintainthe Representation Network so that it is adequate for a member of the Designated
Community to continue to understand and use the digital object. However things
change over time and so the Representation Network must be altered appropriately.
In order to do this the techniques extensively discussed in Chap. 8
to identify any potential gaps in the Representation Network can be
used. Practical ways of doing this are described in detail in Chap. 16
and illustrated in Part II.
This approach allows the greatest flexibility because one has the ability to dis-
cover entirely new ways of looking at the digital objects, however whilst it can be
the most rewarding, it can also be the most difficult.
12.2 Maintaining Access
An alternative to using description is to maintain the current ways of accessing the
digital object, and OAIS discusses several ways of doing this. One can think of thisin terms of interfaces, either programmatic or user interfaces. In addition hardware
emulation can be viewed as doing essentially the same thing but this deserves the
more extensive discussion given in Sect. 7.9, although another type of emulation is
described below.
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/31/2019 Chapter 12 - Basic Preservation Strategies
3/6
12.2 Maintaining Access 199
12.2.1 Access and Use Services
OAIS discusses maintaining the Dissemination API in order to continue to support
applications which the Designated Community uses to access and use the digital
object. This is closely related to the ideas of virtualisation discussed in Sect. 7.8.The virtualisation approach has the advantage that it facilitates the ability of the
Designated Community to be able to use their favourite applications to access and
use the digital object. This can be consistent with maintaining the Dissemination
API by means of appropriate software wrappers. A number of options are discussed
in some detail in Chap. 9.
12.2.2 Access Software Look and Feel
This option focuses on the assumption that the Designated Community wishes to
maintain the original look and feel of the Content Information of a set of AIUs as
presented by a specified application or set of applications. Discussion of hardware
emulation, which provides the ultimate maintenance of look and feel is provided in
Sect. 7.9. Conceptually, the OAIS provides (i.e. makes available/points to) a soft-
ware environment that allows the Consumer to view the AIUs Content Information
through the applications transformation and presentation capabilities. For example,
there may be a desire to use a particular application that extracts data from an ISO
9660 CD-ROM and presents it as a multi-spectral image. This application runs undera particular operating system, requires a set of control information and use of a CD-
ROM reading device, and presents the information to driver software for a particular
display device. In some cases this application may be so pervasive that all members
of the Designated Community have access to the environment and the OAIS merely
designates the Content Data Object to be the bit string used by the application.
Alternatively, an OAIS may supply (as Representation Information) such an envi-
ronment, including the Access Software application, when the environment is less
readily available. However, as the OAIS and/or the Designated Community moves
to new computing environments, at some point the application will cease to func-tion or will function incorrectly. At such a point Transformation will become an
attractive option.
12.2.2.1 Emulation of Look and Feel the Hard Way
It is worth discussing in a little more detail another way of maintaining look and
feel when, for example the compiled version of the application or libraries it depends
upon, are not available, nor is the source code. The term emulation may be applied to
this technique since emulation may be defined as the ability of a computer program
or electronic device to imitate another program or device [79].
The OAIS may, despite the drawbacks, consider emulation for the access applica-
tion in the following way. If the application provides a well-known set of operations
and a well-defined API for access, the API could be adequately documented and
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?- -
7/31/2019 Chapter 12 - Basic Preservation Strategies
4/6
200 12 Basic Preservation Strategies
tested to attempt an emulation of that application. However, if the consumer inter-
face is primarily one of display or other devices which affect human senses (e.g.,
sound), this reverse engineering becomes nearly impossible, because it may not be
obvious when the application runs but does not function correctly for all possi-
ble inputs. To guarantee the discovery of all such situations, it would be necessaryto record the Access Softwares correctly functioning output, and preserve this
alongside the emulation. The behaviour would need to be checked with the results
obtained after from the emulation. This may be quite difficult if the application has
many different modes of operation. Further, if the applications output is primarily
sent to a display device, recording this stream does not guarantee that the display
looks the same in the new environment and therefore the combination of applica-
tion and environment may no longer be giving completely correct information to the
Consumer.
Maintaining a consistent look and feel may require, as a starting point, captur-ing that look and feel with a separate recording to use as validation information.
In general, it may be difficult if not impossible to formally describe the look and
feel. However, a number of Transformational Information Properties may essen-
tially define criteria against which preservation may be tested; validation against
these Information Properties would be a necessary, although not always sufficient,
condition for testing the adequacy of the preservation activity.
12.3 Migration/Transformation
At some point it may be decided that maintaining the original medium or the
Representation Network for a digital object is not practical for cost reasons, or does
not meet requirements for some other reason. Therefore the digitally encoded infor-
mation must be encoded in some other way, either the same bit sequences on new
media or else changed bit sequences.
It is possible to identify four primary digital Migration types. The primary types,
ordered by increasing risk of information loss, are:
1. Operations which do not change the bit sequences
Refreshment: A Digital Migration where a media instance, holding one or
more AIPs or parts of AIPs, is replaced by a media instance of the same
type by copying the bits on the medium used to hold AIPs and to manage
and access the medium. As a result, the existing Archival Storage mapping
infrastructure, without alteration, is able to continue to locate and access
the AIP. As discussed at the start of the book many processes go on to translate from
magnetic domains (for a magnetic disk) to bits. This bit copy may not be a
physical copy.
Replication: A Digital Migration where there is no change to the Packaging
Information, the Content Information and the PDI. The bits used to convey
these information objects are preserved in the transfer to the same or new
-
7/31/2019 Chapter 12 - Basic Preservation Strategies
5/6
12.3 Migration/Transformation 201
media-type instance. Refreshment is also a Replication, but Replication may
require changes to the Archival Storage mapping infrastructure.
2. Operations which change the bit sequences
Repackaging: A Digital Migration where there is some change in the bitsof the Packaging Information.
Transformation: A Digital Migration where there is some change in the
Content Information or PDI bits while attempting to preserve the full
information content. This deserves some extended discussion, which follows.
12.3.1 Transformation
Transformation implies a change in the bit sequence of either the ContentInformation or the PDI.
In many discussions of digital preservation the term Migration is used
when in fact what is meant is specifically Transformation because
the aim in those discussions is to change the digital encoding of the
information.
Given a certain piece of information there could be many different ways of
encoding it digitally. For example an image could be encoded as a TIFF file or a
JPEG; a document could be held as Word or PDF; a table containing scientific data
could be held as a FITS table or as a CSV (comma-separated values) file. Each of
these alternatives would need it their own, different, Representation Network.
However some Transformations make more sense than others. This will com-
monly be regarded as changing from one data format to another, but one must also
think about the associated semantics. Some formats have little or no room for the
semantics. Another consideration is the number and types of applications commonly
associated with the various formats.
For example an image could be regarded as a table where each of the cells con-
tains a number. However it would not make good sense to encode the image as a
CSV file because of the loss of semantics involved. Moreover the applications (e.g.
spreadsheet programmes) normally used to deal with a CVS file do not normally
display the data as one would expect an image to be displayed.
With regard to the semantics, one can supplement the capabilities of a particu-
lar format with something else e.g. the CSV file could have an associated text file
to supply the missing semantic information, such as the meanings of the columns,
which would otherwise be missing. In this case one would need the Representation
Information for (1) the CSV file (2) the text file and (3) the relationship between
them. While this is possible, the more attractive option would be to choose anew format which can itself handle the required semantics, with available appli-
cations that supply the required functionality, at least as well as the original format.
Therefore given a piece of digitally encoded information that one needs to preserve,
the transformation which one should reasonably apply is not arbitrary.
-
7/31/2019 Chapter 12 - Basic Preservation Strategies
6/6
202 12 Basic Preservation Strategies
There are deep reasons for making a careful choice and documenting that choice
appropriately. This is discussed in detail in Sect. 13.6.
However there are a number of useful points which should be made here. For
example one can think of the ideal Transformation in which the new digital object
has the same information as the original. If this is the case then it should be pos-sible to confirm this by means of another Transformation back to the original bit
sequence. If one can find this pair of Transformations then one can define (following
the revised version of OAIS):
Reversible Transformation: A Transformation in which the new represen-
tation defines a set (or a subset) of resulting entities that are equivalent to
the resulting entities defined by the original representation. This means that
there is a one-to-one mapping back to the original representation and its set
of base entities.
On the other hand if one looks at the other transformations mentioned above, for
example from FITS to CSV, then one would, without additional information, e.g.
the supplementary text file mentioned above, lose information and therefore not be
able to make the reverse transformation.
It is therefore reasonable to define:
Non-Reversible Transformation: A Transformation which cannot be guar-
anteed to be a Reversible Transformation.
An important point to note is that the definition ofnon-reversible is drawn as
broadly as possible. For example one does not need to have to prove there is no
backward transformation, only that one cannot guarantee that such a transformation
can be constructed.
We will come back to these definitions in Chap. 13 where they play an important
role in considerations of Authenticity.
12.4 Summary
This chapter has raced through a number of the basic preservation strategies and
techniques; it should be clear that each technique has its own strengths and weak-
nesses, and one must be careful to recognise these. The reader must be careful not
to be misled by the amount of material on emulation here; this was a useful loca-
tion for this material. Other preservation techniques are discussed in much more
detail throughout this book. Other chapters are devoted to descriptive Representation
Information and also to Transformations.
In Part II we provide examples of many of these techniques with evidence to
support their efficacy when applied appropriately.
http://-/?-http://-/?-http://-/?-http://-/?-