PERICLES - Choice of Information Encapsulation (IE) Technique

31
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] “This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”. Choice of IE Technique Anna-Grit Eggers (University of Goettingen)

Transcript of PERICLES - Choice of Information Encapsulation (IE) Technique

Page 1: PERICLES - Choice of Information Encapsulation (IE) Technique

GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]

“This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”.

Choice of IE TechniqueAnna-Grit Eggers (University of Goettingen)

Page 2: PERICLES - Choice of Information Encapsulation (IE) Technique

Encapsulation Techniques

Features

Page 3: PERICLES - Choice of Information Encapsulation (IE) Technique

• IE techniques cover a wide range of uses which differ regarding the:

•processing velocity •required disk space• location of storage•accessibility and perceptibility of the payload (by human / by machine analysis)

•preservation level of the carrier / digital object•processability of digital object and payload file formats and file sizes•provided compression mechanisms.

Range of uses

Page 4: PERICLES - Choice of Information Encapsulation (IE) Technique

• We identified criteria to distinguish between the techniques.

Criteria

• An encapsulation technique that fits for a specific use scenario can be chosen based on the technique specific characteristics of these criteria.

• Definition of criterion in this context:A property or feature of information encapsulation techniques

that can be used to compare different techniques on the basis of the criterion characteristics.

• For example:•Robustness of encapsulated information after encapsulation with an algorithm towards processing of the carrier.

•Perceptibility of the encapsulated information by an observation of the carrier.

Page 5: PERICLES - Choice of Information Encapsulation (IE) Technique

Examples of characteristics for this example:• The characteristic of a technique for the criterion “Robustness”

can be:• “robust” (“true”)• “not robust” (“false”)

• The characteristic of a technique for the criterion “Perceptibility” can be:

• “visible”• “not perceptible by humans”• “detectable by computers”• “not perceptible at all”.

Criterion characteristics

• The assignment for this criterion is harder, because the threshold for “human perceptibility” or “computer detectability” is blurred and not a “true/false” value.

Page 6: PERICLES - Choice of Information Encapsulation (IE) Technique
Page 7: PERICLES - Choice of Information Encapsulation (IE) Technique

Encapsulation Techniques

User scenario

Page 8: PERICLES - Choice of Information Encapsulation (IE) Technique

• IE techniques are used for a specific purpose in the context of a use scenario.

• This usage scenario defines features and characteristics that are desired to be fulfilled by a potential IE technique.

• The overall task is to find the best IE technique by capturing and evaluating the scenario defined by the user.

• The user could aim for encapsulating messages or metadata with digital files.

• The aim could also be to add legal information, ownership information, corporate designs, or information to ensure the authenticity of a digital object.

Creating a scenario

Page 9: PERICLES - Choice of Information Encapsulation (IE) Technique

• Some need a visible payload, others prefer to hide it. • The most valuable information might be the digital object or the

payload itself (this is often the case with steganographic messages).

Creating a scenario (cont.)

Page 10: PERICLES - Choice of Information Encapsulation (IE) Technique

• Three procedures are required to implement the scenario:•scenario capturing•weighting•decision calculation mechanism

Implementing a scenario

• Capturing:a questionnaire which requests the importance of a set of scenario criteria for the given scenario. The criteria are chosen in a way that they can be mapped to the features of the IE techniques.

• The amount of investigated criteria correlates with the amount of available IE techniques: It should be high enough to be able to distinguish between all main techniques, but low enough that the user won’t be overwhelmed while filling the questionnaire.

Page 11: PERICLES - Choice of Information Encapsulation (IE) Technique

• Weighting:Another crucial aspect is to ask the user

• which of the criteria are important for the scenario• how important they are, • which should be excluded because not pertinent.

Implementing a scenario

• The Analytic Hierarchy Process is a sophisticated but complex method:criteria are compared to each other and the user has to decide for each comparison which criterion is the more important one.

• A simpler approach is to include an option to exclude unimportant criteria and add a weighting mechanism for the user to indicate how desirable a characteristic of a criterion is, or how important it is for the scenario.

Page 12: PERICLES - Choice of Information Encapsulation (IE) Technique

● There are two types of criteria to consider:

Decision criteria

• must be fulfilled to be able to use a specific algorithm.

• File formats are an example of a technical criterion, because some algorithms can only be used for specific file formats.

Technical criteria

• depend on a usage scenario or the user preferences.

Scenario criteria

Page 13: PERICLES - Choice of Information Encapsulation (IE) Technique

Technical criteria:

◦ File formats (for carrier files as well as the payload files)

◦ Number of files

◦ Capacity

Technical decision criteria

Page 14: PERICLES - Choice of Information Encapsulation (IE) Technique

File formats • Embedding algorithms usually supports a set of file formats and cannot be used on files with the wrong formats.

• For packaging, the metadata has to be mapped to one of the standard XML packaging formats. The created packet reduces the risk of data loss.

Technical decision criteria

Number of files• A growing number of files increases the risk of losing one of them.

• With more than one file, the files can facilitate the identification process of the belonging files by providing indications for the used formats. That reduces the impact of file format obsolescence.

Page 15: PERICLES - Choice of Information Encapsulation (IE) Technique

Capacity• Capacity is the message size constraint: the number of payload bits that

can be embedded by an embedding algorithm into a specific digital object.

Technical criteria (cont.)

• It is influenced by the data format and used method. Some methods increase the risk of damaging the carrier file, or the payload becomes visible if the message size is too big.

• While packaging methods have no limit for the size of the payload files, embedding methods mostly have a maximum payload size.

• Invisible watermarking and steganography embedding methods not only become visible, if the payload size is too high, the cost for algorithmic calculations will also strongly increase with the size of the payload.• The use of an information frame can scale theoretically well for big payload file size. Though it might be unproductive, if the data frame outsizes the original digital object. In such a case the use of a packaging method can be considered

Page 16: PERICLES - Choice of Information Encapsulation (IE) Technique

● Processability and robustness● Complexity (space/time) of the algorithm● Used disk space of the output● Restorability of the carrier● Risk of data loss● Perceptibility● Location of the encapsulated information● Spreading, standardization● Security, confidentiality● Authenticity

Scenario criteria

Page 17: PERICLES - Choice of Information Encapsulation (IE) Technique

• To be re-usable, digital objects need to be processable normally by applications unhindered by its encapsulated information.

Processability

• Packaging techniques might require unpacking before processing the digital object, thus consuming additional calculation power.

• Embedding techniques do not change the file format of a digital object which therefore can usually be processed directly.

• A method that allows for encapsulated information to survive processing steps is considered “robust”.

• In an ideal case the embedded metadata survives even file format conversions.

Page 18: PERICLES - Choice of Information Encapsulation (IE) Technique

• Robustness can be strong or weak. Weakness implies an additional extraction step to keep the metadata safe.

Robustness

• In a scenario where the digital object is frequently viewed and processed, the metadata has to be embedded with a robust method.

• Steganographic methods are often very robust: • they take an attacker into account. • the digital objects can be processed normally, because a usage restriction would betray the hidden messages.

• Visible digital watermarks can be very robust. • Imperceptible watermarks are often fragile or semi

fragile, to allow the recognition of authenticity violations. • The use of available metadata fields is a very robust method that allows even object conversions. This can be valid for information frames, too.

Page 19: PERICLES - Choice of Information Encapsulation (IE) Technique

Parameters for calculating costs for algorithmic calculations and resources for the encapsulation process:

Complexity of algorithms

● The time and space requirements related to the complexity of the algorithm used for the encapsulation and recovery of the original digital object and the environment information

● This includes costs for validation calculations and for the unpacking algorithms of packaging strategies.

● Big O notation can help to express the algorithms behaviour in relation to the embedding payload size:http://web.mit.edu/16.070/www/lecture/big_o.pdf

● Frequent use a digital object and its metadata requires faster the restoration time.

Page 20: PERICLES - Choice of Information Encapsulation (IE) Technique

• The time for decompression has to be added if the data was compressed.

• Packaging mostly needs a lot more time for this than embedding.• With robust methods, an extraction is not necessary before the

reuse of the object.• Edition of available metadata fields is integrated in many

programs, and therefore not very time intensive. • The extension of an information frame in itself is not necessarily

time intensive, whereas the embedding method used on the frame might.

Complexity of algorithms (cont)

Page 21: PERICLES - Choice of Information Encapsulation (IE) Technique

• The difference of disk space needed for the enriched digital object in contrast to the original digital object can be an important parameter for preserving a large amount of data.

Disk space requirements

• Some methods offer the possibility of compressing the data, so that disk space can be saved.

• Packaging container compress both payload and digital object.• Embedding methods offering compression only compress the

embedded metadata and not the carrier.• Packaging can save more disk space than embedding with

compression.• An integrity check for possible damage during compression.

• Compression, decompression and validation require extra calculation time

Page 22: PERICLES - Choice of Information Encapsulation (IE) Technique

• Embedding methods mostly do not need much extra disk space.

Disk space requirements (cont.)

• With steganography algorithms changing only single bits, the size of the digital object remains constant. This method, however limits the capacity.

• The use of available metadata fields doesn’t need much disk space. Compression can extend the capacity for these methods.

• Information frames need additional disk space proportional to the size of the metadata files that should be stored.

Page 23: PERICLES - Choice of Information Encapsulation (IE) Technique

• The encapsulation method has to ensure that the digital object and the metadata can be restored.

• The digital object has to be restored in its original state unscathed.

• The integrity has to be verified by checksum comparisons. • There are different levels of integrity, e.g. just to ensure that the

significant properties survive, or a bit exact replica. • The significant properties have to survive in any case. • A validation of the whole object is often simpler than the

validation of the significant properties. • It is highly improbably that packaging damages the digital object.

A validation is easy, if the checksum was added to the metadata file.

Restoration

Page 24: PERICLES - Choice of Information Encapsulation (IE) Technique

• Not all algorithms that are used for embedding are completely reversible.

• Reversible embedding algorithms often embed the information of how to reverse them into a defined location of the digital object.

• If metadata is converted in the encapsulation, by example by compression, encryption, or format conversion, it might be necessary to validate the metadata.

• The methods using available metadata fields or information frames offer easy restoration.

Restoration (cont.)

Page 25: PERICLES - Choice of Information Encapsulation (IE) Technique

• The following factors increase the risk of damage for the digital object or the metadata:

•encryption usage• information hiding•compression•processing•conversion of the digital object

Risk of data loss

• Packaging stores the metadata in separate files. This guarantees access to embedded information which in turn may help identify the related digital data.• Data containers used for packaging mostly have standard formats that are not as vulnerable as non-standardised formats.

• At the same time the risk of data loss is higher for separated files when unpacking.

• For some embedding methods object modifications are inevitable.

Page 26: PERICLES - Choice of Information Encapsulation (IE) Technique

• The term ’Data Hiding’ describes methods to embed information in a way it is not perceptible by humans.

• Steganography and invisible digital watermarks are mostly detectable by machines.

• For most preservation scenarios it is necessary to be able to detect the encapsulated information.

• Data hiding increases the risk of losing the knowledge about the existence of this data.

• To avoid this, the carrier can be tagged with a visible method.• Packaging is always visible, whereas steganographic methods are

usually invisible.

Visibility

Page 27: PERICLES - Choice of Information Encapsulation (IE) Technique

• The location where the metadata is encapsulated can be a decision criterion:

• a separate file• the exact location at a file• the time dimension of an audio file• the background noise.

Location

• The location of storage is the main difference between packaging and embedding methods:

•Packaging stores the information in a separated file, mostly in a standardised XML format.

•Embedding stores the environment information directly in the digital object.

Page 28: PERICLES - Choice of Information Encapsulation (IE) Technique

• The embedding into the background noise has no influence on the significant properties and doesn't need additional disk space.

• Therefore, the noise has to be clearly identifiable, to prevent damage of the digital object.

• Using available metadata fields, or an information frame, do not influence the significant properties of the digital object directly.

• Some embedding methods store information by changing elements of the object, e.g. by inverting single bits of an image pixel, or by usingof an imperceptible frequency of audio files.

• Some data formats offer extra space for the storage of additional information.

Location (cont.)

Page 29: PERICLES - Choice of Information Encapsulation (IE) Technique

• Some encapsulation tools offer security features, like encryption.

Security

• If encryption requires a secret key for accessing the data, there is high risk potential of losing the data by losing the key.

• The preservation and re-use of confidential objects or encapsulated information requires adequate prudence. For this purpose an encryption makes sense.

• The confidentiality of steganographic methods is based on the retention of knowledge or authorisation, if no additional encryption is used. Insofar this constitutes a very weak kind of confidentiality.

Page 30: PERICLES - Choice of Information Encapsulation (IE) Technique

• Authenticity and integrity of the digital object and its environment information are paramount for many usages.

• Authenticity can be important if the digital object has special legal requirements.

• Fragile or semi-fragile digital watermarks can be used in some cases to ensure the integrity of a delivery copy of an object, hereby the object is changed slightly by the application of the watermarking algorithm.

• The marking would be destroyed, if the file is altered, thereby an intact mark can ensure that no third party changed the object.

• Authenticity plays a major role in the archive context in which also the provenance and chain of custody of an object are important.

Authenticity

Page 31: PERICLES - Choice of Information Encapsulation (IE) Technique

• To guarantee the integrity of a digital object, it is often kept apart in its original state and context, all changes to the original are omitted.

Integrity

• The BagIt directory structure can be used without applying an additional packaging or compression method to prevent object alterations.

• Metadata is added into other defined directories of the structure, so that the digital object remains untouched, even by complementing information at a future date.

• The integrity of the encapsulated information can be verified by adding the checksum of their originals to the restoration metadata.