PERICLES Policy management & ontology supported preservation - Acting on Change 2016

36
GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation] “This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”. Policy Management & Ontology Supported Preservation Jean-Yves Vion-Dury (Xerox/PERICLES) Justin Simpson (Artefactual) Stratos Kontopoulos (CERTH/PERICLES) Joel Simpson (Artefactual) @PericlesFP7 #PERIconf2016

Transcript of PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Page 1: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

GRANT AGREEMENT: 601138 | SCHEME FP7 ICT 2011.4.3 Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]

“This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no601138”.

Policy Management & Ontology Supported PreservationJean-Yves Vion-Dury (Xerox/PERICLES)Justin Simpson (Artefactual)Stratos Kontopoulos (CERTH/PERICLES)Joel Simpson (Artefactual)

@PericlesFP7 #PERIconf2016

Page 2: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

How do preservation policies evolve and are managed over time?▶ Institutions’ preservation practices (and policies)

evolve as the institution learns from their own work or from knowledge/best practices used by the community.

▶Before implementing changes, it would be helpful to understand the implications or potential impact on current repository of DOs.

▶We will explore changing needs for email preservation.

Objectives

Page 3: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Who we are:

About Artefactual & Archivematica

Why we are here:▶Archivematica already uses policies▶We believe that there are many opportunities to

improve how policies are used▶We are excited by the potential of leveraging

technology and learning from the PERICLES project

(the company) is the organisational home for two open source

projects: (a preservation platform) and (a digital repository)

Page 4: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Archivematica Format Policies

Rule

Command

Tool

Format

Purpose

applied to

is for a particular preservation

executes a

using a

➔ Over 850 file formats defined with suggested assessment for preservation & access purposes

➔ Over 1,000 predefined rules provided

➔ 39 predefined commands provided

➔ 18 different tools available to be used

▶Format policies are simple rules applied to digital objects of a particular format, for a particular preservation purpose

Page 5: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

How the FPR Works Today

▶The Format Policy Registry (FPR) is a significant body of knowledge derived from Artefactual and our users

Local FPR Database

Storage Services

Workflow Engine

FPR Server

Preservation PlanningWorkflow DashboardsAccess & Admin

Web based User Interface

Artefactual maintains the current knowledge base of formats, rules & commands When Archivematica is installed, the latest rules are downloaded so users can start to use them immediately

The Preservation Planning user interface provides an easy way to manage all of the rules, commands etc The rules are executed by the workflow engine. Users can review status and perform manual steps using the workflow dashboards

Page 6: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Policies: Current Benefits

We believe the focus on policies in Archivematica today provides a number of benefits▶Simplification: separating rules (policies) from

workflow make both easier to configure and manage

▶Understandability: abstracting policies from technical implementation enables non-technical users to interact more directly with the system

▶Shareability: enables some level of sharing best practices across the community

Page 7: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Policies: Potential Improvements

We think the PERICLES approach may help us improve upon our existing focus on policies: ▶Simplification: many important preservation

decisions are still deeply embedded in technical implementation

▶Understandability: using well defined vocabularies & languages (ontologies) to define policy will make make it easier to be precise and eliminate ambiguity

▶Shareability: using common standards will make it easier to share policy within a community

Page 8: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Policies: New Benefits

There are a few benefits that may be achieved through the PERICLES approach to policies: ▶Impact analysis: ability to determine the

impact of a change in policy before it is made▶Reasoning / change management: once

impact analysis is automated, it is possible to automate the management (resolution) of impacts

Fabio Corubolo
[email protected] ""Although I’m (joel) happy to present this as long as I get some help with the speaker notes below in advance!""I have added speaker notes. Please let me know if these are OK. I will not be presend at the Day 1 workshop, as I need to present the practice session. HTH, [email protected]
Page 9: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Policies: New Benefits

▶Validation: we can attach ad hoc validation processes (tests)

▶Reuse: making use of existing ontological knowledge bases on formats and preservation policies in general

Page 10: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

PERICLES Model-driven Preservation

Page 11: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

▶Abstraction of complex systems as models that can be manipulated independently

Model-driven Preservation

Models

Digital ecosystem◦ Analogy with biological

systems◦ Evolving systems of

interdependent entities

Capture and representation of the environment▶ Understand the

wider context around digital objects that impacts their long-term reuse

Continuous change and reuse

Continuum approach▶ Merging of active-life

and archival phases▶ Non-custodial

Page 12: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

“... a formal, explicit specification of a shared conceptualization...” [Studer et al., 1998]

Upper ontology: A model of the common objects that are generally applicable across multiple knowledge domains.

Domain ontology: A model of concepts that belong to a specific domain or part of the world.

What is an Ontology?

machine readable with computational

semantics

unambiguous concepts, properties,

functions, axioms definition

commonly accepted

consensual knowledge

abstract, simplified model of a domain

[Studer et al., 1998] Studer, R., Benjamins, V.R. and Fensel, D. (1998), Knowledge engineering: Principles and methods. Data & Knowledge Engineering, Elsevier Ltd, Vol. 25, Issues 1-2, pp. 161-197

Page 13: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

◦ Classes (concepts)Superclass/subclass

relationship◦ Properties (relationships)

Subject → Predicate → Object◦ Axioms, restrictions and constraints◦ Individuals (instances)

OWL - the Web Ontology Language

Key Notions

Page 14: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

PERICLES Models▶ LRM -

ontology for modelling linked resources

▶DEM – formalism for digital ecosystems

▶Domain ontologies

Page 15: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Aims ▶Model digital objects, dependencies between

them, temporal evolution▶Maximise interoperability with other ontologies

(existing or future)▶Interoperate with environment information and

digital ecosystem models

Linked Resource Model (LRM)

Page 16: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

▶Relation between change and dependency▶Understanding dependencies between digital

objects and resources within their environment is the key to assess and manage change

▶Given objects A and B, A is dependent on B if changes to B have a significant impact on the state of A, or if changes to B can impact the ability to perform function X on A.

Dependency and Change

Page 17: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Dependency: the association, relation or interaction among two or more Resources

Plan: presents a set of actions/steps to be executed by Agent

precondition and impactDescription:

intention: the intended usage of a Resourcespecification: the context of the Dependency

itself

LRM Dependency

Page 18: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

LRM Dependency

Page 19: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

▶Digital Ecosystem represents the surrounding environment of a digital object that impacts reuse▶Digital ecosystem can include data objects,

software, user communities, processes, technical services and their dependencies▶Scope depends on the particular use case

Digital Ecosystem Model (DEM)

Page 20: PERICLES Policy management & ontology supported preservation - Acting on Change 2016
Page 21: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Domain Ontologies

▶Modelling DP-related risks in◦ Digital Video Art (DVA)◦ Software-based Art (SBA)◦ Born-digital Archives (BDA)

▶Facilitate curators in modelling, projecting & tackling risks throughout DP process▶Extensible for future adopters▶Ontology reuse: LRM, DEM, CIDOC-CRM,

CRMdig, DC

Page 22: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Key Constructs▶Dependencies

◦ HW dependencies: HW requirements for a resource to function properly

◦ SW dependencies: Dependency of a resource or activity on specific SW

◦ Data dependencies: Requirement of knowledge/data/information

◦ Further dependency specialization via intentions & specifications

▶Activities: Temporal entities representing actions intentionally carried out by actors that generate changes◦ E.g. creation, acquisition, display etc.

▶Agents: Resources that may bring change to another resource or participate in an activity◦ Further specialized in Human & SW Agents

Page 23: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

DVA Domain Ontology

Page 24: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

HW Dependency (DVA)

Page 25: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

PERICLES Design Pattern for Policies

http://ontologydesignpatterns.org/wiki/Submissions:Policy

Page 26: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Email Scenario

Page 27: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

▶Real world examples of email preservation policies.▶Changes proposed based on lessons

learned.▶“Historic” set of processes & policies and

“future” set.▶Explore how PERICLES can help understand

the implications of moving from the historic to the future policies before enacting those changes.▶Explore benefits of digital ecosystem

approach compared to existing approaches.

Workshop Approach

Page 28: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Transfer Email from Source System

Process Dissemination Package (DIP)Process Archival Package (AIP)

Simplified Email Process

Export Email Data

Pre-accession

review

Transfer (to preservation

platform)

Virus Scan Fixity Check Extract Attachments

Identify & Validate Format

Clean File Names

Normalize Emails Create AIP Add Rights

MetadataIdentify

Sensitive Information

Create DIP

Process Submission Information Package (SIP)

Page 29: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Historic Email Policies

Process StepExport

Email DataPreferred source format for email is maildir (where possible)

Policy

Normalize Emails

Preferred preservation format for email is maildir

Extract Attachment

sAttachments must be extracted from source emails & stored as discrete objects

Converted attachments should retain links to emails to which they were attached

Page 30: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Changing Email Policies

Historic Policy1) Preferred source format for email is

maildir (where possible)

Future (changed) Policy

2) Preferred preservation format for email is maildir

3) Attachments must be extracted from source emails & stored as discrete objects

4) Converted attachments should retain links to emails to which they were attached

Preferred source format for email is IMAP protocol (second preference is mbox)

Preferred preservation format for email is mbox

Attachments must be extracted from source emails for archival processing (but need not be retained as discrete objects)No change to policy per se -- but implementation will change from using Archivematica UUIDs to native email UUIDs

Fabio Corubolo
Please note that these can be expressed in the models as policy parameters. We have a similar example in the test we are demonstrating for the practice worksop where the parameter is changed. It could theoretically be applied here too (given the rules and that the models are build with DEM according to the specifications
Page 31: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

New Email PoliciesThese policies did not have a precedent or equivalent in our historic process5) Email Accounts should be characterized and metadata to be extracted / described should include: Total number of attachments, size of mailbox, first email sent date, first email received date, last email sent date, etc. etc.

6) Digital signatures provided within any email should be verified

Page 32: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Modelling Policies in PERICLES

Page 33: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Impact Analysis of Email Changes

How will PERICLES help us identify the impacts of the policy changes?1. Identify all objects where source format was Maildir? (in

practice we may not care to attempt to extract source data again)

2. Identify all objects preserved in Maildir format so that we know how many should be re-normalized into Mbox format

3. Identify all extracted attachments we no longer need to store4. Identify all attachments that need a new reference (the

native email UUID instead of the previously generated Archivematica UUID)

5. Identify all email accounts that should be characterized6. Identify all emails with digital signatures that should be

verified

Page 34: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

A Sample Policy in PERICLES

Page 35: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

▶How do preservation policies evolve and are managed over time?▶Organizations seeking ways to improve

how policies are used▶PERICLES model-driven preservation

approach▶The email policy preservation scenario

Conclusions

Page 36: PERICLES Policy management & ontology supported preservation - Acting on Change 2016

Thank You!