Scape project presentation - Scalable Preservation Environments

27
SCAPE Scalable Preservation Environments

description

This presentation describes the EU-funded project SCAPE – Scalable Preservation Environments –, its developments and sustainability plans. The SCAPE project has developed scalable services for planning and execution of institutional preservation strategies on an open source platform that orchestrates semi-automated workflows for large-scale, heterogeneous collections of complex digital objects. The project run-time was around 3½ years from 2011 to 2014. Read more about SCAPE at www.scape-project.eu

Transcript of Scape project presentation - Scalable Preservation Environments

Page 1: Scape project presentation - Scalable Preservation Environments

SCAPEScalable Preservation Environments

Page 2: Scape project presentation - Scalable Preservation Environments

• Your collection of digital data is growing rapidly.

• Your preservation activities must become more efficient and more scalable.

• You need SCAPE!

• The SCAPE project has developed scalable solutions for long-term preservation of large-scale and heterogeneous data sets.

2

Digital Preservation – What do I need?

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

Page 3: Scape project presentation - Scalable Preservation Environments

3

Its all about scalability!

• Scalable services for planning and execution of institutional preservation strategies

• Infrastructure for the execution of digital preservation processes on large volumes of data

• Existing tools have been improved and extended.• New tools have been developed where necessary.

What is SCAPE?

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

Page 4: Scape project presentation - Scalable Preservation Environments

4

SCAPE covers a whole digital preservation life cycle

What is SCAPE?

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Interconnecting services support the preservation of large repositories of digital objects

• Applications support the formulation of preservation policies, decision making and selection of preservation actions

Page 5: Scape project presentation - Scalable Preservation Environments

5

Take your pick – choose what you need!

What is SCAPE?

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Use the full set of interconnected SCAPE components or a selected series of SCAPE tools or workflows.

• Many SCAPE components can be individually incorporated.

Page 6: Scape project presentation - Scalable Preservation Environments

• All SCAPE solutions arise from real-world challenges at partner institutions.

• Each challenge is tested in testbeds at the partner institutions.

6

Solutions Tested in Real Life

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

Web Content

Digital Repositories

Data Centres

Research Data Sets

Testbeds

Page 7: Scape project presentation - Scalable Preservation Environments

ScalabilityIn four dimensions: Heterogeneity of collections as well as number, size and complexity of objects

AutomationThrough scalable,

automated and simple to design preservation

workflows

PlanningAnswering core preservation planning questions

IntegrationThrough a robust,

integrated, open source preservation system

Solutions for Content Holders

7This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

Page 8: Scape project presentation - Scalable Preservation Environments

8

Overview: SCAPE Architecture

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

Page 9: Scape project presentation - Scalable Preservation Environments

9

Overview: SCAPE Components

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

The SCAPE Platform is a

reference architecture for scalable

preservation environments

Page 10: Scape project presentation - Scalable Preservation Environments

10

Overview: SCAPE Components

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

The SCAPE Preservation Components are tools which enhance the functionality of a digital preservation system in:• Scalability• Functional coverage• Quality

Page 11: Scape project presentation - Scalable Preservation Environments

11

Overview: SCAPE Components

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

The SCAPE Planning and Watch components address the bottleneck of decision processes and processing information required for decision making

Page 12: Scape project presentation - Scalable Preservation Environments

Examples of tools and services

12This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

Page 13: Scape project presentation - Scalable Preservation Environments

13

Scout – an Automated Preservation Watch System

Scalable Planning and Watch

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Enables you to monitor your collections

• Lets you access community knowledge

• Collects relevant knowledge and enables automated notification

Page 14: Scape project presentation - Scalable Preservation Environments

14

C3PO – Content Profiling Tool for Preservation Analysis

Scalable Planning and Watch

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Analyses characterisation metadata for digital collections

• Aggregates and combines the metadata information across collections

• Generates a profile of the content set

• Allows use of different metadata formats

Page 15: Scape project presentation - Scalable Preservation Environments

15

Plato – Scalable Preservation Planning

Scalable Planning and Watch

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Decision-making support tool• Guides you through the

preservation planning workflow

• Provides trust through controlled experiments and documentation

• Provides an executable plan

Page 16: Scape project presentation - Scalable Preservation Environments

16

ToMaR – let your Preservation Tools Scale

Scalable Tools

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Run existing tools against large amounts of files

• Execute tools in a scalable fashion on a MapReduce cluster

• Enable scalable workflows which chain together a set of tools

• Process payloads too big to be computed on a single machine

Page 17: Scape project presentation - Scalable Preservation Environments

17

Pagelyzer – Monitor your Web Content

Preservation Components

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Detect changes in web pages• Compare web page versions

on a large scale• Compare web page rendering

in different browsers• Determine appropriate

frequency of web harvestings

Page 18: Scape project presentation - Scalable Preservation Environments

18

Jpylyzer – Easy Validation of JPEG 2000

Preservation Components

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Automated JP2 validation and feature extraction

• Enables you to confirm whether an image is a valid, intact JP2 file

• Reports the key technical properties of the image

Page 19: Scape project presentation - Scalable Preservation Environments

19

Matchbox – easy Detection of Nearly Duplicate Images

Preservation Components

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Identify duplicate content, even where files are of different size, format, cropping etc. or scanned from different original copy

• Automate quality assurance and reduce manual effort

Page 20: Scape project presentation - Scalable Preservation Environments

20

xcorrSound – Automate Sound Wave Analysis

Preservation Components

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Compare two audio files and output the similarity

• Detect overlaps in audio files• Detect occurrences of a

smaller audio file (e.g. a jingle) within a larger audio file or an index of audio files

Page 21: Scape project presentation - Scalable Preservation Environments

SCAPE tools are published as open source software.

Tools and services from SCAPE are sustained by

• Open Planets Foundation -address core digital preservation challenges and engage with the community

• COPTR -Community Owned digital PreservationTool Registry

21

Sustainability of Tools and Services

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

Page 22: Scape project presentation - Scalable Preservation Environments

Ultimate Sustainability goal:• Supporting communities of practice by enabling

efficient collaboration during the project and beyond.

Open Planets Foundation will take post-project ownership of the outputs, supported by other partners providing specific capabilities.

Sustainability of SCAPE results

22This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

Page 23: Scape project presentation - Scalable Preservation Environments

Five complementary approaches:• Visibility

Providing integrated outreach to multiple audiences to maximise discoverability.

• QualityEnsuring that project outputs conform to standards-driven quality assurance.

• Training Supporting skills development to further institutional capacity building.

• Open licensingUsing open licences to encourage the adoption and reuse of project outputs.

• Community integrationIntegrating project outputs into commercial and non-commercial systems and services.

Sustainability of SCAPE results

23This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

Page 24: Scape project presentation - Scalable Preservation Environments

• EU-funded project under FP7 (Research and Technological Development)

• Project runtime: February 2011 to September 2014• 20 partners from 10 countries - from memory

institutions, data centres, research labs, universities, and industrial firms

• Public Project materials are licensed under a CC-BY-SA International License

24

About SCAPE

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

Page 25: Scape project presentation - Scalable Preservation Environments

25

SCAPE Consortium

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐ 25

Page 26: Scape project presentation - Scalable Preservation Environments

26

Additional Sources of Interest

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• Development Infrastructure• Code repository hosted by the Open Planets Foundation and GitHub

• https://github.com/openplanets/scape/• Development Wiki

• http://wiki.opf-labs.org/display/SP/Home

• Tools• http://www.scape-project.eu/tools

• Experimental Workflows• http://www.myexperiment.org/search?query=SCAPE&type=all&commit=Search

• Publications• http://www.scape-project.eu/category/publication

• Public Deliverables• http://www.scape-project.eu/category/deliverable

Page 27: Scape project presentation - Scalable Preservation Environments

27

More Information

This work was partially supported by the SCAPE Project.The SCAPE project is co funded by the European Union under FP7 ICT 2009.4.1 (Grant Agreement number 270137).‐ ‐

• SCAPE website: www.scape-project.eu• Blog posts and more:

www.openplanetsfoundation.com/projects/scape • Tools and Services:

https://github.com/openplanets/scape • SCAPE Twitter: @SCAPEProject, #SCAPEProject• SCAPE Newsletter: Sign up via www.scape-project.eu

All images © the SCAPE Project or its partners, except images on slides 3, 6 and 26 © www.digitalbevaring.dk