Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE,...

42
Putting the Shock back into Putting the Shock back into the Future the Future Prof. Jane Hunter [email protected]

Transcript of Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE,...

Page 1: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Putting the Shock back into Putting the Shock back into the Futurethe Future

Prof. Jane Hunter [email protected]

Page 2: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

AgendaAgenda• PANIC, AONS – Semi-automated Preservation • FUSION

– Provenance Model cf. PREMIS– Secure Provenance Visualization

• DART– Metadata Schema Registry– SRB + Fedora– Annotation tools

• Future Issues/Research– Interoperability of Preservation Metadata – across

heterogeneous archiving systems– Collaborative Preservation Tools– Trusted repositories – Social networks– Preservation selection metrics

Page 3: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

PANIC ProjectPANIC Project

Objectives– Address the long term

preservation and accessibility of (composite) digital objects

Partners– DSTC, UQ

Page 4: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

ChallengesChallenges

Within digital libraries/archives:• Wide range of file formats - different platforms,

different authoring/display software• Composite mixed-media objects – web pages,

images, video, audio, Flash, SMIL, SVG• Highly proprietary – software & hardware

dependent• Dynamic and interactive• Difficult to capture – boundary problem• Large scale• Little guidance

Page 5: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

Existing ToolsExisting Tools

• JHOVE, DROID - Metadata extraction tools• OCLC’s INFORM, Cornell’s VRC – risk

assessment -> notification services• GDFR, PRONOM, DCC-RIR – Format registries• VersionTracker, IIPC – Software Registries• XENA, TOM – Conversion/migration services• IBM’s UVC (Universal Virtual Computer)• Koninklijke Bibliothek - Emulation services

Page 6: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

ObjectivesObjectivesProvide an Integrated Preservation Framework which

supports:• Large, heterogeneous, distributed collections• Multiple formats and composite digital objects• Changing organizational needs

– Range of solutions• Flexible, Dynamic, Scalable, Extensible• New emerging formats, software, recommendations• New migration, emulation services• Recommender services/decision support• Sustainable - cost-effective, semi-automated• Collaborative effort!

Page 7: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Preservation MetadataCapture Tools

(PREMINT, JHOVE,NLNZ, DROID)

PANIC

Networked Distributed Archives

Protein DataBank

Registries

Software Registry(VersionTracker)

Format Registry(PRONOM, GDFR,

RIR)

RecommendationRegistry

(INFORM)

Web services

ServiceDescriptions

(OWL-S)

Risk Assessment &Notification

Services(VRC, INFORM)

PreservationServices

(XENA, TOM, UVC)

SDSSSkyServer

ESO ScienceArchiveGenBankADIL

EnterpriseServiceOrientedArchitecture (SOA)

Page 8: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

StepsSteps• Archival – selection and capture of digital object(s) +

preservation metadata• Risk assessment and notification of potential

obsolescence– New recommendations, formats, software versions

• Service Specification and Request– Emulation or Migration– Inputs/Outputs– Cost– Speed– Remote/Distributed/Local– Reliability– Lossiness

• Select, Compose, Invoke Preservation Service• Record preservation events

Page 9: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

PANICPANIC AArchitecturerchitecture

PreservationMetadatainput tool

Invocation component

MultimediaCollection

PreservationMetadata

RequesterAgent

Discovery component

Discovery Agent(e.g. SemanticMatchmaker)

Notification component

NotificationService

Registry(s)

Internet

PreservationServiceRegistry

OWL-S Profiles

CustodialOrganization

ObsolescenceDetector

ServiceDiscovery

ServiceSelection

ServiceInvocation

WSDLSOAP

Provider component

TIFF-to-JPEG2000

AIFF-to-MP3

Mac OS1 Emulator

PreservationWeb Services

PreservationService Provider

Agent

Retrieve and InvokeAppropriate Service(s)

CollectionsManager

ApacheAXIS

SesameRDF Store

MySQLdatabases

Page 10: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Presentation Metadata

Intention Metadata

Descriptive Metadata

File Groups

Structural Map

Administrative

Technical Metadata

Rights Metadata

Source Metadata

DigiProv Metadata

Extensions

Metadata Encoding and Transmission Standard (METS)

Return Incompatibilities

Format Registry

Recommendation Registry

Software Registry

FormatName

FormatType

CurrentVersion

PreviousVersion

ReleaseDate

SoftwareName

SoftwareType

CurrentVersion

PreviousVersion ReleaseDate

FormatSupported

Company

Platform

Recommendation

FormatVersion Authority

URL ReleaseDate

FormatName

CompareExtract

Format Details

SoftwareDependencies

Obsolescence Detector

Notification componentNotification component

Obsolescence detector – periodically compares the preservation metadata for each object with registries to determine when object is at riskof obsolescence

Page 11: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

Discovery componentDiscovery component• Discovery Agent - matches service request against

OWL-S descriptions of Preservation Web services• Returns a ranked list of Preservation Web services

that match the request

Discovery component

Discovery Agent(e.g. SemanticMatchmaker)

PreservationServiceRegistry

OWL-S Profiles

Sesame RDF Store

Page 12: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

OWLOWL--S Preservation ExtensionsS Preservation ExtensionsService ExecutionStatus

SystemRequirment

RemoteExecution

Download

Creator

ReleaseDate

ServiceQuality

Speed

Reliability

Emulation

e.g. WindowsXP

e.g. JohnDoe

e.g. 8-12-2003

e.g High

e.g. Low

EmulatedObject

EmulationType

SystemSettinge.g. 256 bitpalette

Migration

OriginalObjectFormat

OriginalObjectVersion

e.g. TIFF

e.g. 5.12

TargetObjectFormat e.g. JPEG2000

TargetObjectVersion e.g. 2.02

Lossiness e.g. lossless

e.g. MAC OS

e.g. OS

subClassOf

PreservationService

Page 13: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

Provider componentProvider component

Provider Agent either:• retrieves and invokes

preservation servicelocally or;

• Invokes preservation service remotely

Provider component

TIFF-to-JPEG2000

AIFF-to-MP3

Mac OS1 Emulator

PreservationWeb Services

PreservationService Provider

Agent

Page 14: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

AONSAONS

• Automated Obsolescence Notification Service

• APSR (Aust. Partnership for Sustainable Repositories) funded Project

• Collaboration between– University of Qld– ANU (Peter Raftos, Joseph Curtis)– NLA

Page 15: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

AONS ArchitectureAONS ArchitectureDigital Collections

DROID

PRONOM

DSpace ANUFedora UQPandora

LCSDF

(GDFR)

AONS

RegistryFormatsSoftwareVersions

Summary of

CollectionFormats

Comparison

EmailNotification

Registered Collections Manager,Owner or Consumers of data?

Page 16: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

Future of AONSFuture of AONS

• Implement over Fez at UQ… subset of PANDORA• Build a GUI• Test, evaluate and refine• Investigate release as open source middleware• Integrate - GDFR, risk assessments/rankings• Provide access to trusted services - quality ratings• OWL-S versus WSMO • Grid Services - Web Services Resource Framework

(WSRF)• Composite services, composite objects

Page 17: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Example Example –– Scientific ModelsScientific Models

Area Mean S.D. X Y Mode Length Major Minor Angle Int.Den Back. Min Max1 0.01 208.2 88.14 0.34 0.06 253 0.34 0.11 0.08 102.7 0 0 35 2532 0.01 206.8 89.14 0.17 0.07 253 0.34 0.1 0.08 17.57 0 0 35 2533 0.01 212.9 84.54 0.26 0.11 253 0.37 0.11 0.1 158 0 0 35 2534 0 190.4 98.85 0.07 0.1 253 0.21 0.07 0.05 76.53 0 0 35 2535 0.03 228.8 68.54 0.67 0.38 253 0.75 0.24 0.15 154.8 0 0 35 2536 0.09 240.7 50.36 0.34 0.48 253 1.24 0.38 0.3 95.89 0 0 35 2537 0.08 240.1 51.46 0.59 0.59 253 1.18 0.35 0.28 81.38 0 0 35 253

Slattery, O., Lu, R., Zheng, J., Byers, F., Tang, X. "Stability Comparison of Recordable Optical Discs- A study of error rates in harsh conditions," Journal of Research of the NIST, 109, 517-524, 2004

Average LE = 1/T exp –(A –B/T)

Derived_from

Each componentHas software, OS, hardwaredependencies

Page 18: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

Preservation of Composite Preservation of Composite ObjectsObjects

• Use XML to package metadata, component objects and relationships– METS, MPEG-21 DIDL, XFDU, IMS-CP

• Maintain preservation metadata for both– Composite object– Atomic components

• Maintain index of file formats• Monitor atomic objects first

– JPEG -> JPEG-2000• Then check currency of composite objects

– SMIL 1.0 -> SMIL 2.0

Page 19: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

Scientific PublishingScientific Publishing

Increasing pressure to:• publish raw and derivative data• document precise provenance• share data and analytical, modelling

services• enable duplication and validation• protect IP

Page 20: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

eScienceeScience WorkflowWorkflow

BPEL4WS – workflow based on web services

t3

t6t5

t2

Conduct Experiments

Capture ExperimentalResults/Data

DataIntegration,Exploration

t1

Initiate NewExperiments

Organization AOrganization B

t8

ModelFormulation

Organization C

t8

Model Validation,StatisticalAnalysis

Data ProcessingSemanticIndexing

Kepler

t8

ModelPublication

Page 21: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

ComponentsComponents• Prior work - pre-existing data, models, publications;• Experimental, observational data

– numerical data, survey data, images, video, audio, maps, spectral data, real-time sensor data;

– instrumental conditions, settings and parametric ranges• Formulae, rules, hypotheses;• Conceptual models - axioms, models and metaphors;• Numerical models – mathematical functions;• Software - source code, executables, applets, web services

– Analysis, processing, transformation services– Computational models – simulation software

• Hardware – instruments and computers;• Visualizations – 2D, 3D imagery, graphs, tables, charts, diagrams,

animations;• Textual - publications, reports, documentation, annotations,

bibliographies, reviews

Page 22: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Harmony ABC ModelHarmony ABC Model

Entity

Temporality

Actuality

Abstraction

Time

Place

Event

State

Action

Artifact

Work

AgentManifestation

Item

Page 23: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Harmony ABC Model

Act

Event1

agen

t

role"Author"

"Roald Dahl"

type

"Authoring"

State1

Event2

State2

Man1format"Manuscript"

creates involv

es

Act

agent

"Knopf"

role"Publisher"

date

"1964"

date

"1985"

type

"Publishing"

Man2

creates

inState

inState

format"hardcover"

"Shindleman"

Work

title

"Charlie and theChocolate Factory"

hasRealization hasR

ealiza

tion

value

illustrator

Page 24: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Extended ABC Extended ABC OntologyOntology

Entity

Temporality

Actuality

Abstraction

Time

Place

Event

State

Action

Artifact

Work

AgentManifestation

Item

Experiment

SimulationRun

Processing

Model

Design

Hypothesis

Theory

Data

Numerical

Textual

Image

Graphical

Audio

Video

Mapping

TheoreticalModel

ComputationalModel

description

objectives

conditions

results

GraphicalModel

creator

discipline

input_parameters

output_parameters

scope

Page 25: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

EventE1

Type

Experiment

Context

Date/Time

Place

Action

AgentRole

InputOutput

Tool

Conditions

EventE2

Type

Processing

Context

Date/Time

Place

Action

AgentRole

InputOutput

Model

Visualization

ToolMatLab

ExperimentalDesign

Samples

ExperimentalResults

Samples

State1 State2State3Objectives

Scope

ModellingModelling eScienceeScience ProvenanceProvenance

ZeissSTEMImicroscope

Agents can be people, instruments or software e.g., web services

Page 26: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

A Visualisation of an Electrolyte Manufacture experiment.

Page 27: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

The direct relationship between selected nodes is automatically inferred.

Page 28: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Coarse-grained viewof provenance

Page 29: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Example of a Scientific Model PackageExample of a Scientific Model Package

Each component has software, OS, hardware dependencies + interdependencies

RDF Package

Area Mean S.D. X Y Mode Length Major Minor Angle Int.Den Back. Min Max1 0.01 208.2 88.14 0.34 0.06 253 0.34 0.11 0.08 102.7 0 0 35 2532 0.01 206.8 89.14 0.17 0.07 253 0.34 0.1 0.08 17.57 0 0 35 2533 0.01 212.9 84.54 0.26 0.11 253 0.37 0.11 0.1 158 0 0 35 2534 0 190.4 98.85 0.07 0.1 253 0.21 0.07 0.05 76.53 0 0 35 2535 0.03 228.8 68.54 0.67 0.38 253 0.75 0.24 0.15 154.8 0 0 35 2536 0.09 240.7 50.36 0.34 0.48 253 1.24 0.38 0.3 95.89 0 0 35 2537 0.08 240.1 51.46 0.59 0.59 253 1.18 0.35 0.28 81.38 0 0 35 253

Drennan, J., Knibbe R., Auchterlonie, G., “Effect of porosity and firingTemperature on fuel cell efficiency”, Journal of Solid State Ionics, Vol 8, No 2, 517-524, 2004

Average LE = 1/T exp –(A –B/T) derived_from

refers_to

image_of

analysis_of

graph_of

ePrintsdatabase

refers_to

TitleCreatorDescriptionTypeDisciplineDate.PublishedLicense

Page 30: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

• Scientific Model/Publication construction tools– Drag and drop tools – hyperlinks and bitstreams– Metadata generation/capture tools– Science Commons license attachment– Database ingestion – to institutional repository– RDF Datastore

• Kowari + links to SRB, DSpace, Fedora

• Search, Browse and Retrieval– RDFQL– Jgraph, Haystack – Relationship graphs

Required ToolsRequired Tools

Kowari

SRB DSpace Fedora

Page 31: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

DART DART

Dataset Acquisition, Accessibility & Annotation eResearch Technologies

• $3.23M – DEST ARIIC funding• 3 partners (Monash (PI), UQ, JCU)• 15 months -> Dec 2006• 28 Separate work packages

– Data Collection, Monitoring and Quality Assurance (DMQ)

– Storage and Interoperability (SI)– Content and Rights (CR)– Annotation and Assessment (AA)– Discovery and Access (DA)

Page 32: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment
Page 33: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

UQ UQ WorkpackagesWorkpackages• DMQ4 – Online remote access to instruments/sensors• AA1 – Annotations of scientific data• AA2 – Secure annotation server• AA3 – Collaborative annotations• SI1 – Integration of Fedora and SRB• SI3 – Semantic search interface on SRB• CR2 – Creative Commons – licensing and enhanced

search engine• CR3 – Science Commons tools (SHERPA/ROMEO)• DA3 – Metadata Schema Registry

Page 34: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment
Page 35: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

Page 36: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

VannoteaVannotea –– Collaborative Collaborative Annotation and Discussion of Annotation and Discussion of

Medical Images/VideosMedical Images/Videos

Page 37: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment
Page 38: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Long Term Repositories 2006

Future ResearchFuture Research

• Interoperability of Preservation Metadata –across heterogeneous archiving systems

• Collaborative Preservation – Decision support– Jabber instant messaging, chat, skype– Producer, Consumers, Repository Manager

• Trusted repositories -> trusted data/files/annotations– Social networks, FOAF with ratings

• Preservation metrics

Page 39: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Preservation Metadata InteroperabilityPreservation Metadata Interoperability

PREMIS Ontology(OWL)

METS Profile 1 METS Profile 2MODS MPEG-21 DIDL

Network of heterogeneous digital archiving systems

Integrated Preservation Monitoring& Management System

Page 40: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Trusted RepositoriesTrusted Repositories

Objects– Data/files - genres– Services/software– Methodologies– Annotations/reviewsAgents– Organizations– Research Groups– Individuals

Page 41: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

Preservation Selection MetricsPreservation Selection Metrics• What is “high” quality data? - what are the significant attributes

– Accessibility – Accuracy, Completeness, Consistency– Reliability, Trustworthiness – reputation of source– Authenticity, certified – not tampered with– Provenance, metadata– Repeatable, reproducible, validation– Value-add – annotations, metadata– Re-use - citations – Uniqueness– Objectivity – unbiased– Relevance– Concise representation– Standards compliance– Currency, timeliness– Positive peer reviews, citations

• How to measure/assess quality– Which attributes are measurable? How to measure them?

Page 42: Putting the Shock back into the Future · Long Term Repositories 2006 Existing Tools • JHOVE, DROID - Metadata extraction tools • OCLC’s INFORM, Cornell’s VRC – risk assessment

ReferencesReferences

http://www.itee.uq.edu.au/~eResearchcontact: [email protected]