D.3.1: State of the Art - Linked Data and Digital Preservation
-
Upload
prelida-project -
Category
Technology
-
view
135 -
download
4
description
Transcript of D.3.1: State of the Art - Linked Data and Digital Preservation
![Page 1: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/1.jpg)
State of the ArtSUMMARY OF D3.1 STATE OF THE ART
D GIARETTA
![Page 2: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/2.jpg)
Outline Preservation – State of the Art Challenges for Linked Data Options Conclusions
![Page 3: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/3.jpg)
EC policy – a brief history – a personal view
EC support for DP research for creating digital objects
Data Digitisation
e-Infrastructure to Digital Agenda
National funding Significantly more than EC funding What is the EC role?
![Page 4: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/4.jpg)
DP research: approx 100M€ from EC
From Research on Digital Preservation within projects co-funded by the European Union in the ICT programme, 2011, Stephan Strodl et al http://cordis.europa.eu/fp7/ict/creativity/report-research-digital-preservation_en.pdf
![Page 5: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/5.jpg)
Situation now
The digital preservation community has failed in persuading the EC that there is need for more funding for DP research◦We do not have a consistent story about:◦ Costs◦ Rights◦ Methods etc◦ “Emulate or Migrate” inadequate!◦ Who is doing it right
Luxembourg unit which previously funded DP research – name changed to “Creativity” - now shows no funding for digital preservation research
EC expects results from the previous 100 M € research by deploying solutions
![Page 6: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/6.jpg)
Digital Preservation – some quotes: Head of unit funding the Digital Preservation projects asked repeatedly:◦“Who pays and why?”
NSF colleague:◦“Digital preservation is like VAT – people don’t
like it”
![Page 7: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/7.jpg)
Value pyramid
From Riding the Wave
![Page 8: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/8.jpg)
“The Digital Agenda for Europe outlines policies and actions to maximise the benefit of the digital revolution for all. Supporting research and innovation is a key priority of the Agenda, essential if we want to establish a flourishing digital economy.”
Neelie Kroes,
Vice-President of the EC, responsible for the Digital Agenda
Data is the new gold.“We have a huge goldmine… Let’s start mining it.”Neelie Kroes
That is the magic to find value amid the mass of data. The right infrastructure, the right networks, the right computing capacity and, last but not least, the right analysis methods and algorithms help us break through the mountains of rock to find the gold within.
![Page 9: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/9.jpg)
……but
Gold is precious because ◦it is rare ◦it does not combine with other elements◦it does not perish
……..but……….
Data is valuable because ◦there is so much of it◦it is more valuable when it is combined together◦BUT it is far from imperishable
Role for Linked Data
![Page 10: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/10.jpg)
OR
![Page 11: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/11.jpg)
Preservation – State of the Art
![Page 12: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/12.jpg)
Problems when preserving data
Preserve?
Preserve what?
For how long?
How to test?
Which people?
Which organisations?
How well?
• Metadata? – What kind? How much?
![Page 13: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/13.jpg)
Difficulties in digital preservation
Many different terminologies
Many different views of preservation
Many different kinds of digital objects◦ Documents◦ Data◦…… and new types of objects
Tools and Services◦Which ones work for which digital objects?◦Which tools/techniques fit together?◦ How to integrate new tools
Consistent training needed
Risks vs Cost
Who can you trust?
}Need a consistent, coherent approach to digital preservation- APARSEN.
Need an Audit and Certification system – ISO 16363
OAIS – ISO 14721
![Page 14: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/14.jpg)
Preservation techniquesFor each technique
look for evidence – what evidence?
must at least make sure we consider different types of data◦rendered vs non-rendered◦composite vs simple◦dynamic vs static◦active vs passive
must look at all types of threats
![Page 15: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/15.jpg)
Basic preservation activities
Libraries say:
“Emulate or migrate”
◦Works well with data only in special cases◦ Can repeat what was done before instead of new things
◦ Does not help with building cross-disciplinary communities
• Can repeat what has been done before
BUT• Cannot use new applications
• Convert to format which new software can use
BUT• What if there are many
software systems?
![Page 16: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/16.jpg)
Contains numbers – need meaning
16
![Page 17: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/17.jpg)
...to be combined and processed to get this
17
Level 2 Level 0 Level 1
ProcessingProcessing/c
ombining
![Page 18: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/18.jpg)
...or this
18
![Page 19: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/19.jpg)
OAIS Information model: Representation Information
The Information Model is keyRecursion ends at
KNOWLEDGEBASE of the DESIGNATED COMMUNITY
(this knowledge will change over time and region)
Does not demand that ALL Representation Information be collected at once.
A process which can be tested
![Page 20: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/20.jpg)
FITS FILE
FITS DICTIONARYFITS
STANDARD
PDF SOFTWAREJAVA VM
PDF STANDARD
FITS JAVA SOFTWARE
DICTIONARY SPECIFICATION
XML SPECIFICATION
UNICODE SPECIFICATION
Rep Info Network
![Page 21: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/21.jpg)
Additional technique: add Representation Information
Descriptions of the digitally encoded objectIdeal description allows a machine to extract information
![Page 22: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/22.jpg)
Migration
OAIS defines various types of Migration:◦Do not change the bits ◦Refresh◦Replicate
◦Change the packaging but not the content◦Repackage
◦Change the content◦Transform (usually non-reversible)◦Need to consider “Transformational Information Properties” – important for
AUTHENTICITY◦Related to “Significant properties”◦Add appropriate Representation Information for the new format
22
![Page 23: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/23.jpg)
AND – be prepared toHand-over
Preservation requires funding Funding for a dataset (or a repository) may stop Need to be ready to hand over everything needed for preservation◦OAIS (ISO 14721) defines “Archival Information Package
(AIP).◦Issues:◦ Storage naming conventions◦Representation Information ◦ Provenance◦ ….
![Page 24: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/24.jpg)
Preserving digitally encoded information
Ensure that digitally encoded information are understandable and usable over the long term Long term could start at just a few years Chain of preservation
Need to do something because things become “unfamiliar” over timeBut the same techniques enable use of data which is “unfamiliar” right now
![Page 25: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/25.jpg)
When things changes We need to:
◦Know something has changed
◦ Identify the implications of that change
◦Decide on the best course of action for preservation
◦What RepInfo we need to fill the gaps
◦ Created by someone else or creating a new one
◦ If transformed: how to maintain data authenticity
◦Alternatively: hand it over to another repository
◦Make sure data continues to be usable
Orchestration Service
Gap Identification Service
Preservation Strategy Tk
RepInfo Registry Service
Authenticity Toolkit
Packaging Tk
Data Virtualisation Toolkit
Process Virtualisation Toolkit
RepInfo Toolkit
![Page 26: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/26.jpg)
SCIDIP-ES
Storage Service
Gap Identification
Service
Orchestration Service
RepInfo Registry Service
Preservation Strategy Toolkit
Data Virtualisation
Toolkit
Process Virtualisation
Toolkit
Authenticity Toolkit
Packaging Toolkit
RepInfo Toolkit
Finding Aid
Toolkit
Cloud Storage
External Access/Use
Services
Persistent ID i/f Service
External PI
services
ISO Certification Organisation
Certification Toolkit
Services: run on remote servers
Toolkits Runs on local machines
• These SUPPLEMENT what repositories do (customised for repositories)
• Make it easier for repositories to do preservation – share the effort
![Page 27: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/27.jpg)
![Page 28: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/28.jpg)
Preservation objectives The same digital object may be preserved with different aims in mind by different repositories:For a digital document
Re-print the pages?To understand the numbers printed in the page to
do further research
For a piece of performance artReplay a recording of a particular performance?Re-perform the work?
For a scientific data fileUnderstand the numbers?Understand the numbers in the context of a
particular theory?
![Page 29: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/29.jpg)
Preservation, Value and Re-use
(re-)usability the essential test for success of preservation◦ Usability usually essential for justifying cost of preservation
Impossible to insist on common formats, semantics or software◦ How to avoid N2 problem?
Impossible to know what formats, semantics or software will be used in future
Needs appropriate Representation Information ◦ for preservation (use in the future when things have become unfamiliar)◦ for use now (use of unfamiliar data i.e. most of it!)◦ automated (re-)use as far as possible
APARSEN is bringing together a coherent, consistent, evidence-based approach to digital preservation involving tools, services, consultancy and training.
![Page 30: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/30.jpg)
Classification of objects
must at least make sure we consider different types of data◦rendered vs non-rendered◦composite vs simple◦dynamic vs static◦Active vs passive
RDF Triple: dynamic/complex/non-rendered/passive
![Page 31: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/31.jpg)
Key questions about the what is to be preservedWhat is the object to be preserved?The specific piece of RDF?The specific RDF plus data pointed toThe underlying database (if any)? The whole linked “world”?
What are the preservation objectives?The RDF and whole inference system?Just the RDF?Just the underlying database (if any)?
![Page 32: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/32.jpg)
Key questions about RDF
What Representation information is needed for the LD?Schema?Additional semantics?Evolution of links e.g. replace this host by a new one)?Snapshots?
What Transformation?One version of RDF to another?Move to replacement for RDF?Change of underlying database?Authenticity??
Who to hand over toWhat to do with the URIs? – maintain or change?What to do with the underlying database (if any)?
![Page 33: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/33.jpg)
Key questions about the things the RDF points toWill they be preserved?How to find the Representation Information?Will the Persistent Identifiers change?
![Page 34: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/34.jpg)
Joint Key QuestionsWho will pay, and why?
For which things?
Are some things more valuable – and therefore more likely to be preserved?What happens when some things disappear?
![Page 35: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/35.jpg)
OptionsBe clear about what is meantUnderstand what is possibleStart with what is agreed as valuableDon’t promise too much
![Page 36: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/36.jpg)
Input to standardsSee http://www.iso16363.org
Audit and Certification of Trustworthy repositoriesForum: OAIS Futures
![Page 37: D.3.1: State of the Art - Linked Data and Digital Preservation](https://reader033.fdocuments.us/reader033/viewer/2022052907/5593ed3e1a28ab583b8b45a0/html5/thumbnails/37.jpg)
ConclusionsA great deal of funding (€100M) has been invested in digital preservation research by the EU
EC is not putting further funding into digital preservation research
There are technical challenges
The biggest challenge is to be clear about what the preservation aims are for Linked Data