CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
Robert Sharpe, Tessella PRELIDA Workshop 2013 ENSURE Linked Data Registry.
-
Upload
zoe-shepherd -
Category
Documents
-
view
213 -
download
0
Transcript of Robert Sharpe, Tessella PRELIDA Workshop 2013 ENSURE Linked Data Registry.
Agenda
• Archives, libraries and representation information• Previous “technical registries”:
– Potted History– Issues
• ENSURE linked data technical registry:– What’s different?– Why we hope it should succeed?
• Conclusions and feedback…
Archives, libraries & representation information
• Hold descriptive / cataloguing information for centuries:– Helps determine context and makes things unambiguous:– E.g., census records
• Frequency, type of information• Professions• Parish boundaries
– Includes references to other sources / archives
• A “representation information network” of “linked data”
• With advent of digital material:– Need information on formats, rendering software etc.– Look to add “Technical Registry”
Technical Registries: Potted History 1/2
• PRONOM:– Started in 2001– On-line from 2005– “File format registry”– In fact, holds more…
• Planets Core Registry (2008)– Holds even more entities
• Both:– Database–based– Web-based GUI
• Issues:– Partially populated – Hard to add new entities– Hard to synchronise
Technical Registries: Potted History 2/2
• Move to linked data:– Linked Data PRONOM– UDFR– …
• Issues:– Partially populated– Hard to add new entities– Partial projects: enough to be used?– Hard for people to query: SPARQL but not via simple GUI– Complex provenance
What’s different?
• ENSURE Linked Data Technical Registry:
– Less entities: more population:• Expand later
– Start with synchronise issue– Good querying and user interface:
• Human Search / Browse• Human View / Edit
– Simple view of provenance– Long term commitment:
• Will integrate with SDB/Preservica• 20+ organisations will use it
Data Model
• Keep it simple:– Things actually used– Things actually populated– Add more if and when needed
• Format:– ID, Name, Version, Description– Release Date, Withdrawn Date– Internal Signature, External Signature– Relationships
• Not:– Assessments, Risk scores– Documents, Reference files, Agents– Intellectual Property– Technical Environments– XCDL, XCEL– Types, Faceting– Complex provenance
Islands of Information / Synchronise pkg Class Mo...
Format
+ Format
+ FormatExternalSignature
+ FormatInternalSignature
+ FormatRelationship
+ FormatRelationshipType
+ InternalSignature
+ InternalSignatureByteSequence
ComponentProperty
+ ComponentProperty
+ ComponentType
+ ComponentTypeProperty
+ SingleFileComponentType
Cost
+ FormatToolStatistics
+ ProcessingCost
+ ServerType
+ StorageCost
+ ToolStatistics
FileInstanceProperty
+ FileInstanceProperty
+ FormatInstanceProperty
MigrationPathway
+ ManifestationType
+ MigrationPathway
+ MigrationPathwayStep
+ MigrationPathwayType
Policy
+ CharacterisationToolApplicabilityPolicy
+ CharcterisationToolApplicabil ityParameter
+ CollectionType
+ DeliverableUnitType
+ MigrationPathwayPolicy
+ MigrationPathwayStepToolParameter
+ MigrationPathwayStepValidation
+ Policy
+ StoragePolicy
Software
+ CharacterisationToolApplicability
+ CharacterisationToolPurposeType
+ Software
+ Tool
+ ToolParameter
StorageMedia
+ StorageSystem
Maintained by UK National Archives
Maintained by Tessellato describe capabil ites of the software
Maintained by host organisation to maintain local configuration
Allow view / edit
• Needs to be simple and user friendly • Not clear it can then expand with model w/o effort?
Provenance
• Blocks of information:– Format, Software, Property, Pathway
• Who made change to format, when and based on what info?
• Need provenance of block not each item– Store every change:
• Rollback• Diff
• In fact makes synchronise easy:– Receive update and detect change