Robert Sharpe, Tessella PRELIDA Workshop 2013 ENSURE Linked Data Registry.

11
Robert Sharpe, Tessella PRELIDA Workshop 2013 ENSURE Linked Data Registry

Transcript of Robert Sharpe, Tessella PRELIDA Workshop 2013 ENSURE Linked Data Registry.

Robert Sharpe, TessellaPRELIDA Workshop 2013

ENSURE Linked Data Registry

Agenda

• Archives, libraries and representation information• Previous “technical registries”:

– Potted History– Issues

• ENSURE linked data technical registry:– What’s different?– Why we hope it should succeed?

• Conclusions and feedback…

Archives, libraries & representation information

• Hold descriptive / cataloguing information for centuries:– Helps determine context and makes things unambiguous:– E.g., census records

• Frequency, type of information• Professions• Parish boundaries

– Includes references to other sources / archives

• A “representation information network” of “linked data”

• With advent of digital material:– Need information on formats, rendering software etc.– Look to add “Technical Registry”

Technical Registries: Potted History 1/2

• PRONOM:– Started in 2001– On-line from 2005– “File format registry”– In fact, holds more…

• Planets Core Registry (2008)– Holds even more entities

• Both:– Database–based– Web-based GUI

• Issues:– Partially populated – Hard to add new entities– Hard to synchronise

Technical Registries: Potted History 2/2

• Move to linked data:– Linked Data PRONOM– UDFR– …

• Issues:– Partially populated– Hard to add new entities– Partial projects: enough to be used?– Hard for people to query: SPARQL but not via simple GUI– Complex provenance

What’s different?

• ENSURE Linked Data Technical Registry:

– Less entities: more population:• Expand later

– Start with synchronise issue– Good querying and user interface:

• Human Search / Browse• Human View / Edit

– Simple view of provenance– Long term commitment:

• Will integrate with SDB/Preservica• 20+ organisations will use it

Data Model

• Keep it simple:– Things actually used– Things actually populated– Add more if and when needed

• Format:– ID, Name, Version, Description– Release Date, Withdrawn Date– Internal Signature, External Signature– Relationships

• Not:– Assessments, Risk scores– Documents, Reference files, Agents– Intellectual Property– Technical Environments– XCDL, XCEL– Types, Faceting– Complex provenance

Islands of Information / Synchronise pkg Class Mo...

Format

+ Format

+ FormatExternalSignature

+ FormatInternalSignature

+ FormatRelationship

+ FormatRelationshipType

+ InternalSignature

+ InternalSignatureByteSequence

ComponentProperty

+ ComponentProperty

+ ComponentType

+ ComponentTypeProperty

+ SingleFileComponentType

Cost

+ FormatToolStatistics

+ ProcessingCost

+ ServerType

+ StorageCost

+ ToolStatistics

FileInstanceProperty

+ FileInstanceProperty

+ FormatInstanceProperty

MigrationPathway

+ ManifestationType

+ MigrationPathway

+ MigrationPathwayStep

+ MigrationPathwayType

Policy

+ CharacterisationToolApplicabilityPolicy

+ CharcterisationToolApplicabil ityParameter

+ CollectionType

+ DeliverableUnitType

+ MigrationPathwayPolicy

+ MigrationPathwayStepToolParameter

+ MigrationPathwayStepValidation

+ Policy

+ StoragePolicy

Software

+ CharacterisationToolApplicability

+ CharacterisationToolPurposeType

+ Software

+ Tool

+ ToolParameter

StorageMedia

+ StorageSystem

Maintained by UK National Archives

Maintained by Tessellato describe capabil ites of the software

Maintained by host organisation to maintain local configuration

Allow view / edit

• Needs to be simple and user friendly • Not clear it can then expand with model w/o effort?

Provenance

• Blocks of information:– Format, Software, Property, Pathway

• Who made change to format, when and based on what info?

• Need provenance of block not each item– Store every change:

• Rollback• Diff

• In fact makes synchronise easy:– Receive update and detect change

Conclusions

• Simple, Usable• Synchronised (as needed)• Provenance held (simply)• Expandable (with limited but not zero effort)

• Being built now• Should be complete by December• Will be integrated to working repository and thus used• Will need to iterate from there…

• Comments and ideas welcome