Download - Robert Sharpe, Tessella PRELIDA Workshop 2013 ENSURE Linked Data Registry.

Transcript

Robert Sharpe, TessellaPRELIDA Workshop 2013

ENSURE Linked Data Registry

Agenda

• Archives, libraries and representation information• Previous “technical registries”:

– Potted History– Issues

• ENSURE linked data technical registry:– What’s different?– Why we hope it should succeed?

• Conclusions and feedback…

Archives, libraries & representation information

• Hold descriptive / cataloguing information for centuries:– Helps determine context and makes things unambiguous:– E.g., census records

• Frequency, type of information• Professions• Parish boundaries

– Includes references to other sources / archives

• A “representation information network” of “linked data”

• With advent of digital material:– Need information on formats, rendering software etc.– Look to add “Technical Registry”

Technical Registries: Potted History 1/2

• PRONOM:– Started in 2001– On-line from 2005– “File format registry”– In fact, holds more…

• Planets Core Registry (2008)– Holds even more entities

• Both:– Database–based– Web-based GUI

• Issues:– Partially populated – Hard to add new entities– Hard to synchronise

Technical Registries: Potted History 2/2

• Move to linked data:– Linked Data PRONOM– UDFR– …

• Issues:– Partially populated– Hard to add new entities– Partial projects: enough to be used?– Hard for people to query: SPARQL but not via simple GUI– Complex provenance

What’s different?

• ENSURE Linked Data Technical Registry:

– Less entities: more population:• Expand later

– Start with synchronise issue– Good querying and user interface:

• Human Search / Browse• Human View / Edit

– Simple view of provenance– Long term commitment:

• Will integrate with SDB/Preservica• 20+ organisations will use it

Data Model

• Keep it simple:– Things actually used– Things actually populated– Add more if and when needed

• Format:– ID, Name, Version, Description– Release Date, Withdrawn Date– Internal Signature, External Signature– Relationships

• Not:– Assessments, Risk scores– Documents, Reference files, Agents– Intellectual Property– Technical Environments– XCDL, XCEL– Types, Faceting– Complex provenance

Islands of Information / Synchronise pkg Class Mo...

Format

+ Format

+ FormatExternalSignature

+ FormatInternalSignature

+ FormatRelationship

+ FormatRelationshipType

+ InternalSignature

+ InternalSignatureByteSequence

ComponentProperty

+ ComponentProperty

+ ComponentType

+ ComponentTypeProperty

+ SingleFileComponentType

Cost

+ FormatToolStatistics

+ ProcessingCost

+ ServerType

+ StorageCost

+ ToolStatistics

FileInstanceProperty

+ FileInstanceProperty

+ FormatInstanceProperty

MigrationPathway

+ ManifestationType

+ MigrationPathway

+ MigrationPathwayStep

+ MigrationPathwayType

Policy

+ CharacterisationToolApplicabilityPolicy

+ CharcterisationToolApplicabil ityParameter

+ CollectionType

+ DeliverableUnitType

+ MigrationPathwayPolicy

+ MigrationPathwayStepToolParameter

+ MigrationPathwayStepValidation

+ Policy

+ StoragePolicy

Software

+ CharacterisationToolApplicability

+ CharacterisationToolPurposeType

+ Software

+ Tool

+ ToolParameter

StorageMedia

+ StorageSystem

Maintained by UK National Archives

Maintained by Tessellato describe capabil ites of the software

Maintained by host organisation to maintain local configuration

Allow view / edit

• Needs to be simple and user friendly • Not clear it can then expand with model w/o effort?

Provenance

• Blocks of information:– Format, Software, Property, Pathway

• Who made change to format, when and based on what info?

• Need provenance of block not each item– Store every change:

• Rollback• Diff

• In fact makes synchronise easy:– Receive update and detect change

Conclusions

• Simple, Usable• Synchronised (as needed)• Provenance held (simply)• Expandable (with limited but not zero effort)

• Being built now• Should be complete by December• Will be integrated to working repository and thus used• Will need to iterate from there…

• Comments and ideas welcome