Working with metadata in digital archives

17
Working with metadata in digital archives Erpanet Metadata in Digital Preservation Marburg, 3-5 September 2003 Bill Roberts [email protected] Tessella Support Services plc 3 Vineyard Chambers Abingdon OX14 3PX United Kingdom www.tessella.com

description

Working with metadata in digital archives. Erpanet Metadata in Digital Preservation Marburg, 3-5 September 2003 Bill Roberts [email protected] Tessella Support Services plc 3 Vineyard Chambers Abingdon OX14 3PX United Kingdom www.tessella.com. Metadata functions. Edit. Import. - PowerPoint PPT Presentation

Transcript of Working with metadata in digital archives

Page 1: Working with metadata in digital archives

Working with metadata in digital archives

Erpanet Metadata in Digital PreservationMarburg, 3-5 September 2003

Bill [email protected]

Tessella Support Services plc3 Vineyard ChambersAbingdon OX14 3PXUnited Kingdom

www.tessella.com

Page 2: Working with metadata in digital archives

Metadata functions

Collect

Store

Import Search

Export

View

Edit

Page 3: Working with metadata in digital archives

Collect metadata (1)

Some must be manual – assist user, prevent mistakes

Avoid duplication – record hierarchiesautomation in user environment

(business process, workflow etc.) automatic analysis of file properties processing history (virus checking

results etc.)

Page 4: Working with metadata in digital archives

Collect metadata (2)

UK National Archives Digital Archive – Stellent “OutsideIn”

analyses file to determine type could also form part of approach

to extract metadata from content

Page 5: Working with metadata in digital archives

Collect metadata (3)

Pfizer Central Electronic ArchiveSmall metadata setAutomatic collection of metadata

Software agents on user serversPossible to do moreImprove ease of useImprove accuracy

Pfizer aiming to simplify provenance metadata

Page 6: Working with metadata in digital archives

Import metadata (1)

Transfer format – XML link metadata to files during

transfer virus checking, file format

analysis etc.Maintain loose coupling between

components of system – agreed interfaces

Page 7: Working with metadata in digital archives

Import metadata (2)

Efficiency – large transfers XML can be expensive to process speed memory – DOM can be 20 times

larger than XML file

Page 8: Working with metadata in digital archives

Storage - requirements

don’t lose it! maintain links between

metadata, records and files find what you are looking for retrieve

Page 9: Working with metadata in digital archives

Storage approaches

encapsulation vs. ease of access volume of data speed of searching vs. speed of

import/export typically metadata in database

and files on file server

Page 10: Working with metadata in digital archives

The National Archives (UK) Digital Archive approach

Relational database for metadata, file server for computer files

Metadata stored as XML documents in database

A few key elements stored in tables and indexed (unique identifier, PROCAT reference)

Links between records, files, accessions, metadata managed in database

Subset of metadata identified as searchable – values extracted into text based index

File contents not currently searchable

Page 11: Working with metadata in digital archives

UK Digital Archive (2)

record and file metadata kept separately flexible relationship between records and

computer files Unlimited depth of record hierarchy (records

can contain sub-records) metadata imported/exported as XML so

easier/quicker to store as XML designed for ease of extension to metadata

(disadvantage of extracting metadata into database tables)

<GSMElement name=“Title”> rather than <Title>

Page 12: Working with metadata in digital archives

Alternatives

VERS approach: metadata and content files encapsulated together within XML file

+ve: record is self-contained +ve: well-suited to use of digital signatures

on both metadata and content -ve: more denormalisation required for

access -ve: complexity of adding to or editing

metadata -ve: if file is needed for more than one

record, must be duplicated

Page 13: Working with metadata in digital archives

Interoperability

Not much experience in practice so farXML helps - but not much!Likely to be similar but not identical

schemasDifferent implementations of same

schemaShort term: ad hoc mapping between

schemas for specific systemsLonger term: various initiatives, but

standardisation and semantics-based approaches are difficult

Page 14: Working with metadata in digital archives

Extending or changing the schema

Schema may (will!) change in future

No “one size fits all” approachTNA plans for extensions to core

metadata according to file type and according to function

Version control

Page 15: Working with metadata in digital archives

Preservation metadata

Maintain ability to understand and authentically reproduce content files

PRONOM system – separate database for file formats/accessibility

KB preservation layer model approach

Technology watch

Page 16: Working with metadata in digital archives

Authentication/Integrity

Digital signatures – has something changed? (also simpler hashing algorithms)

Digital signatures – who signed it?Control accessAudit logs

Page 17: Working with metadata in digital archives

Conclusions

Digital preservation is still a young discipline, so “best” approach not always clear

Do something! Learn from experience

Design for flexibility/replaceability – records must outlive any implementation