Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of...

Post on 03-Jan-2016

214 views 1 download

Tags:

Transcript of Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of...

Implementation of PREMIS in METS

Rebecca GuentherSr. Networking & Standards Specialist, Library of Congressrgue@loc.gov

PREMIS Implementation FairSan Francisco, CAOctober 7, 2009

METS records the (possibly hierarchical) structure of digital objects, the names and locations of the files that comprise those objects, and the associated metadata

A METS document may be a unit of storage (e.g. OAIS AIP) or a transmission format (e.g. OAIS SIP or DIP)

METS is extensible and modular METS uses the XML Schema facility for combining

vocabularies from different Namespaces The METS Editorial Board has endorsed PREMIS as an

extension schema Many institutions trying to use PREMIS within the METS

context

Structure of a METS file

ArchivalInformation

Package

DescriptiveInformation

ContentInformation

described by

derived from

delimited by

identifies

further described by

RepresentationInformation

DataObject

Semantics

ProvenanceInformation

ReferenceInformation

FixityInformation

ContextInformation

PreservationDescriptionInformation

PackagingInformation

Structure described by

<dmdSec>

<fileGrp>

<techMD>

<METS>

<digiProvMD><sourceMD>premis:event<techMD>

<structMap>

MODSMARCXML

DC

premis:object

metsRightspremis:rights

<rightsMD>

File formats premis:objecttextMD

MIX

<file>

<amdSec>

OAIS, METS and PREMIS

Legend

Black Arial = OAISRed Times New Roman = METS Primary SchemaBlue Times New Roman Italics = Extension Schema

<mdRef>

METS extension schemas

“wrappers” or “sockets” where elements from other schemas can be plugged in

Provides extensibility Uses the XML Schema facility for combining vocabularies from

different Namespaces Endorsed extension schemas:

• Descriptive: MODS, DC, MARCXML• Technical metadata: MIX (image); textMD (text)• Preservation related: PREMIS

Why do we need guidelines for using PREMIS with METS?

Contents of each information package may vary depending on its function within a repository

Need to determine how to include representation metadata and associate it with package components

PREMIS data entities (objects, events, rights, agents) do not map perfectly to METS categories for representation metadata (techMD, digiProvMD, rightsMD, sourceMD)

There are redundant elements between the two standards Both have extensibility mechanisms Flexibility of both standards requires implementation

choices

Development of Guidelines for Using PREMIS with METS for Exchange PREMIS in METS Guidelines Working Group

• Consists of PREMIS and METS experts• Focuses on the METS document as a mechanism of exchange

of digital objects and their metadata (SIP or DIP)• Facilitates communication when internal requirements and

technical environments vary Tension between flexibility and being prescriptive to facilitate

interoperability• Consider usage scenarios• If a SIP it may get unwrapped and stored in different structures• If a DIP it is converted from internal structures to PREMIS• A more liberal approach is possible for a SIP than a DIP

Establishing guidelines, a METS profile, and exampleshttp://www.loc.gov/standards/premis/guidelines-premismets.pdf

Implementation issues in using PREMIS with METS

Location of PREMIS metadata within METS documents Whether to record elements redundantly if they occur in

both PREMIS and METS Relationship of different structural metadata mechanisms in

PREMIS and METS How to record PREMIS Agent entities in METS documents Use of identifiers to link elements in PREMIS and METS How to record elements that are also part of a format

specific technical metadata schema (e.g. MIX)

Some recommendations from Guidelines

METS sections• Use Object in techMD or digiProvMD• Use Event in digiProvMD• Use Rights in rightsMD• Use Agent in digiProvMD or rightsMD

PREMIS Container -- use only if keeping all PREMIS metadata together. Do not use if separating PREMIS metadata into different amdSec subelements

PREMIS and METS redundancies -- Choosing which options to use is an implementation decision, document in profile e.g. METS <size> element attributes and subelements of <objectCharacteristics> in PREMIS

Recommendations (cont.)

Structural relationship elements -- use the METS structMap to record structural relationships, use PREMIS relationship elements to record preservation and derivation relationships and structural if desired

ID/IDREF and PREMIS identifier elements -- use METS ID/IDREF mechanisms, best practices for using these ID/IDREF mechanisms apply

Use PREMIS extensibility mechanism for format specific technical metadata

Document decisions in METS profiles

<fileSec><fileGrp><file ID="FID1" SIZE="184302" ADMID="TMD1PREMIS TMD1MIX DP1EVENT

DP1AGENT“ CHECKSUM="4638bc65c5b9715557d09ad373eefd147382ecbf" CHECKSUMTYPE="SHA-1">

<FLocat LOCTYPE="OTHER" xlink:href="BXF22.JPG" /></file></fileGrp></fileSec><techMD ID="TMD1PREMIS"> <mdWrap MDTYPE="PREMIS"> <xmlData>

<premis:object > <objectCharacteristics> <fixity> <messageDigestAlgorithm>SHA-1 </messageDigestAlgorithm> <messageDigest>4638bc65c5b9715557d09ad373eefd147382ecbf 

</messageDigest> <messageDigestOriginator>EchoDep/messageDigestOriginator> </fixity> <size>184302</size> </objectCharacteristics>

Elements defined in both METS and PREMIS:• METS: Checksum, Checksumtype

• attribute of <file>• not repeatable

PREMIS: fixity• also includes messageDigestOriginator• allows multiples

<fileSec><fileGrp><file ID="FID1" ADMID="TMD1PREMIS DP1EVENT DP1AGENT“

MIMETYPE="image/jpeg" <FLocat LOCTYPE="OTHER" xlink:href="BXF22.JPG"/></file></fileGrp></fileSec>

<techMD ID="TMD1PREMIS“ <mdWrap MDTYPE="PREMIS"> <xmlData> <premis:object> <objectCharacteristics> <format> <formatDesignation> <formatName>image/jpeg</formatName>  <formatVersion>1.02 </formatVersion> </formatDesignation></format> </objectCharacteristics>Elements defined both in METS and PREMIS:• METS: MIMETYPE

• attribute of <file>• optional

PREMIS: <format> • more granular; includes name and version (although name may be MIMETYPE)• mandatory

<fileSec> <fileGrp> <file ID="FID1" ADMID="TMD1PREMIS TMD1MIX DP1EVENT DP1AGENT"><techMD ID="TMD1PREMIS"> <linkingEventIdentifier> <linkingEventIdentifierType>ECHODEP Hub Event </linkingEventIdentifierType> <linkingEventIdentifierValue>echo12345</linkingEventIdentifierValue> </linkingEventIdentifier><digiprovMD ID="DP1EVENT">  <premis:event> <eventIdentifier> <eventIdentifierType>ECHODEP Hub Event</eventIdentifierType> <eventIdentifierValue>echo12345 </eventIdentifierValue> </eventIdentifier> <eventType>ingestion</eventType> <eventDateTime>2006-05-02T15:12:53 </eventDateTime></event>

Elements defined both in METS and PREMIS METS ID/Idref: used to associate metadata in different sections and for different

files PREMIS identifiers: explicit linking between entity types

<structMap TYPE=“physical”> <div ORDER="1" TYPE="text"> <:fptr FILEID="FID9"/> <div ORDER="1" TYPE="page" LABEL=" Page [1]"> <fptr FILEID="FID1"/></mets:div> <div ORDER="2" TYPE="page" LABEL=" Page [2]"> <fptr FILEID="FID2"/></mets:div> </div>

<relationship> <relationshipType>structural</relationshipType> <relationshipSubType>is sibling of </relationshipSubType> <relatedObjectIdentification> <relatedObjectIdentifierType>UCB</relatedObjectIdentifierType> <relatedObjectIdentifierValue>FID2</relatedObjectIdentifierValue> <relatedObjectSequence>1</relatedObjectSequence>

Elements defined both in METS and PREMIS: METS: structMap

• details structural relationships and is the heart of the METS document• hierarchical, so may be more expressive than PREMIS semantic units• links the elements of the structure to content files and metadata

PREMIS: <relationship> • details all kinds of relationships, including structural• data dictionary says that implementations may record by other means

Some METS profiles with PREMIS

UCSD simple and complex object UC Berkeley ECHO Dep Generic METS Profile for Preservation and Digital

Repository Interoperability LC Profile for Recorded Events Australian METS Profile TIPR … many others

Additional changes to Guidelines

Make extensibility mechanism consistent with METS• significantPropertiesExtension• objectCharacteristicsExtension• creatingApplicationExtension• environmentExtension• signatureInformationExtension• eventOutcomeDetailExtension• rightsExtension

Additional changes to Guidelines (cont.)

Add the same elements and attributes as in METS to PREMIS extension elements in schema and data dictionary• mdRef, mdWrap• binData, xmlData• Attributes: ID, LABEL, MDTYPE, MIMETYPE, SIZE,

CREATED, CHECKSUM, CHECKSUMTYPE

Allow URI or string for MDTYPE Add use cases/examples to illustrate choices made Clarify structural relationships

Implementing an Exchange Standard

PREMIS Implementation Tool• Some tools documented on the PREMIS website

http://www.loc.gov/standards/premis/tools_for_premis.php

• PiM tool developed by Florida Center for Library Automation

• Further work to generate metadata from digital files in PREMIS elements