Global Digital Format Registry (GDFR) - Harvard...

12
Global Digital Format Registry (GDFR) Data Model v.4 Rev. 2004-01-12 1 Introduction The concept of format permeates all technical areas of digital preservation and repositories. Policy and processing decisions regarding ingest, storage, access, and preservation are frequently, if not uniformly, conditioned on a format-specific basis. The existence of a sustainable registry of authoritative representation information about digital formats has been identified as a crucial component of the research agenda for effective digital preservation [NSF-DELOS]. The DLF has sponsored a series of invitational workshops to investigate the technical and policy questions surrounding the establishment of a Global Digital Format Registry (GDFR). 2 Scope The Global Digital Format Registry (GDFR) will maintain persistent, unambiguous bindings between public identifiers for digital formats and representation information for those formats. 3 Definitions Format. A fixed, byte-serialized encoding of an information model. Information model. A formal expression of exchangeable knowledge [ISO 14721]. Representation information. Information that maps formatted content streams into more meaningful concepts; in the narrower scope of GDFR, the significant syntactic and semantic properties of formats [ISO 14721]. 4 Data Types 4.1 Primitive Data Types ByteStream. A sequence of arbitrary octets. Enumeration. A set of unique values. Integer. An integer numeric value. String. A sequence of characters represented in the UTF-8 encoding [UTF- 8]. 4.2 Derived Data Types Date. A time and date in the Gregorian calendar represented as an ISO 8601-encoded string [ISO 8601] as constrained by [Wolf] . Email. A SMTP email address represented as an RFC 2821-encoded string [SMTP]. GDFR Data Model v.4 1

Transcript of Global Digital Format Registry (GDFR) - Harvard...

Page 1: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of

Global Digital Format Registry (GDFR)Data Model v.4Rev. 2004-01-12

1 Introduction

The concept of format permeates all technical areas of digital preservation and repositories. Policy and processing decisions regarding ingest, storage, access, and preservation are frequently, if not uniformly, conditioned on a format-specific basis. The existence of a sustainable registry of authoritative representation information about digital formats has been identified as a crucial component of the research agenda for effective digital preservation [NSF-DELOS]. The DLF has sponsored a series of invitational workshops to investigate the technical and policy questions surrounding the establishment of a Global Digital Format Registry (GDFR).

2 Scope

The Global Digital Format Registry (GDFR) will maintain persistent, unambiguous bindings between public identifiers for digital formats and representation information for those formats.

3 Definitions

Format. A fixed, byte-serialized encoding of an information model. Information model. A formal expression of exchangeable knowledge [ISO 14721]. Representation information. Information that maps formatted content streams into more meaningful

concepts; in the narrower scope of GDFR, the significant syntactic and semantic properties of formats [ISO 14721].

4 Data Types

4.1 Primitive Data Types

ByteStream. A sequence of arbitrary octets. Enumeration. A set of unique values. Integer. An integer numeric value. String. A sequence of characters represented in the UTF-8 encoding [UTF-8].

4.2 Derived Data Types

Date. A time and date in the Gregorian calendar represented as an ISO 8601-encoded string [ISO 8601] as constrained by [Wolf].

Email. A SMTP email address represented as an RFC 2821-encoded string [SMTP]. MIME. A MIME media type represented as an RFC 2046-encoded string [MIME]. NonNegative. A non-negative integer, i.e., 0, 1, 2, … Telephone. A telephone number represented as an ITU-T E.164-encoded string [ITU E.164]. URI. A Universal Resource Identifier represented as an RFC 2396-encoded string [URI].

5 Data Model

All property attributes are defined in the data model in terms of their name, type, obligation, cardinality, and definition. Obligation is indicated as: 'M' for mandatory, 'MA' for mandatory-if-applicable, and 'O' for optional. Cardinality is indicated as 'R' for (arbitrarily) repeatable.

GDFR Data Model v.4 1

Page 2: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of

5.1 Primitive Properties

AccessType Enumeration M Access type:

Escrow Inaccessible copy on fileLicense Access by license onlyOn-site On-site access onlyPublic Unrestricted accessRestricted No accessOther Requires informative note

Start Date O Starting dateEnd Date O Ending dateNote String MA R Informative noteLastModified Date M Modification date/timestamp

AgentName String M Personal or corporate name of agentType Enumeration M Agent type:

Commercial Commercial (for-profit) entityGovernment Governmental agencyEducation Educational institutionNon-profit Non-profit entityProfessional Professional organizationStandard Accredited standards bodyTrade Trade associationOther Requires informative note

Address String O Postal addressTelephone Telephone O Telephone numberFax Telephone O Facsimile numberEmail Email O Email addressWeb URI O Web siteNote String MA R Informative noteLastModified Date M Modification date/timestamp

ApplicationName String M Application nameVersion String M Version identifierRelease Date M Release dateVendor Agent O VendorProcess Process O R ProcessHWDependency

Platform O R Hardware dependency

SWDependency Application O R Software dependencyNote String O R Informative noteLastModified Date M Modification date/timestamp

AuthorityAgent Agent M Authority agentStart Date MA Starting date of effective authorityEnd Date MA Ending date of effective authorityNote String O R Informative noteLastModified Date M Modification date/timestamp

GDFR Data Model v.4 2

Page 3: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of

ClassIdentifier Cognomen M Class identifierDescription String M DescriptionNote String O R Informative noteLastModified Date M Modification date/timestamp

CognomenValue String M Cognomen valueType Enumeration M Cognomen type:

AFNOR AFNOR standardANSI ANSI standardARK CDL Archival Resource KeyBSI BSI standardCCITT CCITT standardDDC Dewey Decimal ClassificationDOI Digital Object IdentifierECMA ECMA standardGDFRClass GDFR classification identifierGDFRFormat GDFR format identifierGDFRRegistry GDFR registry identifierHandle CNRI handleInformal No defined syntax or embedded semanticsISO ISO standardISBN International Standard Book NumberISSN International Standard Serial NumberITU ITU recommendationJEITA JEITA standardLCC Library of Congress ClassificationLCCN Library of Congress Control NumberMIME MIME media type [MIME]NISO NISO standardPII Publisher's Item Identification [PII]PURL Persistent URLRFC IETF Request for CommentSICI Serial Item and Contribution Identifier [SICI]TOM Typed Object Model identifierUUID/GUID Universally/globally-unique Identifier

[UUID]URI Uniform Resource Identifier [URI]URL Uniform Resource LocatorURN Uniform Resource Number [URN]Other Requires informative note

Note String MA R Informative noteLastModified Date M Modification date/timestamp

GDFR Data Model v.4 3

Page 4: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of

DocumentTitle String M Document titleType Enumeration M Document type:

ArticleCorrespondenceManualMonographReportStandardThesisWebOther Requires informative note

Author Agent O R AuthorEdition String O EditionPublisher Agent O R PublisherDate Date O Publication dateAccessibility Access M R Access regimeIdentifier Cognomen O R IdentifierNote String MA R Informative noteLastModified Date M Modification date/timestamp

EventAgent Agent M Agent effecting the eventType Enumeration M Event type:

Delete Deletion of a formatInitial Initial registration of a formatObsolescence Declaration of format obsolescenceUpdate Update format representation informationOther Requires informative note

Scope Enumeration M Scope of the event:Editorial Non-substantive editorial changeTechnical Substantive technical change

Review Enumeration M Review type:Full Full technical reviewPartial Requires informative noteNone No review

Date Date M Date/timestampNote String O R Informative noteLastModified Date M Modification date/timestamp

InterfaceProtocol Enumeration M Interface protocol:

HTTP.NETRMI Remote method invocationSOAP Web ServiceOther Requires informative note

Connection String MA Protocol-specific connection parametersNote String O R Informative noteLastModified Date M Modification date/timestamp

GDFR Data Model v.4 4

Page 5: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of

OntologyClass Class M Ontological classNote String O R Informative noteLastModified Date M Modification date/timestamp

PlatformName String M Platform nameVersion String M Version identifierRelease Date M Release dateVendor Agent O VendorNote String O R Informative noteLastModified Date M Modification date/timestamp

ProcessType Enumeration M Process type:

Create Create new instantiation of formatted objectRender Media type-specific rendering of formatted

objectTransformFrom Requires source auxiliary formatTransformTo Requires target auxiliary formatValidate Validation of formatted objectOther Requires informative note

Auxiliary Cognomen MA R Source or target format of transformationNote String O R Informative noteLastModified Date M Modification date/timestamp

RegistryIdentifier Cognomen M Registry identifierService Service M R Supported GDFR serviceLastHarvestedBy Date O Date/timestamp of last harvest by this registryLastHarvest Date O Date/timestamp of last harvest of this registryNote String O R Informative noteLastModified Date M Modification date/timestamp

RelationIdentifier Cognomen M Target format identifierRegistry Cognomen O Target registry identifierNote String O R Informative noteLastModified Date M Modification date/timestamp

ServiceType Enumeration M Service type:

Approval Technical reviewDescription Query for specific formatExport Bulk export of registry dataIntrospection Information about registry instanceMaintenance Maintain format representation informationNotificationSynchronization Distributed synchronization

Interface Interface M R Service interface

GDFR Data Model v.4 5

Page 6: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of

Note String O R Informative noteLastModified Date M Modification date/timestamp

GDFR Data Model v.4 6

Page 7: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of

SignatureValue ByteStream M Signature valueObligation Enumeration M Signature obligation:

MandatoryMandatoryIfApplicable Requires informative noteOptional

Note String MA R Informative noteLastModified Date M Modification date/timestamp

5.2 Derived Properties

Derived properties inherit all of the attributes of their parent.

ExternalSignature IS-A SignatureType Enumeration M External signature type:

Extension File extensionType Mac OS data typeOther Requires informative note

FormatRelation IS-A RelationType Enumeration M Format relation type:

EquivalentTo Equivalent to targetIsPreviousVersionOf Previous version of targetIsSubsequentVersionOf Subsequent version of targetIsSubtypeOf Subtype of targetIsSupertypeOf Supertype (parent) of targetMayContain May encapsulate targetUsedBy May be encapsulated by targetOther Requires informative note

InternalSignature IS-A SignaturePosition Enumeration M Signature position:

Fixed Fixed position; requires offsetArbitrary Arbitrary position

Offset NonNegative MA Byte offset

Person IS-A AgentTitle String O Personal titleAffiliation Agent O Organizational affiliation

5.3 Registry Properties

GDFR IS-A RegistryVersion String M Version identifier for registry code base and data modelDate Date M Build date for registry code base and data modelAegis Authority M R Responsible authorityExternalRegistry Registry O R Known external registryOntology Ontology M Ontological classification schemeFormat Format O R Format representation information

GDFR Data Model v.4 7

Page 8: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of

5.4 Format Properties

FormatIdentifier Cognomen M Format canonical identifierDescription String M Short description of formatAlias Cognomen O R Variant identifierVersion String O Format version identifierAuthor Agent O R AuthorOwner Authority M R Legal ownerMaintainer Authority O R MaintainerClassification Cognomen O R Ontological classificationRelationship FormatRelation O R Typed relationship with other formatSpecification Document M R Specification documentSignature Signature O R External or internal signatureApplication Application O R Application system using formatProvenance Event M R Provenance eventNote String O R Informative noteLastModified Date M Modification date/timestamp

6. Identifiers

GDFR requires three types for identifiers: for ontological classifications, formats, and registries. If these identifiers are strictly for purposes of identification, i.e., no resolution is necessary, they should be defined in a registered gdfr namespace of the info URI scheme [INFO].

info:gdfr/c/classidinfo:gdfr/f/formatidinfo:gdfr/r/registryid

If resolution is desired, then the identifiers should be defined in a registered gdfr namespace of the URN scheme [URN]:

urn:gdfr:c:classidurn:gdfr:f:formatidurn:gdfr:r:registryid

References

[INFO] H. Van de Sompel, T. Hammond, E. Neylon, and S. L. Weibel, The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces, Internet draft, December 2003 <http://www.ietf.org/internet-drafts/draft-vandesompel-info-uri-01.txt>.

[ITU E.164] ITU-T E.164, The international public telecommunications numbering plan, May 1997.

[ISO 6093] ISO 6093:1985, Information processing – Representation of numerical values in character strings for information interchange.

[ISO 8601] ISO 8601:1997, Data elements and interchange formats – Information interchange – Representation of dates and times.

[ISO 11179] ISO/IEC 11179-3:2003, Information technology – Specification and standardization of data elements – Part 3: basic attributes of data elements.

[ISO 14721] ISO 14721:2003, Space data and information transfer systems – Open archival information system – Reference model <http://wwwclassic.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf>.

GDFR Data Model v.4 8

Page 9: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of

[MIME] N. Freed and N. Borenstein, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, RFC 2046, November 1996 <http://www.ietf.org/rfc/rfc2046.txt>.

[NSF-DELOS] M. Hedstrom, S. Ross, et al., Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation, 2003 <http://delos-noe.iei.pi.cnr.it/activities/internationalforum/Joint-WGs/ digitalarchiving/Digitalarchiving.pdf>.

[PII] Elsevier Science, Publisher Item Identifier as a means of document identification <http://www.elsevier.nl/ inca/homepage/about/pii>.

[SICI] ANSI/NISO Z39.56-1996, Serial Item and Contribution Identifier (SICI).

[URI] T. Berners-Lee, R. Fielding, and L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, RFC 2396, August 1998 <http://www.ietf.org/rfc/rfc2396.txt>.

[SMPTP] J. Klenson, Simple Mail Transfer Protocol, RFC 2281, April 2001 <http://www.ietf.org/rfc/rfc2821.txt>.

[UUID] ISO/IEC 11578:1996, Information technology – Open Systems Interconnection – Remote Procedure Call (RPC).

[URN] R. Moats, URN Syntax, RFC 2141, May 1997 <http://www.ietf.org/rfc/rfc2141.txt>.

[UTF-8] Unicode Consortium, The Unicode Standard, Version 3.0 (Reading: Addison-Wesley, 2000).

[Wolf] M. Wolfe and C. Wicksteed, Date and Time Formats, W3C Note, September 15, 1997 http://www.w3.org/ TR/NOTE-datetime.

GDFR Data Model v.4 9