Global Digital Format Registry (GDFR) - Harvard...
Transcript of Global Digital Format Registry (GDFR) - Harvard...
![Page 1: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of](https://reader036.fdocuments.us/reader036/viewer/2022092615/5b3384c97f8b9a32238b5637/html5/thumbnails/1.jpg)
Global Digital Format Registry (GDFR)Data Model v.4Rev. 2004-01-12
1 Introduction
The concept of format permeates all technical areas of digital preservation and repositories. Policy and processing decisions regarding ingest, storage, access, and preservation are frequently, if not uniformly, conditioned on a format-specific basis. The existence of a sustainable registry of authoritative representation information about digital formats has been identified as a crucial component of the research agenda for effective digital preservation [NSF-DELOS]. The DLF has sponsored a series of invitational workshops to investigate the technical and policy questions surrounding the establishment of a Global Digital Format Registry (GDFR).
2 Scope
The Global Digital Format Registry (GDFR) will maintain persistent, unambiguous bindings between public identifiers for digital formats and representation information for those formats.
3 Definitions
Format. A fixed, byte-serialized encoding of an information model. Information model. A formal expression of exchangeable knowledge [ISO 14721]. Representation information. Information that maps formatted content streams into more meaningful
concepts; in the narrower scope of GDFR, the significant syntactic and semantic properties of formats [ISO 14721].
4 Data Types
4.1 Primitive Data Types
ByteStream. A sequence of arbitrary octets. Enumeration. A set of unique values. Integer. An integer numeric value. String. A sequence of characters represented in the UTF-8 encoding [UTF-8].
4.2 Derived Data Types
Date. A time and date in the Gregorian calendar represented as an ISO 8601-encoded string [ISO 8601] as constrained by [Wolf].
Email. A SMTP email address represented as an RFC 2821-encoded string [SMTP]. MIME. A MIME media type represented as an RFC 2046-encoded string [MIME]. NonNegative. A non-negative integer, i.e., 0, 1, 2, … Telephone. A telephone number represented as an ITU-T E.164-encoded string [ITU E.164]. URI. A Universal Resource Identifier represented as an RFC 2396-encoded string [URI].
5 Data Model
All property attributes are defined in the data model in terms of their name, type, obligation, cardinality, and definition. Obligation is indicated as: 'M' for mandatory, 'MA' for mandatory-if-applicable, and 'O' for optional. Cardinality is indicated as 'R' for (arbitrarily) repeatable.
GDFR Data Model v.4 1
![Page 2: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of](https://reader036.fdocuments.us/reader036/viewer/2022092615/5b3384c97f8b9a32238b5637/html5/thumbnails/2.jpg)
5.1 Primitive Properties
AccessType Enumeration M Access type:
Escrow Inaccessible copy on fileLicense Access by license onlyOn-site On-site access onlyPublic Unrestricted accessRestricted No accessOther Requires informative note
Start Date O Starting dateEnd Date O Ending dateNote String MA R Informative noteLastModified Date M Modification date/timestamp
AgentName String M Personal or corporate name of agentType Enumeration M Agent type:
Commercial Commercial (for-profit) entityGovernment Governmental agencyEducation Educational institutionNon-profit Non-profit entityProfessional Professional organizationStandard Accredited standards bodyTrade Trade associationOther Requires informative note
Address String O Postal addressTelephone Telephone O Telephone numberFax Telephone O Facsimile numberEmail Email O Email addressWeb URI O Web siteNote String MA R Informative noteLastModified Date M Modification date/timestamp
ApplicationName String M Application nameVersion String M Version identifierRelease Date M Release dateVendor Agent O VendorProcess Process O R ProcessHWDependency
Platform O R Hardware dependency
SWDependency Application O R Software dependencyNote String O R Informative noteLastModified Date M Modification date/timestamp
AuthorityAgent Agent M Authority agentStart Date MA Starting date of effective authorityEnd Date MA Ending date of effective authorityNote String O R Informative noteLastModified Date M Modification date/timestamp
GDFR Data Model v.4 2
![Page 3: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of](https://reader036.fdocuments.us/reader036/viewer/2022092615/5b3384c97f8b9a32238b5637/html5/thumbnails/3.jpg)
ClassIdentifier Cognomen M Class identifierDescription String M DescriptionNote String O R Informative noteLastModified Date M Modification date/timestamp
CognomenValue String M Cognomen valueType Enumeration M Cognomen type:
AFNOR AFNOR standardANSI ANSI standardARK CDL Archival Resource KeyBSI BSI standardCCITT CCITT standardDDC Dewey Decimal ClassificationDOI Digital Object IdentifierECMA ECMA standardGDFRClass GDFR classification identifierGDFRFormat GDFR format identifierGDFRRegistry GDFR registry identifierHandle CNRI handleInformal No defined syntax or embedded semanticsISO ISO standardISBN International Standard Book NumberISSN International Standard Serial NumberITU ITU recommendationJEITA JEITA standardLCC Library of Congress ClassificationLCCN Library of Congress Control NumberMIME MIME media type [MIME]NISO NISO standardPII Publisher's Item Identification [PII]PURL Persistent URLRFC IETF Request for CommentSICI Serial Item and Contribution Identifier [SICI]TOM Typed Object Model identifierUUID/GUID Universally/globally-unique Identifier
[UUID]URI Uniform Resource Identifier [URI]URL Uniform Resource LocatorURN Uniform Resource Number [URN]Other Requires informative note
Note String MA R Informative noteLastModified Date M Modification date/timestamp
GDFR Data Model v.4 3
![Page 4: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of](https://reader036.fdocuments.us/reader036/viewer/2022092615/5b3384c97f8b9a32238b5637/html5/thumbnails/4.jpg)
DocumentTitle String M Document titleType Enumeration M Document type:
ArticleCorrespondenceManualMonographReportStandardThesisWebOther Requires informative note
Author Agent O R AuthorEdition String O EditionPublisher Agent O R PublisherDate Date O Publication dateAccessibility Access M R Access regimeIdentifier Cognomen O R IdentifierNote String MA R Informative noteLastModified Date M Modification date/timestamp
EventAgent Agent M Agent effecting the eventType Enumeration M Event type:
Delete Deletion of a formatInitial Initial registration of a formatObsolescence Declaration of format obsolescenceUpdate Update format representation informationOther Requires informative note
Scope Enumeration M Scope of the event:Editorial Non-substantive editorial changeTechnical Substantive technical change
Review Enumeration M Review type:Full Full technical reviewPartial Requires informative noteNone No review
Date Date M Date/timestampNote String O R Informative noteLastModified Date M Modification date/timestamp
InterfaceProtocol Enumeration M Interface protocol:
HTTP.NETRMI Remote method invocationSOAP Web ServiceOther Requires informative note
Connection String MA Protocol-specific connection parametersNote String O R Informative noteLastModified Date M Modification date/timestamp
GDFR Data Model v.4 4
![Page 5: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of](https://reader036.fdocuments.us/reader036/viewer/2022092615/5b3384c97f8b9a32238b5637/html5/thumbnails/5.jpg)
OntologyClass Class M Ontological classNote String O R Informative noteLastModified Date M Modification date/timestamp
PlatformName String M Platform nameVersion String M Version identifierRelease Date M Release dateVendor Agent O VendorNote String O R Informative noteLastModified Date M Modification date/timestamp
ProcessType Enumeration M Process type:
Create Create new instantiation of formatted objectRender Media type-specific rendering of formatted
objectTransformFrom Requires source auxiliary formatTransformTo Requires target auxiliary formatValidate Validation of formatted objectOther Requires informative note
Auxiliary Cognomen MA R Source or target format of transformationNote String O R Informative noteLastModified Date M Modification date/timestamp
RegistryIdentifier Cognomen M Registry identifierService Service M R Supported GDFR serviceLastHarvestedBy Date O Date/timestamp of last harvest by this registryLastHarvest Date O Date/timestamp of last harvest of this registryNote String O R Informative noteLastModified Date M Modification date/timestamp
RelationIdentifier Cognomen M Target format identifierRegistry Cognomen O Target registry identifierNote String O R Informative noteLastModified Date M Modification date/timestamp
ServiceType Enumeration M Service type:
Approval Technical reviewDescription Query for specific formatExport Bulk export of registry dataIntrospection Information about registry instanceMaintenance Maintain format representation informationNotificationSynchronization Distributed synchronization
Interface Interface M R Service interface
GDFR Data Model v.4 5
![Page 6: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of](https://reader036.fdocuments.us/reader036/viewer/2022092615/5b3384c97f8b9a32238b5637/html5/thumbnails/6.jpg)
Note String O R Informative noteLastModified Date M Modification date/timestamp
GDFR Data Model v.4 6
![Page 7: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of](https://reader036.fdocuments.us/reader036/viewer/2022092615/5b3384c97f8b9a32238b5637/html5/thumbnails/7.jpg)
SignatureValue ByteStream M Signature valueObligation Enumeration M Signature obligation:
MandatoryMandatoryIfApplicable Requires informative noteOptional
Note String MA R Informative noteLastModified Date M Modification date/timestamp
5.2 Derived Properties
Derived properties inherit all of the attributes of their parent.
ExternalSignature IS-A SignatureType Enumeration M External signature type:
Extension File extensionType Mac OS data typeOther Requires informative note
FormatRelation IS-A RelationType Enumeration M Format relation type:
EquivalentTo Equivalent to targetIsPreviousVersionOf Previous version of targetIsSubsequentVersionOf Subsequent version of targetIsSubtypeOf Subtype of targetIsSupertypeOf Supertype (parent) of targetMayContain May encapsulate targetUsedBy May be encapsulated by targetOther Requires informative note
InternalSignature IS-A SignaturePosition Enumeration M Signature position:
Fixed Fixed position; requires offsetArbitrary Arbitrary position
Offset NonNegative MA Byte offset
Person IS-A AgentTitle String O Personal titleAffiliation Agent O Organizational affiliation
5.3 Registry Properties
GDFR IS-A RegistryVersion String M Version identifier for registry code base and data modelDate Date M Build date for registry code base and data modelAegis Authority M R Responsible authorityExternalRegistry Registry O R Known external registryOntology Ontology M Ontological classification schemeFormat Format O R Format representation information
GDFR Data Model v.4 7
![Page 8: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of](https://reader036.fdocuments.us/reader036/viewer/2022092615/5b3384c97f8b9a32238b5637/html5/thumbnails/8.jpg)
5.4 Format Properties
FormatIdentifier Cognomen M Format canonical identifierDescription String M Short description of formatAlias Cognomen O R Variant identifierVersion String O Format version identifierAuthor Agent O R AuthorOwner Authority M R Legal ownerMaintainer Authority O R MaintainerClassification Cognomen O R Ontological classificationRelationship FormatRelation O R Typed relationship with other formatSpecification Document M R Specification documentSignature Signature O R External or internal signatureApplication Application O R Application system using formatProvenance Event M R Provenance eventNote String O R Informative noteLastModified Date M Modification date/timestamp
6. Identifiers
GDFR requires three types for identifiers: for ontological classifications, formats, and registries. If these identifiers are strictly for purposes of identification, i.e., no resolution is necessary, they should be defined in a registered gdfr namespace of the info URI scheme [INFO].
info:gdfr/c/classidinfo:gdfr/f/formatidinfo:gdfr/r/registryid
If resolution is desired, then the identifiers should be defined in a registered gdfr namespace of the URN scheme [URN]:
urn:gdfr:c:classidurn:gdfr:f:formatidurn:gdfr:r:registryid
References
[INFO] H. Van de Sompel, T. Hammond, E. Neylon, and S. L. Weibel, The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces, Internet draft, December 2003 <http://www.ietf.org/internet-drafts/draft-vandesompel-info-uri-01.txt>.
[ITU E.164] ITU-T E.164, The international public telecommunications numbering plan, May 1997.
[ISO 6093] ISO 6093:1985, Information processing – Representation of numerical values in character strings for information interchange.
[ISO 8601] ISO 8601:1997, Data elements and interchange formats – Information interchange – Representation of dates and times.
[ISO 11179] ISO/IEC 11179-3:2003, Information technology – Specification and standardization of data elements – Part 3: basic attributes of data elements.
[ISO 14721] ISO 14721:2003, Space data and information transfer systems – Open archival information system – Reference model <http://wwwclassic.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf>.
GDFR Data Model v.4 8
![Page 9: Global Digital Format Registry (GDFR) - Harvard Libraryhul.harvard.edu/gdfr/documents/DataModel-v4-2004-01-12.doc · Web viewThe concept of format permeates all technical areas of](https://reader036.fdocuments.us/reader036/viewer/2022092615/5b3384c97f8b9a32238b5637/html5/thumbnails/9.jpg)
[MIME] N. Freed and N. Borenstein, Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, RFC 2046, November 1996 <http://www.ietf.org/rfc/rfc2046.txt>.
[NSF-DELOS] M. Hedstrom, S. Ross, et al., Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation, 2003 <http://delos-noe.iei.pi.cnr.it/activities/internationalforum/Joint-WGs/ digitalarchiving/Digitalarchiving.pdf>.
[PII] Elsevier Science, Publisher Item Identifier as a means of document identification <http://www.elsevier.nl/ inca/homepage/about/pii>.
[SICI] ANSI/NISO Z39.56-1996, Serial Item and Contribution Identifier (SICI).
[URI] T. Berners-Lee, R. Fielding, and L. Masinter, Uniform Resource Identifiers (URI): Generic Syntax, RFC 2396, August 1998 <http://www.ietf.org/rfc/rfc2396.txt>.
[SMPTP] J. Klenson, Simple Mail Transfer Protocol, RFC 2281, April 2001 <http://www.ietf.org/rfc/rfc2821.txt>.
[UUID] ISO/IEC 11578:1996, Information technology – Open Systems Interconnection – Remote Procedure Call (RPC).
[URN] R. Moats, URN Syntax, RFC 2141, May 1997 <http://www.ietf.org/rfc/rfc2141.txt>.
[UTF-8] Unicode Consortium, The Unicode Standard, Version 3.0 (Reading: Addison-Wesley, 2000).
[Wolf] M. Wolfe and C. Wicksteed, Date and Time Formats, W3C Note, September 15, 1997 http://www.w3.org/ TR/NOTE-datetime.
GDFR Data Model v.4 9