Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst....
-
Upload
marjorie-watkins -
Category
Documents
-
view
223 -
download
5
Transcript of Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst....
![Page 1: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/1.jpg)
Summer Institute on Data Curation:
Digital Preservation& Standards
Jerome McDonoughAsst. Professor, GSLIS
June 4, 2008
![Page 2: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/2.jpg)
I love standards. There are so many of them to choose from.
![Page 3: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/3.jpg)
Standards & Sustainability
Disclosure: Are complete specifications available? For free?
Adoption: To what extent is the standard already used?
Documentation: Is the specification clear and straightforward? Are there additional resources to assist in understanding the standard?
![Page 4: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/4.jpg)
Standards & Sustainability
External Dependencies: To what extent does use of the standard rely on particular hardware or software? On other standards? On other non-standards?
Impact of Patents: If patents cover some or all of the standard, are licensing issues likely to complicate use of the standard?
Technological Protection Measures: Does the standard rely on technological protection measures which will inhibit your ability to preserve data?
Tip of the hat to Library of Congress Sustainability Of Digital Formats Sitehttp://www.digitalpreservation.gov/formats/sustain/sustain.shtml
![Page 5: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/5.jpg)
Part I:How to Operate an
Archive
![Page 6: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/6.jpg)
Open Archival Information System
Reference ModelDeveloped by the Consultative Committee For Space Data Systems
Adopted as ISO 14721:2003Available at http://public.ccsds.org/publications/archive/650x0b1.pdf
Provides definitions of components of an archive, their relationship to each other, a set of mandatory responsibilities for an archive, and both functional and data models.
![Page 7: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/7.jpg)
OAIS Reference Model:Mandatory
Responsibilities Negotiate for an accept information from producers Obtain sufficient control of information to ensure long-
term preservation (including necessary IP permissions and authority to migrate)
Determine which communities should be the Designated Communities and should be able to understand the information provided
Ensure that the information to be preserved is independently understandable to the designated community (i.e., they can understand it without the assistance of experts who created it ).
Follow documented policies and procedures ensuring information is preserved against reasonable contingencies
Make the information available to the designated community
![Page 8: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/8.jpg)
OAIS Functional Model
![Page 9: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/9.jpg)
OAIS Functional Model: Ingest
![Page 10: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/10.jpg)
OAIS Functional Model: Archival Storage
![Page 11: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/11.jpg)
OAIS Functional Model:Data Management
![Page 12: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/12.jpg)
OAIS Functional Model: Access
![Page 13: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/13.jpg)
OAIS Functional Model:Preservation Planning
![Page 14: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/14.jpg)
OAIS Functional Model: Administration
![Page 15: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/15.jpg)
OAIS Data Model
![Page 16: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/16.jpg)
Part II:How to Create Content
for an Archive
![Page 17: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/17.jpg)
Archival Content
A syllogism to ponder:No digital media can be read without a hardware device designed to read the media format.
It is exceedingly rare for a hardware device intended to read a specific digital media format to be manufactured for more than 30 years, and many have had shorter lifespans.
Therefore, if your content is not device independent, it is not really archival.
![Page 18: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/18.jpg)
Archival Content: Text
Some Issues to Consider When Examining Text StandardsTechnical aspects of character encoding
Character Repertoire (Script & Language Support)
Line Break Handling & Line OrientationIndexingFormattingOther processing
![Page 19: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/19.jpg)
Archival Content: Text
A Standard for CharactersUnicode 5.1 - ISO/IEC 10646
Two variable length encodings (UTF-8, UTF-16) and a fixed length encoding (UTF-32). In UTF-8, byte order is not an issue. In UTF-16 and UTF-32, big-endian and little-endian encodings are supported.
Over 100K characters, supporting 75 different scripts and many additional symbols and diacritics, with room for expansion to 1,114,112 characters.
Support for a variety of line breaking mechanismsSupport for different text directionality, including algorithms specifying the appropriate handling of text of mixed directionality
![Page 20: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/20.jpg)
Archival Content: Text
A Standard for Syntax XML (World Wide Web Consortium)
Standards for Semantics Chemical Markup Language, Chemical Industry Data Exchange
Astronomical Markup Language, Astronomical Dataset Markup Language, Astronomical Instrument Markup Language
Earth Science Markup Language, Geography Markup Language, NetCDF Markup Language, ArcGIS Markup Language
MathML Etc., etc., etc….
![Page 21: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/21.jpg)
Archival Content: Images
Some Issues to Consider When Examining Image StandardsColor DepthColor SpaceColor ManagementImage Resolution ScalabilityCompression
![Page 22: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/22.jpg)
Archival Content: Images
Tagged Image File Format (TIFF) 6.0 -- 1 to 64-bit color depth, supports grayscale, RGB, YCbCr, CMYK and CIELab color spaces, supports embedded ICC color profiles, raster format, supports uncompressed as well as lossless and lossy DCT-based compression
JPEG 2000 (ISO/IEC 15444) -- 1-48 bits per channel with multiple channels (including alpha & transparency), supports wide array of color spaces with sRGB and sYCC as defaults, supports ICC color profiles, raster format, supports uncompressed as well as lossless and lossy wavelet based compression
Scalable Vector Graphics 1.2 -- uses sRGB color spaces, supports ICC Color Profiles, vector format
![Page 23: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/23.jpg)
Archival Content: Audio/Video
Some Issues to Consider when Examining Audio/Video StandardsAudio sampling rateAudio bit depthVideo frame rateVideo color space/depthCompression
Good News: Audio/Video is a bit more standardized than text/image world
Bad News: Lossless digital audio is rare; lossless digital video is almost nonexistent.
![Page 24: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/24.jpg)
Archival Content:Audio/Video
Broadcast WAVE Audio (EBU Standard N22 - 1997)
For video, picture is less clear. Proprietary solutions dominate market. Many of these (e.g., QuickTime, WMV) do support lossless image frame and audio data. MXF, a SMPTE standard, is gaining some traction in digital library circles (and the movie industry)
![Page 25: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/25.jpg)
Archival Content: Data
Some disciplinary de facto standards (e.g., Chemical Markup Language). Cover Pages (http://xml.coverpages.org) is a good source for information on many of the major ones.
No single standard for general use for data encoding, although many contenders
![Page 26: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/26.jpg)
Archival Content: Data
Binary Format Description Language (BDFL) -- XML language based on the Extensible Scientific Interchange Language (XSIL) that supports documentation of binary and ASCII data
eXtensible Data Format (XDF) -- scientific data format supporting hierarchical data structures, N-dimensional arrays, scalar and vector fields, user-defined coordinate systems
![Page 27: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/27.jpg)
Archival Content: Data
Data Format Description Language (DFDL) -- A language for describing the structure or binary and character encoded data to expose their structure, format and metadata so that machine processes can work upon them.
Data Documentation Initiative (DDI) -- An effort by the ICPSR at Univ. of Michigan to develop an XML format for documenting social science data sets. XML files can be used to produce either bibliographic descriptions of data sets or SAS/SPSS/STATA data definition statements.
![Page 28: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/28.jpg)
Archival Content: Data
Hierarchical Data Format (HDF5) -- General purpose file format (with supporting software library) for storing scientific data, developed by NCSA. Uses two fundamental structures, groups and data sets, where a data set is an N-dimensional array of data elements with metadata.
![Page 29: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/29.jpg)
Archival Content: Paper
ANSI/NISO Z39.48-1992, Permanence of Paper for Publications and Documents in Libraries and Archives
ISO 9706-1994, Information and documentation -- Paper for documents -- Requirements for permanence
![Page 30: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/30.jpg)
Part III:How to Create Metadata
for an Archive
![Page 31: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/31.jpg)
Metadata: Identifiers
Persistence is important, but…Clarity on what is being identified may be more important (or, why an OpenURL is not a call number).
Standards proliferate in this space; choice of any identifier may depend on:Social concerns (for whom am I identifying something?)
Identifier/address resolution (how do I find a copy/item using this identifier?)
![Page 32: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/32.jpg)
Metadata: Structural
Metadata intended to identify the components of an object and their relationship to each other in order to support the object’s navigation and use
Metadata Encoding & Transmission Standard (METS)
MPEG-21 Digital Item Declaration Language
XML Formatted Data Units (XFDU)OAI-ORE
![Page 33: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/33.jpg)
Metadata: Provenance
Metadata documenting the origins and life-cycle of a digital object
PREMIS Data Dictionary for Preservation Metadata 2.0Joint project of OCLC & RLGDefines metadata element set that “supports the viability, renderability, understandability, authenticity and integrity of digital objects in a preservation context.”
![Page 34: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/34.jpg)
Metadata: Provenance
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
The PREMIS Data Model
![Page 35: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/35.jpg)
Metadata: Provenance
PREMIS Object Metadata:IdentifierCategoryPreservation LevelSignificant PropertiesCharacteristics (fixity, size, format, etc.)Original NameStorageEnvironmentSignatureRelationships to other Objects, Events, Rights
![Page 36: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/36.jpg)
Metadata: Provenance
PREMIS Event MetadataIdentifierTypeDate & TimeDetailsOutcomeRelationship to Agents and Objects
![Page 37: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/37.jpg)
Metadata: Provenance
PREMIS Agent MetadataIdentifierNameType
![Page 38: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/38.jpg)
Metadata: Provenance
PREMIS Rights MetadataRights StatementRights BasisCopyright InformationLicense InformationStatute InformationRights GrantedRelationship to Objects and Agents
![Page 39: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/39.jpg)
Metadata: Administrative
Technical MetadataZ39.87 and MIXTechnical Metadata for Text (TextMD)AES-X098 Administrative Metadata for Audio Objects
SMPTE RP210.10-2007 Metadata Dictionary
Rights MetadataStandards, yes. That you want to use, no.
![Page 40: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/40.jpg)
Metadata: Descriptive
Issues to consider:Nature of object to be describedReal purpose(s) of descriptionCommunity(ies) that will utilize description
Supporting standards of descriptive practice and controlled vocabularies
![Page 41: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/41.jpg)
Metadata: Descriptive
Library/Archives/Museums/EducatorsMARC, MODS, Dublin CoreEADVRA Core, CDWAIEEE LOM
Data RepositoriesData Documentation InitiativeContent Standard for Digital Geospatial Metadata
Darwin CoreAccess to Biological Collection Data (ABCD)
![Page 42: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/42.jpg)
How to Evaluate an Archive
![Page 43: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/43.jpg)
Evaluating Archives
Trustworthy Repositories Audit & Certification (TRAC) Criteria & Checklisthttp://www.crl.edu/content.asp?l1=13&l2=58&l3=162&l4=91
Digital Repository Audit Method Based on Risk Assessment (DRAMBORA)http://www.repositoryaudit.eu/
![Page 44: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/44.jpg)
Exercise: URLs
Imageshttp://people.lis.uiuc.edu/~jmcdonou/Bryce.tif
http://people.lis.uiuc.edu/~jmcdonou/Bryce.jp2
![Page 45: Summer Institute on Data Curation: Digital Preservation & Standards Jerome McDonough Asst. Professor, GSLIS June 4, 2008.](https://reader036.fdocuments.us/reader036/viewer/2022062321/56649e805503460f94b846ee/html5/thumbnails/45.jpg)
Exercise: URLs
METS Schema, Documentation, Namespace http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/standards/mets/docs/mets.v1-7.html http://www.loc.gov/METS/
PREMIS Schema, Documentation, Namespace http://www.loc.gov/standards/premis/v1/PREMIS-v1-1.xsd
http://www.loc.gov/standards/premis/ http://www.loc.gov/standards/premis/v1
MIX Schema, Documentation, Namespace http://www.loc.gov/standards/mix/mix20/mix20.xsd http://www.niso.org/kst/reports/standards?step=2&gid=
None&project_key=b897b0cf3e2ee526252d9f830207b3cc9f3b6c2c
http://www.loc.gov/mix/v20