highlights of smithsonian collections - Smithsonian Institution Archives
Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives
-
date post
19-Oct-2014 -
Category
Technology
-
view
2.563 -
download
0
description
Transcript of Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives
![Page 1: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/1.jpg)
Why Can’t I Read This File? Born-Digital Challenges
at the Smithsonian Institution Archives
Lynda Schmitz FuhrigMid-Atlantic Regional Archives Conference Fall 2011, Bethlehem, PA
![Page 2: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/2.jpg)
![Page 3: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/3.jpg)
Smithsonian Institution Archives’ Mission
• Appraise, acquire, and preserve• Offer a range of research and reference services• Create and promote products and services that broaden the understanding of the Smithsonian • Provide professional archival and conservation expertise
Above, a collection storage area for the Smithsonian Institution Archives, located on the third floor of Capital Gallery West. Upper left, in 1894 a room on the fourth floor, East Wing of the Smithsonian Institution Building, was converted for use as the Smithsonian Institution Archives.
![Page 4: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/4.jpg)
SI Archives Digital Services Division
• Curate and preserve born-digital collections
• Digitize images, video, and audio
• Research digital preservation issues
• Promote the archives through web and outreach
SIA Accession 11-124
![Page 5: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/5.jpg)
Born-digital records that document
the Smithsonian’s history• Text• Images• Drawings/CAD• Databases and spreadsheets• Audio• Video• Websites and social media• Email accountsMany part of mixed collection of paper and electronicRemovable media or server/ftp transfer
SIA Accession 11-281
![Page 6: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/6.jpg)
![Page 7: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/7.jpg)
SI Archives’ procedures
• Inspect media
• Virus scan
• Conduct transfer/ingest with checksums
• Make copy
• Analyze files for formats and issues
• Convert proprietary files to preservation formats
![Page 8: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/8.jpg)
Current preservation formatsMS Word/WordPerfect PDF/A or PDF
PowerPoint, Excel PDF/A or PDF
GIF, JPG, BMP, etc. TIF
Access databases SIARD XML
Audio WAV/BWF
Websites crawled and captured as WARC
Email saved to XML following CERP/EMCAP preservation schema
Born-digital video not straight-forward. Different options
Digitized video Motion JPG2000
![Page 9: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/9.jpg)
Tools for processing
• Open source and proprietary software• Jhove, Droid, FITS (FITS is also a format)• MediaInfo• In-house batch scripts• Duke Data Accessioner• Evaluating Curator’s Workbench• CERP (SIA-Rockefeller Archive Center) parser
![Page 10: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/10.jpg)
Files in disguise
• No extension – right click to open in Notepad to see coding, especially helpful with WordPerfect
• Wrong extension – .doc could be a Word or it could be WordPerfectBMP that is a JPG
• Complete unknowns that date back 20 years or more
Accession 10-052
![Page 11: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/11.jpg)
Older files
• Gerber • PCD (Kodak Photo CD)• EXE (Executables)
Gerber overlay, by AA7JC, Creative Commons: Attribution-NonCommercial-ShareAlike 2.0 Generic.
![Page 12: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/12.jpg)
DATs (Digital Audio Tapes)Transfer them now, if you can!
Machine production ended
Tapes susceptible to fungus, other problems
DAT recorded in 1990 for the Folk Masters radio program. SIA Accession 06-106
![Page 13: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/13.jpg)
It Says It Is
PDF/A
Accession 08-149
![Page 14: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/14.jpg)
![Page 15: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/15.jpg)
But It’s Not PDF/A
![Page 16: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/16.jpg)
Software incompatibility issues
![Page 17: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/17.jpg)
New formats/flavors/technologies
Geospatial PDF WWF – PDF that doesn’t print
Keep an eye on mobile sites/apps
3D scanning and printing - Point clouds
![Page 18: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/18.jpg)
Digital forensics
![Page 19: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/19.jpg)
Resources for formats
Sustainability of Digital Formats – Library of Congresshttp://www.digitalpreservation.gov/formats
Pronom – The National Archives in the UKhttp://www.nationalarchives.gov.uk/PRONOM/Default.aspx
Unified Digital Formats Registry – Expected date of operation 2012http://www.udfr.org/
FILExt – File Extension Sourcehttp://filext.com/
TrID – File Identifierhttp://mark0.net/soft-trid-e.html
![Page 20: Why Can’t I Read This File? Born-Digital Challenges at the Smithsonian Institution Archives](https://reader033.fdocuments.us/reader033/viewer/2022051816/54442aa6afaf9f9c098b4713/html5/thumbnails/20.jpg)
Lynda Schmitz FuhrigDigital Services [email protected]
Smithsonian Institution Archives website:http://siarchives.si.edu