Granular Archival and Nearline Storage Using MySQL, S3 and SQS Presentation
January 2006 Archival Storage Strategies and Technologies Presentation
-
Upload
aiim-golden-gate -
Category
Technology
-
view
688 -
download
1
description
Transcript of January 2006 Archival Storage Strategies and Technologies Presentation
Porter-Roth Associates 1
Archival Storage Strategies & Technologies
AIIM Presentation
January 25, 2006
Porter-Roth Associates 2
Bud Porter-RothPorter-Roth Associates
http://www.erms.com
Porter-Roth Associates 3
Agenda
IntroductionThe Preservation ProblemRecommendations
Porter-Roth Associates 4
Introduction
Basic NeedBasic Need
ComplianceCompliance
Disaster Recovery
Disaster Recovery
Porter-Roth Associates 5
Introduction
Flash DrivesFlash Drives
File SystemsFile Systems
ee--Mail ServersMail Servers
Local DrivesLocal Drives
WebWebServersServers
ImagingImagingRepositoriesRepositories
PaperPaperFilesFiles
Electronic Electronic Document Document RepositoriesRepositories
MicrofilmMicrofilm
BusinessBusinessSystemsSystems
Video LibrariesVideo Libraries
PhotographsPhotographs
PDAs
Porter-Roth Associates 6
The Preservation Problem
The problem is actually two separate, sort of unrelated issues:
Hardware and software to store and read documents
Hardware, OS, applications
The format that the documents are inWord, PDF, XML
Porter-Roth Associates 7
The Preservation Problem
The “Problem” in briefSoftware formats change and become non-supportedSoftware formats fall out of favor over time and disappearHardware drives change and become non-supportedStorage media changes overtime and becomes obsolete
Floppy disksOptical disks (WORM, CD, DVD)Tape (many flavors of)Portable storage media like the “Memory Stick” in use today
With all of the above issues, for digital documents, it means that there is a strong chance that you will be forced to convert something to something else over time – as a, in the foreseeable future, continuing process.
Porter-Roth Associates 8
The Preservation Problem
TIFF (Tagged Image File Format) usually with ITG Group 4 compressionJPEG (Joint Photographic Experts Group)GIF (Graphic Interchange Format)PNG (Portable Network Graphics)Native file formats (Word, Excel, etc) also known as “Born Digital” documentsPDF, PDF/A, PDF/XMany other proprietary electronic formatsPaperFilm
Porter-Roth Associates 9
The Preservation Problem
What is the best option for preserving electronic documents overarchival time spans? (Disregarding the hardware storage issues)
TIFF? A “digital picture” of your pageWidely adopted standard for document imagingNot human readable without the softwareNo access to underlying text without OCR
XML? A format description of the page – a style sheetGood for describing logical structure, but not appearanceMany incompatible domain-specific schemas
Native Format (e.g., MS Word)? Several ubiquitous, but closed proprietary formatsCan you spell WordPerfect?
PDF? PDF/A?
Microsoft Metro renamed XPS?
Porter-Roth Associates 10
Desirable Properties of a Format
Device independenceCan be reliably and consistently rendered without regard to the hardware/software platform
Self-containedContains all resources necessary for rendering
Self-documentingContains its own description
Transparency Amenable to direct analysis with basic tools
Porter-Roth Associates 11
Adobe PDF and PDF/A
PDF is a ubiquitous open format for electronic documents
Proprietary, but with publicly available specificationCompanies, other than Adobe, make PDF products
Many statutory, regulatory, and institutional policies mandate the retention of PDF-based documents over multiple generations of technology
The feature-rich nature of PDF can complicate preservation efforts
Porter-Roth Associates 12
PDF/A
PDF/A is intended to address three primary issues:
Define a file format that preserves the static visual appearance of electronic documents over time
Provide a framework for recording metadata about electronic documents
Provide a framework for defining the logical structure and semantic properties of electronic documents
Porter-Roth Associates 13
PDF/A
PDF/A constraints include:Audio and video content are forbidden Javascript and executable file launches are prohibitedAll fonts must be embedded and also must be legally embeddable for unlimited, universal renderingColorspaces specified in a device-independent mannerEncryption is disallowedUse of standards-based metadata is mandated
Porter-Roth Associates 14
PDF/A
However…PDF/A alone does not guarantee preservation
PDF/A alone does not guarantee exact replication of source material
The intent of PDF/A is not to claim that PDF-based solutions are the best way to preserve electronic documents
But once you have decided to use a PDF-based approach, PDF/A defines an archival profile of PDF that is more amenable to long-term preservation
Porter-Roth Associates 15
PDF/A ….Nevertheless
PDF/A may not be the last preservation format you will use or need
However, proper application of PDF/A should result in reliable, predictable, and unambiguous access to the full information content of electronic documents
Porter-Roth Associates 16
Microsoft XPS
XPS is an abbreviation for the XML Paper Specification The XML Paper Specification describes the XPS Document format. Adocument in XPS Document format (XPS Document) is a paginated representation of electronic paper described in an XML-based format. The XPS Document format is an open, cross-platform document format that allows customers to effortlessly create, share, print, and archive paginated documents.XPS Documents use a file container that conforms to the Open Packaging Conventions. The new file formats in the next version of the Microsoft Office System, codenamed Office "12," also use the Open Packaging Conventions for organizing data into files, allowing businesses to be able to manage Office "12" documents and XPS Documents in the same manner.The XPS Document format is both a fixed-layout document interchange format, a native Windows Vista spool file format, and a PDL (Page Description Language, used by printing devices). http://www.microsoft.com/whdc/xps/default.mspx
Porter-Roth Associates 17
Recommendations
This is still a wild frontier, with no certain outcome or single standard “The good thing about standards is that there are so many of them….”When in doubt about long-term storage of vital documents, paper or film is still a good answerBeware of new technologies, even ones that are “standards”TIFF, JPEG, PDF, PDF/A are recommended.The weight of in-place document formats will mean that change will be very slow and may stop change unless a dramatic “out of the blue” technology appears
Porter-Roth Associates 18
Conclusion & Questions
Finally!
Questions?Questions?