Preserving Email: The Nature of the Problem
Transcript of Preserving Email: The Nature of the Problem
![Page 1: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/1.jpg)
Preserving Email: The Nature of the Problem
Christopher J. Prom, Ph.D
Assistant University Archivist and
Associate Professor of Library Administration
Digital Preservation Coalition Briefing
Wellcome Collection Conference Center, London
July 29, 2011
![Page 2: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/2.jpg)
Googling. .
2
![Page 3: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/3.jpg)
A Twelve Step Plan?
3
![Page 4: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/4.jpg)
Step One
4
![Page 5: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/5.jpg)
Step Three . . .
5
• ;AS^T >S[Enter]
![Page 6: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/6.jpg)
Step Twelve?
• Having had a spiritual awakening as the result of these steps, we tried to carry this message to email‐holics, and to practice these principles in all our affairs.
6
![Page 7: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/7.jpg)
7
![Page 8: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/8.jpg)
Reason One: What Email Is• As technology it is a:
– Saturated– Interwoven– Commonplace– Malleable– Embedded . . .
• Utility, which• Leaves behind evidence. . .
8
![Page 9: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/9.jpg)
Email as Evidence
9
v.‐Man
![Page 10: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/10.jpg)
Reason two: Tech• Communicated information = A record• Interaction of Mail Transfer Agents and User Agents• Flexible/extendable headers, body, and content• MIME = Multipurpose Internet Mail Extensions• Embedded formats and references• What are the significant properties?
– http://www.significantproperties.org.uk/email‐testingreport.html
• No standard storage format for msgs or MIME– Many binary formats, styles, etc.– Where’s Wally?
10
![Page 11: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/11.jpg)
(Tech positives)
• Transmission standardization
• Move to server based storage and IMAP
• MBOX as quasi standard
• Ability to develop storage standard.
11
![Page 12: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/12.jpg)
Reason three: Legal context
• Incentives to keep email
• Incentives to destroy email
• Discovery rules—the wildcard, nation specific
12
![Page 13: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/13.jpg)
Reason four: Institutional Factors
• High cost
• Low (perceived) benefit to keep
• Risk management outlook
• How to winnow?
• Why bother?– Quoting an academic . . .
• Result: It’s all (usually) on the end user
13
![Page 14: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/14.jpg)
The present (and future?) of email preservation
14
![Page 15: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/15.jpg)
Policy: Does it work?• Typically addresses:
– Ownership, access rights, privacy– Quotas, storage, personal usage– Saving (where to), use of other accounts– Reference to other policies
• Minimal guidance• Bottom line: It does not work to change behavior, may help us design better systems
15
![Page 16: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/16.jpg)
Three current technical approaches
• Sweep up the crumbs– Guide the user
– migrate at . . . when exactly??
• Tag it and bag it– ERM‐driven approach
• Capture carbon . . .– and hope we can mine it)
16
![Page 17: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/17.jpg)
Sweeping it up: some brooms
• Mailstore home
• Read pst (command line tool)
• Xena
• Follow up: InSPECT report recommendations
17
![Page 18: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/18.jpg)
A Vacuum
18
![Page 19: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/19.jpg)
A few XML ‘dustpans’
• Java Apeture Library (XML RDF)
• Antwerp City Archives format
• Australian National Archives (XENA)
• PeDALS email extractor
19
![Page 20: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/20.jpg)
XML Account Schema• http://www.records.ncdcr.gov/emailpreservation/mail‐account/mail‐
account_docs.html
• Stores all email for single account
• Could be used as storage system for user agent
• Multiple options for handling unicode (embed or convert)
• Extensive text and MIME handline possiblities (leave as original, conert to binhex, save externally, etc)
• Extensible headers– <name> <value> pairs
• Could write custom format via Aid4Mail scripting
20
![Page 21: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/21.jpg)
Pick up crumbs: CERP Parser
• Email migration tools
21
![Page 22: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/22.jpg)
Email Account Schema Overview
22
![Page 23: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/23.jpg)
Classify It
• Alfresco White Paper: Total Cost of Ownership for Enterprise Content Management– http://blogs.alfresco.com/wp/democast/category/email‐archive/
• A corporate archivist’s perspective
• MeMail Project:– http://e‐records.chrisprom.com/?p=1965
23
![Page 24: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/24.jpg)
Carbon Capture• Auto blindcc• Email archiving software market• What it does
– Single instance storage
• Unknowns: – Cost (Forrester report)– format– ability to permanently preserve– access outside of existing infrastructure
24
![Page 25: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/25.jpg)
The Access Elephant
• Copyright/ Third Party IP
• Search, Discovery, Retreival
• Fedora and other repositories– Hydra Project. Need
• content models
• Deep search (Lucene Solr or similar)
• Front end (Blacklight)
25
![Page 26: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/26.jpg)
Sarah’s inbox: an access model?
26
![Page 27: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/27.jpg)
Two Fundamental Challenges• Building a research and development agenda:
– User behavior, policy, standards (build on InSPECTsignificant properties report)
• Building tools to acquire, preserve, and make email useful for long‐term (cyber‐infrastructure)– Capture, storage, conversion, metadata, access
• Making the case to funders and potential donors
27
Two Three Fundamental Challenges
![Page 28: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/28.jpg)
Personal ‘Archiving’• Cathy Marshall “Rethinking Personal Digital Archiving” – http://www.dlib.org/dlib/march08/marshall/03marshall‐pt1.html.
• http://www.thedigitalbeyond.com/• Lifestream concept (Eric Freeman and David Gelernter)
• Services:– Carbonite, Crashplan, Mozy, etc.– Backupify, Think Up (Gina Trapani)
28
![Page 29: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/29.jpg)
A Modest Proposal
• Provide the users (and institutions) something of value given their ‘piling’ behaviors– Backup Services, plus
– Think‐up like services, plus
– Trust, plus
– the ability to donate!
– http://www.iKive.com
• Investing users and funders in the problem?
29
![Page 30: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/30.jpg)
Questions and Discussion
30
![Page 31: Preserving Email: The Nature of the Problem](https://reader033.fdocuments.us/reader033/viewer/2022051915/6284fd0bfbf6e13dee4c6750/html5/thumbnails/31.jpg)
Preserving Email: The Nature of the Problem
Christopher J. Prom, Ph.D
Assistant University Archivist and
Associate Professor of Library Administration
Digital Preservation Coalition Briefing
Wellcome Collection Conference Center, London
July 29, 2011