Practical Approaches to Electronic Records Management and Preservation
Preservation of Electronic Mail
description
Transcript of Preservation of Electronic Mail
Preservation of Electronic Mail
Druscie SimpsonNC State Archives
November 19, 2004
E-mail: The Digital Divide Also
Multiplies
E-mail as a Burden The Radicati Group and Merrill Lynch estimate that
email is growing at a rate of 300% annually. The Age (July 8, 2003)
The real problem: not more email, but “larger and larger attachments, generating an average of 5MB of email content” daily. The Age (July 8, 2003)
Email generates about 400,000 terabytes of new information each year worldwide
About 31 billion emails are sent daily, on the Internet and elsewhere, a figure which is expected to double by 2006 (source: International Data Corporation (IDC). The average email is about 59 kilobytes in size, thus the annual flow of emails worldwide is 667,585 terabytes. (How Much Information 2003, UC Berkeley)
What do I do with ALL that e‑mail?!
Why are we so interested in E‑Mail and Digital Records?
Email’s far reaching effects
Loss of Corporate Knowledge
Imagine you’re new in the office. All of the information to do your job was on your computer. Your predecessor deleted the information before leaving or it was password protected. You don’t have the password.
Legal Implications If it is in an email and
it sent from, received by, or is stored on a government computer, it is a legal record
Never put anything in an e-mail you don’t want on the front page of the local paper.
Always CYO cover your office.)
Users have several options for keeping their saved e-mails: They may leave it on the mail provider’s
server They may leave it on a web-based mail
server such as Hotmail or Yahoo They may store it in their e-mail client
such as Outlook, Eudora, Netscape They may store it on the file system of
their PC as individual .eml files (MS Outlook Express Electronic Mail)
In each of these circumstances the actual byte stream used to represent the e-mail message is slightly different.
While an e-mail server and e-mail client are obliged to communicate with each other using standards (SMTP, POP3, and IMAP) they are not required to store the e-mail using any sort of standard.
We will be looking for a solution that will have the widest possible use Start with an IMAP server Enhance server with the ability to take the
contents of its message store and create the desired standard XML files called XMTP Using XMTP, SMTP messages can be
transformed via XSLT into HTML pages for viewing. XMTP has been used to implement a telemedicine consultation system using SMTP e-mail and HTML
In the testing phase, but not launched yet http://sourceforge.net/projects/smtp/
IMAP seems to be the only protocol that supports moving and copying e-mail messages from place to place while preserving the e-mail message’s native format.
This means that no matter where the e-mail message ends up, almost any IMAP compliant e-mail client can send it to an “archives” server.
How? Have the user send e-mail directly to a
server hosted by the NC State Archives Have the user send e-mail to an
enhanced IMAP server maintained by their agency This would enable the agency to be able to
locally access the archives e-mail messages IMAP server could then send snapshots to or
send us the XMTP files on electronic media via USPS
Have the user collect and send .pst files to the NC State Archives
Archives will open them with Outlook and move them to the enhanced IMAP server (process would be automated)
Archives should also be able to access packages of e-mail in other formats since Outlook can convert from Eudora, Netscape, etc.
Once loaded into Outlook, the e-mail packages would then be sent to the IMAP server.
Any strategy based on the interception of the data stream is out since we want to collect the e-mail message only after the user has been given a chance to cull and organize them.
Our proposal is to use hmailserver (a source forge open source project) which is an IMAP server that uses MySql or Microsoft SQL server as its message store.
http://www.hmailserver.com
The hMailServer installation contains a minimal MySQL-installation, so if you don't already have a database server in your network, MySQL is installed automatically when you install hMailServer.
The XML creation utility could interface directly with the message store instead of the IMAP protocol.
Hmailserver comes with an attendant com component that can be used to access the data store
Life of an e-mail message E-mail message is sent to the user’s mail server User downloads the message to his/her mailbox User optionally places the message into a folder
on his/her local system User creates a folder on the “Archive” IMAP server User moves the mail from his/her inbox or
specified folder to the folder on the “Archives” IMAP server
An administrator requests that the IMAP server create one or more XML files containing the user’s e-mail
XML files are saved as a preservation copy
Access to Email #1
Load the XML into ENCompass Utilize the IMAP server by enhancing it
to provide web access to its native store similar to the user interface provided by Lurker http://sourceforge.net/projects/lurker
Access to Email #2
Utilizing Documentum by enhancing it to ingest the XML produced by the IMAP server. Documentum server would be used purely
as an e-mail repository, not as a document management application.
Utilize Documentum as a document management application to interfile e-mail messages into named record series
Access to Email #3
Move e-mail messages into a Share Point Portal server Use Outlook to collect the message from the
IMAP server and send them to SPP. Switch-to-Switch Protocol. Protocol specified in
the DLSw standard, used by routers establish DLSw connections, locate resources, forward data, and handle flow control and error recovery.?
XML files would serve purely as a preservation copy.
This Particular Project Take 6 gigabytes of e-mail from Governor
Jim Hunt’s administration (1993-2001; bulk dates 1997-2001) and make it accessible and preservable. E-mail has been appraised and culled to create
the core for preservation E-mail is in Microsoft Outlook .pst files and can be
accessed only by using the correct version of Outlook
Create/utilize programs to move the e-mails out of Microsoft’s proprietary .pst format into a non-proprietary and stable XML format
Also want to write software that is more universal in scope and can be used with most electronic records.
Hire a programmer to write code to convert the .pst files from their format to XML format
Take the converted XML files and load them onto our server and make them available to the public via the web and searchable through our online catalog system (ENCompass/MARS)
Wish us luck! We are very excited to have this
opportunity to explore this potential solution
We hope to take what we learn and apply it to the collection of other electronic government resources that are archival
We’ll keep you posted!