Preservation of Electronic Mail

22
Preservation of Electronic Mail Druscie Simpson NC State Archives November 19, 2004

description

Preservation of Electronic Mail. Druscie Simpson NC State Archives November 19, 2004. E-mail: The Digital Divide Also Multiplies. E-mail as a Burden. The Radicati Group and Merrill Lynch estimate that email is growing at a rate of 300% annually. The Age (July 8, 2003) - PowerPoint PPT Presentation

Transcript of Preservation of Electronic Mail

Page 1: Preservation of Electronic Mail

Preservation of Electronic Mail

Druscie SimpsonNC State Archives

November 19, 2004

Page 2: Preservation of Electronic Mail

E-mail: The Digital Divide Also

Multiplies

Page 3: Preservation of Electronic Mail

E-mail as a Burden The Radicati Group and Merrill Lynch estimate that

email is growing at a rate of 300% annually. The Age (July 8, 2003)

The real problem: not more email, but “larger and larger attachments, generating an average of 5MB of email content” daily. The Age (July 8, 2003)

Email generates about 400,000 terabytes of new information each year worldwide

About 31 billion emails are sent daily, on the Internet and elsewhere, a figure which is expected to double by 2006 (source: International Data Corporation (IDC). The average email is about 59 kilobytes in size, thus the annual flow of emails worldwide is 667,585 terabytes. (How Much Information 2003, UC Berkeley)

Page 4: Preservation of Electronic Mail

       What do I do with ALL that e‑mail?!     

Why are we so interested in E‑Mail and Digital Records?

Email’s far reaching effects

Page 5: Preservation of Electronic Mail

Loss of Corporate Knowledge

Imagine you’re new in the office. All of the information to do your job was on your computer. Your predecessor deleted the information before leaving or it was password protected. You don’t have the password.

Page 6: Preservation of Electronic Mail

Legal Implications If it is in an email and

it sent from, received by, or is stored on a government computer, it is a legal record

Never put anything in an e-mail you don’t want on the front page of the local paper.

Always CYO cover your office.)

Page 7: Preservation of Electronic Mail

Users have several options for keeping their saved e-mails: They may leave it on the mail provider’s

server They may leave it on a web-based mail

server such as Hotmail or Yahoo They may store it in their e-mail client

such as Outlook, Eudora, Netscape They may store it on the file system of

their PC as individual .eml files (MS Outlook Express Electronic Mail)

Page 8: Preservation of Electronic Mail

In each of these circumstances the actual byte stream used to represent the e-mail message is slightly different.

 While an e-mail server and e-mail client are obliged to communicate with each other using standards (SMTP, POP3, and IMAP) they are not required to store the e-mail using any sort of standard.

Page 9: Preservation of Electronic Mail

We will be looking for a solution that will have the widest possible use Start with an IMAP server Enhance server with the ability to take the

contents of its message store and create the desired standard XML files called XMTP Using XMTP, SMTP messages can be

transformed via XSLT into HTML pages for viewing. XMTP has been used to implement a telemedicine consultation system using SMTP e-mail and HTML

In the testing phase, but not launched yet http://sourceforge.net/projects/smtp/

Page 10: Preservation of Electronic Mail

IMAP seems to be the only protocol that supports moving and copying e-mail messages from place to place while preserving the e-mail message’s native format.

This means that no matter where the e-mail message ends up, almost any IMAP compliant e-mail client can send it to an “archives” server.

Page 11: Preservation of Electronic Mail

How? Have the user send e-mail directly to a

server hosted by the NC State Archives Have the user send e-mail to an

enhanced IMAP server maintained by their agency This would enable the agency to be able to

locally access the archives e-mail messages IMAP server could then send snapshots to or

send us the XMTP files on electronic media via USPS

Page 12: Preservation of Electronic Mail

Have the user collect and send .pst files to the NC State Archives

Archives will open them with Outlook and move them to the enhanced IMAP server (process would be automated)

Archives should also be able to access packages of e-mail in other formats since Outlook can convert from Eudora, Netscape, etc.

Once loaded into Outlook, the e-mail packages would then be sent to the IMAP server.

Page 13: Preservation of Electronic Mail

Any strategy based on the interception of the data stream is out since we want to collect the e-mail message only after the user has been given a chance to cull and organize them.

Page 14: Preservation of Electronic Mail

Our proposal is to use hmailserver (a source forge open source project) which is an IMAP server that uses MySql or Microsoft SQL server as its message store.

http://www.hmailserver.com

Page 15: Preservation of Electronic Mail

The hMailServer installation contains a minimal MySQL-installation, so if you don't already have a database server in your network, MySQL is installed automatically when you install hMailServer.

The XML creation utility could interface directly with the message store instead of the IMAP protocol.

Hmailserver comes with an attendant com component that can be used to access the data store

Page 16: Preservation of Electronic Mail

Life of an e-mail message E-mail message is sent to the user’s mail server User downloads the message to his/her mailbox User optionally places the message into a folder

on his/her local system User creates a folder on the “Archive” IMAP server User moves the mail from his/her inbox or

specified folder to the folder on the “Archives” IMAP server

An administrator requests that the IMAP server create one or more XML files containing the user’s e-mail

XML files are saved as a preservation copy

Page 17: Preservation of Electronic Mail

Access to Email #1

Load the XML into ENCompass Utilize the IMAP server by enhancing it

to provide web access to its native store similar to the user interface provided by Lurker http://sourceforge.net/projects/lurker

Page 18: Preservation of Electronic Mail

Access to Email #2

Utilizing Documentum by enhancing it to ingest the XML produced by the IMAP server. Documentum server would be used purely

as an e-mail repository, not as a document management application.

Utilize Documentum as a document management application to interfile e-mail messages into named record series

Page 19: Preservation of Electronic Mail

Access to Email #3

Move e-mail messages into a Share Point Portal server Use Outlook to collect the message from the

IMAP server and send them to SPP. Switch-to-Switch Protocol. Protocol specified in

the DLSw standard, used by routers establish DLSw connections, locate resources, forward data, and handle flow control and error recovery.?

XML files would serve purely as a preservation copy.

Page 20: Preservation of Electronic Mail

This Particular Project Take 6 gigabytes of e-mail from Governor

Jim Hunt’s administration (1993-2001; bulk dates 1997-2001) and make it accessible and preservable. E-mail has been appraised and culled to create

the core for preservation E-mail is in Microsoft Outlook .pst files and can be

accessed only by using the correct version of Outlook

Create/utilize programs to move the e-mails out of Microsoft’s proprietary .pst format into a non-proprietary and stable XML format

Page 21: Preservation of Electronic Mail

Also want to write software that is more universal in scope and can be used with most electronic records.

Hire a programmer to write code to convert the .pst files from their format to XML format

Take the converted XML files and load them onto our server and make them available to the public via the web and searchable through our online catalog system (ENCompass/MARS)

Page 22: Preservation of Electronic Mail

Wish us luck! We are very excited to have this

opportunity to explore this potential solution

We hope to take what we learn and apply it to the collection of other electronic government resources that are archival

We’ll keep you posted!