Digital TV Presented by Peter Chan to the Maxidigm Investment Club Mar. 16, 2007.
Peter Chan CURATEcamp
-
Upload
juliaykim -
Category
Government & Nonprofit
-
view
124 -
download
0
Transcript of Peter Chan CURATEcamp
ePADD Email, Process, Appraise, Discover, Deliver
CurateCamp 2015
Peter Chan Digital Archivist Apr. 23, 2015
Emails Archives in Our Collec?ons
• Robert Creeley -‐ ~50,000 • Richard Fikes -‐ ~100,000 • Terry Winograd -‐ ~650,000 • Benoit Mandelbrot • Harrison Studio • Stanford Humanity Lab
Common Ways to Archive Emails
Paper • Print the emails • File the printed emails to
the respec?ve content folders
Electronic • Archive emails using
func?ons provided in email clients
Normaliza?on
• Converts email from the closed, proprietary file formats to standard, portable formats
• Emailchemy, MailStore
Appraisal • Owner:
– Filter messages to/from certain correspondents
– Review messages containing certain words (divorce, daughter, etc.)
• Curator: – Ensure certain informa?on exists
– Get overall view on who, where, what are men?oned in the messages
• Email clients • ePADD
• Email clients • ePADD
Processing • Place restric?on on
messages containing • personal iden?fiable
informa?on (SS#, credit card #, etc.)
• privacy informa?on (student grades, salary, grievances, medical informa?on, etc.)
• Informa?on s?pulated by donors
• ePADD
Processing Organizing
• Group messages on certain words (project name, event name) together
• Gather all messages belong to the same person with mul?ple emails together
• Group all image a_achments in one place
• List all person, loca?on, organiza?on en??es
• ePADD
20 Email Addresses for 1 Person
• [email protected] • [email protected] • [email protected] • [email protected] • [email protected] • [email protected] • [email protected] • [email protected] • [email protected] • [email protected].
edu
• [email protected] • [email protected] • [email protected] • [email protected] • [email protected] • [email protected] • [email protected] • [email protected] • [email protected] • [email protected]
du
Processing • Facilitate reconcilia?on with
authority files • OCLC FAST • Freebase • Geonames
• User defined regular expressions
• Local kill list
• ePADD
Processing Extract interes?ng items
• List all books, movies men?oned in all messages
• Give breakdown of organiza?ons by type (Universi?es, Companies and Museums, etc.)
• List events • List all topics discussed in
messages • Create local authority records
• Future ePADD
Discovery
• Existence of email archives
• Informa?on about the email archives (as in tradi?onal finding aids)
• Informa?on about the email archives (all person, loca?on, organiza?on en??es and correspondents)
• Ins?tu?on catalog system, Wiki, Finding Aid Repository (OAC etc.), search engines
• Finding Aids • ePADD
Delivery • Email messages • Full text search • Request copy • See a_achment files
(documents, spreadsheets)
• See image a_achments • Bulk search • Annotate messages • Organize messages
• Email clients • ePADD • Quickview Plus
Named En?ty Recogni?on
• Stanford Named En?ty Recognizer (NER) – Jenny Rose Finkel, Trond Grenager, and Christopher Manning. 2005. Incorpora?ng Non-‐local Informa?on
into Informa?on Extrac?on Systems by Gibbs Sampling. Proceedings of the 43nd Annual Mee?ng of the Associa?on for Computa?onal Linguis?cs (ACL 2005), pp. 363-‐370. h_p://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf
– GNU General Public License (v2 or later)
• OpenNLP – (Apache license)
• Custom NER – Use address book, Wikipedia, Freebase