Peter Chan CURATEcamp

41
ePADD Email, Process, Appraise, Discover, Deliver CurateCamp 2015 Peter Chan Digital Archivist Apr. 23, 2015

Transcript of Peter Chan CURATEcamp

ePADD  Email,  Process,  Appraise,  Discover,  Deliver  

 CurateCamp  2015  

Peter  Chan  Digital  Archivist  Apr.  23,  2015  

Emails  Archives  in  Our  Collec?ons  

•  Robert  Creeley  -­‐  ~50,000  •  Richard  Fikes  -­‐  ~100,000  •  Terry  Winograd  -­‐  ~650,000  •  Benoit  Mandelbrot  •  Harrison  Studio  •  Stanford  Humanity  Lab    

Common  Ways  to  Archive  Emails  

Paper  •  Print  the  emails  •  File  the  printed  emails  to  

the  respec?ve  content  folders  

 

Electronic  •  Archive  emails  using  

func?ons  provided  in  email  clients  

Process  

Appraise  

Deliver  

Preserve  

Discover  

Normaliza?on  

•  Converts  email  from  the  closed,  proprietary  file  formats  to  standard,  portable  formats    

 

•  Emailchemy,  MailStore    

 

Appraisal  •  Owner:    

–  Filter  messages  to/from  certain  correspondents  

–  Review  messages  containing  certain  words  (divorce,  daughter,  etc.)  

•  Curator:    –  Ensure  certain  informa?on  exists    

–  Get  overall  view  on  who,  where,  what  are  men?oned  in  the  messages    

 

•  Email  clients  •  ePADD  

•  Email  clients  •  ePADD  

Processing  •  Place  restric?on  on  

messages  containing  •  personal  iden?fiable  

informa?on  (SS#,  credit  card  #,  etc.)  

•  privacy  informa?on  (student  grades,  salary,  grievances,  medical  informa?on,  etc.)  

•  Informa?on  s?pulated  by  donors  

   

•  ePADD  

Can  do  more  than  paper  based  archives!!  

Processing  Organizing  

•  Group  messages  on  certain  words  (project  name,  event  name)  together  

•  Gather  all  messages  belong  to  the  same  person  with  mul?ple  emails  together  

•  Group  all  image  a_achments  in  one  place  

•  List  all  person,  loca?on,  organiza?on  en??es  

•  ePADD  

20  Email  Addresses  for  1  Person    

•  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected].

edu  

•  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]  •  [email protected]

du  

Processing  •  Facilitate  reconcilia?on  with  

authority  files    •  OCLC  FAST  •  Freebase  •  Geonames  

•  User  defined  regular  expressions  

•  Local  kill  list  

•  ePADD  

List  of  Reconciled  Authority  Records  

Processing  Extract  interes?ng  items  

•  List  all  books,  movies  men?oned  in  all  messages  

•  Give  breakdown  of  organiza?ons  by  type  (Universi?es,  Companies  and  Museums,  etc.)  

•  List  events  •  List  all  topics  discussed  in  

messages  •  Create  local  authority  records  

•  Future  ePADD  

Discovery  

•  Existence  of  email  archives  

•  Informa?on  about  the  email  archives  (as  in  tradi?onal  finding  aids)  

•  Informa?on  about  the  email  archives  (all  person,  loca?on,  organiza?on  en??es  and  correspondents)  

•  Ins?tu?on  catalog  system,  Wiki,  Finding  Aid  Repository  (OAC  etc.),  search  engines  

•  Finding  Aids  •  ePADD  

Delivery  •  Email  messages  •  Full  text  search  •  Request  copy  •  See  a_achment  files  

(documents,  spreadsheets)  

•  See  image  a_achments  •  Bulk  search  •  Annotate  messages  •  Organize  messages  

•  Email  clients  •  ePADD  •  Quickview  Plus  

Named  En?ty  Recogni?on  

•  Stanford  Named  En?ty  Recognizer  (NER)  –  Jenny  Rose  Finkel,  Trond  Grenager,  and  Christopher  Manning.  2005.  Incorpora?ng  Non-­‐local  Informa?on  

into  Informa?on  Extrac?on  Systems  by  Gibbs  Sampling.  Proceedings  of  the  43nd  Annual  Mee?ng  of  the  Associa?on  for  Computa?onal  Linguis?cs  (ACL  2005),  pp.  363-­‐370.  h_p://nlp.stanford.edu/~manning/papers/gibbscrf3.pdf  

–  GNU  General  Public  License  (v2  or  later)  

•  OpenNLP    –  (Apache  license)  

•  Custom  NER  – Use  address  book,  Wikipedia,  Freebase