Drake Mendez curatecamp 2015
-
Upload
juliaykim -
Category
Government & Nonprofit
-
view
250 -
download
0
Transcript of Drake Mendez curatecamp 2015
Maximizing Description to Enhance Access to Born-
Digital Archival CollectionsSeeley G. Mudd Manuscript Library
Princeton University Library
Rossy Mendez, Public Services Project Archivist
Jarrett M. Drake, Digital Archivist
CURATEcamp, Brooklyn Historical Society
April 23, 2015
“How we describe the collections in our care influences the ability of people to discover, access, use and interpret them”
Trends in Practice: Archival Arrangement and Description, pg 17.
<extent>
<scopecontent>
<unittitle>
The beginnings…
<extent>
1. Physical Space Quantity/Arrangement
2. Electronic Digital
<unittitle>
Office of President Records, Shirley TilghmanSubgroup (AC379)
<scopecontent>
Series Level
“One third of the digital files are a mixture of PDF’s and Excel
Spreadsheets”
<phystech>
<unitdate>
Multi-level Description of Digital Records
Reality
For born-digital records, the Archive’s existing descriptive workflows failed to provide sufficient context and precision for <did> elements, including <unitdate>, <unittitle>, & <extent>.
Challenge
For multi-level records, how does one create these elements programmatically?
Previous Workflow
Create disk image
Previous Workflow
CSV output from FTK Imager
AT Resource RecordWindows Explorer
EAD <did> element
Revised Workflow
Question
What are the key metadata points we should extract from born-digital records and later represent in EAD?
Answer
1. Names of each folder <unittitle>
2. Modified dates of the oldest and newest files <unitdate>
3. Numbers of folders and numbers of files <extent>
Current Description Workflow
Complete digital records processing for Mudd Library can be found at: http://rbsc.princeton.edu/policies/guidance-recommended-file-formats
.txt
.csv
.xls
.xml
Current Description Workflow: Extract
Shell script to extract <unittitle>, <unitdate>, and <extent> values
-maxdepth 1
Current Description Workflow: Transform
Output of shell script as .txt file
Output of shell script transformed into EAD
Description Workflow Enhancements
Eliminate string values for <extent> elements and minimize post-processing of data
Use topic modeling for textual data (fondz or another program) and write scripts for basic textual analysis (e.g., automated page count for PDF’s)
Index all names of directories and files and represent their structure through a file browser embedded in the finding aid and/or the repository (Hydra)