Drake Mendez curatecamp 2015

Maximizing Description to Enhance Access to Born-

Digital Archival CollectionsSeeley G. Mudd Manuscript Library

Princeton University Library

Rossy Mendez, Public Services Project Archivist

Jarrett M. Drake, Digital Archivist

CURATEcamp, Brooklyn Historical Society

April 23, 2015

“How we describe the collections in our care influences the ability of people to discover, access, use and interpret them”

Trends in Practice: Archival Arrangement and Description, pg 17.

<extent>

<scopecontent>

<unittitle>

The beginnings…

<extent>

1. Physical Space Quantity/Arrangement

2. Electronic Digital

<unittitle>

Office of President Records, Shirley TilghmanSubgroup (AC379)

<scopecontent>

Series Level

“One third of the digital files are a mixture of PDF’s and Excel

Spreadsheets”

Multi-level Description of Digital Records

Reality

For born-digital records, the Archive’s existing descriptive workflows failed to provide sufficient context and precision for <did> elements, including <unitdate>, <unittitle>, & <extent>.

Challenge

For multi-level records, how does one create these elements programmatically?

Previous Workflow

Create disk image

Previous Workflow

CSV output from FTK Imager

AT Resource RecordWindows Explorer

EAD <did> element

Revised Workflow

Question

What are the key metadata points we should extract from born-digital records and later represent in EAD?

Answer

1. Names of each folder <unittitle>

2. Modified dates of the oldest and newest files <unitdate>

3. Numbers of folders and numbers of files <extent>

Current Description Workflow

Complete digital records processing for Mudd Library can be found at: http://rbsc.princeton.edu/policies/guidance-recommended-file-formats

.txt

.csv

.xls

.xml

Current Description Workflow: Extract

Shell script to extract <unittitle>, <unitdate>, and <extent> values

-maxdepth 1

Current Description Workflow: Transform

Output of shell script as .txt file

Output of shell script transformed into EAD

Description Workflow Enhancements

Eliminate string values for <extent> elements and minimize post-processing of data

Use topic modeling for textual data (fondz or another program) and write scripts for basic textual analysis (e.g., automated page count for PDF’s)

Index all names of directories and files and represent their structure through a file browser embedded in the finding aid and/or the repository (Hydra)

Drake Mendez curatecamp 2015

Government & Nonprofit

Transcript of Drake Mendez curatecamp 2015