Drake Mendez curatecamp 2015

18
Maximizing Description to Enhance Access to Born - Digital Archival Collections Seeley G. Mudd Manuscript Library Princeton University Library Rossy Mendez, Public Services Project Archivist Jarrett M. Drake, Digital Archivist CURATEcamp, Brooklyn Historical Society April 23, 2015

Transcript of Drake Mendez curatecamp 2015

Page 1: Drake Mendez curatecamp 2015

Maximizing Description to Enhance Access to Born-

Digital Archival CollectionsSeeley G. Mudd Manuscript Library

Princeton University Library

Rossy Mendez, Public Services Project Archivist

Jarrett M. Drake, Digital Archivist

CURATEcamp, Brooklyn Historical Society

April 23, 2015

Page 2: Drake Mendez curatecamp 2015
Page 3: Drake Mendez curatecamp 2015

“How we describe the collections in our care influences the ability of people to discover, access, use and interpret them”

Trends in Practice: Archival Arrangement and Description, pg 17.

<extent>

<scopecontent>

<unittitle>

Page 4: Drake Mendez curatecamp 2015

The beginnings…

Page 5: Drake Mendez curatecamp 2015

<extent>

1. Physical Space Quantity/Arrangement

2. Electronic Digital

Page 6: Drake Mendez curatecamp 2015
Page 7: Drake Mendez curatecamp 2015

<unittitle>

Office of President Records, Shirley TilghmanSubgroup (AC379)

Page 8: Drake Mendez curatecamp 2015

<scopecontent>

Series Level

“One third of the digital files are a mixture of PDF’s and Excel

Spreadsheets”

Page 9: Drake Mendez curatecamp 2015

<phystech>

Page 10: Drake Mendez curatecamp 2015

<unitdate>

Page 11: Drake Mendez curatecamp 2015

Multi-level Description of Digital Records

Reality

For born-digital records, the Archive’s existing descriptive workflows failed to provide sufficient context and precision for <did> elements, including <unitdate>, <unittitle>, & <extent>.

Challenge

For multi-level records, how does one create these elements programmatically?

Page 12: Drake Mendez curatecamp 2015

Previous Workflow

Create disk image

Page 13: Drake Mendez curatecamp 2015

Previous Workflow

CSV output from FTK Imager

AT Resource RecordWindows Explorer

EAD <did> element

Page 14: Drake Mendez curatecamp 2015

Revised Workflow

Question

What are the key metadata points we should extract from born-digital records and later represent in EAD?

Answer

1. Names of each folder <unittitle>

2. Modified dates of the oldest and newest files <unitdate>

3. Numbers of folders and numbers of files <extent>

Page 15: Drake Mendez curatecamp 2015

Current Description Workflow

Complete digital records processing for Mudd Library can be found at: http://rbsc.princeton.edu/policies/guidance-recommended-file-formats

.txt

.csv

.xls

.xml

Page 16: Drake Mendez curatecamp 2015

Current Description Workflow: Extract

Shell script to extract <unittitle>, <unitdate>, and <extent> values

-maxdepth 1

Page 17: Drake Mendez curatecamp 2015

Current Description Workflow: Transform

Output of shell script as .txt file

Output of shell script transformed into EAD

Page 18: Drake Mendez curatecamp 2015

Description Workflow Enhancements

Eliminate string values for <extent> elements and minimize post-processing of data

Use topic modeling for textual data (fondz or another program) and write scripts for basic textual analysis (e.g., automated page count for PDF’s)

Index all names of directories and files and represent their structure through a file browser embedded in the finding aid and/or the repository (Hydra)