W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George...

WHERE HAVE ALL THE BINDERS GONE?

Greg Colati, University of Denver

Jennifer King, George Washington University

Sylvia Augusteijn, George Washington University

SAA Chicago Session #801

September 1, 2007

WHY MANAGE WITH A DATABASE?

Scale Centralized management Access Reusability Rearrange-ability

REAL DRIVERS OF CHANGE

Demand for item level access Born Digital content Digitized content Researcher demands and expectations

MANY INPUTS, MANY OUTPUTS

Metadatafrom

RecordsManagement

system

OAI metadatafor

harvesters andaggregators

EAD XMLfor

RMOA or other uses

MARCrecords

forIII or other uses

Metadata for localSystems: e.g.Heritage West,

Penrose web. DUVAGA

Metadata from local

systemse.g. DUVAGA, or

IR

In-housecataloging

orimported metadata

CollectionsManagement

Database

Physical

objectStoragelocation

Digital object

Storagelocation

OBJECTS AND ATTRIBUTES I belong to a collection I belong to a series I came from somewhere I am an image I am a certain file format(s) I am about something(s) I am green, blue, and brown

CLUSTERING

VISUALIZATION

© 2

00

7 G

regory

C.

Cola

ti

CONTEXTUALIZE THE RESOURCE

The Encyclopedia of Chicago http://www.encyclopedia.chicagohistory.org/

I WANT WHAT I WANT …

A CULTURAL SHIFT

General

Specific Association

Object

EXTEND INTEROPERABILITY

Descriptive standards at the item level

MANAGE FROM THE BOTTOM UP

Items and attributes Create associations, implicit and explicit

PRODUCTIVITY APPROACH TO PROCESSING, MANAGEMENT, AND ACCESS

Automate metadata creation Metadata extraction Pre-populate metadata fields using default and

automatically generated terms Stop writing extensive biographical and

historical notes Automate digital content creation

USE THE POWER OF DATABASE TOOLS

Ingest tools discussed above Export templates for:

MARC EAD Various XML schemas for item level export:

MARCXML, DC, TEI, VRA etc.

LEVERAGE USE OF DIGITAL REPOSITORIES

We don’t have to be self-sufficient Outsource low-level functions

Mass storage Backup

CREATE PARTNERSHIPS

Computer scientists Librarians Academic technologists

GET INTO MAINSTREAM DISCOVERY TOOLSGET “INTO THE FLOW”

Can everyone say Google MySpace YouTube Facebook

CREATE ACCESS TOOLS BASED ON USER NEEDS

Understand how all of our constituencies seek information and use information

Make our tools reflect these behaviors. When those behaviors change, our tools

should change with them.

NEW SKILLS FOR THE DIGITAL ERAJennifer King

George Washington University

RE:DISCOVERY MAIN PAGE

RE:DISCOVERY FOR INTERNET SEARCH

RFI AND FINDING AID

From Document

To Database

Sylvia AugusteijnGeorge Washington University

Special Collections and University ArchivesSAA session 801

September 1, 2007

Out from the binders

Scope and content notes, series descriptions simple to cut and paste into Re:Discovery

Cut and paste not feasible for thousands of item-level records

“Container list” project is born

Goal: to separate elements of each item name (number, title, date) so Re:Discovery could import them into their respective fields

Container lists

Each item has a number, title, and date, but formats vary slightly in punctuation or spacing

Ways of writing the same name:

1. Correspondence, 1950-57

I. Correspondence – 1950-1957

i. correspondence 1950 to 1957

Naming conventions generally consistent within each finding aid

How to automate?

Automation, part 1:

Delimiting the text

Container lists saved in a text editor (TextPad)

Delimiters are special characters placed within the text to separate the elements

We chose * to signal the beginning and end of each field and % to signal the boundary between fields

Item as it appears in text of finding aid: 1. Correspondence, 1950-57

Item with delimiters inserted: *1*%*Correspondence*%*1950-57*

Delimiting the text (continued)

Re:Discovery can import directly from the text editor, with instructions

Instructions to Re:Discovery: the first element of this name will be the number, the second will be the title, the third will be the date

*1*%*Correspondence*%*1950-57*

How to add these delimiters to thousands of item records?

Automation, part 2: Regular expressions

A regular expression is a string that uses special characters (such as \ + $ ^ ]) to describe and match patterns of text within a document

Regular expressions(continued)

First used regular expressions to search through our text for anything formatted like an item (i.e. to search for a pattern in which an item number is followed by a title and date)

Then used regular expressions to insert our delimiters in between those elements

To turn a page of this:

1. Journals, 1950-602. Photographs, 1970-803. Postcards, 1940-50

Into a page of this:

*00001*%*Journals*%*1950-60**00002*%*Photographs*%*1970-80**00003*%*Postcards*%*1940-50*

Examples of regular expressions

To turn 1. Correspondence, 1950-1957 into

*00001*%*Correspondence, 1950-1957

Find: $[0-9]$. (find any digit followed by a period) Replace: *0000\1*%* (replace with *, four zeroes, that digit and *%*)

Then to turn *00001*%*Correspondence, 1950-1957 into *00001*%*Correspondence*%*1950-1957

Find: , $[0-9]\{4\}$ (find any four-digit number preceded by a comma and space)

Replace: *%*\1 (replace the comma and space with *%*)

Challenges

Tweaking expressions slightly for each new container list

Writing the wrong expression and accidentally replacing the wrong text

Failing to export correctly to Re:Discovery due to small number of missing delimiters

Re:Discovery and beyond Delimited text exported into Re:Discovery

From Re:Discovery, easy creation of EAD finding aids using a template

To date: 257 collections in Re:Discovery (and EAD finding aids on the web)

0 binders

CONTACT INFORMATION:

Greg Colati

Digital Initiatives Coordinator

University of Denver

[email protected]

Jennifer King

Manuscripts Librarian


Washington, DC

[email protected]

Sylvia Augusteijn

Project Archivist


[email protected]

W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George...

Documents

Transcript of W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George...