W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George...
-
Upload
conor-anker -
Category
Documents
-
view
214 -
download
0
Transcript of W HERE H AVE A LL THE B INDERS G ONE ? Greg Colati, University of Denver Jennifer King, George...
WHERE HAVE ALL THE BINDERS GONE?
Greg Colati, University of Denver
Jennifer King, George Washington University
Sylvia Augusteijn, George Washington University
SAA Chicago Session #801
September 1, 2007
WHY MANAGE WITH A DATABASE?
Scale Centralized management Access Reusability Rearrange-ability
REAL DRIVERS OF CHANGE
Demand for item level access Born Digital content Digitized content Researcher demands and expectations
MANY INPUTS, MANY OUTPUTS
Metadatafrom
RecordsManagement
system
OAI metadatafor
harvesters andaggregators
EAD XMLfor
RMOA or other uses
MARCrecords
forIII or other uses
Metadata for localSystems: e.g.Heritage West,
Penrose web. DUVAGA
Metadata from local
systemse.g. DUVAGA, or
IR
In-housecataloging
orimported metadata
CollectionsManagement
Database
Physical
objectStoragelocation
Digital object
Storagelocation
OBJECTS AND ATTRIBUTES I belong to a collection I belong to a series I came from somewhere I am an image I am a certain file format(s) I am about something(s) I am green, blue, and brown
CLUSTERING
VISUALIZATION
© 2
00
7 G
regory
C.
Cola
ti
CONTEXTUALIZE THE RESOURCE
The Encyclopedia of Chicago http://www.encyclopedia.chicagohistory.org/
I WANT WHAT I WANT …
A CULTURAL SHIFT
General
Specific Association
Object
EXTEND INTEROPERABILITY
Descriptive standards at the item level
MANAGE FROM THE BOTTOM UP
Items and attributes Create associations, implicit and explicit
PRODUCTIVITY APPROACH TO PROCESSING, MANAGEMENT, AND ACCESS
Automate metadata creation Metadata extraction Pre-populate metadata fields using default and
automatically generated terms Stop writing extensive biographical and
historical notes Automate digital content creation
USE THE POWER OF DATABASE TOOLS
Ingest tools discussed above Export templates for:
MARC EAD Various XML schemas for item level export:
MARCXML, DC, TEI, VRA etc.
LEVERAGE USE OF DIGITAL REPOSITORIES
We don’t have to be self-sufficient Outsource low-level functions
Mass storage Backup
CREATE PARTNERSHIPS
Computer scientists Librarians Academic technologists
GET INTO MAINSTREAM DISCOVERY TOOLSGET “INTO THE FLOW”
Can everyone say Google MySpace YouTube Facebook
CREATE ACCESS TOOLS BASED ON USER NEEDS
Understand how all of our constituencies seek information and use information
Make our tools reflect these behaviors. When those behaviors change, our tools
should change with them.
NEW SKILLS FOR THE DIGITAL ERAJennifer King
George Washington University
RE:DISCOVERY MAIN PAGE
RE:DISCOVERY FOR INTERNET SEARCH
RFI AND FINDING AID
From Document
To Database
Sylvia AugusteijnGeorge Washington University
Special Collections and University ArchivesSAA session 801
September 1, 2007
Out from the binders
Scope and content notes, series descriptions simple to cut and paste into Re:Discovery
Cut and paste not feasible for thousands of item-level records
“Container list” project is born
Goal: to separate elements of each item name (number, title, date) so Re:Discovery could import them into their respective fields
Container lists
Each item has a number, title, and date, but formats vary slightly in punctuation or spacing
Ways of writing the same name:
1. Correspondence, 1950-57
I. Correspondence – 1950-1957
i. correspondence 1950 to 1957
Naming conventions generally consistent within each finding aid
How to automate?
Automation, part 1:
Delimiting the text
Container lists saved in a text editor (TextPad)
Delimiters are special characters placed within the text to separate the elements
We chose * to signal the beginning and end of each field and % to signal the boundary between fields
Item as it appears in text of finding aid: 1. Correspondence, 1950-57
Item with delimiters inserted: *1*%*Correspondence*%*1950-57*
Delimiting the text (continued)
Re:Discovery can import directly from the text editor, with instructions
Instructions to Re:Discovery: the first element of this name will be the number, the second will be the title, the third will be the date
*1*%*Correspondence*%*1950-57*
How to add these delimiters to thousands of item records?
Automation, part 2: Regular expressions
A regular expression is a string that uses special characters (such as \ + $ ^ ]) to describe and match patterns of text within a document
Regular expressions(continued)
First used regular expressions to search through our text for anything formatted like an item (i.e. to search for a pattern in which an item number is followed by a title and date)
Then used regular expressions to insert our delimiters in between those elements
To turn a page of this:
1. Journals, 1950-602. Photographs, 1970-803. Postcards, 1940-50
Into a page of this:
*00001*%*Journals*%*1950-60**00002*%*Photographs*%*1970-80**00003*%*Postcards*%*1940-50*
Examples of regular expressions
To turn 1. Correspondence, 1950-1957 into
*00001*%*Correspondence, 1950-1957
Find: \([0-9]\). (find any digit followed by a period) Replace: *0000\1*%* (replace with *, four zeroes, that digit and *%*)
Then to turn *00001*%*Correspondence, 1950-1957 into *00001*%*Correspondence*%*1950-1957
Find: , \([0-9]\{4\}\) (find any four-digit number preceded by a comma and space)
Replace: *%*\1 (replace the comma and space with *%*)
Challenges
Tweaking expressions slightly for each new container list
Writing the wrong expression and accidentally replacing the wrong text
Failing to export correctly to Re:Discovery due to small number of missing delimiters
Re:Discovery and beyond Delimited text exported into Re:Discovery
From Re:Discovery, easy creation of EAD finding aids using a template
To date: 257 collections in Re:Discovery (and EAD finding aids on the web)
0 binders
CONTACT INFORMATION:
Greg Colati
Digital Initiatives Coordinator
University of Denver
Jennifer King
Manuscripts Librarian
George Washington University
Washington, DC
Sylvia Augusteijn
Project Archivist
George Washington University