Automating Name Authority Record Updates and Bibliographic File Maintenance

28
Automating Name Authority Record Updates and Bibliographic File Maintenance Catalog Management Interest Group, ALA Annual, Chicago, IL, June 29, 2013 Lucas Mak Michigan State University Libraries A Proof of Concept

description

Automating Name Authority Record Updates and Bibliographic File Maintenance. A Proof of Concept. Lucas Mak Michigan State University Libraries. Catalog M anagement Interest Group, ALA Annual, Chicago, IL, June 29, 2013. Authority Control at MSU. - PowerPoint PPT Presentation

Transcript of Automating Name Authority Record Updates and Bibliographic File Maintenance

Page 1: Automating Name Authority Record Updates and Bibliographic File Maintenance

Automating Name Authority Record Updates and Bibliographic File Maintenance

Catalog Management Interest Group, ALA Annual, Chicago, IL, June 29, 2013

Lucas MakMichigan State University Libraries

A Proof of Concept

Page 2: Automating Name Authority Record Updates and Bibliographic File Maintenance

Authority Control at MSU

1.5 millions Authority Records (1.1 millions NARs) In-house

NACO institution Database maintenance

Post-cataloging Authority Control New Headings Report

• Download NARs from SkyRiver Updates to NARs not necessary caught

• 1XX (No item cataloged under changed 1XX not in New Headings Report)

• Elements other than 1XX (e.g. 4XX, 670)

Page 3: Automating Name Authority Record Updates and Bibliographic File Maintenance

LC/NACO NAF RDA Transition

PCC Day 1 for RDA NAR: Mar. 31, 2013 Phased reissuance of NARs

Phase 1 • Scope

– NARs with characteristics known to be at variance with RDA practice – Not candidates for any of the mechanical changes to be made during phase 2

• Adding a 667 note “THIS 1XX FIELD CANNOT BE USED UNDER RDA UNTIL THIS RECORD HAS BEEN REVIEWED AND/OR UPDATED”– Completed Aug. 20, 2012 (436,943 records processed)

Phase 2• Programmatic changes to 1XX headings that are not acceptable under RDA (e.g.,

changes to Bible headings, spelling out Dept. and months, etc., abbreviations in the subfield $d for personal names)

• Completed March 27, 2013 (371,942 records changed)

Page 4: Automating Name Authority Record Updates and Bibliographic File Maintenance

Updates of NARs by NACO institutions Reviewing, upgrading, and recoding Phase 1 records to

RDA Adding any of the 17 new MARC fields (e.g. 046, 372, etc.) Routine NAR maintenance

• PCC post-RDA test guidelines “strongly encourage” to evaluate and recode the “RDA-acceptable AACR2 NARs” to RDA whenever possible

Page 5: Automating Name Authority Record Updates and Bibliographic File Maintenance

Objectives

To catch changes to NARs Changes in 1XX Addition, deletion, or updates of elements other than 1XX

To perform related BFM if 1XX in a NAR is changed

Page 6: Automating Name Authority Record Updates and Bibliographic File Maintenance

Tasks

To download NARs one-by-one/in bulk To detect updates to NARs already existing in ILS To overlay existing NARs with updated ones Updates authorized access points (AAPs) in bib

records if 1XX in NAR updated To automate and link up the above tasks

Page 7: Automating Name Authority Record Updates and Bibliographic File Maintenance

Task #1: Download NARs OCLC LCNAF SRU Service

Can be searched by LCCN Available in multiple schema including MARCXML SRU-based service (HTTP request) FREE!! But:

• Updated every Monday night• Bulk download – by search term (e.g. after certain date)

Implementation• Search LCCNs one-by-one by AutoIt script

– Around 10 records/sec. retrieved• Download XML files into one folder (files named by LCCN)

Page 8: Automating Name Authority Record Updates and Bibliographic File Maintenance

Task #2: NAR Update Detection To compare NARs from ILS and NARs from LC/NACO NAF by XSLT

MARC 005 (timestamp)

If timestamp more current on the NAR from NAF Overlay the NAR in ILS

Page 9: Automating Name Authority Record Updates and Bibliographic File Maintenance

Task #3: Export/Overlay of NARs

MarcEdit Export updated NARs into ILS Through TCP/IP (Host address, Port, .mrc file) One-by-one (though .mrc file can contain multiple NARs)

Page 10: Automating Name Authority Record Updates and Bibliographic File Maintenance

Task #4: Updates of Bib AAPs

XSLT To detect changes in 1XX between old and new NARs To build AAP conversion table (a TXT file) when 1XX is

changed AutoIt

Automate bib AAP updates by “Global Update” module in ILS• Read old and new AAPs from the TXT file and fill out info required

in “Global Update” process

Page 11: Automating Name Authority Record Updates and Bibliographic File Maintenance

Task #5: Automation

Use AutoIt to: Link up various steps in the workflow Automate searching against OCLC LCNAF SRU Service by

compiling and sending HTTP requests Execute various XSLTs in a predetermined sequence

• e.g. NAR comparison AAP comparison Read TXT files (LCCN list, AAP conversion table) created by

XSLT processes Run MarcEdit to overlay obsolete NARs Execute “Global Update” process

Page 12: Automating Name Authority Record Updates and Bibliographic File Maintenance

Basic Workflow

ILS

ILS NARs

Extract by Create Lists

LCCNs

Extract by XSLT

Search by AutoIt

LC/NACO NARs

Retrieve

Updated NARs

Compare by XSLT

Overlay by MarcEdit

Updated Headings

Global Update

Page 13: Automating Name Authority Record Updates and Bibliographic File Maintenance

Data Integrity Issue #1

No ILS ARN in extracted NARs Needed for 949 overlay command Solution

• Extract “LCCN” & “ILS ARN” pair through Create Lists• Merge ARN into extracted NARs (907$a) by XSLT/MarcEdit

Page 14: Automating Name Authority Record Updates and Bibliographic File Maintenance

Data Integrity Issue #2

NARs without 010 010 contains LCCN Some LCCNs transposed into 035

• Original prefix (n, no, nb, nr) removed• Prepended with prefix (OCoLC)• Possibly done during system migration

Solution1. Search string in 035 (excl. prefix) as keyword in SkyRiver2. Retrieve complete LCCN from matched record3. Search retrieved LCCN against OCLC Service and download the

record

Page 15: Automating Name Authority Record Updates and Bibliographic File Maintenance

Data Integrity Issue #3

Existing NARs without 005 No timestamp

• Bring in the new NAR whenever the old NAR lacks 005

Page 16: Automating Name Authority Record Updates and Bibliographic File Maintenance

Data Integrity Issue #4

Local data in NAR Local call no. (e.g. 050, 090, 053$5) Institution code & initials (shared catalog) Copy local data into new NAR before overlay

Page 17: Automating Name Authority Record Updates and Bibliographic File Maintenance

Search and Retrieval Issue #1 “Blank” XML File from OCLC LCNAF SRU Service

Page 18: Automating Name Authority Record Updates and Bibliographic File Maintenance

Search and Retrieval Issue #1 (Cont’d)

No hit for some LCCNs XML file size: < 2KB LCCNs in places other than 010$a Not indexed

• Cancelled LCCNs (010$z)

Solution1. Compile a list of LCCNs with file size < 2KB2. Search LCCNs in SkyRiver by Keyword3. Get new LCCNs from 010$a4. Search OCLC LCNAF SRU Service using new LCCNs But …

Page 19: Automating Name Authority Record Updates and Bibliographic File Maintenance

Search and Retrieval Issue #2 Keyword search in SkyRiver returns multiple hits

Undifferentiated & related NARs

Write LCCNs with multiple hits to a log file for manual review

Person broken out from undifferentiated NAR

Original undifferentiated NAR cancelled

Page 20: Automating Name Authority Record Updates and Bibliographic File Maintenance

Search and Retrieval Issue #2 (Cont’d)

• Keyword search in SkyRiver returns multiple hits Same numeral part of LCCN with different prefixes

Write LCCNs with multiple hits to a log file for manual review

NAR contributed

via RLIN

NAR contributed

via OCLC

Page 21: Automating Name Authority Record Updates and Bibliographic File Maintenance

Search and Retrieval Issue #2 (Cont’d)

Keyword search in SkyRiver returns no hit The LCCN in question no longer exists in NAF

• NAR containing cancelled LCCN was cancelled again– Loss of 010$z

• Write no-hit LCCNs into log file for manual review

Page 22: Automating Name Authority Record Updates and Bibliographic File Maintenance

Search and Retrieval Issue #2 (Cont’d) Keyword search in SkyRiver returns no hit

False negative• Space between prefix and number removed• Hyphen within number removed (e.g. n 85-342238 n

85342238)– Search normalized LCCNs

• Delay in returning result for a search due to slow or unstable Internet connection speed– Set a longer wait time before trying to copy new LCCN– Run keyword search in SkyRiver in loop until

» Number of entries in log file equals to immediate preceding round, or

» File size of the no-hit log file equals zero

Page 23: Automating Name Authority Record Updates and Bibliographic File Maintenance

Global Update Issues

ILS interface navigation AAPs with diacritics

Found by search in Global Update module but couldn’t be replaced

Code points & exact match in Global Update Old AAPs not found

Corresponding bib records deleted “Orphan” NARs Write LCCN to log file for manual review

Page 24: Automating Name Authority Record Updates and Bibliographic File Maintenance

Not Found & Search

Revised Workflow

ILS NARs

LCCNs

LC/NACO NARs

ILS

Updated AAPs

Extract

Extract

Search

Found & Retrieve

Compare

Global Update

ARN- LCCNLog File

Not Found/

Multiple Hits

Merg

e

Retrieve New LCCN

Search

Fishy NARsUpdated NARs

Overlay by MarcEdit

AAPs Not Found

Page 25: Automating Name Authority Record Updates and Bibliographic File Maintenance

Test Results

82,398 NARs tested 81,362 NARs needed to be overlaid* 4,584 AAPs became obsolete 10,900 bib records had at least one heading flipped* Many NARs exported from ILS do not contain field 005

Page 26: Automating Name Authority Record Updates and Bibliographic File Maintenance

Limitations Identities broken out from undifferentiated

NARs can’t be detected Partially taken care of by “New Headings Report”

AAPs have no corresponding NARs Non-Latin script parallel APs in Field 880 Scalability issues

Slow export using MarcEdit Slow “Global Update” process Memory intensive XSLT process

• “Java heap space” out of memory error

Page 27: Automating Name Authority Record Updates and Bibliographic File Maintenance

Possible Enhancements

“Data Exchange” module for NAR overlay Data Exchange module – record load function Manual intervention needed

SQL backend of Sierra (Sierra DNA) Write SQL commands to batch changes But, EDIT function not yet available through SQL command

AACP (Automatic Authority Control Processing) Flip AAPs matching 4XX in NARs to corresponding 1XX in an overnight

process Replace “Global Update” with AACP

• “Rig” undated NARs by inserting obsolete AAP as 4XX• Export “rigged” NARs to ILS to trigger the overnight process• Overlay exported “rigged” NARs in ILS with original updated NARs

Page 28: Automating Name Authority Record Updates and Bibliographic File Maintenance

Questions?

Lucas Mak ([email protected])