A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

14
Archives and Museum Informatics 12: 221–234, 1998. © 2000 Kluwer Academic Publishers. Printed in the Netherlands. 221 A Case Study of EAD Implementation at Durham University Library Archives and Special Collections RICHARD HIGGINS Durham University Library Archives and Special Collections, Palace Green, Durham, DH1 3RN, England Abstract. Issues arising from using EAD for small-scale retrospective conversion of non- standardized item level finding aids for a broad range of materials. Introduction Every project is the product of its circumstances, and the progress of this EAD implementation has been moulded by the combination of opportunities and the existing state of finding aids at Durham. The opportunity arose in 1995, with the award of a four year funded project to create an “information system” under the Follett initiative. 1 While the objectives were vague enough to allow for experimen- tation, and the result could not have been predicted at the time, the fixed term nature of the task has had some positive benefits on its outcome. Whereas it is tempting in an archival setting to start working to a rather protracted time scale, in which years may pass before anything is even decided upon, much less produced, with a four year deadline it was necessary to produce results quickly without the luxury of too much debate. This of itself led to a useful compromise: clearly there would be no time to recast existing handlists, so it would be necessary to use a format within which all of our existing finding aids could be accommodated verbatim. Given the utterly mixed structure and content of the Durham finding aids, a Procrustean solution was required. Laxity of structure is not always a problem however, and it can be turned to advantage in retrospective conversion: although it will reduce the speed of the process and mean far less automation is possible and more intervention will be required, a loosely organised finding aid does allow for more leeway in 1 The programme was called Non-Formula Funding of Specialised Research Collections in the Humanities, and further information can be found at its website <http://www.kcl.ac. uk/projects/srch/>. The project was never going to complete the retrospective conversion of all finding aids, but was intended to create a system in which the work could continue. It will have created about 250 EAD encoded finding aids during its course.

Transcript of A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

Page 1: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

Archives and Museum Informatics12: 221–234, 1998.© 2000Kluwer Academic Publishers. Printed in the Netherlands.

221

A Case Study of EAD Implementation at DurhamUniversity Library Archives and SpecialCollections

RICHARD HIGGINSDurham University Library Archives and Special Collections, Palace Green, Durham, DH1 3RN,England

Abstract. Issues arising from using EAD for small-scale retrospective conversion of non-standardized item level finding aids for a broad range of materials.

Introduction

Every project is the product of its circumstances, and the progress of this EADimplementation has been moulded by the combination of opportunities and theexisting state of finding aids at Durham. The opportunity arose in 1995, with theaward of a four year funded project to create an “information system” under theFollett initiative.1 While the objectives were vague enough to allow for experimen-tation, and the result could not have been predicted at the time, the fixed term natureof the task has had some positive benefits on its outcome. Whereas it is tempting inan archival setting to start working to a rather protracted time scale, in which yearsmay pass before anything is even decided upon, much less produced, with a fouryear deadline it was necessary to produce results quickly without the luxury of toomuch debate. This of itself led to a useful compromise: clearly there would be notime to recast existing handlists, so it would be necessary to use a format withinwhich all of our existing finding aids could be accommodated verbatim. Giventhe utterly mixed structure and content of the Durham finding aids, a Procrusteansolution was required. Laxity of structure is not always a problem however, and itcan be turned to advantage in retrospective conversion: although it will reduce thespeed of the process and mean far less automation is possible and more interventionwill be required, a loosely organised finding aid does allow for more leeway in

1 The programme was called Non-Formula Funding of Specialised Research Collectionsin the Humanities, and further information can be found at its website<http://www.kcl.ac.uk/projects/srch/>. The project was never going to complete the retrospective conversion of allfinding aids, but was intended to create a system in which the work could continue. It will havecreated about 250 EAD encoded finding aids during its course.

Page 2: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

222 RICHARD HIGGINS

forming the result. While the solution applied at Durham was entirely dictated bythe peculiar development and history here, it may provide useful information forother implementations, and has certainly provided a severe testbed for evaluatingEAD.

The Archival Situation

Durham University Library Archives and Special Collections (DULASC) is theproduct of several institutions, and its heterogenous origins did not contributetowards any underlying consistency of holdings or their description. If any char-acteristic of Durham practice can be adduced it is that of item-level listing, andtherefore lengthy handlists. Although many of the finding aids were made avail-able for sale, they were clearly primarily intended for use within the search room.They provide very little contextual information; presumably this would have beensupplied by asking the archivist on duty in the search room. An online handlist willbe used in a quite different context, one where there is no reassuring archivist withthe accumulated knowledge of a lifetime present to explain the intricacies glossedover or alluded to by the description. Not only does the release of a handlist outsideof the repository place it in a different context, which requires more explanation tobe present, but the creation of a long-term handlist format such as is implied by theuse of SGML has to confront the need to record the archivist’s knowledge for usersfar in the future who may have to work without being able to consult directly theoriginal cataloguer.

Existing finding aids at Durham span the usual categories: handwritten cardindices or ring-bound volumes, stab-bound duplicated typescript and several gen-erations of computer files, culminating with WordPerfect 5.1 for DOS, the standardmethod in place at the start of the project. The margins of the search room copieshad by now been filled with annotations, notes of relocation or reinterpretation, andeven references to extra sheets of description in even greater depth filed elsewhere(not all of which turned out to be easy to trace).2 No house style existed: listingbroadly followed the principle of transcribing the details from the document. Giventhat DULASC is in any case the product of several institutions, this means thatthere is no regular format for which an easy conversion program could be designed,while the time constraint on the project allowed no scope for accessing the originalmaterial and rewriting the finding aids. The resulting online lists would lack aconsistent structure, and have no authority control, so the best course was to tryto integrate these failings into the solution. Clearly there would be a requirement toadd an authority control layer, but as there was no possibility of doing this duringthe conversion the chosen format would have to allow for adding control terms asand when resources became available.

2 A few examples are viewable at<http://www.dur.ac.uk/Library/asc/sgml/paperfa.html> for thebenefit of all those whose repositories have no such materials.

Page 3: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

A CASE STUDY OF EAD IMPLEMENTATION 223

A further problem with the data was that while there were item descriptions, noinformation was made available at collection or major series level, which exacer-bated the inadequacies of the data as far as remote access was concerned. WhereasEAD evolved within a mature culture of MARC-AMC collection descriptions,from which institutions could link to their new online finding aids, no such traditionor data existed at Durham.3 In fact the first phase of the project involved compilingthis information from the knowledge of existing curatorial staff, and mounting iton a website in ISAD(G) influenced prose as simple HTML pages.4 As well asproviding a means of arranging the collections into manageable groups, the rapidgrowth of a website that quickly consolidates information is an excellent demon-strator of what is possible. One reason the WWW has taken off so quickly is thatit advertises itself: without necessarily understanding how it works it is possible tosee what it does, and it provides textual information in a fast and simple fashion.The main deficiency of HTML was the lack of any structural elements, which waswhere the enriched information storage of EAD became a clear solution, and soonafter the alpha version of EAD was made publicly available, it was adopted as theformat for Durham’s retrospective conversion work.

Choosing EAD

Given the expansive item level descriptions typical of Durham’s handlists, lengthyprose passages are a common feature which need to be accommodated by a findingaid system.5 While there are database systems tailored for archival cataloguing, afield based system does not give much flexibility for prose presentation, and onefactor that united Durham’s existing lists was the luxuriance of existing descrip-tion. The solution chosen allowed for the combination of what would never bestandardized description with authority controlled index terms. While it is possibleto use EAD under a rigorous controlled regime, if the existing content permits,it can also afford great freedom to include wide variations of description wherethese are closer to the norm. It is clear to see the reasons for MARC, which wasdesigned to be a data exchange format, controlling the usage of all its fields. Sucha requirement is less clear in EAD, which has far less of a need to work as a meansof data exchange. Archival objects tend to be unique, so few opportunities willarise either for repositories to economise by reusing existing catalogue records, orof their creating confusion with two radically different descriptions of instances of

3 This seems a common situation in Britain, where the lack of any one standard approach tolisting methods such as APPM provides in the United States may make any training in EAD far moredifficult here. It will be difficult to establish a standard training programme without first having astandard method of describing archival collections.

4 DULASC website:<http://www.dur.ac.uk/asc/index.html>.5 The basic reasons for using SGML were covered in a previous article, “Standardised languages

for data exchange and storage. The Encoded Archival Description: using SGML to create permanentelectronic handlists”,Business Archives Principles and Practice73 (May 1997): 33–47, available at<http://www.dur.ac.uk/asc/eadarticle.html>.

Page 4: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

224 RICHARD HIGGINS

the same object. This is not to be taken as an argument for abandoning descriptivestandards, but rather for using them where they can be realistically controlled. EADhas a complete range of elements to accommodate index terms (approximating tothe fields of MARC A – theUSMARC Format for Authority Data), which canbe used in many parts of the description – indeed one of the early phases ofEAD markup is characterised by a compulsion to mark every person and placewith <persname> and<geogname> tags. All of these elements can be used in awrapper element<controlaccess> and attached at the relevant level of descriptionto provide access terms for a collection, series or item. Furthermore, if desired,the contents of<controlaccess> can be hidden in most presentation software,which allows online searching or printed index generation with the authoritativesubject, genre or name term. With authority control being exercised over thesefields, there seems less need to control the other parts of the description. Whenremote searching is established for EAD finding aids, these are the fields thatwill be searched, and the rest of the description is what will be returned as thesearch result. Providing this result is comprehensible, its content will not requireany further conformity.

The project centred upon the creation of an online system, but this was not theonly local requirement. It was felt that printed handlists were still an importanttool, both in the search room and as something that could be provided for externalusers. A great deal of effort was being expended on producing these from WordPer-fect, manually adjusting page breaks and fiddling with settings, or photocopyinga gradually degrading typescript. It was clear that the amount being charged forthese handlists hardly covered the internal administration cost, much less that ofproducing a printout or photocopy. If the onus for finding and printing the findingaid is devolved upon the user, the previous revenues generated by occasional salesare outweighed by saving of staff time and repository resources. Thus it seemedsensible to try to make the finding aids available to searchers and allow them toprint them out if required, which has been done, while retaining the potential toproduce well-formatted printed handlists here as well.

Conversion Method

With no consistent structure to the legacy finding aids, it was not possible to devisea simple conversion process. Duplicate typescript lists could be sent out, but asignificant part of the material only existed in a unique version: not only would itbe a risk to send out and possibly lose these, but it would leave no means of usingthe collection while absent. None of the finding aids scanned successfully: thehandwritten ones eventually proved enough of a challenge to human comprehen-sion, while the typescripts had been duplicated to an extent that had degraded andfractured the original typeface beyond the powers of OCR. The method adoptedwas to send out a large block of the material for which spares existed, and use localtypists for the rest. In both cases the system was the same: rather than spend time

Page 5: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

A CASE STUDY OF EAD IMPLEMENTATION 225

trying to impose structure where little existed, the lists were keyed as basic text(without formatting, but using some extended characters which could be replacedwith entities). The initial block of typing provided enough marking up work toallow for the slower processing of the handwritten lists, although we were verylucky in our typists who where able to deal more quickly and accurately withdifficult material than could have been hoped or planned for.

Rather than having to work through the lists beforehand annotating them withkeying instructions, this was postponed until after the keying returned. At this pointthe resulting text was checked during the marking up stage – while this seems notto take full advantage of the keyboarders to do as much of the work as possible,it required only a single reading of the lists in question by combining the proof-reading and tagging processes.

Most item level listings comprise a reference number, date, content and physicaldescription, for which the markup process involved wrapping the elements aroundeach body of text, and deriving attributes for an ID reference for each referencenumber and a normalized form of each unit date. However, these appeared indifferent orders, with more or less description, and many date formats reflectingthe original means of dating the document. In a very few cases, where the structureof the finding aid was regular over several hundred pages, it was possible to auto-mate the ensuing markup process, using awk scripts, but this was an occasionalbonus rather than a regular feature. Most of the markup was added manually, asthe text was checked against the original, and the surrounding annotations that hadaccreted to the original description was added. WordPerfect 7 for SGML provedan extremely apt tool for the task. It has an extensive macro facility, which couldbe adjusted to reflect as much structure as each document possessed, and in manycases it was possible to create a macro that could markup a single item and stop atthe point where editorial intervention was required, or simply to allow for checking.It also allows the process to be done in ASCII and saved as SGML with minimaleffort, using only a simple batch conversion script to substitute three characters.Some conversion could be done by simple search and replace, but it was oftenthe case that this was less automatable than would be assumed. It quickly becameclear, for example, that the most overused character glyph was “′ ”, which servedvariously as an apostrophe, opening or closing single quote, the abbreviation forfeet in measurements, the general abbreviation mark in medieval Latin and as thetransliteration of several Arabic characters. In very few cases did it serve only oneof these roles throughout a finding aid, and so it had to be replaced with the correctentity at every instance, demonstrating only one possible pitfall of automating theprocess.

One frequent demand during the project has been for a statistical representa-tion of the progress of the conversion. In the entire course of this process, it wasimpossible to come up with a satisfactory unit by which to measure this. Suggestedcriteria were records, finding aids, collections, and printed pages. Viewed acrossthe repository, none of these really proved workable: an item description varied

Page 6: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

226 RICHARD HIGGINS

between one line and two pages (which also invalidates the use of the printed pageas a unit, even if it were not for different typefaces, margin, and even page sizes).Finding aids ranged from two to over eight hundred pages in length, while onecollection might have a single finding aid, another ten or even a hundred more.Fortunately the project was able to justify itself by its appearance online, withouthaving to produce too many statistics, as it would have been very difficult to havecorrelated progress with any of the suggested criteria.

Software

A major obstacle for implementing EAD has always been the software. Nearly allinquiries received from other archivists about the work in progress at Durham haveraised this question, and it is not easily answered. The most commonly suggestedsolution, that XML will produce a supply of cheap new tools, was of no help in afixed term project, which is now coming to an end before XML has developed intoan option backed by mature software. As has been described above, the conversionwork was done using WordPerfect for SGML, which proved an effective tool foradding markup to text, so the next priority was for a display mechanism. At thetime the only cheap option for this was Softquad’s Panorama, which had to beinstalled on the client machine. In spite of the then existence of a downloadablefree version of Panorama, it was never going to be viable to expect all users toobtain and install this. The important role that Panorama played was local, in thatit provided an excellent tool for learning how SGML functions. The creation ofstylesheets and navigators is quick and simple enough to experiment with how theattributes and context can control the appearance of the EAD file. Even if it playsno part in the final implementation of an EAD system, Panorama should be used atthe development stage, if only as a learning tool for the implementors.

Aside from the probability that it will not be present on the client machine,Panorama had the serious flaw that it could only deal with one file at a time. Whatwas required was a means of dealing with all the potential finding aids as a group, orany possible sub-sets thereof, and which could communicate with client machinesusing generic software. To do this, Dynaweb was chosen as a low maintenanceserving system. This required the purchase of Dynatext, the SGML presentationsystem, as well as Dynaweb, which converts these files to HTML and serves themout to clients. Dynatext compiles files from SGML which it presents on Dynatextbrowsers according to stylesheet controlled rules, while a further set of rules con-trols the conversion to HTML (and XML if required) and the included web serverprovides the distribution system. The installation at Durham is based on UNIX,where a central Dynatext system resides: this is translated by the Dynaweb systemfor external use, or attached as an NFS drive to PC’s with Dynatext browsers forinternal use, including a recently installed public terminal in the search room. Themain constraint on Dynaweb’s presentation is that it can only do what HTML cando, which often means mapping complex structural combinations to their nearest

Page 7: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

A CASE STUDY OF EAD IMPLEMENTATION 227

HTML equivalent. It also means less flexibility with languages, as the charactersets for transliterations will not exist on WWW browser equipped client machines.The server is also used to distribute the raw SGML files, allowing them to beused directly with Panorama or accessed by SGML aware software such as thatused by RLG to compile their Eureka archive catalogue.6 Dynaweb is often cri-ticised for its cost and speed. It appears expensive in comparison with desktopPC software, but it is industrial strength software, as is much that is designed forSGML. Fortunately at Durham there were several projects underway that requiredan online presence, so it was possible to combine several budgets in order to obtaina minimal implementation. Dynatext is also generic software, and so could beused for other text based requirements beyond that of handlists (which unfortu-nately has not yet happened at Durham, although some work is in progress onproviding transcriptions of documents). The speed question is difficult to resolve,as many of the factors depend upon the capacity of the network rather than thesoftware. However, the way in which Dynaweb automates the process of con-version does reduce the amount of staff time required, and removes one possiblearea of error in the document control process, one which gains in importance asthe number of EAD finding aids increases into the hundreds. It also provides arudimentary mechanism to limit the quantity of text served out for each request,which can improve performance on the Internet where large unwieldy handlists areinvolved.

The whole question of document control does not often seem to have beenaddressed in current EAD implementations, indeed archivists often seem to payless attention to the finding aid than any other type of document in their care. AtDurham, not only had computer files for some finding aids proven impossible tofind at all, in other cases the search revealed several versions on different com-puters. The central storage of data implicit in this implementation provides anexcellent start to a document control system by providing an obvious place to keepthe single authoritative data file. It seemed sensible to look at how SGML wasused in larger organisations, which meant investigating commercial software. Onthe basis that it was a tool designed for technical writers, whose requirements aresimilar to those of archivists producing handlists, and with the additional incentivesof its strength in controlling printing and a very competitive educational pricingpolicy, Adobe’s FrameMaker+SGML was examined as a possible editing tool. Thispowerful package produced some excellent results, but has revealed one of EAD’sidiosyncrasies. Like many commercial SGML products, FrameMaker+SGML isoriented around the book model – a two level structure of a wrapper around aseries of chapters. However EAD does not quite follow this structural feature,which makes it far more difficult to implement (although it is of course one of thegreat strengths of the DTD). Instead of chapters, which form regular sub-divisions,

6 <http://flambard.dur.ac.uk:6336/sgmlink/>. However, it should be noted that the Panoramastylesheets for these are not maintained at present, so display results may not be satisfactory. Thisroute is primarily provided for simple access to EAD files.

Page 8: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

228 RICHARD HIGGINS

the container structure of EAD means that large sections of the listing can beembedded at different levels within the<dsc>, which can only be treated as asingle segment. Following the book model, a large file will be divided into chapters,whereas a<dsc>, because it is in turn embedded in an<archdesc>, cannot besplit into parts. This means that software designed around the book model, such asFrameMaker+SGML, is unable to use the DTD structure to break the EAD file intosections one level into the file, as the sections are always buried several levels deep.This makes it impossible to use the software’s mechanism to break up the EADfile into manageable pieces and has required the creation of an interim automatictransformation process. As a customisable editor FM is now proving very effective,the printing highly controllable, and it adds the bonus of generating Acrobat PDFfiles which provide another alternative means of distributing our finding aids.

Basic Problems with EAD

Certainly the most irritating feature of EAD’s structure when working at item levelis the permitted content of the<did>. Whatever their purpose or function maybe at higher levels, the available elements do not work well for item level listing.Even amongst our inconsistent finding aids, there were always common features,a reference number, usually a date, a description of the content of the item, andsome physical description. Although the order varied, it always seemed to embedthe content description in the middle, which created a record that was easy toread and comprehend. According to the<did> model, this was not possible, asthe<scopecontent> element always had to trail along afterwards, excluded fromthe<did>. It seems that the primary function of the<did>, to provide a smallextractable chunk of data, while useful at series level, is not so suited to itemlevel description. The later inclusion of the<abstract> element in the<did>only serves to emphasise this: it does not allow for any more than a brief pieceof text (<p> paragraphs not being permitted) – once more understandable whensummarizing a series of materials, but item listing using both<abstract> and<scopecontent> becomes simply pleonastic. Once calendar entries were takeninto consideration, which often required a dozen descriptive subordinate elements,it became clear that the descriptions were showing too many signs of the vio-lence required to recast them into the<did>’s format. It was decided to use theaccommodating<note> element to contain the content description. This was onlythe start of the wholesale tag abuse that<note> has suffered here since. It hasproven the most versatile of elements, but care has been taken always to add anattribute, so that all its separate functions can be distinguished. If, in future, aparticular element usage is changed, or a new more suitable element is allowed,translation will be simple, and controllable, not resulting in changes to other uses of<note>.

The<note> element solved another problem involving data stored in attrib-utes, which is more of a fault of SGML software than EAD. The most relevant

Page 9: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

A CASE STUDY OF EAD IMPLEMENTATION 229

example for this is the langmaterial attribute of the container elements<c01>etc. This has been the suggested place for recording the language of the materialbeing described, but three features of many SGML packages militate against itsuse. In some cases, browsers will not find text generated from attributes: althoughthose who understood this could do more a complex search on the contents of thelangmaterial attribute, while you would see “French” displayed on the screen, afree text search would not find that text. In all cases, controlling exactly where thebrowser displays this attribute-derived text proves difficult with the options usuallyallowed, which tend to be at the start or end of the container element. With itemlevel descriptions of varying lengths, sometimes with custodial or alternate avail-able forms information following, the sudden appearance of a line saying “French”can appear out of context and less than clear in purpose. A final minor problem isthat most software will not recognise character entities in attributes, which wouldhave caused problems with the names of some obscure languages. The use of a<note> element, with an encodinganalog attribute set to “ISADG4.4”, allowedthe descriptor to be placed exactly, and to be searched for, as well as retaining thepossibility of future conversion to a more satisfactory option.

A more general structural problem with EAD relates to distribution. It isdesigned to break finding aids down into analytical sections – such as<c01>components and their subordinate files and items, which do not tend to come insimilar and convenient sizes. For distribution on the Internet, breaking a large fileup is desirable to optimize the speed of access, but there is no mechanism suitablefor this built into the EAD structure. As it is difficult to see how such a mechanismcould be built into the element structure, perhaps it is enough to rely upon the dataserving system to perform this task.

Some Conversion Examples

(i) DURHAM DEAN & CHAPTER MUNIMENTS

The Durham Dean & Chapter Muniments are probably the best preserved set ofrecords of their kind, and are further distinguished by having been looked after andused continuallyin situ for nearly a millennium, and for retaining much of theirmedieval organisation. They record the administration of Durham cathedral, fromits medieval life as a Benedictine priory, through the dissolution of the monasteriesto its modern status as a cathedral. The finding aids follow traditional models: in the15th century a large repertory, theMagnum Repertorium, was created to providedescriptions of the documents and their existing organisation, and it is still usedtoday as a finding aid for parts of the collection. In the 19th century some sortingof remaining material was done, and listing (still in Latin, for Latin documents,but English for those in English or French) continued. Following the formation ofthe University’s Department of Palaeography and Diplomatic in the late 1940s (for

Page 10: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

230 RICHARD HIGGINS

which the deposit of the Muniments formed the foundation collection), many ofthe medieval classes of documents were calendared in extreme detail.7

In spite of the sheer bulk of description devoted to this collection, there werefeatures that were ideally suited to SGML, aside from its promised longevity,especially in the existing practice of describing the document in full at its ear-liest occurrence, and referring to this for each subsequent instance. As well as theoriginal document, it would be registered in one or more cartularies, and oftenrecited in subsequent documents as evidence of prior enjoyment of privileges. Thisancient archival system is reflected very pleasingly in EAD: cartularies can largelybe described with a series of hypertext links to originals, and conversely documentsthat have not survived can be reconstructed from their later copies.8 The existenceof repertories of documents has also meant that where calendars do not yet exist forclasses of documents, it has been possible to create skeleton handlists indicating thequantity of documents, and whether they are now missing or transferred to anotherclass. This has the additional advantage that some documents have been calendaredin the past in the course of compiling generic guides, although the class withinwhich they are located has not yet been fully listed. By inserting these into theskeleton finding aids, no available descriptions need be wasted, and in instanceswhere documents have been printed the searcher can at least be pointed towardsthis, even if there is as yet no catalogue entry.

However, there are limitations of the EAD structure that become evident whenworking at this level of detail. The division of elements within the<did>, andthe allowance of only one<did> element within the<c0?> container, makesome types of documents difficult to render. The major problem is with docu-ments that contain others within them. A good example of this is the item thatrecites earlier examples within it, as can be common in documents narrating legalcases, or theinspeximus. As only a single<did> is allowable, once the containingdocument has been described, there is no facility for including within that severalothers, exactly where they appear, as is the practice with calendaring. Two possiblesolutions arise: one, the re-entrant<did>, involved editing the DTD to allow thecontainer<c0?> element to contain more than one<did>. This actually servedthe exact purpose, but removed any vestige of conformity. Although this is less ofa problem with SGML, as in many cases the handlists turn up bag and baggagewith the DTD files and stylesheet, it is still a departure from EAD’s design. Thesolution eventually was a compromise, repeating the elements within the<did>and differentiating them with attributes. This ends up as a misuse of SGML as

7 There is not room to cover the full complexity of this collection, but the EAD guide tothe collection can of course be consulted at<http://flambard.dur.ac.uk:6336/dynaweb/handlist/ddc/dcdguide/>.

8 An excellent example of this is the Papalia class, which as was often the case with docu-ments issued by popes, did not survive with anything like the same success as other classes.However, the short descriptions in the Magnum Repertorium and transcriptions in cartulariesallow the finding aid for this class to give a far more complete idea of what it contained<http://flambard.dur.ac.uk:6336/dynaweb/handlist/ddc/dcdpap/>.

Page 11: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

A CASE STUDY OF EAD IMPLEMENTATION 231

procedural rather than descriptive markup, but while unsatisfactory it does conveythe desired information.

A notable feature of the medieval documents is the number of surviving seals,which have been an object of cataloguers’ attention for more than a century. Aprinted catalogue appeared, itself the product of a revision of a previous cardindex,9 listing over 3,700 seals and providing photographs for over a quarter ofthese. This in turn has been extensively revised with annotations, re-attributions andsome major recasting, but at the same time has been cited in published work, andeven used in the calendaring of the muniments. There is thus a complex trail to bepreserved in the online version: not only must it include a great many corrections,but it must be possible to track these from the original reference number to thenew location. There also remains the extensive task of linking from the calendareddocuments to which they are attached to the description of the seal, which has notyet been performed.

Aside from the comprehensive description of the items in the muniments, othermetadata has accreted to the calendars. The appearance in print or manuscript ofother versions of the documents has already been mentioned, and these referenceshave easily been included. There was also scope to include explanatory notes, oftenlonger than the descriptions themselves, such as those giving the reasons for datingmedieval accounts to particular years, using the pop-up features of SGML displayto preserve a uniform basic layout, with the opportunity to allow access to greaterdetail if required.

(ii) SUDAN ARCHIVE

The Sudan Archive has been collected at Durham over the last thirty years, andnow comprises over 325 collections, mostly the papers of people who worked inthe Sudan during the Condominium period. As well as being one of the largestsources of information on the Sudan, it complements several other collections atDurham that record Britain’s colonial history. Its interest as an example of EADconversion comes partly from its non-British content, but mostly because it wasthe only example at Durham of finding aids created to a single pattern. As itturned out, this pattern fitted the EAD<drow> or tabular method, so with somemisgivings about implementing two systems, the<drow> rather than the<did>method was used. While this may be creating problems for the future trying torun two structures in parallel, there are enough handlists involved to justify someeffort, and using the two approaches raises some interesting comparisons. Unlikethe<did>, the<drow> method does not limit the elements available to certainparts of the description. Within the<dentry> the elements can be combined inany order, so the problems of the<did> raised above simply do not occur.

9 C. Hunter Blair, “Durham Seals”, inArchaeologia Aeliana, 3rd series, vols. 7–9 (1911–1913);11–16 (1914–1919). This was based upon the work of W. Greenwell, and has been revised by M.Snape. Available online at<http://flambard.dur.ac.uk:6336/dynaweb/handlist/ddc/dcdmseal/>.

Page 12: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

232 RICHARD HIGGINS

The practical implementation of the<drow>method has proven quite difficult,as it is only a partial table model, which means that software that rigorously appliestable rules has considerable difficulty interpreting it. Whereas standard SGMLtable models consist of a wrapping<table> element that contains, amongst others,a <row> element that in turn contains one or more<cell> elements, althoughEAD has the latter two elements in the form of<drow> and<dentry>, the equiv-alent of the<table> wrapper is not as simple. What tends to happen is a nestof tables: opening a<c01> starts the proceedings, and every subsequent containerelement is a table nested within it, often to the depth of several sub-divisions. Whilethis is by no means impossible for software to deal with, it is not how the designersof most SGML software would have expected it to be used. In a standard documenta table may be occasionally expected to have a table nested within it, but in thenormal course of document creation it is hard to think of a situation in which thiswould be as convoluted as it is in EAD. The best practical implementations of thetabular<drow> method that have been managed at Durham have involved side-stepping the table producing features of the software and using running headers ormore simple column models rather than a true series of tables. As with the DurhamDean & Chapter Muniments, the size of the Sudan Archive has meant creatingan overall finding aid for the entire group with links to the finding aids for eachcollection. Rather than using the eadgrp DTD to clump together many finding aidsinto a rather large file, it has proven easier to produce an EAD finding aid for theArchive.10 From this, links are created to the finding aid to each collection as thesebecome available.

(iii) PHOTOGRAPHIC MATERIAL

As well as the seal catalogue, other objects have been included within the EADformat. Fragile materials such as glass lantern slides or negatives have beenrendered more usable by scanning and attaching the image to the catalogue entry.Aside from reducing the risk of breakage if these items were produced in the searchroom, the result is easier to use as a screen image than relying upon a light boxor the light from a convenient window. One entire collection of glass negatives,produced by a local photographers firm, and comprising some 2,000 images, hasbeen placed online, creating a more practical means of accessing the material.11 Aswith the finding aids themselves, the images can be placed on the Internet wherethey are readily accessible, and there does not seem to be any risk that these lowresolution images will lead to any great loss of revenue for the repository. Themajor obstacle is not related to the technology at all, but that of obtaining copyrightclearance, so the images made available so far are those for which the repositoryowns the copyright.

10 The guide, with links to all collections listed online, is at<http://flambard.dur.ac.uk:6336/dynaweb/handlist/sad/sudan/>.

11 The Edis Negatives, at<http://flambard.dur.ac.uk:6336/dynaweb/handlist/pho/edisneg/>.

Page 13: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

A CASE STUDY OF EAD IMPLEMENTATION 233

Implications of Online Finding Aids and Carrying forward the Project

At the outset the question was raised of how being online affected the finding aid.Much archival listing practise has relied upon context – it is implicit in the basicprincipal ISAD(G)2.1.1, that of description from the general to the specific, andrequired by the last of the sequence ISAD(G)2.1.4, which stipulates non-repetitionof information.12 Aside from the obvious problems, ditto marks or “as above”which can be expanded at conversion if spotted, there are subtler problems whichcan arise. By putting material online control is lost over how it is presented, whichcan be more important than a change of background colour. It is likely that theitem description that makes sense in context will be retrieved as an isolated record.Where in the past the approximate date of an undated item could be estimated fromits context in a sequence or series, now a date, at the very least a century, needs tobe supplied. Even a few undated items can invalidate the entire process of datesearching. Furthermore, a great deal of contextual information has to be added tothe finding aid in a place where the information will be found by the end user,on the assumption that they will not have an in depth knowledge of the collectionbeforehand, and be looking at the finding aid as one of the last stages of theirsearch. Rather, it will be likely that they have jumped to it from a general search,and are now in the process of sorting through a morass of unrelated results, whichif they have been retrieved from deep within a finding aid, may have lost some ofthe contextual information. The obvious place for such description would be in theintroduction to the finding aid, but will the searcher refer all the way back therefrom the item their search has initially located? This remains to be seen, and ispartially a problem of presentation of information, but could also be helped by thelayout of the search mechanism. At least with Dynaweb, or Panorama, the resultsof a search are displayed in context, and the option is there to work back to the toplevel of description. The difficulty is making sure that the searcher does this beforegetting to work on the retrieved data, perhaps oblivious of important informationstored at a higher level.

The major issue still to be dealt with here also relates to improving the searchingsystem. This is the establishment of an authority system, which as already indi-cated, was absent from the existing legacy data, and which it was decided to leaveto one side until after the conversion process. Given the presence of this materialon the Internet, this needs to be global rather than local, which creates severalrequirements. An established set of authorities will prevent too much experimen-tation being required, and will allow the finding aids to be used in conjunctionwith an existing body of material that already uses these standards, and in orderto be able to catalogue in an online environment, the authority thesauri need to beavailable online. In view of this, and as part of a library that already uses AACR2and LCSH, the Library of Congress maintained name and subject authority filesseem the obvious path to pursue. This does not seem to be a common approach

12 ISAD(G): General International Standard Archival Description(Stockholm, 1993), p. 11.

Page 14: A Case Study of EAD Implementation at Durham University Library Archives and Special Collections

234 RICHARD HIGGINS

in Britain, where other authority schemes have been suggested. As these have notbeen used extensively, and do not have the background of sustained online use,it seems sensible to adopt a mature system over one that does not yet exist, andone with an existing international userbase. While many criticisms are levelled atthese authority files, it is unlikely that any such system will be ideal, indeed themain feature will always be compromise – the primary purpose after all is stand-ardization rather than perfection. Such standards as are being devised nationallyare still in development, and lack practical implementations. On a cost level alone,it seems unlikely that the massive infrastructure required to place and maintainthesauri online, as the Library of Congress has done, would be a practical exerciseto duplicate.

Once authority controlled index terms have been added, the data will be suitablefor distributing via a Z39.50 server, which will complement the existing system.This will allow for the integration of Durham’s finding aids into an internationalnetwork of information resources, using SGML as the storage medium and provid-ing access to individual records or entire finding aids distributed in a variety offormats that can be downloaded to screen or printer. The heavy annotations inthe margins of many of our paper finding aids show how they have always beenevolving, live documents, and it is to be hoped that they will now be available toanyone who wishes to consult them.