Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in...

31
Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 Cataloguing manuscripts in an international context Experiences from the Europeana regia project

Transcript of Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in...

Page 1: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Cataloguing manuscripts in an

international contextExperiences from the Europeana regia project

Page 2: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 2

• „A digital cooperative library of roal manuscripts in Medieval and Renaissance Europe“

• Co-funded by the European Commission (ICT-PSP, 50%)• Jan 2010 - June 2012• 5 partners

– BnF Bibliothèque nationale de France (and many municipal libraries)

– BSB Bayerische Staatsbibliothek, Munich

– BHUV Biblioteca historica, Universitat de Valencia

– HAB Herzog August Bibliothek Wolfenbüttel

– KBR Bibliothèque royale de Belgique, Brussels

Europeana Regia: partners

Page 3: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 3

• 3 collections– Bibliotheca Carolina (8th/9th cent., 425 mss)– The Library of King Charles V of France (14th cent., 167 mss)– The library of the Aragonese Kings of Naples (15th cent., 282

mss)

• Information in 6 languages– Catalan, Dutch, English, French, German, Spanish

• Descriptions in 5 formats– MARC, EAD, TEI, MAB, MXML (=format of the German

national manuscript database Manuscripta Mediaevalia)

ER: collections + formats

Page 4: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 4

1.Management2.Specification of metadata3. Integration of metadata4.Digitisation5. Integration of images6.Dissemination

WP leaders: Bnf 1+6, HAB 2+3, BSB 4+5

Workpackages

Page 5: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 5

• This work package aims at consolidating the list and format of metadata to be used by each participant.– formats– level of metadata– quality of metadata– organise ingestion format

WP2 Specification of metadata

Page 6: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

6Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.1 State of the art in metadata• global survey of cataloguing and metadata standards• obtain a matched OAI extraction despite the different

formats used in libraries (EAD and TEI e.g.)– BNF: EAD– KBR: local DB, had to decide upon the format (=TEI)– BSB: MAB → METS/MODS, MXML → TEI– HAB: TEI– BHUV: MARC-XML– Lyons: special data, with mapping to MODS

• quality and amount of metadata varied heavily6

Page 7: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

7Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.2 Minimum metadata• Which information about a manuscript is necessary as

a very short overview and sufficient for basic orientation?– the manuscript identifiers (actual and former shelf marks,

hosting institution or possessors)– a summary title– basic historical information (date of origin, place of origin,

previous owners)– basic material information (material of the support, number of

leaves, size of leaves)– introductory bibliographical information

7

Page 8: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

8Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.2 Minimum metadata• Europeana‘s view (according to ESE v3.2):

– Obligatory: europeana:rights– Strongly recommended

• dc:title• dcterms:alternative• dc:creator• dc:date• dc:contributor• dcterms:created• dcterms:issued

– Consider, how your data will perform in response to „who“, „what“, „where“, and „when“ queries

8

Page 9: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

9Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.2 „Academic“ metadata• All information besides this minimum level is - together

with the minimum level - what is called "academic metadata" in the project, and will have to be added in a second step.

• The project partners will make sure that very important bits of information (authors names, standardized titles of the contained texts) are accurately provided.

• References to norm data like VIAF will be included in the descriptions.

• These special metadata will be translated in the relevant languages.

9

Page 10: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

10Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.3 Vademecum for librarians• study of the existing descriptions (printed catalogues,

card files, computer files)• selection of common metadata to be provided by each

participant, formalised in a guideline for librarians and academic staff

• description of the format of metadata, in TEI, EAD and MARC

10

Page 11: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

11Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

D2.4 Attractive Guidelines• Guidelines (were intended to) cover the areas

– Content aggregation, metadata, image processing– Now: description of the project, the partners, the collections,

minimum metadata, and user's requirements• They (were intended to) address an external audience

– Librarians, scholars, professionals– Now: interested public

• „Technical description“ (former D2.1 – D2.3) will be published separately

11

Page 12: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

• Integration of the existing (minimum or academic) metadata in each library’s system,

• Description of the digital object with a table of contents editor or image–related XML file, to update the metadata according to recent research (i.e. : date, patron, artist, origin)

• Eventually providing more detailed information in other internet resources

• Ingestion of data in Europeana

12

WP3 Integration of metadata

Page 13: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

13Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

WP3 Progress on Metadata• Librarians will enter the minimum existing data in each library’s

system, as a first step, in order to make these metadata immediately available for Europeana.

• Metadata will be updated and (if necessary) amended, following a scheme that will allow improved access to the digital copy and agreed among the project partners.

• Full descriptions of the manuscripts will be accessible in specialized databases, e.g. Manuscripta Mediaevalia.

• As far as possible this information should be made available to Europeana.

13

Page 14: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

14Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Mapping data: General rules• Map as many of the original source elements as possible to the

available ESE elements• If this is not possible, leave it unmapped or consider using

<europeana:unstored>• If possible use one of the more specific <dcterms> refinements• Consider how to meet expectations of the user and the

functionality of the system best• Consider how the data would perform in response to “who, what,

where and when” queries. This therefore encompasses names, types, places and dates relevant to the object and what it depicts

14

Page 15: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

15Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

WP2 vs. WP3• WP2 = Theoretical foundation (Me)• WP3 = Practical implementation (Stefanie Gehrke)

15

Page 16: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

Local presentation: HAB

Page 17: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

Europeana: Search result

Page 18: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

Europeana: Detail view

Page 19: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

europeanaregia.eu: Homepage

Page 20: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

europeanaregia.eu: byRepository - HAB

Page 21: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30

Wolfenbütteler Buchspiegel

europeanaregia.eu: ms detail

Page 22: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 22

• All institutions needed to adapt their cataloguing formats– Customise the ENRICH-TEI schema, adapt TEI, adapt

AMREMM

• Aggregation via TEL (The European Library)– obligatory: rights declaration → tei:availability– obligatory: thumbnail → tei:pubPlace/tei:ptr– needed: project / collection → tei:projectDesc

• Delivery of ESE, preparation for EDM– EDM still under construction, but possibility “to represent”

manuscript data

Decisions: customise + delivery

Page 23: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 23

• One TEI file for the manuscript, referencing the facsimile and resources (descriptions, websites, etc.)– Only „minimum“ metadata, ready to export to Europeana– Will be updated– <msDesc> = metadata, to be stored in <sourceDesc>

• One TEI file for each description– As rich description as it has been originally– Will stay as it is, as it represents an „historical“ document– <msDesc> = data, to be stored in <text>

Local Decisions: Theory

Page 24: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 24

• TEL hasn't dealt with TEI themselves– Crosswalk had to be implemented → we supplied XSLTs

• How to – make sure every institution submits the same set of

information?– make similar information from different formats look

alike?• Refinement of the mapping table prepared in WP2

Mapping: Practice

Page 25: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 25

• Translation– Some of the basic categories can be translated

(semi-)automatically (names, dates, etc)– In order to „avoid“ translation, use latin names and text-titles

• Harmonisation– Done during transformation (e.g. via OAI-MPH) respectively

during processing by aggregator→ break with habits: in TEL summary title contains shelfmark

• Normalisation, semantic quality– Norm data like VIAF, TGN, etc. shall be applied wherever

possible

Decisions: translate + harmonise

Page 26: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 26

• Subject classification– In order to allow for browsing through repositories, subject

classification would be helpful → special index entries?

• Ontologies– Norm vocabulary would be helpful, e.g. for bindings,

decoration, muscial notation, scripts, etc.→ cf. Europeana Regia's TEI customisation

Wish-list

Page 27: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 27

• The project has seen many changes– Implement workflows for the first time (digitisation + metadata

- KBR; OAI for mss → HAB; norm data → BHUV, KBR)– Adaptations (AMREMM – BHUV)– Delivery to Europeana → aggregation through TEL– Adapt export formats → harmonisation through TEL

• Adaptation of OAI difficult for BnF/BSB → selection by TEL

– ESE → EDM– Copyright status: RR-F → CC0– Organise multilingual access ourselves → europeanaregia.eu

Chan(c|g)es

Page 28: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 28

• Cataloguing in a world of electronic publication and distribution, portals, and the need to exchange data needs to take into account– Data formats (local practices)– Publication paths (in print, electronic)– Mapping paths (generalisation of data types)– even: arranging data (position of kinds of data in the

character stream)

Conclusions I

Page 29: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 29

• Additonally, some bits of information need to be encoded (more) explicitely– e.g. textual language

• Data tends to sit in multiple places– Each of them with special views, interests– Still: the most up-to-date and complete information will be

available usually from local presentations

• Until the realisation of the Semantic Web (and having solved some copyright issues) we might do with slim descriptions in portals and the richness in local (i.e. specialised) presentations.

Conclusions II

Page 30: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 30

• http://www.europeanaregia.eu• http://www.europeana.eu• http://www.hab.de• http://diglib.hab.de/?db=mss

References

Page 31: Experiences from the Europeana regia project · – e.g. textual language • Data tends to sit in multiple places – Each of them with special views, interests – Still: the most

Torsten Schaßan – Encodage de documents – Caen – 2012, March 30 31

07_GoogleBooks_Blick-ins-Buch_Suchergebnis

AMREMM Descriptive Cataloguing of Ancient, Medieval, Renaissance, and Early-modern Manuscripts

EDM Europeana Data ModelENRICH European Networking Resources and Information concerning Cultural

Heritage

ESE Europeana Semantic ElementsMAB Maschinelles Austauschformat für BibliothekenOAI-PMH Open Archives Initiative, Protocol for Metadata HarvestingPND (GND) Personennamendatei (→ Gemeinsame Normdatei)TEL The European LibraryTGN Getty Thesaurus of Geographical NamesVIAF Virtual International Authority File

Glossary