Manuscriptorium seamless access to old European written heritage

25
Adolf Knoll National Library of the Czech Republic MANUSCRIPTORIUM SEAMLESS ACCESS TO OLD EUROPEAN WRITTEN HERITAGE

description

Adolf Knoll National Library of the Czech Republic. Manuscriptorium seamless access to old European written heritage. Digitizing manuscripts. 1992-1993 – pilot projects for UNESCO 1995-1996 – starting routine work - PowerPoint PPT Presentation

Transcript of Manuscriptorium seamless access to old European written heritage

Page 1: Manuscriptorium seamless access  to  old European written heritage

Adolf KnollNational Library of the Czech Republic

MANUSCRIPTORIUMSEAMLESS ACCESS TO OLD EUROPEAN

WRITTEN HERITAGE

Page 2: Manuscriptorium seamless access  to  old European written heritage

Digitizing manuscripts 1992-1993 – pilot projects for UNESCO 1995-1996 – starting routine work 2000 – launch of national programme for

digitization of old manuscripts 2003 – launch of Manuscriptorium DL 2007-2009 – EU ENRICH project to

support aggregation service Today – growing on

Page 3: Manuscriptorium seamless access  to  old European written heritage

Metadata framework 1996 – own SGML approach (a kind of predecessor of

XML) – DOBM language (in 1999 recommended by UNESCO for the Memory of the World programme)

2002 – TEI P4 extended MASTER approach (masterx.dtd)

2009 – TEI P5 schema for description of manuscripts (enrich.xsd) / METS rejected

2012/2013 – inclusion of long-term preservation metadata

Two migrations of complex digital documents until co-development of the fully international solution based on TEI P5.

Page 4: Manuscriptorium seamless access  to  old European written heritage

Providing access In the beginning only off-line Several manuscripts mounted on the

web Researchers showed interest in on/line

access Manuscriptorium Digital Library

launched 10 years ago Manuscript owners had to agree

Page 5: Manuscriptorium seamless access  to  old European written heritage

Manuscriptorium Digital Library

Central database Remote data repositories: those of Manuscriptorium and of partner digital libraries

Metadata

TEI P5 enrich.dtd internal format Document description Structural map Possibly image description

Data

WWW recommended formats (JPG, PNG, GIF)

Tile solution for maps Full texts (TXT, TEI)

Page 6: Manuscriptorium seamless access  to  old European written heritage

The problem Dispersed rare collections in space Users need to travel:

Physically from one place to anotherVirtually from one application to another (different

behaviours, rights, tools, opportunities, etc.) Solution: to take everything under one

interface:Portal: users are navigated to remote applicationsDigital Library: users work in one place

Page 7: Manuscriptorium seamless access  to  old European written heritage

Digital library

Central model, e.g. World Digital Library

Distributed model, Manuscriptorium

Metadata are in the central database

Data (images, full texts) are in the central data repository

Metadata are in the central database

Data (images, full texts) are in partner repositories

Growth secured through repeated harvests of descriptions and structures

Parallel re-use of data

Page 8: Manuscriptorium seamless access  to  old European written heritage

Virtual aggregation

Central database

MNSData

repository

Pz

Pm

P1

P…

PnPo Px

P3P2

P – image repository

Page 9: Manuscriptorium seamless access  to  old European written heritage

Seamless aggregation All metadata indexed in the central

database incl. the structure Images from partner repositories called

into the unique presentation interface Browsing as if everything were on one

place Enhanced use of images

Page 10: Manuscriptorium seamless access  to  old European written heritage

Cooperation OAI harvest of agreed profiles Profiles as large as possible Internal TEI P5 format able to

accommodate:Library descriptions (MARC-based)Scientific descriptions (TEI-based)

Off-line batch ingest where OAI inapplicable

Production for Manuscriptorium

Page 11: Manuscriptorium seamless access  to  old European written heritage

Production for Manuscriptorium Partner has images without suitable

metadata (description & structure) M-TOOL application, now online,

producing TEI P5 (enrich.dtd) compatible files

M-CAN application for upload, control, and offer of xml files (behaviour as if in real Manuscriptorium), while images stored on home servers

Page 12: Manuscriptorium seamless access  to  old European written heritage

User personalization User personal library for:

His virtual collections○ Static○ Dynamic

His virtual documents (any file from any partner library can become a component part of a new document; this one can be described in M-TOOL online in conformity with TEI P5 specification for description of manuscrips – enrich.dtd)

Page 13: Manuscriptorium seamless access  to  old European written heritage
Page 14: Manuscriptorium seamless access  to  old European written heritage
Page 15: Manuscriptorium seamless access  to  old European written heritage

Manuscriptorium placement

MNS

P1P1

P1P1

P1

Pw

P1P1

P1Px

P1P1

P1P1Py

P1P1

P1P1Pz

EUROPEANA

TEL

PRIMO

SUMMON

EBSCO DSCZgateway

CERLMSS

Page 16: Manuscriptorium seamless access  to  old European written heritage

From whom do the data comeCzech Republic Abroad National Library (3320) Moravian Library (470) Strahov Monastery (319) National Museum Library

(272) …

Universidad Complutense, Madrid (2902) Свято-Троицкая Сергиева Лавра (2668) UnivLib Wroclaw (1839) UnivLib Köln (1634) – several administered

collections NL, Italy, Firenze (1566) NL, Spain, Madrid (1444) Reykjavík (1176) – NL + Arne Magnusson

Found. UnivLib Vilnius (1085) UnivLib Heidelberg (1025) eCodices* Switzerland (889) NL, Romania, Bucureşti (393) UnivLib Bratislava (241) UnivLib Zielona Góra (231) …..

23,655 digitized docs, from which 18,077 from abroad, ie. 76.4% (Dec. 2013)

Page 17: Manuscriptorium seamless access  to  old European written heritage

Traffic generators: all visits

1. Direct: 23,47%2. Google: 21,783. Europeana: 13,89%4. NL CZ: 5,95%5. Seznam: 3,58%6. Cs.wikipedia.org: 2,58%7. Vychodoceskearchivy.cz: 2,41%8. Dasp.at: 1,16%9. Facebook: 0,80%10. ....other partners….. 16. TEL: 0,49%

August 2012 – July 1013

Page 18: Manuscriptorium seamless access  to  old European written heritage

Traffic generators: referencing pages 50,52%

1. Europeana: 27,50%2. NL CZ: 11,77%3. Wikipedia CZ: 5,11%……. 6. Facebook: 1,59%13. TEL: 0,98%

August 2012 – July 1013

Page 19: Manuscriptorium seamless access  to  old European written heritage

From which countries do the users come

2009 - 2012 2011 - 2012

1. Inland (CZ) – 54.3%2. Germany – 5.5%3. Poland – 4.3%4. U.S.A. – 4.0%5. France – 2.8%6. Slovakia – 2.7%7. Italy – 2.7%8. Spain – 2.6%9. Austria – 2.5%10. Romania – 2.1%

1. Inland (CZ) – 52.5%2. Germany – 5.5%3. Poland – 4.4%4. U.S.A. – 3.9%5. Italy – 3.2%6. Spain – 2.9%7. Austria – 2.8%8. France – 2.8%9. Romania – 2.5%10. Slovakia – 2.4%

Page 20: Manuscriptorium seamless access  to  old European written heritage
Page 21: Manuscriptorium seamless access  to  old European written heritage
Page 22: Manuscriptorium seamless access  to  old European written heritage

Known problems

Technical/organizational Political/cultural

Partner servers do not function

Permanent URLs of images have been changed without update of the OAI harvested profiles

Funding esp. for faster development

We are not sure about enclosure of documents from Eastern Asia

Some people, institutions or some countries may dislike aggregation operated by a Czech institution

Some people are unwilling to make their collections widely accessible

Page 23: Manuscriptorium seamless access  to  old European written heritage

Near future if funded enough for development … Further aggregation Solution to linguistic problems

Graphemes variationExternal thesauri

Imaging: centrally stored images can be pre-processed to create metadata for search of objects within them

Mark-up of music documents New and more user-friendly interface

Page 24: Manuscriptorium seamless access  to  old European written heritage

www.manuscriptorium.eu The Manuscriptorium Digital Library is

operated by AiP Beroun Ltd. on behalf of the National Library of the Czech Republic

The National Library:does not generate any income from

Manuscriptorium servicesis today the only funding body of

Manuscriptorium operation and development (directly or via projects)

Page 25: Manuscriptorium seamless access  to  old European written heritage

www.manuscriptorium.eu Virtual research environment:

Seamless aggregation, i.e. real-time work on geographically dispersed resources

Saving time and money of researchers (neither physical nor virtual travelling/navigation)

Integrated on-line tools You are welcome to join us [email protected]

August 2013: 24,892 digitized docs; more than 600 fulltexts; 303,542 descriptive records