Manuscriptorium seamless access to old European written heritage

Post on 24-Feb-2016

31 views 0 download

Tags:

description

Adolf Knoll National Library of the Czech Republic. Manuscriptorium seamless access to old European written heritage. Digitizing manuscripts. 1992-1993 – pilot projects for UNESCO 1995-1996 – starting routine work - PowerPoint PPT Presentation

Transcript of Manuscriptorium seamless access to old European written heritage

Adolf KnollNational Library of the Czech Republic

MANUSCRIPTORIUMSEAMLESS ACCESS TO OLD EUROPEAN

WRITTEN HERITAGE

Digitizing manuscripts 1992-1993 – pilot projects for UNESCO 1995-1996 – starting routine work 2000 – launch of national programme for

digitization of old manuscripts 2003 – launch of Manuscriptorium DL 2007-2009 – EU ENRICH project to

support aggregation service Today – growing on

Metadata framework 1996 – own SGML approach (a kind of predecessor of

XML) – DOBM language (in 1999 recommended by UNESCO for the Memory of the World programme)

2002 – TEI P4 extended MASTER approach (masterx.dtd)

2009 – TEI P5 schema for description of manuscripts (enrich.xsd) / METS rejected

2012/2013 – inclusion of long-term preservation metadata

Two migrations of complex digital documents until co-development of the fully international solution based on TEI P5.

Providing access In the beginning only off-line Several manuscripts mounted on the

web Researchers showed interest in on/line

access Manuscriptorium Digital Library

launched 10 years ago Manuscript owners had to agree

Manuscriptorium Digital Library

Central database Remote data repositories: those of Manuscriptorium and of partner digital libraries

Metadata

TEI P5 enrich.dtd internal format Document description Structural map Possibly image description

Data

WWW recommended formats (JPG, PNG, GIF)

Tile solution for maps Full texts (TXT, TEI)

The problem Dispersed rare collections in space Users need to travel:

Physically from one place to anotherVirtually from one application to another (different

behaviours, rights, tools, opportunities, etc.) Solution: to take everything under one

interface:Portal: users are navigated to remote applicationsDigital Library: users work in one place

Digital library

Central model, e.g. World Digital Library

Distributed model, Manuscriptorium

Metadata are in the central database

Data (images, full texts) are in the central data repository

Metadata are in the central database

Data (images, full texts) are in partner repositories

Growth secured through repeated harvests of descriptions and structures

Parallel re-use of data

Virtual aggregation

Central database

MNSData

repository

Pz

Pm

P1

P…

PnPo Px

P3P2

P – image repository

Seamless aggregation All metadata indexed in the central

database incl. the structure Images from partner repositories called

into the unique presentation interface Browsing as if everything were on one

place Enhanced use of images

Cooperation OAI harvest of agreed profiles Profiles as large as possible Internal TEI P5 format able to

accommodate:Library descriptions (MARC-based)Scientific descriptions (TEI-based)

Off-line batch ingest where OAI inapplicable

Production for Manuscriptorium

Production for Manuscriptorium Partner has images without suitable

metadata (description & structure) M-TOOL application, now online,

producing TEI P5 (enrich.dtd) compatible files

M-CAN application for upload, control, and offer of xml files (behaviour as if in real Manuscriptorium), while images stored on home servers

User personalization User personal library for:

His virtual collections○ Static○ Dynamic

His virtual documents (any file from any partner library can become a component part of a new document; this one can be described in M-TOOL online in conformity with TEI P5 specification for description of manuscrips – enrich.dtd)

Manuscriptorium placement

MNS

P1P1

P1P1

P1

Pw

P1P1

P1Px

P1P1

P1P1Py

P1P1

P1P1Pz

EUROPEANA

TEL

PRIMO

SUMMON

EBSCO DSCZgateway

CERLMSS

From whom do the data comeCzech Republic Abroad National Library (3320) Moravian Library (470) Strahov Monastery (319) National Museum Library

(272) …

Universidad Complutense, Madrid (2902) Свято-Троицкая Сергиева Лавра (2668) UnivLib Wroclaw (1839) UnivLib Köln (1634) – several administered

collections NL, Italy, Firenze (1566) NL, Spain, Madrid (1444) Reykjavík (1176) – NL + Arne Magnusson

Found. UnivLib Vilnius (1085) UnivLib Heidelberg (1025) eCodices* Switzerland (889) NL, Romania, Bucureşti (393) UnivLib Bratislava (241) UnivLib Zielona Góra (231) …..

23,655 digitized docs, from which 18,077 from abroad, ie. 76.4% (Dec. 2013)

Traffic generators: all visits

1. Direct: 23,47%2. Google: 21,783. Europeana: 13,89%4. NL CZ: 5,95%5. Seznam: 3,58%6. Cs.wikipedia.org: 2,58%7. Vychodoceskearchivy.cz: 2,41%8. Dasp.at: 1,16%9. Facebook: 0,80%10. ....other partners….. 16. TEL: 0,49%

August 2012 – July 1013

Traffic generators: referencing pages 50,52%

1. Europeana: 27,50%2. NL CZ: 11,77%3. Wikipedia CZ: 5,11%……. 6. Facebook: 1,59%13. TEL: 0,98%

August 2012 – July 1013

From which countries do the users come

2009 - 2012 2011 - 2012

1. Inland (CZ) – 54.3%2. Germany – 5.5%3. Poland – 4.3%4. U.S.A. – 4.0%5. France – 2.8%6. Slovakia – 2.7%7. Italy – 2.7%8. Spain – 2.6%9. Austria – 2.5%10. Romania – 2.1%

1. Inland (CZ) – 52.5%2. Germany – 5.5%3. Poland – 4.4%4. U.S.A. – 3.9%5. Italy – 3.2%6. Spain – 2.9%7. Austria – 2.8%8. France – 2.8%9. Romania – 2.5%10. Slovakia – 2.4%

Known problems

Technical/organizational Political/cultural

Partner servers do not function

Permanent URLs of images have been changed without update of the OAI harvested profiles

Funding esp. for faster development

We are not sure about enclosure of documents from Eastern Asia

Some people, institutions or some countries may dislike aggregation operated by a Czech institution

Some people are unwilling to make their collections widely accessible

Near future if funded enough for development … Further aggregation Solution to linguistic problems

Graphemes variationExternal thesauri

Imaging: centrally stored images can be pre-processed to create metadata for search of objects within them

Mark-up of music documents New and more user-friendly interface

www.manuscriptorium.eu The Manuscriptorium Digital Library is

operated by AiP Beroun Ltd. on behalf of the National Library of the Czech Republic

The National Library:does not generate any income from

Manuscriptorium servicesis today the only funding body of

Manuscriptorium operation and development (directly or via projects)

www.manuscriptorium.eu Virtual research environment:

Seamless aggregation, i.e. real-time work on geographically dispersed resources

Saving time and money of researchers (neither physical nor virtual travelling/navigation)

Integrated on-line tools You are welcome to join us adolf.knoll@nkp.cz

August 2013: 24,892 digitized docs; more than 600 fulltexts; 303,542 descriptive records