The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National...
-
Upload
shannon-perkins -
Category
Documents
-
view
213 -
download
0
Transcript of The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National...
The KB e-Depotlong-term preservation of scientific publications in practice
Marcel Ras, National library of The Netherlands
Libraries: traditional or digital
What is the problem with digital?
• Digital information needs an intermediary to be interpreted– Hardware, operating system, software
• Physical carriers can be damaged. Digital carriers too and can become obsolete
• There is an awful lot of digital information– annual growth of 500 million TB– 2008: 490 billion GB of information produced on the internet
(30 billion iPods)
• We rely on digital information• Digital information only (no paper equivalent)
The KB
• KB is national library of The Netherlands• Task of a national library is to collect, describe and preserve
national imprint• Paper, but also digital• National deposit since 1974 (but no deposit legislation)
• 49 km books (2,5 million)– annual growth: 38.000
• 18 km journals – annual growth : 11.000
• over 1 km microfilm• 13 million digital scientific articles
– annual growth: 3 million
The KB e-Depot (2)
• Digital version of traditional depot• Operational since 2003• No legal deposit legislation• Based on agreements with publishers• International focus • International scope of scientific output• Safe Places Network
History of the KB e-Depot
1994: e-publications part of deposit: need for an infrastructure
1995: Experiments with Elsevier, Dutch Publishers Association
2002: Landmark archiving agreement with Elsevier
2003: e-Depot system operational
2006: start project with Academic repositories
2007: Archiving agreements with major 19 STM publishers
2007: Start development ingest procedures for other materials
2008: co-operation with Open Access communities (DARe / DOAJ)
What do we preserve?
e-journals international (22 million - 30 TB)
digitized masters (21 million - 275 TB)
websites (100.000 - 120 TB)
IR's (300.000 - 30 TB)
e-books international (1 million - 30 TB)
e-books national (400.000 - 15 TB)
e-journals national (600.000 - 10 TB)
Why should we bother?
• Accessibility of scientific information and knowledge is in danger• Next generations do not have access to the information of our era
(digital dark ages)• E-only and e–communication only• Digital information is extremely fragile
– Thread for cultural heritage– Thread for scientific research– Financial consequences
Should we bother?
• BBC Domesday project• 1086:William the conqueror • 1986: information on British society stored
on state-of-the-art laser disks• After 20 year carrier and hardware were
useless
What can we do?
Digital Preservation
• Long-term and safe storage• registration• Tools for permanent access
Preservation metadata
+
+ permanent access
Time machine
storage
storage
Digital archive
Registration
Time machine
Preservation research
• Research on file formats and tools– Characterization (Jhove, Droid)– validation
• Preservation strategies– Migration– Emulation
• Preservation metadata– PREMIS
• Preservation planning• storage• R&D results directly implemented into e-Depot infrastructure• Quality improvement is continuous process• International collaboration
What does it cost?
• Initial costs for development (2000 – 2003)• Annual costs, of which
– Operational Staffing: 30 %– Project staffing & development: 25 %– Maintenance + hard- software licenses: 25 %– Storage: 20 %
• But how to calculate– Preservation management– Preservation actions
• Annual costs about 4 million euro
DP is not only a technical issue
e-Depot system
e-Depot infrastructure
Day-to-day operations
e-Depot department
Research & Development
DP department
Technical management
IT department
IBM
Access
Online Services department
Functional owner, acquisitions, analysis, quality control, ingest, data management, publisher contact, guidelines
Research, projects, European research projects, guidelines,Development, DP policy
Daily maintenance, storage, IT infrastructure, coordinating technical improvements
User services, ILL, User Interfaces, user survey, Access policy
Pro
du
cer
Desig
nate
d c
om
mu
nit
y
Acquisitions and Processing Division
Research & Development Division
IT Department User Services Division
New challenges
• New collections to be preserved– Websites, digitized materials, institutional repositories
• New content types– e-books, image files, AV, multimedia, complex objects
• Hybrid collections– Websites– Publications & research data– Analog & digital combined– “liquid publications”– Compound objects
• Growing storage capacity– From 11 TB to 500 TB
Conclusions
• Preservation is not just storage and technique• It demands for long-term organizational commitment• It requires continuous research• It requires substantial investments in infrastructure and
up-to-date expertise and skills• It brings organizational changes
– Process innovation: from traditional library to digital library– e-Depot brought new set of skills to the “traditional” library– Organizational change
• It asks for constant rethinking: next generation LTP solution