Save This Book

54
Peter Brantley Internet Archive

description

Discussion of motivations for digital book preservation and national and EU policy changes that may be necessary to encourage national ebook archives.

Transcript of Save This Book

Page 1: Save This Book

Peter Brantley Internet Archive

Page 2: Save This Book

A. Re: preservation

B. National policies

Page 3: Save This Book
Page 4: Save This Book

Preservation for digital materials –

A. Heritage is obligatory for social continuity.B. Maintenance of assets is business requisite.

Page 5: Save This Book
Page 6: Save This Book

• Myriads of ebook formats• Digital rights management• No national policy in place

No one is in charge of the preservation of our growing cultural heritage in digital books.

Page 7: Save This Book

Digital storage means it is easy to preserve in multiple locations, in different formats. More redundancy than paper could ever afford.

Storage costs trending toward insignificant on a per-byte basis.

Page 8: Save This Book

High density storage: Each rack stores between 0.5 and 0.75 petabytes.

Page 9: Save This Book

A book is a complex assembly of content –Text, video, foreword, illustrations, photos.Each item needs careful management.

Page 10: Save This Book

Digital production workflow requires:

- content management system - sophisticated rights / use tracking - skilled personnel esp. engineering

Page 11: Save This Book

Possibility of loss should motivate preservation.Loss can arise for oneself – or one’s partners –

via –

1. complex, tightly coupled systems2. struggle of memory vs. forgetting

Page 12: Save This Book
Page 13: Save This Book

From an engineering perspective, complexity in work flow systems raises the chance of catastrophic loss. As system efficiency increases, interactions become “tightly-coupled”.

Tightly coupled systems are prone to routine accidents with unforeseen cascading effects.

Page 14: Save This Book

- Charles PerrowSociology Dept, Yale University

Ex.: Apollo 13

“[The] accident was not the result of a chance malfunction in a statistical sense, but rather resulted from an unusual combination of mistakes, coupled with a somewhat deficient and unforgiving design.” source: Wikipedia.

Page 15: Save This Book

Complexity of production systems and distribution platforms growing alongside online inventory and commerce.

Potential trigger events are ubiquitous.

Page 16: Save This Book

“We're working quickly to recover from a major issue in one of our database clusters. We're incredibly sorry for the inconvenience.”

Page 17: Save This Book

(2 2011):

‘Unfortunately, I have mixed up the accounts and accidentally deleted yours. I am terribly sorry for this grave error and hope that this mistake can be reconciled. ’ …

‘Our teams are currently working hard to try to restore the contents of this user's account. We are working on a process that would allow us to easily restore deleted accounts and we plan on rolling this functionality out soon.’ (em. Added)

Page 18: Save This Book
Page 19: Save This Book

Many archives are lost by simply forgetting where or how they were stored. Canisters of film, shelves of books, or backup tapes.

“Those books are in a warehouse in Jersey.” “The servers are in one of our datacenters.”

Page 20: Save This Book

Commonly, companies get acquired or shutter business operations; records are abandoned.

Servers are redeployed without an audit. Databases run with no backup routines.

Page 21: Save This Book

Preserving without metadata is not helpful. Metadata provides necessary context.

Need to point backwards (provenance) and forward (to find superseded content)

Without necessary metadata infrastructure, preservation architectures are rickety.

Page 22: Save This Book

If you are a publisher, do you safeguard your digital assets? Has that process been audited against security threats? Technical accidents? Are your workflows well understood? Have you conducted trial asset recovery exercises?

Does your insurance company know?Do you mind if I give them a call?

Page 23: Save This Book
Page 24: Save This Book

For 100s of years, libraries preserved books.Sort of anyway (witness Alexandria).

By-product of the inefficiency of distribution:Lots of libraries wound up with same books.

Lots of copies keep stuff safe.

Page 25: Save This Book

Books were once bespoke – monks with pens.

Over around 500 years, books became industrialized – mass produced. Although ebooks are the ultimate industrial product, we come full circle.

Digital books stand on threshold of tremendous mutability.

Page 26: Save This Book

It is easy to store objects now, arguably, but … It is also easy to save objects that cannot be easily recovered.

E.g., PDF can be a stew of non standard formats and arbitrary data. EPUB with DRM locks data with risk that key will be lost.

Page 27: Save This Book

We may be able to preserve digital editions.Can we preserve the book of the future?

• Scripted and Interactive• Networked and Distributed • Personalized and Mobile

Page 28: Save This Book
Page 29: Save This Book

In U.S., beyond public and academic library collections, the Library of Congress plays a unique role:

§407. The Required copies or phonorecords shall be deposited in the Copyright Office for the use or disposition of the Library of Congress.

Page 30: Save This Book

Current U.S. code privileges print as best edition for preservation. Slowly being changed.

(Congress approved e-journal demand deposit for LoC in the 2010 legislative session).

Page 31: Save This Book

The Copyright Office historically held that transmission to the LoC of deposited digital books would be an infringing faithful copy. despite –

§ 704. In the case of published works, all copies, phonorecords, and identifying material deposited are available to the Library of Congress for its collections ….

Page 32: Save This Book
Page 33: Save This Book

I led an initiative to confront this issue in NYC in the summer of 2008.

Convened meeting via Digital Library Federation with LoC, Mellon, Portico, NISO, IDPF, BISG, and publisher consultants and representatives to discuss digital archives that would also serve as escrow.

Page 34: Save This Book

The group decided that we should attempt a trial project with a small set of publishers who would deposit sample books into Portico’s repository; the pilot would inform against business, legal, and policy issues.

With that in hand, LoC would be in stronger position to solicit rule changes by Congress.

Page 35: Save This Book

Portico is a not for profit jointly representing publishers and libraries preserving a growing volume of digital content with high reliability in a rights respecting, secure archive. Serving as an escrow, access to the archive by its members is triggered only under carefully-defined and contractually-specified circumstances.

Initial funding came from Mellon and LoC.

Page 36: Save This Book
Page 37: Save This Book

The AAP evidenced support.

Ed McCoyd, Director of Digital Policy:

“Thank you for speaking at the AAP Digital Issues Working Group meeting on [11 June 2008], regarding your interest in bringing parties together to develop digital book archives for preservation.

Page 38: Save This Book

“Your points about preserving cultural patrimony, and assuring permanent access by libraries and other digital content customers as well as by the publishers themselves, certainly resonated with the group.

“I look forward to talking further about your initiative in the coming months, and will keep the publishers apprised of the additional details as they develop.”

Page 39: Save This Book

It fizzled due to the inability of the LoC to determine whether it had the capacity to pursue a deposit initiative, and if so, what department of the Library should proceed.

Page 40: Save This Book
Page 41: Save This Book

Portico decided to continue the pursuit of ebooks deposits on behalf of its members.

Members are primarily research libraries –‘cuz, Who else has a mandate to care about preservation?

With inevitable focus on academic ebookpublishers, e.g. Elsevier, starting in 2008.

Page 42: Save This Book

Portico has been unable to penetrate trade publishing sector through private initiative.

No national policy requiring digital deposit.

Page 43: Save This Book
Page 44: Save This Book

Google Book Search (GBS) has emerged in the absence of international rulemaking as the default archive for some institutions.

GBS does not do preservation quality imaging, and there is requirement for comprehensive publisher participation.

This is NOT a good solution.

Page 45: Save This Book

Europeana digital library seeking to build a collection preserving Europe’s vast cultural heritage.

In September 2010,Ghent Univ. Library became first in Europe to deposit public domain books scanned by Google into Europeana.

Page 46: Save This Book

"Work begins this week to add over 5 million digital objects, ranging from Spanish civil war photographs and handwritten letters from philosopher Immanuel Kant, to Europeana from 19 of Europe’s leading research and university libraries.

“… It will also add extensive collections from Google Books, theses, dissertations and open-access journal articles to the 15 million items amassed in Europeanato date. Providers include some of Europe’s most prestigious universities and research institutes … ."

(2 2011)

Page 47: Save This Book

GBS partners’ collection management of older public domain books is a pale shadow of the comprehensive international policy framework needed to mandate preservation of our cultural heritage in digital books.

Preservation must mandate participation for libraries and publishers in a legal framework.

Page 48: Save This Book

Widespread recognition of need for new copyright regime supporting digital use.

Internationally interoperable network of rights assertions in rights registries capable of automated status query in geographic and national domains is a very useful support.

Page 49: Save This Book
Page 50: Save This Book

Mandating the deposit of digital works for preservation purposes to ensure adequate representation of copyright assertions for a content manifest would be one approach.

Arguably avoids Berne / TRIPS.

Page 51: Save This Book

Most publishers are demonstrably willing to engage in discussion of efforts supporting the development of persistent heritage archives.

Requires hard political work conjoining some subset of copyright policies, the imperatives of cultural heritage, technical architecture.

Page 52: Save This Book

If nothing is done, and we cannot solve this problem for the simple digital books of the 20th Century, what are we going to do with the books of the 21st Century?

Page 53: Save This Book
Page 54: Save This Book

peter brantley

director, bookserver projectinternet archivesan francisco, ca

(twitter) @naypinya (slideshare)

peter at archive.org