148 john shaw2006fall

39
Archiving Archiving What is it and why What is it and why should it be important should it be important to me? to me? John Shaw John Shaw Director, Publishing Director, Publishing Technologies Technologies SAGE Publications, U.S. SAGE Publications, U.S.

description

 

Transcript of 148 john shaw2006fall

Page 1: 148 john shaw2006fall

ArchivingArchivingWhat is it and why What is it and why

should it be important should it be important to me?to me?

John ShawJohn ShawDirector, Publishing Director, Publishing

TechnologiesTechnologiesSAGE Publications, U.S.SAGE Publications, U.S.

Page 2: 148 john shaw2006fall

I. Archiving Overview

II. Types of Archives

II. A SAGE Example

IV. Risks, Questions, and More Questions

Page 3: 148 john shaw2006fall

Archiving PartArchiving Part I: Archiving Overview

Page 4: 148 john shaw2006fall

What is an What is an ArchiveArchive??

An authoritative collectionAn authoritative collection Preserved and professionally managed Preserved and professionally managed in in

perpetuityperpetuity History, institutional commitment & policy, History, institutional commitment & policy,

integrity re: preservationintegrity re: preservation “…“…information needed for society’s information needed for society’s

memory.” memory.” "Schellenberg in Cyberspace," "Schellenberg in Cyberspace," American American ArchivistArchivist 61:2 (Fall 1998), p. 309-327. 61:2 (Fall 1998), p. 309-327.

Preservation firstPreservation first

Page 5: 148 john shaw2006fall

What is a Repository?What is a Repository?

““A place where things can be stored and A place where things can be stored and maintained; a storehouse.”maintained; a storehouse.” [Society of American Archivists Glossary][Society of American Archivists Glossary]

““Depository” is sameDepository” is same also library that receives government also library that receives government

documents to public accessdocuments to public access Not all repositories are archivesNot all repositories are archives

Page 6: 148 john shaw2006fall

Why Care?Why Care?

““Preserving information for decades or even Preserving information for decades or even centuries has proved important. Shang dynasty centuries has proved important. Shang dynasty (12th century BC) Chinese astronomers (12th century BC) Chinese astronomers inscribed eclipse observations on “oracle inscribed eclipse observations on “oracle bones" (animal bones and tortoise shells). bones" (animal bones and tortoise shells). About 3200 years later researchers used these About 3200 years later researchers used these records, together with one from 1302BC, to records, together with one from 1302BC, to estimate that the accumulated clock error was estimate that the accumulated clock error was just over 7 hours, and from this derived a value just over 7 hours, and from this derived a value for the viscosity of the Earth's mantle as it for the viscosity of the Earth's mantle as it rebounds from the weight of the glaciers..”rebounds from the weight of the glaciers..”

****************

Page 7: 148 john shaw2006fall

Why Care?Why Care?

““These timescales of many decades, even centuries, contrast with the typical 5-year lifetime for computing hardware and digital media”” “A Fresh Look at the Reliability of Long term Digital Storage.” “A Fresh Look at the Reliability of Long term Digital Storage.” Baker, Mary, et al.. EuroSys '06, April 18-21, 2006Baker, Mary, et al.. EuroSys '06, April 18-21, 2006

Page 8: 148 john shaw2006fall

Preservation: Digital information is Preservation: Digital information is impermanentimpermanent

Publisher: Safety Publisher: Safety to insure ongoing availability of your to insure ongoing availability of your

contentcontent Your library customers: CustodianshipYour library customers: Custodianship

to insure continuity of the record of to insure continuity of the record of scientific progressscientific progress

Very long view: epistemology, history Very long view: epistemology, history of science and cultureof science and culture

Why Care?Why Care?

Page 9: 148 john shaw2006fall

What Should be What Should be Preserved?Preserved?

Scholarly contentScholarly content Research materialsResearch materials Web-based, digitally born contentWeb-based, digitally born content

Page 10: 148 john shaw2006fall

How e-Archives DifferHow e-Archives Differ Mission: collection v. preservationMission: collection v. preservation Access control, dark v. lightAccess control, dark v. light Deposits Deposits

Why: voluntary v. mandatedWhy: voluntary v. mandated Who: author v. publisherWho: author v. publisher What: manuscripts v. final workWhat: manuscripts v. final work When: backfile v. current contentWhen: backfile v. current content

Future format migrationFuture format migration Rights transferRights transfer CostsCosts

Page 11: 148 john shaw2006fall

Archiving PartArchiving Part II: Types of Archives

Page 12: 148 john shaw2006fall

Types of Archives:Types of Archives:

National archivesNational archives Institutional repositoriesInstitutional repositories Community-based archivesCommunity-based archives Product solution archivesProduct solution archives

Page 13: 148 john shaw2006fall

Types of Archives:Types of Archives:NationalNational

Dutch National libraryDutch National libraryKoninklijke BibliotheekKoninklijke Bibliotheek (KB) (KB)

British LibraryBritish Library NIH – PubMedCentral?NIH – PubMedCentral?

““NIH’s digital repository for biomedical NIH’s digital repository for biomedical research”research”

Library of Congress?Library of Congress?

Page 14: 148 john shaw2006fall

KB: Dutch National KB: Dutch National LibraryLibrary

Mission: Legal deposit libraryMission: Legal deposit library “…“…collect, catalogue and preserve all collect, catalogue and preserve all

publications appearing in the Netherlands. ”publications appearing in the Netherlands. ” Capable of ingesting 60,000 articles/dayCapable of ingesting 60,000 articles/day

Deposits: Source files from publishersDeposits: Source files from publishers Automated, strictAutomated, strict

Costs?Costs? Access Control:Access Control:

Local patron accessLocal patron access Publisher sets remote access rulesPublisher sets remote access rules

Page 15: 148 john shaw2006fall

KB: Dutch National KB: Dutch National LibraryLibrary

Migration: Preservation research leaderMigration: Preservation research leader Committed to format migrationCommitted to format migration

Archiving agreements with:Archiving agreements with: OUP, Sage, Blackwell, Elsevier, Kluwer OUP, Sage, Blackwell, Elsevier, Kluwer

Academic, etc.Academic, etc.

Page 16: 148 john shaw2006fall

The British Library The British Library Legal Deposit PilotLegal Deposit Pilot

Mission: Legal deposit libraryMission: Legal deposit library UK-published (to start)UK-published (to start)

Pilot: Legal deposit for e-journalsPilot: Legal deposit for e-journals 23 volunteer publishers23 volunteer publishers Secure infrastructureSecure infrastructure

Uses DigiTool by Ex-LibrisUses DigiTool by Ex-Libris Shared with the Shared with the otherother UK legal deposit UK legal deposit

librarieslibraries To “scope and test” ingest, storage, retrievalTo “scope and test” ingest, storage, retrieval

Cost?Cost?

Page 17: 148 john shaw2006fall

The British Library: The British Library: Preservation and Preservation and

MigrationMigration BL’s future for managing digital assetsBL’s future for managing digital assets

preserve any type of digital material in preserve any type of digital material in perpetuity perpetuity

Migration Migration ensure that users can view the material with ensure that users can view the material with

contemporary applications contemporary applications preserve the original look-and-feel where preserve the original look-and-feel where

possiblepossible Access ControlAccess Control

““appropriate permissions” appropriate permissions”

Page 18: 148 john shaw2006fall

PMC: US PMC: US National National Library of MedicineLibrary of Medicine

Journal ArchiveJournal Archive Mission: Make research more accessibleMission: Make research more accessible FreeFree full-text archive of 230 journals full-text archive of 230 journals Deposit: publishers submit source filesDeposit: publishers submit source files MigrationMigration Access ControlAccess Control Cost?Cost?

Page 19: 148 john shaw2006fall

PMC: Depository for PMC: Depository for NIH-Funded Research NIH-Funded Research

ArticlesArticles Authors of NIH-funded articles Authors of NIH-funded articles

“encouraged” to deposit final manuscript “encouraged” to deposit final manuscript ““After all modifications due to …peer review”After all modifications due to …peer review” MS Word, PDF, etc.MS Word, PDF, etc. With supplementary informationWith supplementary information Publisher can replace with published versionPublisher can replace with published version

To be required soon?To be required soon?

Page 20: 148 john shaw2006fall

Library of CongressLibrary of Congress

National Digital Information Infrastructure and Preservation Program (NDIIPP) – formed in 2000 Members: National Library of Medicine, the

National Agricultural Library, the National Institute of Standards and Technology, the Research Libraries Group, the OCLC Online Computer Library Center, and the Council on Library and Information Resources

Preliminary investigation and software development Preliminary investigation and software development phasephase

Primarily e-journal depositPrimarily e-journal deposit Future …???Future …???

Page 21: 148 john shaw2006fall

Types of Archives:Types of Archives:InstitutionalInstitutional

University with expansive focusUniversity with expansive focus Stanford Digital RepositoryStanford Digital Repository

AutomatedAutomated LOCKSS LOCKSS

Page 22: 148 john shaw2006fall

Stanford Digital Stanford Digital RepositoryRepository

Stanford Univ. Libraries initiativeStanford Univ. Libraries initiative Digital preservation servingDigital preservation serving

Stanford UniversityStanford University Broader academic communityBroader academic community PublishersPublishers

Principles: Trust, Security, TransparencyPrinciples: Trust, Security, Transparency Costs?Costs?

Page 23: 148 john shaw2006fall

LOCKSSLOCKSS Technology to preserve Technology to preserve locallocal library collection library collection Automated, self-correcting Automated, self-correcting cachecache servers servers

Requires LOCKSS server at libraryRequires LOCKSS server at library Requires publisher participationRequires publisher participation Builds collection of all resources which the Builds collection of all resources which the

institution licensesinstitution licenses Goes online to users if data source becomes Goes online to users if data source becomes

unavailableunavailable Provides access to static “HTML images” of Provides access to static “HTML images” of

sourcesource CostsCosts

Page 24: 148 john shaw2006fall

Types of Archives:Types of Archives:Product SolutionProduct Solution

Non-profit organizationNon-profit organization PorticoPortico

Page 25: 148 john shaw2006fall

PorticoPortico

Mission: scholarly preservationMission: scholarly preservation Standalone archive Standalone archive Initiated by JSTOR, with grant fundingInitiated by JSTOR, with grant funding

Deposits: source files from publisherDeposits: source files from publisher Migration: plannedMigration: planned Costs Costs

Publishers annual fee $250 to $75,000Publishers annual fee $250 to $75,000 based on annual revenuebased on annual revenue

Libraries annual fee $1,500 to $24,000Libraries annual fee $1,500 to $24,000 based on Library Materials Expenditurebased on Library Materials Expenditure

Page 26: 148 john shaw2006fall

Portico: Access ControlPortico: Access Control

Member libraries get access:Member libraries get access: ““when specific trigger events occur, and when specific trigger events occur, and

when titles are no longer available from the when titles are no longer available from the publisher or other source.” publisher or other source.”

Trigger events include:Trigger events include: Publisher stops operationsPublisher stops operations Publisher ceases to publish a titlePublisher ceases to publish a title Publisher no longer offers back issuesPublisher no longer offers back issues Catastrophic and sustained failure of a publisher’s Catastrophic and sustained failure of a publisher’s

delivery platformdelivery platform

Can also fulfill “perpetual access” Can also fulfill “perpetual access” subscription obligationssubscription obligations

Page 27: 148 john shaw2006fall

Types of Archives:Types of Archives:CommunityCommunity

Community based and openly runCommunity based and openly run CLOCKSSCLOCKSS

Page 28: 148 john shaw2006fall

CLOCKSSCLOCKSS (Controlled LOCKSS)(Controlled LOCKSS)

Long-term global archiving solutionLong-term global archiving solution Community-managed, failsafe repository for scholarly Community-managed, failsafe repository for scholarly

contentcontent Serve libraries & publishers in the event of a long-term Serve libraries & publishers in the event of a long-term

business interruptionbusiness interruption Publishers participation is voluntaryPublishers participation is voluntary

Small number library participants maintain the Small number library participants maintain the archive on behalf of larger communityarchive on behalf of larger community libraries preserve member publisher content whether libraries preserve member publisher content whether

they subscribe or notthey subscribe or not Release only after a trigger eventRelease only after a trigger event

Publisher, libraries, and society collaborative decision Publisher, libraries, and society collaborative decision to releaseto release

““cost sharing” for system, not accesscost sharing” for system, not access Costs?Costs?

Page 29: 148 john shaw2006fall

Summary TableSummary Table

AgencyAgency Primary Primary MissionMission

DataData A/CA/C MigratioMigrationn

KBKB Gov’tGov’t PreservPreserv PubPub TwilightTwilight YesYes

BLBL Gov’tGov’t PreservPreserv PubPub ?? YesYes

PorticoPortico Ind.Ind. FailsafeFailsafe PubPub DarkDark YesYes

PMCPMC Gov’tGov’t AccessAccess Pub, Pub, AuthoAuthorr

LightLight YesYes

LoCLoC Gov’tGov’t PreservPreserv PubPub ?? ??

SDRSDR Inst.Inst. PreservPreserv PubPub TwilightTwilight YesYes

LOCKSSLOCKSS Inst.Inst. FailsafeFailsafe PubPub DarkDark --

CLOCKSCLOCKSSS

Comm.Comm. FailsafeFailsafe PubPub DarkDark --

Page 30: 148 john shaw2006fall

Summary:Summary:How Repositories DifferHow Repositories Differ

Stated purposeStated purpose Dark v. light Dark v. light Complete backfile v. current onlyComplete backfile v. current only DepositsDeposits

Who: author v. publisherWho: author v. publisher What: manuscripts v. final workWhat: manuscripts v. final work Why: voluntary v. mandatedWhy: voluntary v. mandated

Rights transferRights transfer Access controlAccess control CostsCosts

Page 31: 148 john shaw2006fall

Archiving Part III:Archiving Part III:A SAGE ExampleA SAGE Example

Page 32: 148 john shaw2006fall

Why Archive?Why Archive? SAGE’s commitment to customers and SAGE’s commitment to customers and

partnerspartners Critical to society arrangementsCritical to society arrangements Essential for new e-sales (consortia + Essential for new e-sales (consortia +

single institutions) – Perpetual accesssingle institutions) – Perpetual access Business continuityBusiness continuity Long-term preservationLong-term preservation We are not archiving experts!We are not archiving experts!

Page 33: 148 john shaw2006fall

Where to Archive?Where to Archive?

Dutch KBDutch KB CLOCKSSCLOCKSS LOCKSS LOCKSS Portico Portico Library of CongressLibrary of Congress British LibraryBritish Library

Page 34: 148 john shaw2006fall

How to Archive?How to Archive? Provide details of digital availabilityProvide details of digital availability Provide sample of contentProvide sample of content Provide details of content format Provide details of content format

(DTD)(DTD) Send all backfile for loadingSend all backfile for loading Set up content flow for ongoing Set up content flow for ongoing

contentcontent

Page 35: 148 john shaw2006fall

SAGE Experience with SAGE Experience with DutchKBDutchKB

Contract and negotiationContract and negotiation Contact with technical teamContact with technical team Delivery of samples and details of Delivery of samples and details of

scopescope Follow-up questionsFollow-up questions Visit KB – Find out what’s happeningVisit KB – Find out what’s happening

Delivery of back contentDelivery of back content

Delivery of ongoing issuesDelivery of ongoing issues

Ongoing issue discrepanciesOngoing issue discrepancies

Page 36: 148 john shaw2006fall

Archiving Part IV: Archiving Part IV: Questions, Questions Questions, Questions and More Questionsand More Questions

Page 37: 148 john shaw2006fall

Measurements of Measurements of SuccessSuccess

Who is overseeing the archiving process Who is overseeing the archiving process and governance?and governance?

Compliance?Compliance? Accuracy and legitimacy?Accuracy and legitimacy? Financial stability?Financial stability?

Page 38: 148 john shaw2006fall

ResourcesResources Archiving should be done by librarians ad archivists, period. Gordon Archiving should be done by librarians ad archivists, period. Gordon

Tibbitts, Blackwell Publishing. April 4, 2006 UKSGTibbitts, Blackwell Publishing. April 4, 2006 UKSG Portico - Portico - http://www.portico.org/http://www.portico.org/ LOCKSS - LOCKSS - http://lockss.stanford.eduhttp://lockss.stanford.edu CLOCKSS - CLOCKSS - http://www.lockss.org/clockss/Homehttp://www.lockss.org/clockss/Home KB E-Depot - KB E-Depot - http://www.kb.nl/index-en.htmlhttp://www.kb.nl/index-en.html DepotDigital Archiving at the national library of the Netherlands-

http://www-5.ibm.com/be/pdf/en/events/nextlevel/presentation_kb_den_haaghttp://www-5.ibm.com/be/pdf/en/events/nextlevel/presentation_kb_den_haag_edepot_ibm_brussels_v03.pdf_edepot_ibm_brussels_v03.pdf

““A Fresh Look at the Reliability of Long term Digital Storage.” Baker, Mary, A Fresh Look at the Reliability of Long term Digital Storage.” Baker, Mary, et al.. EuroSys '06, April 18-21, 2006et al.. EuroSys '06, April 18-21, 2006

Digital Archives & Repositories: Why should I care? – Bernard Hecker, Digital Archives & Repositories: Why should I care? – Bernard Hecker, HighWire Press, Publishers Meeting, October 2004HighWire Press, Publishers Meeting, October 2004

Archive Overview, – Bernard Hecker, HighWire Press, Publishers Meeting, Archive Overview, – Bernard Hecker, HighWire Press, Publishers Meeting, April 2006April 2006

Trusted Digital Repositories: Attributes and Responsibilities An RLG-OCLC Report. © 2002 Research Libraries Group

British Library: Project: JCLD Pilot Project in Anticipation of E-Journals, June 2005 Simon Inger

Note: Presentation based on Digital Archives & Repositories: Why should I care? – Digital Archives & Repositories: Why should I care? – Bernard Hecker, HighWire Press, Publishers Meeting, October 2004; Archive Bernard Hecker, HighWire Press, Publishers Meeting, October 2004; Archive Overview. Bernard Hecker, HighWire Press, Publishers Meeting, April 2006; Overview. Bernard Hecker, HighWire Press, Publishers Meeting, April 2006; Archiving: A SAGE Example. John Shaw. Publishers Meeting, April 2006 Archiving: A SAGE Example. John Shaw. Publishers Meeting, April 2006

Page 39: 148 john shaw2006fall

Thank You!Thank You!

Contact info:Contact info: [email protected]@sagepub.com

www.sagepub.comwww.sagepub.com