1 Archiving Workshop (Soleil, May 2010) Archiving System Status.
Archiving What is it and why should it be important to me? John Shaw Director, Publishing...
-
Upload
jewel-dean -
Category
Documents
-
view
215 -
download
0
Transcript of Archiving What is it and why should it be important to me? John Shaw Director, Publishing...
ArchivingArchivingWhat is it and why What is it and why
should it be important should it be important to me?to me?
John ShawJohn ShawDirector, Publishing Director, Publishing
TechnologiesTechnologiesSAGE Publications, U.S.SAGE Publications, U.S.
I. Archiving Overview
II. Types of Archives
II. A SAGE Example
IV. Risks, Questions, and More Questions
What is an What is an ArchiveArchive??
An authoritative collectionAn authoritative collection Preserved and professionally managed Preserved and professionally managed in in
perpetuityperpetuity History, institutional commitment & policy, History, institutional commitment & policy,
integrity re: preservationintegrity re: preservation “…“…information needed for society’s information needed for society’s
memory.” memory.” "Schellenberg in Cyberspace," "Schellenberg in Cyberspace," American American ArchivistArchivist 61:2 (Fall 1998), p. 309-327. 61:2 (Fall 1998), p. 309-327.
Preservation firstPreservation first
What is a Repository?What is a Repository?
““A place where things can be stored and A place where things can be stored and maintained; a storehouse.”maintained; a storehouse.” [Society of American Archivists Glossary][Society of American Archivists Glossary]
““Depository” is sameDepository” is same also library that receives government also library that receives government
documents to public accessdocuments to public access Not all repositories are archivesNot all repositories are archives
Why Care?Why Care?
““Preserving information for decades or even Preserving information for decades or even centuries has proved important. Shang dynasty centuries has proved important. Shang dynasty (12th century BC) Chinese astronomers (12th century BC) Chinese astronomers inscribed eclipse observations on “oracle inscribed eclipse observations on “oracle bones" (animal bones and tortoise shells). bones" (animal bones and tortoise shells). About 3200 years later researchers used these About 3200 years later researchers used these records, together with one from 1302BC, to records, together with one from 1302BC, to estimate that the accumulated clock error was estimate that the accumulated clock error was just over 7 hours, and from this derived a value just over 7 hours, and from this derived a value for the viscosity of the Earth's mantle as it for the viscosity of the Earth's mantle as it rebounds from the weight of the glaciers..”rebounds from the weight of the glaciers..”
****************
Why Care?Why Care?
““These timescales of many decades, even centuries, contrast with the typical 5-year lifetime for computing hardware and digital media”” “A Fresh Look at the Reliability of Long term Digital Storage.” “A Fresh Look at the Reliability of Long term Digital Storage.” Baker, Mary, et al.. EuroSys '06, April 18-21, 2006Baker, Mary, et al.. EuroSys '06, April 18-21, 2006
Preservation: Digital information is Preservation: Digital information is impermanentimpermanent
Publisher: Safety Publisher: Safety to insure ongoing availability of your to insure ongoing availability of your
contentcontent Your library customers: CustodianshipYour library customers: Custodianship
to insure continuity of the record of to insure continuity of the record of scientific progressscientific progress
Very long view: epistemology, history Very long view: epistemology, history of science and cultureof science and culture
Why Care?Why Care?
What Should be What Should be Preserved?Preserved?
Scholarly contentScholarly content Research materialsResearch materials Web-based, digitally born contentWeb-based, digitally born content
How e-Archives DifferHow e-Archives Differ Mission: collection v. preservationMission: collection v. preservation Access control, dark v. lightAccess control, dark v. light Deposits Deposits
Why: voluntary v. mandatedWhy: voluntary v. mandated Who: author v. publisherWho: author v. publisher What: manuscripts v. final workWhat: manuscripts v. final work When: backfile v. current contentWhen: backfile v. current content
Future format migrationFuture format migration Rights transferRights transfer CostsCosts
Types of Archives:Types of Archives:
National archivesNational archives Institutional repositoriesInstitutional repositories Community-based archivesCommunity-based archives Product solution archivesProduct solution archives
Types of Archives:Types of Archives:NationalNational
Dutch National libraryDutch National libraryKoninklijke BibliotheekKoninklijke Bibliotheek (KB) (KB)
British LibraryBritish Library NIH – PubMedCentral?NIH – PubMedCentral?
““NIH’s digital repository for biomedical NIH’s digital repository for biomedical research”research”
Library of Congress?Library of Congress?
KB: Dutch National KB: Dutch National LibraryLibrary
Mission: Legal deposit libraryMission: Legal deposit library “…“…collect, catalogue and preserve all collect, catalogue and preserve all
publications appearing in the Netherlands. ”publications appearing in the Netherlands. ” Capable of ingesting 60,000 articles/dayCapable of ingesting 60,000 articles/day
Deposits: Source files from publishersDeposits: Source files from publishers Automated, strictAutomated, strict
Costs?Costs? Access Control:Access Control:
Local patron accessLocal patron access Publisher sets remote access rulesPublisher sets remote access rules
KB: Dutch National KB: Dutch National LibraryLibrary
Migration: Preservation research leaderMigration: Preservation research leader Committed to format migrationCommitted to format migration
Archiving agreements with:Archiving agreements with: OUP, Sage, Blackwell, Elsevier, Kluwer OUP, Sage, Blackwell, Elsevier, Kluwer
Academic, etc.Academic, etc.
The British Library The British Library Legal Deposit PilotLegal Deposit Pilot
Mission: Legal deposit libraryMission: Legal deposit library UK-published (to start)UK-published (to start)
Pilot: Legal deposit for e-journalsPilot: Legal deposit for e-journals 23 volunteer publishers23 volunteer publishers Secure infrastructureSecure infrastructure
Uses DigiTool by Ex-LibrisUses DigiTool by Ex-Libris Shared with the Shared with the otherother UK legal deposit UK legal deposit
librarieslibraries To “scope and test” ingest, storage, retrievalTo “scope and test” ingest, storage, retrieval
Cost?Cost?
The British Library: The British Library: Preservation and Preservation and
MigrationMigration BL’s future for managing digital assetsBL’s future for managing digital assets
preserve any type of digital material in preserve any type of digital material in perpetuity perpetuity
Migration Migration ensure that users can view the material with ensure that users can view the material with
contemporary applications contemporary applications preserve the original look-and-feel where preserve the original look-and-feel where
possiblepossible Access ControlAccess Control
““appropriate permissions” appropriate permissions”
PMC: US PMC: US National National Library of MedicineLibrary of Medicine
Journal ArchiveJournal Archive Mission: Make research more accessibleMission: Make research more accessible FreeFree full-text archive of 230 journals full-text archive of 230 journals Deposit: publishers submit source filesDeposit: publishers submit source files MigrationMigration Access ControlAccess Control Cost?Cost?
PMC: Depository for PMC: Depository for NIH-Funded Research NIH-Funded Research
ArticlesArticles Authors of NIH-funded articles Authors of NIH-funded articles
“encouraged” to deposit final manuscript “encouraged” to deposit final manuscript ““After all modifications due to …peer review”After all modifications due to …peer review” MS Word, PDF, etc.MS Word, PDF, etc. With supplementary informationWith supplementary information Publisher can replace with published versionPublisher can replace with published version
To be required soon?To be required soon?
Library of CongressLibrary of Congress
National Digital Information Infrastructure and Preservation Program (NDIIPP) – formed in 2000 Members: National Library of Medicine, the
National Agricultural Library, the National Institute of Standards and Technology, the Research Libraries Group, the OCLC Online Computer Library Center, and the Council on Library and Information Resources
Preliminary investigation and software development Preliminary investigation and software development phasephase
Primarily e-journal depositPrimarily e-journal deposit Future …???Future …???
Types of Archives:Types of Archives:InstitutionalInstitutional
University with expansive focusUniversity with expansive focus Stanford Digital RepositoryStanford Digital Repository
AutomatedAutomated LOCKSS LOCKSS
Stanford Digital Stanford Digital RepositoryRepository
Stanford Univ. Libraries initiativeStanford Univ. Libraries initiative Digital preservation servingDigital preservation serving
Stanford UniversityStanford University Broader academic communityBroader academic community PublishersPublishers
Principles: Trust, Security, TransparencyPrinciples: Trust, Security, Transparency Costs?Costs?
LOCKSSLOCKSS Technology to preserve Technology to preserve locallocal library collection library collection Automated, self-correcting Automated, self-correcting cachecache servers servers
Requires LOCKSS server at libraryRequires LOCKSS server at library Requires publisher participationRequires publisher participation Builds collection of all resources which the Builds collection of all resources which the
institution licensesinstitution licenses Goes online to users if data source becomes Goes online to users if data source becomes
unavailableunavailable Provides access to static “HTML images” of Provides access to static “HTML images” of
sourcesource CostsCosts
Types of Archives:Types of Archives:Product SolutionProduct Solution
Non-profit organizationNon-profit organization PorticoPortico
PorticoPortico
Mission: scholarly preservationMission: scholarly preservation Standalone archive Standalone archive Initiated by JSTOR, with grant fundingInitiated by JSTOR, with grant funding
Deposits: source files from publisherDeposits: source files from publisher Migration: plannedMigration: planned Costs Costs
Publishers annual fee $250 to $75,000Publishers annual fee $250 to $75,000 based on annual revenuebased on annual revenue
Libraries annual fee $1,500 to $24,000Libraries annual fee $1,500 to $24,000 based on Library Materials Expenditurebased on Library Materials Expenditure
Portico: Access ControlPortico: Access Control
Member libraries get access:Member libraries get access: ““when specific trigger events occur, and when specific trigger events occur, and
when titles are no longer available from the when titles are no longer available from the publisher or other source.” publisher or other source.”
Trigger events include:Trigger events include: Publisher stops operationsPublisher stops operations Publisher ceases to publish a titlePublisher ceases to publish a title Publisher no longer offers back issuesPublisher no longer offers back issues Catastrophic and sustained failure of a publisher’s Catastrophic and sustained failure of a publisher’s
delivery platformdelivery platform
Can also fulfill “perpetual access” Can also fulfill “perpetual access” subscription obligationssubscription obligations
Types of Archives:Types of Archives:CommunityCommunity
Community based and openly runCommunity based and openly run CLOCKSSCLOCKSS
CLOCKSSCLOCKSS (Controlled LOCKSS)(Controlled LOCKSS)
Long-term global archiving solutionLong-term global archiving solution Community-managed, failsafe repository for scholarly Community-managed, failsafe repository for scholarly
contentcontent Serve libraries & publishers in the event of a long-term Serve libraries & publishers in the event of a long-term
business interruptionbusiness interruption Publishers participation is voluntaryPublishers participation is voluntary
Small number library participants maintain the Small number library participants maintain the archive on behalf of larger communityarchive on behalf of larger community libraries preserve member publisher content whether libraries preserve member publisher content whether
they subscribe or notthey subscribe or not Release only after a trigger eventRelease only after a trigger event
Publisher, libraries, and society collaborative decision Publisher, libraries, and society collaborative decision to releaseto release
““cost sharing” for system, not accesscost sharing” for system, not access Costs?Costs?
Summary TableSummary Table
AgencyAgency Primary Primary MissionMission
DataData A/CA/C MigratioMigrationn
KBKB Gov’tGov’t PreservPreserv PubPub TwilightTwilight YesYes
BLBL Gov’tGov’t PreservPreserv PubPub ?? YesYes
PorticoPortico Ind.Ind. FailsafeFailsafe PubPub DarkDark YesYes
PMCPMC Gov’tGov’t AccessAccess Pub, Pub, AuthoAuthorr
LightLight YesYes
LoCLoC Gov’tGov’t PreservPreserv PubPub ?? ??
SDRSDR Inst.Inst. PreservPreserv PubPub TwilightTwilight YesYes
LOCKSSLOCKSS Inst.Inst. FailsafeFailsafe PubPub DarkDark --
CLOCKSCLOCKSSS
Comm.Comm. FailsafeFailsafe PubPub DarkDark --
Summary:Summary:How Repositories DifferHow Repositories Differ
Stated purposeStated purpose Dark v. light Dark v. light Complete backfile v. current onlyComplete backfile v. current only DepositsDeposits
Who: author v. publisherWho: author v. publisher What: manuscripts v. final workWhat: manuscripts v. final work Why: voluntary v. mandatedWhy: voluntary v. mandated
Rights transferRights transfer Access controlAccess control CostsCosts
Why Archive?Why Archive? SAGE’s commitment to customers and SAGE’s commitment to customers and
partnerspartners Critical to society arrangementsCritical to society arrangements Essential for new e-sales (consortia + Essential for new e-sales (consortia +
single institutions) – Perpetual accesssingle institutions) – Perpetual access Business continuityBusiness continuity Long-term preservationLong-term preservation We are not archiving experts!We are not archiving experts!
Where to Archive?Where to Archive?
Dutch KBDutch KB CLOCKSSCLOCKSS LOCKSS LOCKSS Portico Portico Library of CongressLibrary of Congress British LibraryBritish Library
How to Archive?How to Archive? Provide details of digital availabilityProvide details of digital availability Provide sample of contentProvide sample of content Provide details of content format Provide details of content format
(DTD)(DTD) Send all backfile for loadingSend all backfile for loading Set up content flow for ongoing Set up content flow for ongoing
contentcontent
SAGE Experience with SAGE Experience with DutchKBDutchKB
Contract and negotiationContract and negotiation Contact with technical teamContact with technical team Delivery of samples and details of Delivery of samples and details of
scopescope Follow-up questionsFollow-up questions Visit KB – Find out what’s happeningVisit KB – Find out what’s happening
Delivery of back contentDelivery of back content
Delivery of ongoing issuesDelivery of ongoing issues
Ongoing issue discrepanciesOngoing issue discrepancies
Archiving Part IV: Archiving Part IV: Questions, Questions Questions, Questions and More Questionsand More Questions
Measurements of Measurements of SuccessSuccess
Who is overseeing the archiving process Who is overseeing the archiving process and governance?and governance?
Compliance?Compliance? Accuracy and legitimacy?Accuracy and legitimacy? Financial stability?Financial stability?
ResourcesResources Archiving should be done by librarians ad archivists, period. Gordon Archiving should be done by librarians ad archivists, period. Gordon
Tibbitts, Blackwell Publishing. April 4, 2006 UKSGTibbitts, Blackwell Publishing. April 4, 2006 UKSG Portico - Portico - http://www.portico.org/http://www.portico.org/ LOCKSS - LOCKSS - http://lockss.stanford.eduhttp://lockss.stanford.edu CLOCKSS - CLOCKSS - http://www.lockss.org/clockss/Homehttp://www.lockss.org/clockss/Home KB E-Depot - KB E-Depot - http://www.kb.nl/index-en.htmlhttp://www.kb.nl/index-en.html DepotDigital Archiving at the national library of the Netherlands-
http://www-5.ibm.com/be/pdf/en/events/nextlevel/presentation_kb_den_haaghttp://www-5.ibm.com/be/pdf/en/events/nextlevel/presentation_kb_den_haag_edepot_ibm_brussels_v03.pdf_edepot_ibm_brussels_v03.pdf
““A Fresh Look at the Reliability of Long term Digital Storage.” Baker, Mary, A Fresh Look at the Reliability of Long term Digital Storage.” Baker, Mary, et al.. EuroSys '06, April 18-21, 2006et al.. EuroSys '06, April 18-21, 2006
Digital Archives & Repositories: Why should I care? – Bernard Hecker, Digital Archives & Repositories: Why should I care? – Bernard Hecker, HighWire Press, Publishers Meeting, October 2004HighWire Press, Publishers Meeting, October 2004
Archive Overview, – Bernard Hecker, HighWire Press, Publishers Meeting, Archive Overview, – Bernard Hecker, HighWire Press, Publishers Meeting, April 2006April 2006
Trusted Digital Repositories: Attributes and Responsibilities An RLG-OCLC Report. © 2002 Research Libraries Group
British Library: Project: JCLD Pilot Project in Anticipation of E-Journals, June 2005 Simon Inger
Note: Presentation based on Digital Archives & Repositories: Why should I care? – Digital Archives & Repositories: Why should I care? – Bernard Hecker, HighWire Press, Publishers Meeting, October 2004; Archive Bernard Hecker, HighWire Press, Publishers Meeting, October 2004; Archive Overview. Bernard Hecker, HighWire Press, Publishers Meeting, April 2006; Overview. Bernard Hecker, HighWire Press, Publishers Meeting, April 2006; Archiving: A SAGE Example. John Shaw. Publishers Meeting, April 2006 Archiving: A SAGE Example. John Shaw. Publishers Meeting, April 2006
Thank You!Thank You!
Contact info:Contact info: [email protected]@sagepub.com
www.sagepub.comwww.sagepub.com