Northwestern digital repository initiative: platform and persistence

28
repository initiative: Platform and persistence

description

Introduction to and overview of digital repository projects at Northwestern University, developed for a guest lecture at the Dominican University Graduate School of Library and Information Science Digital Curation course. Presentation based in part on an earlier presentation developed by Steve DiDomenico and Claire Stewart

Transcript of Northwestern digital repository initiative: platform and persistence

Page 1: Northwestern digital repository initiative: platform and persistence

Northwestern digital repository initiative:

Platform and persistence

Page 2: Northwestern digital repository initiative: platform and persistence

Claire StewartDirector, Center for Scholarly Communication and Digital CurationHead, Digital Collections, Library Technology DivisionNorthwestern [email protected]

Page 3: Northwestern digital repository initiative: platform and persistence

What is a repository and why should I care?

Page 4: Northwestern digital repository initiative: platform and persistence

Library as institutional memory

Page 5: Northwestern digital repository initiative: platform and persistence

Tweeted in 2012 by Gail Steinhart, Head of Research Services, Mann Library, Cornell University

Page 6: Northwestern digital repository initiative: platform and persistence

Vines, T. H., Albert, A. Y. K., Andrew, R. L., Débarre, F., Bock, D. G., Franklin, M. T., … Rennison, D. J. (2013). The Availability of Research Data Declines Rapidly with Article Age. Current Biology, 24(1), 94–97. doi:10.1016/j.cub.2013.11.014

“The major cause of the reduced data availability for older

papers was the rapid increase in the proportion of data sets

reported as either lost or on inaccessible storage media. For

papers where authors reported the status of their data, the

odds of the data being extant decreased by 17% per year

(Figure 1D).” [emphasis added]

The Availability of Research Data Declines Rapidly with Article Age

Page 7: Northwestern digital repository initiative: platform and persistence

What is a repository and why should I care?

A concept

TheRepository

All the stuff

A set of technologies

Page 8: Northwestern digital repository initiative: platform and persistence

Technologies and architecture

Page 9: Northwestern digital repository initiative: platform and persistence

Repository as service• Description and characterization - descriptive, provenance and technical

metadata

• Selection, conversion, digitization

• Deposit and versioning

• Interoperability, APIs for ingest, discovery

• Access control, copyright support and other legal/regulatory compliance

• Persistence –Stable, permanent links (URLs, DOIs, etc.)

–Health of digital objects

–Replication and dark archiving

–Migration or emulation, virtualization

Page 10: Northwestern digital repository initiative: platform and persistence

What’s already in our repository

digital.library.northwestern.edu

Page 11: Northwestern digital repository initiative: platform and persistence

Maps of Africa

First Fedora project @ NU

2006 project, internally funded

116 antique maps at high resolution

Page 13: Northwestern digital repository initiative: platform and persistence

Archival finding aids

findingaids.library.northwestern.edu Archon for EAD, Fedora + Blacklight for storage and discovery, Primo syndication

Page 15: Northwestern digital repository initiative: platform and persistence

Northwestern Books and the Book Workflow Interface

2009

Mellon-funded

Now used for all in-house book digitization

books.northwestern.edu

Page 16: Northwestern digital repository initiative: platform and persistence

Every page of each digitized book has this information:Datastream ID MIMETYPE Schema/ontology

Dublin Core metadata DC text/xml OAI_DC

MODS metadata MODS text/xml MODS

Relationship metadata RELS-EXT text/xml RELS-EXT

OCR PDF file PDF application/pdf

OCR XML OCR XML text/xml ABBYY OCR

OCR Text OCR TEXT text/plain

Source camera image file ARCHV-IMG image/jpeg

Source technical metadata in MIX ARCHIV-TECHMD text/xml MIX

Source camera technical metadata in EXIF ARCHV-EXIF text/xml Exif as XML

Corrected image file PROC-IMG image/jpeg

Corrected image technical metadata in MIX PROC-TECHMD text/xml MIX

Delivery image JPEG2000 file DELIV-IMG image/jp2

Delivery image technical metadata in MIX DELIV-TECHMD text/xml MIX

SVG for delivery mechanism DELIV-OPS text/xml SVG

Viewer html HTML text/html HTML

Page 17: Northwestern digital repository initiative: platform and persistence

By the numbers — # of objectsAs of November 2013:

• Finding aids: 1,114

• Digitized books: 3,491

• Digitized book pages: 835,806

• Image objects: 216,271

• A few others, including 3D objects, and collection objects

A total of 1,187,414 objects in the repository

Every object has several datastreams (files, descriptive metadata, technical metadata, etc.)

Page 18: Northwestern digital repository initiative: platform and persistence

By the numbers — storageAs of Feb 5, 2014:97.1 TB of content on repository (including digitized collections

queued for ingestion) and JPEG2000 server.

Library & NUIT purchased 200 TB of storage replicated between Evanston and Chicago campuses (that is over 400 TB in total).

Page 19: Northwestern digital repository initiative: platform and persistence

Digital preservation/persistence• Persistent URLs• Mirrored storage (as of fall 2014)• PREMIS (preservation) metadata• Routine health checks for data• Geographically distributed storage• Dark archives• Migration/virtualization services

Page 20: Northwestern digital repository initiative: platform and persistence

Distributed storage and dark archives

• DuraCloud• Amazon Glacier• Digital Preservation Network (DPN)

Page 21: Northwestern digital repository initiative: platform and persistence

Current repository projects

• Digital Image Library (DIL)

• Avalon

• Hydramata

Page 22: Northwestern digital repository initiative: platform and persistence

HydraNorthwestern joined 2011

Framework for repository applications using Ruby on Rails

Community with 22 partners

Page 23: Northwestern digital repository initiative: platform and persistence

2007 Provost funded move from Art History to the Library, expansion to other disciplines

115,000 images in Hydra + Fedora

Moving all legacy digital collections into DIL & its Hydra counterparts in 2014-2015

images.northwestern.edu

Digital Image Library (DIL)

Page 24: Northwestern digital repository initiative: platform and persistence

AvalonIMLS-funded project with

Indiana UniversityReleases:• 0 July 2012

• .5 October 2012

• 1.0 May 2013

• 2.0 October 2013 (NU pilot)

First NU production with R3, expected in next month

media.northwestern.edu (dev/demo)

Page 25: Northwestern digital repository initiative: platform and persistence

Scholarly communication and digital curation

• Options for archiving scholarly materials

• Authors rights, copyright help and education, open access support

• E-science and research data life cycle

• Digital humanities

• Library-based publishing

• Responding to funder requirements

Page 26: Northwestern digital repository initiative: platform and persistence

Hydramata (formerly Shared IR)

Five-institution project to develop a next-generation institutional repository solution in Hydra

Page 27: Northwestern digital repository initiative: platform and persistence

Expanding our repository program• Massive storage, planning for growth, sustainability

• Digital preservation serviceso Offsite third copy (DPN, DuraCloud, Glacier)o Verification services

• Research computingo Research data lifecyle - how to capture metadata early? what to

keep?o Automate deposit from Vault?

• Shared infrastructure and services whenever possible

• Deeper collaboration with NUIT, Research, central admin, schools

Page 28: Northwestern digital repository initiative: platform and persistence

Discussion and questionsClaire StewartDirector, Center for Scholarly Communication and Digital CurationHead, Digital Collections, Library Technology DivisionNorthwestern [email protected]