Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library...

32
Migrating Repository Metadata & Users: The Harvard DRS2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014

Transcript of Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library...

Page 1: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migrating RepositoryMetadata & Users:

The Harvard DRS2 Project

Andrea Goethals, Harvard LibraryIS&T Archiving 2014, May 15 2014

Page 2: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Library Digital Initiative Funds (1998-)

• Build technical infrastructure - Digital Repository Service (DRS)

• Hire specialists• Build digital collections via 49 internal grants

to be preserved in the DRS

Page 3: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Oct-00

Oct-01

Oct-02

Oct-03

Oct-04

Oct-05

Oct-06

Oct-07

Oct-08

Oct-09

Oct-10

Oct-11

Oct-12

Oct-13

0

10

20

30

40

50

60

DRS Users Grew to 55 Organizational Units at Harvard

Page 4: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

DRS is Central to User Workflows

• DRS

• Access (discovery,

search, delivery

platforms)

• Ingest (deposit

tools)

• Manage (cataloging

& manageme

nt tools)

• reformatting labs;

automated system

deposits; library,

archives and

museum staff

• reformatting labs ; library,

archives and

museum staff;

repository managers

• researchers, teachers, learners

Page 5: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Why a New DRS?

• Upgrade to best-in-breed technologies• Adopt digital preservation best practices and

standards• Preserve metadata better• Improve collection management• Support preservation planning & activities• Improve access to content & metadata• Support more formats & genres

Page 6: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Evolution of the DRS

2000 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 20122001

DRS in

production

New DRS in productionDRS enhancements

New DRS infrastructuredevelopment

2013 2014 2015

New DRS metadata migration

& user adoption

Page 7: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

New DRS - Completed

2009 2010 2011 2012

convened DRS

Advisory Group

software in production

2013 2014 2015

users trained,phase 1

hardware in production

migrated content to new hardware

InfrastructureDevelopment

Metadata Migration

& User Adoption

Fedora assessment

DuraCloud pilot test

early release beta 1 beta 2

beta 3

first object deposited to the new

DRS

Page 8: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

New DRS – In Progress

2009 2010 2011 2012 2013 2014 2015

InfrastructureDevelopment

Metadata Migration

& User Adoption

metadata migration tools

created & tested

migrating metadata

moving users

Page 9: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Why “Metadata” Migration?

Why not“content” migration?

Page 10: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Pre-migration

DRS Content

DRSDatabase

Page 11: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Post-migration

DRS Content

DRS Database New DRS Database

New DRS Index

New DRS Object Descriptors

Page 12: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

New DRS Data Model

• Not a simple metadata conversion• A new DRS object is a logical intellectual entity

that unifies multiple DRS files, for example:– Still image objects - archival and production

masters, and deliverables including thumbnails – Audio objects - archival and production masters

and deliverables– PDS objects - page image and text files

Page 13: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Object Descriptors

• METS files generated for each object– Standards-based schemas (PREMIS, MODS, MIX,

etc.)• Metadata gathered from multiple sources

– Current DRS database– Every content file parsed using FITS– In some cases catalog records, finding aids, legacy

METS files

Page 14: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Technical Challenges

• Many formats• Unique migration rules per format • Preserving all identifiers• Uninterrupted access for end users• Large (>5000 file) page-turned documents• 46+ million DRS files -At 1 sec/file would

take 530+ days!

Page 15: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Formulating a Migration Plan

• Technical analysis– DRS content– Possible metadata sources

• User analysis– Management activity via system logs– Preparation via training and testing registration

lists– Perceived preparation & concerns via survey of

highest volume, active users

Page 16: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migration Plan

• Combines needs of users with technical requirements– Respects all technical requirements– Minimizes the time users need to work in two

systems at the same time

Page 17: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migrating Content in 5 Stages

Migrate 1st : Tier 1 contentMigrate 2nd: Tier 2 contentMigrate 3rd: Tier 3 contentMigrate 4th: Tier 4 contentMigrate 5th: Tier 5 content

Page 18: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migrating Content in 5 Stages

Migrate 1st : Tier 1 contentMigrate 2nd: Tier 2 contentMigrate 3rd: Tier 3 contentMigrate 4th: Tier 4 contentMigrate 5th: Tier 5 content

simpler objects

more complex objects

Page 19: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migrating Content in 5 Stages

Migrate 1st : Tier 1 contentMigrate 2nd: Tier 2 contentMigrate 3rd: Tier 3 contentMigrate 4th: Tier 4 contentMigrate 5th: Tier 5 content

dependenciesbetween tiers

dependencieswithin tiers

Page 20: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migrating Content in 5 StagesTier Content

1 Text (Methodology, ESRI World File), Document, Color Profile, Target Image

2 PDS Document, Still Image

3 Audio, Text (SMIL)

4 Web Harvest, Opaque Container

5 Biomedical Image; Google Document Container 1, 2, 3

Page 21: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migrating Content in 5 StagesTier Content

1 Text (Methodology, ESRI World File), Document, Color Profile, Target Image

2 PDS Document, Still Image

3 Audio, Text (SMIL)

4 Web Harvest, Opaque Container

5 Biomedical Image; Google Document Container 1, 2, 3

Page 22: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migrating Content in 5 StagesTier Content

1 Text (Methodology, ESRI World File), Document, Color Profile, Target Image

2 PDS Document, Still Image

3 Audio, Text (SMIL)

4 Web Harvest, Opaque Container

5 Biomedical Image; Google Document Container 1, 2, 3

Tiers 1, 3, 4, 5: Migrate across all DRS owner codes at one timeTier 2: Migrate one DRS owner code at a time

Page 23: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migrating Content in 5 StagesTier Content

1 Text (Methodology, ESRI World File), Document, Color Profile, Target Image

2 PDS Document, Still Image

3 Audio, Text (SMIL)

4 Web Harvest, Opaque Container

5 Biomedical Image; Google Document Container 1, 2, 3

Tiers 1, 3, 4, 5: Migrate across all DRS owner codes at one timeTier 2: Migrate one DRS owner code at a time

* Minimizes the amount of time the content they manage the most is in 2 different systems

Page 24: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Technical Strategies

• Modular, parallelizable migration design• Delivery services made migration-aware• Test, test, test• Design for migration failures – make do-overs

possible

Page 25: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Technical Strategy – Modular, Parallelizable

• 1) Group files into objects

• 2) Run FITS , combine with metadata to generate object descriptors

• 3) Ingest into new DRS

• Objects queue

• Descriptors ready queue

• END

• START

Page 26: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Tuning Experiments

• Single powerful computer– Dell R720 Server using Intel(R) Xeon(R) CPU E5-

2643 0 @ 3.30GHz CPU’s with 16 Cores, 64 GB of Memory and 1 TB of internal disk

– Various thread counts– 4-35 files processed per second

• Next: – RAM disk– Multiple computers

Page 27: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

User Strategies

• Advisors - DRS Advisory Group• Minimize disruption

– Tier 2 migration - one owner at a time– Close partners - Imaging Services

• Tapping help of experts – “pioneer” depositors, beta testers, trainers

• Regular communications monthly via HL Update

Page 28: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migration State Diagram

Page 29: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Migration Set Checklist

• Description of the affected content• List of steps needing human intervention, who

will do them, date of completion– includes communication, migration kickoff and

post-migration verification tasks• Final step – manager signs off on completion• Checklist is preserved

Page 30: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Learned So Far

• Can migrate in sub-second/file time• User-contributed metadata varies in quality

– Should automate more and/or put more validation checks in place

– Useful exercise to analyze metadata values and elements periodically

• errors in metadata values• value vs. effort of metadata elements

Page 31: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Preservation Capability Before and After the DRS2 Project

Level One Level Two Level Three Level Four

Storage & Geographic Location

File Fixity and Data Integrity

Information Security

Metadata

File Formats

= already compliant = will be compliant after the DRS2 project

Based on the NDSA Levels of Digital Preservation

Page 32: Migrating Repository Metadata & Users: The Harvard DRS 2 Project Andrea Goethals, Harvard Library IS&T Archiving 2014, May 15 2014.

Q & AThanks!

DRS Advisory GroupDRS beta testers

DCSWGBobbi Fox

Franziska FreyAndrea Goethals

Wendy GogelChip Goines

HUIT SecurityJonathan Kennedy

LTS OperationsSpencer McEwen

Grainne ReillyTracey Robinson

Randy SternJanet TaylorChris Vicary

Robin WendlerJulie WetherillVitaly Zakuta