Building a Digital Archives for the City of Vancouver Glenn Dingwall [email protected] 14...

26
Building a Digital Archives for the City of Vancouver Glenn Dingwall [email protected] 14 September, 2011

Transcript of Building a Digital Archives for the City of Vancouver Glenn Dingwall [email protected] 14...

Building a Digital Archives for the City of Vancouver

Glenn Dingwall [email protected]

14 September, 2011

Project Context

2004-2006 VanRIMS Classification Project

2008-2009 VanDOCS ERDMS Project

2009-2010 Olympic Legacy Project

Project Phases

I - Proof of Concept (2008-2009)• Public records• Controlled creation environment

II – Prototype (2009-2010)• Private records• Uncontrolled creation environment

Initial Assumptions

• Use OAIS (Open Archival Information System Reference Model) as a starting point

• Progressively add to requirements, drawing from:– General Preservation Standards

• InterPARES• RLG/OCLC Trusted Digital Repositories (TDR)

– Task specific• E.g., PREMIS metadata

– Institution specific requirements

CoV Digital Archives: Producers and Consumers

Digital Preservation: The Business Case

• Technology obsolescence• Technology incompatibility• Long-term access and useability

Alternatives – What’s out there already?

Already many free/open source tools available:

Repository DSpace FEDORA Greenstone

Ingest Tools JHOVE DROID XENA

Access Archivist’s Toolkit ICA AtoM

Each only does a small part in the preservation chain, no start-to-finish single solution

So, what can we do with the existing tools?

Can we piece all of the various components together to come up with a complete Digital Preservation system?

Constraints:• Use open source tools wherever possible• Lightweight system architecture• Architecturally independent components

What is OAIS?

OAIS (=Open Archival Information System)• ISO 14721:2003• Is a high level reference model• Defacto standard for discussing digital

preservation concepts at this level• Important concepts include

– Information Model– Functional Entities– Mandatory Responsibilities

OAIS Information Model

Information Packages contain:– Content (records)– PDI = Preservation Description Information (metadata)– Packaging Information

Three types of Information Packages:SIP = Submission Information Package (what we get)AIP = Archival Information Package (what we preserve)DIP = Dissemination Information Package (what we

provide)

Information Package Model

Preservation Description Inform ation (PDI - “m etadata”)

Context

P rovenance

F ix ity

Reference

Content Inform ation

F ile 1

F ile 2

F ile n

...

Packaging

OAIS Responsibilities

• Accept submissions from Producer• Establish control over material• Implement long-term preservation policies• Determine who the users are (“designated

Community”)• Ensure preserved information is

understandable to users• Provide access

OAIS Functional Entities

• Establishes the main functional components of the system

• Defines the relationships of the components to each other in terms of the information that passes between them

OAIS Functional Entities

Data Management

Access

Archival Storage

Ingest

Preservation Planning Administration Management

SIP

AIP DIP

City of Vancouver Archives Implementation

50TB NAS

Archivematica ICA AtoM

Met

adat

a

Data Management

Access

Archival Storage

Ingest

Preservation Planning Administration Management

Search Queries

SIP

AIP

DIP

Archivematica

Archivematica Pipeline

ArchivematicaSIP- Content - Metadata

Ingest

Archivematica Pipeline

ArchivematicaSIP- Content - Metadata

Ingest

AIP- Original Content - Metadata

+- Normalized Content- Preservation Metadata

to Archival Storage

Archivematica Pipeline

ArchivematicaSIP- Content - Metadata

Ingest

AIP- Original Content - Metadata

+- Normalized Content- Preservation Metadata

to Archival Storage

DIP- Access Copies- Descriptive Metadata

To Access System

Ingest Workflow Summary

Receive SIP

Audit SIP

Characterize Content

Normalize Content

Appraise Content

Package AIP

Store AIP/Upload DIP

Micro-services

Create SIP backup Characterize and extract metadata Scan for viruses in submission documentation

Verify SIP compliance Set file permissionsCharacterize and extract metadata in

submission documentation

Assign file UUIDs and checksums Appraise SIP for preservation Normalize submission documentation

Verify metadata directory checksumsScan for removed files post appraise SIP for

preservation Remove files without PREMIS

Remove thumbs.db files Create DIP directory Verify PREMIS checksums

Create Dublin Core template Normalize Compile METS

Set file permissions Set file permissions Add Dublin Core to METS

Appraise SIP for submission Approve normalization Copy METS to DIP directory

Scan for removed files post appraise SIP for submission Check for submission documentation Generate DIP

Place in quarantineMove Submission Documentation into objects

directory Set file permissions

Remove from quarantineAssign file UUIDs and checksums to

submission documentation Prepare AIP

Extract packages Extract packages in submission documentation Upload DIP

Sanitize file and directory namesSanitize file and directory names in submission

documentation Store AIP

Scan for viruses

Media type File formatsPreservation

format(s) Access format(s) Normalization tool

Audio AC3, AIFF, MP3, WAV, WMA WAVE (LPCM) MP3 FFmpeg

Email PST MBOX MBOX readpst

Office Open XML DOCX, PPTX, XLSX Original format PDF for PPTX OpenOffice

Plain text TXT Original format Original format None

Portable Document Format

PDF PDF/A PDF Ghostscript

Presentation files PPT ODF PDF OpenOffice

Raster images

BMP, GIF, JPG, JP2*, PCT, PNG*, PSD, TIFF, TGA

Uncompressed TIFF JPEG ImageMagick

Raw camera files/Digital Negative format**

3FR, ARW, CR2, CRW, DCR, DNG, ERF, KDC, MRW, NEF, ORF, PEF, RAF, RAW, X3F Original format JPEG ImageMagick/UFRaw

Spreadsheets XLS ODF Original format OpenOffice

Vector images AI, EPS, SVG SVG PDF Inkscape

Video

AVI, FLV, MOV, MPEG-1, MPEG-2, MPEG-4, SWF, WMV MPEG-2 MPG FFmpeg

Word processing files DOC, WPD, RTF ODF PDF OpenOffice

Media Type Preservation Plans

GIS Preservation Questions

• Appropriate formats• Acceptable losses during

migration/normalization• Availability of normalization software• Availability of viewing software• Necessary metadata

Archivematica Collaborators

• Artefactual Systems Inc.• City of Vancouver Archives• International Monetary Fund• University of British Columbia Library• Rockefeller Archive Centre

Documentation Wikis

Vancouver Digital Archives Project• http://artefactual.com/wiki/index.php?

title=Vancouver_Digital_Archives

Archivematica• http://archivematica.org/wiki

Qubit (ICA-AtoM)• http://qubit-toolkit.org/wiki