Digital Preservation and Management. Preserving Digital Resources: Why is it an Issue? Technology...

Post on 28-Mar-2015

216 views 0 download

Tags:

Transcript of Digital Preservation and Management. Preserving Digital Resources: Why is it an Issue? Technology...

Digital Preservation and Management

Preserving Digital Resources: Why is it an Issue?

Technology obsolescenceDigital media life expectancyVariety of file formatsDigital rights managementCostsOrganizational resistance

Assumptions

Digital preservation is more challenging and complex than preservation of analog objectsDigital preservation is more than a technical preservation strategy

“THE” solution doesn’t existDigital preservation needs to be integrated into organizational culture

Assumptions

Change HappensFile formats matter

Non-proprietary is best; de facto standards are good

System architecture and documentation mattersOpen systems that can be moved to other platforms

Technology isn’t the whole solutionPolicies, planning, and resources

The community is just beginning to work on these issues – and everything is new and is changing

Terms

Digital Object: Any resource that can be stored or manipulated by a computerDigitized Resources: Any resource that has been digitized from an analog sourceBorn Digital: Any resource that was created digitally and will be managed and preserved digitally

Terms

Digital preservation/archiving: Storage, maintenance, and access to a digital object over the long term, usually as a consequence of applying one or more preservation strategies

Terms

Viability: maintenance of the bitstreamRenderability: viewable by humans and “processable” by computersUnderstandability: interpretable by humansFixity: The state or quality of being fixed or unchanged.Reliability: the digital objects are created in a trustworthy way. They are what they say they areAuthenticity: the digital object remains reliable over time

Digital Preservation Strategies

Bitstream CopyingRefreshing Durable/Persistent Media Technology Preservation Digital ArchaeologyAnalog BackupsMigration

ReplicationReliance on StandardsNormalizationCanonicalizationEmulationEncapsulationUniversal Virtual Computer

Trusted Digital Repositories

A repository whose mission is to provide reliable, long term access to managed digital resources to a community, now and in the future.

Trusted Digital Repositories

AttributesAdministrative responsibilityOrganizational viabilityFinancial sustainabilityTechnological suitabilitySystem securityProcedural accountabilityOAIS compliant

Trusted Digital Repositories

Implementation approaches will varyApproach will depend on:

ContextUsers (designated community)

Underlying issue remains constantFunctionalityReliability and authenticity

Open Archival Information System (OAIS) Reference Model

Conceptual framework for an archival system dedicated to preserving and maintaining access to digital information over the long termConsists of people and systems

http://ssdoo.gsfc.nasa.gov/nost/isoas/overview.html (overview)http://ssdoo.gsfc.nasa.gov/nost/wwwclassic/documents/pdf/CCSDS-650.0-B-1.pdf (standard)

OAIS: What is it?

Any organization or system charged with the task of preserving information over the long term and making it accessible to a specific group of users

An OAIS archive is expected to meet certain minimum responsibilities

OAIS: Minimum Responsibilities

Negotiate and accept appropriate information from information creatorsObtain sufficient control over the information to ensure preservationDetermine the scope of the “Designated Community” (the users)Ensure that users can understand the information without assistance from the information creators

OAIS: Minimum Responsibilities

Follow documented policies and procedures

Ensure preservationAuthenticate informationDisseminate (provide access to) information

Make the information available to the Designated Community

Preservation Planning

Monitoring technology and users; developing preservation actions

Preservation planning is part of the administration functions of any archival program; OAIS has highlighted it as a distinct function

Importance of constant and ongoing management and planning for digital preservation call for this

Components of a Digital Preservation Program

TDR and OAIS imply that there are three components of a digital preservation program

Resources Framework (trust)Organizational Infrastructure (policy)Technological Infrastructure (technology)

Resource Framework

Nothing is sustainable without ongoing commitment of resourcesA high level commitment to digital preservation must demonstrate an adequate resource commitment

Deliverables that meet the goalsLine item budgetsStaff commitmentStrategic planningProjections for costs and funding scenarios

Resource Framework

Commitment of resources (time, money, staff) implies organizational commitment and reflects organizational prioritiesStaffing is the expensive part!Curatorial functions

Appraising, acquiring, processing, metadata creation, ongoing management, access

Technical functionsComputer operation, system administrator, database administrator, storage administrator, application programmer, preservation expertise

Planning

Identify stakeholders and their rolesEducateAll partners need a desired outcome

Tangible or intangibleBuy-inMission, goals, outcomes

Organizational Infrastructure

Organizational and Curatorial Responsibilities

Policy frameworkOperational Responsibilities

Planning frameworkFunctions and roles

Organizational and Curatorial Responsibilities – Policy Framework

Strategic PlanCollection PolicySecurity PolicyPreservation PolicyAccess Policy

Strategic Plan

Overview and scope of the digital preservation program and its contextMission/PurposeHigh level goals and objectivesCommitment to OAIS and community best practicesRelated documentation and who is responsibleAdministrative/Oversight structureHigh level audience statement

Audience (Designated Community)

OAIS requirement

ExplicitAll collectionsPer collection

Audience=assumed knowledge and resources

Impacts of Audience Identification

The kinds of collections you will acceptThe kind of descriptive information (metadata) you will provideThe kind of services you will offer

Software, translatorsThe kind of preservation actions chosen

Significant propertiesThe access mechanisms you need to provide

Collection Policy

What kinds of digital resources are you going to collect and digitally preserve?Content considerations

Are you focusing on a specific content area?

Rights management considerationsMetadata responsibilities and requirementsRequirements for documenting acquisitions

Collection Policy

Technical considerationsDigitization with no physical counterpartDigitization with a physical counterpartAnything born digitalBorn digital that can’t be reformatted to eye readable

Collection Policy

Are there further limitations on what you will collect? (examples)

Non-proprietary formats onlySpecific formats only (TIFF)Systems/databases onlyDistinct documents onlyMinimum amount of metadata required at time of acquisitionMaterials that can be digitally reformatted in a specific way

Move everything to TIFF?Move everything to XML?

Documenting Acquisitions

OAIS requires agreements with depositors that address acquisition, maintenance, access and withdrawal

Should already be using these kinds of agreementsMay need to revise for digital materials, to include

What happens if functionality is lost?Is reformatting to eye readable an acceptable preservation option?What kind of access can you provide and is it acceptable?Are there digital-specific copyright issues to consider?

Documenting Acquisitions

May need to revise for digital materials, to include

Metadata creation responsibilitiesRights managementWhat level of functionality will be available from the digital repository?

Security Policy

System securityPhysical environmentBackup and recovery Fixity of the data (reliability)Disaster preparedness and responsePlanning and documentation requirementsAssign responsibility

Preservation Policy

Commitment to digital preservationGoals of digital preservationScope of materials

FormatsMetadata suppliers

Access commitments

Preservation Policy

Definition of overall preservation strategyAre there limitations?What happens if preservation actions go wrong?Is reformatting to eye-readable an acceptable preservation action? Under what circumstances?

Planning and documentation requirementsResponsibilities assigned

Operational Responsibilities

Based on work done by OAIS community to define the principle obligations of an OAIS compliant repositoryAppropriate planning documentation will be necessary to carry out operations

Specific planning based on strategic plan and policies

Operational Responsibilities

AcquisitionPhysical and intellectual controlDetermines audience (designated community)Follows policies and procedures to assure preservation of authentic informationAccessPromotes development of best practices and standards

Acquisition

Development of collection policiesIncludes specific required formats, if appropriate

Procedures and workflows for copyright clearance for access and preservationMetadata specifications and implementationProcedures to ensure the authenticity of submitted materialAssessment of the completeness of the submissionDocumentation of all acquisition transactions

Control

Preparing the materials for storage

Content analysisSignificant propertiesVerification of metadataUnique and persistent identifier assignedAuthenticity and integrity checkMove to archival storage

Preservation Actions

Monitoring of technology and the digital materialsTechnology watchPreservation planning

Classes of materialActions to be takenDocumentation of actions and resultsFunctionality considerations

Access

A system for resource discoveryMechanism for authenticity checkAccess control mechanismsUser support

Standards and Best Practices

Promote and utilizeResults in economies of scaleCreation of high quality digital resources that are more amenable to preservation

Work with software suppliers, potential depositors, designated communities

In-house

Significant investmentTechnical expertiseWorkflow impacts

Maintain physical control

Outsource

Can the service provider meet your needs and requirements?Less investment?

No cost models to show if this is accurate

Less reliance on in-house technical expertise and infrastructure necessaryWhat happens if the service provider goes out of business?

Combination

Build what you canBuild what you need that can’t be outsourcedBuy what you can’t build

Now, digital repositories…

OAIS Metadata Implications

Metadata is data that facilitates the management, description, and preservation of a digital object or aggregation of digital objects. Standards and best practices are developed to promote the creation of metadata to it supports interoperability and collaboration. Metadata setsMetadata encoding schema

Types of Metadata

DescriptiveTechnicalStructuralAdministrativePreservation

Metadata

Each type of metadata will be needed to facilitate the preservation and usability of born digital materialUse standards and best practice metadata setsThink interoperability

TechnologicallyElement sets

Immediate Actions

Get Your Team TogetherIdentify your needs

Do you really need a digital repository right NOW?Is there an interim solution until the field is more settled?

Agree on vision and goalsPlan

Immediate Actions

Discuss strategyCommunication

Any institutional repository depends on a relationship with IT staff

PrioritiesLanguage barriers

Immediate Actions

Identify the organizational infrastructure changes that need to be madeInvestigate existing tools and digital repositoriesLearn and experiment with existing toolsMake high level decisions

What kind of digital materials are we going to commit to preserving?

Immediate Actions

FundingInventories of digital resourcesEstablish metadata standards and practicesIdentify and understand users

Take Home Concepts

Use standards and best practices

The solution is complex; the tools are incomplete

Organizational and technological challenges

Learn about what others are doing and build on itDon’t reinvent the wheel

Take Home Concepts

Resources are the issuePeople, not computers!

Expect and plan for changeThis is all a work in progress“First generation” technologies, tools, understanding of issuesYou will redo work

Existing Tools

Tools

Technical toolsInterfaces, infrastructure and technologies that allow you to do the work necessary to create, manage and preserve digital resources

Examples might include:Metadata creationFile format verificationAlgorithms for fixity checksAppraisal/processing toolsAccess tools – indexing, finding aids, etc.Acquisition tools

Tools

Few currently existOptions

WaitBuild your ownModify existing toolsUse what there is

Tools

DSpaceFedoraTM LOCKSSGreenstoneOCLC Digital Archive

DSpace

A specialized content management system that:

manages and distributes digital itemsallows for creation, indexing and searching of metadatasupports long term preservation of materialdesigned to make submission and administration easy

DSpace

Developed by MIT and Hewlett PackardBased on freely available software

can use proprietary software as well with minor modifications

Customizable Academic community is especially active in the use of this implementationUNIX based; written in Java

DSpace

No support availablePreservation is done locally and is not inherent in the systemDownloads and specific information at http://www.dspace.org

Dspace Demo - MIT Presshttps://hpds1.mit.edu/handle/1721.1/1776

FedoraTM

Flexible Extensible Digital Object and Repository Architecture“An Open-Source Digital Repository Management System” – the architectural underpinning or plumbingUsed to support institutional repositories, digital libraries, content management, digital asset management, scholarly publishing, and digital preservation

FedoraTM

Cornell and University of Virginia, funded by MellonFreely available Based on open source software and web based technologiesLimited interfaces

ManagementAccessAccess Lite

Persistent ID (P ID)

Method DefinitionMetadata

System Metadata

Datastreams (specs)

Persistent ID (P ID)

Method ImplementationMetadata

System Metadata

Datastreams(executables)

Behav ior DefinitionObject

Behav ior MechanismObject

Persistent ID (PID)

Disseminators

System Metadata

Datastreams

Data Object

FedoraTM Architectural Model

FedoraTM

Installs on Windows PCPackaged to get up and running quicklyDemo set of objectsScales with hardware in a production environmentNo support availablePlumbing only; no inherent preservationDownloads and information available at http://www.fedora.info

LOCKSS

Lots of Copies Keeps Stuff SafeTo safeguard web journals libraries subscribe toMimics the way libraries manage paper collections

Redundant, distributed, decentralized

LOCKSS

Works only for HTTP/HTML standard file types (html, jpeg, gif, pdf, etc)Open source code

It can be modifiedDesigned to be low cost, low time

Will run on a dedicated PCPC specs available on the LOCKSS site

LOCKSS

Publishers can prevent LOCKSS from caching their contentPublishers must give libraries permission

Licensing language available on the LOCKSS web site

Freely availableNo support (ease of use is highlighted)Preservation is not inherenthttp://lockss.stanford.edu/

Greenstone

A suite of software for building and distributing digital library collectionsProduced by the New Zealand Digital Library Project at the University of WaikatoDeveloped and distributed in cooperation with UNESCO and the Human Info NGO. Open-source, multilingual software, issued under the terms of the GNU General Public License.

Greenstone

“Should in fact work on any Windows or Unix system.”“Local library”“Web library”Greenstone Librarian InterfaceThe “Organizer”

Greenstone

Documentation is availableInstaller's GuideDeveloper's Guide Paper to Collection Inside Greenstone Collections MG/MG++

Workshops are also heldListservs for implementorsSome technical support availableNot preservation orientedhttp://www.greenstone.org/cgi-bin/library

OCLC Digital Archive

Standards basedOAIS compliantMETS encoded dissemination packages

Phased support for various formats and material type

Currently text and still imageCan integrate with current library selection and cataloging activitiesContent owner manages the archived objects and determines accessKnown costsOffers bit preservation

OCLC Digital Archive Functions

Harvest from webpreview and review

Metadata creationIngest

From web or batchAccess management

public or restricted ViewingDisseminationReportsPeriodic Audits of Objects in the ArchiveFrequent Backups and Disaster Prevention

Digital Archive Web Services

End User Access

OCLC Digital Archive Development

Preservation policy and plans in progress

Expanding formats and object types accepted

Active in development of preservation metadata standard and will comply

Active in developing digital repository certification

Additional information available at:http://www.oclc.org/support/training/digitalarchive/http://www.oclc.org/support/documentation/digitalarchive/

Other Tools

Australian PANDAS-PANDORACONTENTdm (content management)SDSC Data Grid TechnologyWeb harvesting toolsE-records management softwareDocument management systemsData warehousing technologyXML parsing tools

SDSC and others