Altman RDAP11 Policy-based Data Management

18
Policy Based Digital Preservation: SafeArchive & The Dataverse Network ® Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for the Research Data Access and Preservation Summit ASIS&T March 2011

description

Micah Altman, Harvard; Policy-based Data Management The 2nd Research Data Access and Preservation (RDAP) Summit An ASIS&T Summit March 31-April 1, 2011 Denver, CO In cooperation with the Coalition for Networked Information http://asist.org/Conferences/RDAP11/index.html

Transcript of Altman RDAP11 Policy-based Data Management

Page 1: Altman RDAP11 Policy-based Data Management

Policy Based Digital Preservation:SafeArchive & The Dataverse Network

®Micah Altman, Institute for Quantitative Social Science, Harvard University

Prepared for the Research Data Access and Preservation SummitASIS&T

March 2011

Page 2: Altman RDAP11 Policy-based Data Management

Collaborators*

Policy Based Digital Preservation2

Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Kevin Condon, Jonathan Crabtree, Merce Crosas, Gary King, Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Lois Timms-Ferrarra, Akio Sone, Bob Treacy

Research SupportThanks to the Library of Congress (PA#NDP03-1), the

National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-09-0041-09), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive.

* And co-conspirators

Page 3: Altman RDAP11 Policy-based Data Management

Related Work

Policy Based Digital Preservation3

Reprints available from: http://maltman.hmdc.harvard.edu

Altman, M., and J. Crabtree, 2011. “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”, Proceedings of Archiving 2011. (Forthcoming)

Altman, M., Beecher, B., and Crabtree, J.; with L. Andreev, E. Bachman, A. Buchbinder, S. Burling, P. King, M. Maynard. 2009. "A Prototype Platform for Policy-Based Archival Replication." Against the Grain. 21(2): 44-47.

Altman, M., Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist. 72(1): 169-182

Crosas, M. 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2).

King, Gary (2007), " An Introduction to the Dataverse Network as an Infrastructure for Data Sharing", Sociological Methods and Research, Vol. 32, No. 2, pp. 173-199

Gutmann,M. Abrahamson, M, Adams, M.O., Altman, M, Arms, C., Bollen, K., Carlson, M., Crabtree, J., Donakowski, D., King, G., Lyle, J., Maynard, M., Pienta, A., Rockwell, R, Timms-Ferrara L., Young, C., 2009. "From Preserving the Past to Preserving the Future: The Data-PASS Project and the challenges of preserving digital social science data", Library Trends 57(3):315-33

Page 4: Altman RDAP11 Policy-based Data Management

SafeArchive: TRAC-Based Management of LOCKSS Facilitating collaborative replication and

preservation with technology… Collaborators declare explicit non-uniform resource

commitments Policy records commitments, storage network

properties Storage layer provides replication, integrity,

freshness, versioning SafeArchive software provides monitoring, auditing,

and provisioning Content is harvested through HTTP (LOCKSS) or OAI-

PMH Integration of LOCKSS, The Dataverse Network, TRAC

Policy Based Digital Preservation4

Page 5: Altman RDAP11 Policy-based Data Management

Adding Policy to LOCKSS

LOCKSS Lots of Copies Keep Stuff Safe Widely used in library community Self-contained OSS replication system, low maintenance,

inexpensive Harvests resources via web-crawling, OAI-PMH, database

queries,… Maintains copies through secure p2p protocol Zero trust & self repairing

What does SafeArchive Add Auditing – easily monitor number of copies of content in

network Provisioning – ensure sufficient copies and distribution Collaboration – coordinate across partners, monitor resource

commitments Provide restoration guarantees Integrate with Dataverse Network digital repository

Policy Based Digital Preservation5

Page 6: Altman RDAP11 Policy-based Data Management

Why this tool?

To facilitate institutions in making commitments aligned with their policies and incentives, and

Automatically execute and monitor those commitments and policies

(Self-interest… Support Data-PASS partnership agreements and transfer protocols)

This tool provides a targeted vertical slice of functionality through the policy stack…

Policy Based Digital Preservation6

Page 7: Altman RDAP11 Policy-based Data Management

Another Why…

Policy Based Digital Preservation7

R.I.P.

Page 8: Altman RDAP11 Policy-based Data Management

SafeArchive Components

Policy Based Digital Preservation8

Current

Planned

Page 9: Altman RDAP11 Policy-based Data Management

SafeArchive Auditing & Reports

Policy Based Digital Preservation9

Exam

ple

Fra

gm

en

ts

Page 10: Altman RDAP11 Policy-based Data Management

SafeArchive: TRAC Alignment

SafeArchive audits provide evidence for compliance with policies on: archival storage & preservation (B4) independent audit mechanisms (B2) appropriate system infrastructure

(C1) and disaster planning and recover

(C3) SafeArchive supports embedded

policy documentation: Organizational infrastructure (A1-4) Collection policies (B2.5,2.7,5.2) System configuration (C1.7-1.10)

Policy Based Digital Preservation10

Page 11: Altman RDAP11 Policy-based Data Management

SafeArchive: Schematizing Policy and Behavior

Policy Based Digital Preservation11

“The repository system must be able to identify the number of copies of all stored digital objects, and the location of each object and their copies.”

Policy

Schematization

Behavior(Operationalization)

Page 12: Altman RDAP11 Policy-based Data Management

The Dataverse Network ®

Policy Based Digital Preservation12

For Organizations For Scholars

•Brand it like your own website.•Upload any type of data.•Establish a persistent data citation•Facilitate data discovery•Provide live analysis •Receive permanent storage space

•Used by archives, libraries, journals, schools•Enable contributors to upload data•Organize studies by collections•Search across a universe of data•Control access and terms of use•Federate with catalogs and partners: OAI-PMH, LOCKSS, Z39.50, DDI

Page 13: Altman RDAP11 Policy-based Data Management

Dataverse Network – Designed for Research Data

Policy Based Digital Preservation13

Page 14: Altman RDAP11 Policy-based Data Management

Policy Support in the DataVerse Network

Access Control Roles: access, curation, administration Authenticate by: user, group, network, proxy

Workflow Policies Built-in Versioning and Deaccessioning Curatorial Review

Review of changes prior to release of new version Review of new virtual archives

Legal Policies Terms of use: accounts, uploads, downloads Hierarchical terms: network, archive, study Access request workflow

Policy Based Digital Preservation14

Page 15: Altman RDAP11 Policy-based Data Management

Archival Collaboration through shared infrastructure:Data-PASS

Data-PASS is a broad-based partnership of social science data archives.

Data-PASS partners collaborate to: identify and promote good archival

practices seek out at-risk research data mutually safeguard collections build preservation infrastructure

Data-PASS uses DataVerse: Creates federated catalog Manages content for some partners Provides simple way for

organizations to participate in partnership

Data-PASS uses SafeArchive: Collaboration through mutual

replication of partner content Supports legal transfer agreements

Policy Based Digital Preservation15

Page 16: Altman RDAP11 Policy-based Data Management

Where Do Policies Fit in Organizational Decisions?

Policy Based Digital Preservation16

NSDA

LOCKSS

META-ARCHIVE

DATA-PASS

SAFE

DVN

IRODS

Page 17: Altman RDAP11 Policy-based Data Management

Ideal integration of policy and technology?

Expressed in domain/business language Translated to a formal schematization Automatically measured by technology Directly controls procedures & actions to achieve compliance Verifiable translation from business domain policy

Where do we go from here Combine flexibility of IRODS and semantic level of TRAC Self-documenting infrastructure Formal verifiable translation of policy to schema, and schema to

action Make good policy easy to implement!

Policy Based Digital Preservation17

Policy: A set of rules and objectives expressed at a high level domain that controls actions at a lower level

Page 18: Altman RDAP11 Policy-based Data Management

Contact Us

Policy Based Digital Preservation18

Micah Altman

maltman.hmdc.harvard.edu

SafeArchive

safearchive.org

The Dataverse Network ™

thedata.org