A Platform for Auditable, Distributed, Asymmetric Archival Replication

Post on 12-Jan-2016

30 views 0 download

Tags:

description

A Platform for Auditable, Distributed, Asymmetric Archival Replication. Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University - PowerPoint PPT Presentation

Transcript of A Platform for Auditable, Distributed, Asymmetric Archival Replication

Micah AltmanAssociate Director, Harvard-MIT Data Center

Institute for Quantitative Social Science, Harvard University

Bryan BeecherDirector of Computing and Network Services

Inter-university Consortium of Political and Social Research, University of Michigan

Marc MaynardDirector of Technical Services

The Roper Center for Public Opinion Research, University of Connecticut

Jonathan CrabtreeAssistant Director for Archives and Information Technology

HW Odum Institute for Research in Social Science, University of North Carolina

CNI 2008 Fall Task Force Meeting 1

Our StoryWho are you guys?What problem are you trying to solve?What have you done?Why do we care?

CNI 2008 Fall Task Force Meeting 2

Data-PASS• Partnership devoted to identifying, acquiring and preserving data at-risk of being lost to the social science research community

• Partners– ICPSR– Odum Institute– Harvard MIT Data Center

– Roper Center– National Archives

CNI 2008 Fall Task Force Meeting 3

http://flickr.com/photos/phauly/35555985/

Data-PASS

CNI 2008 Fall Task Force Meeting 4

Data-PASSLots of little files (social science data)ASCII data filesPDF technical documentation (codebooks)Millions of ‘em

Archival storageWas tapeNow disk

CNI 2008 Fall Task Force Meeting 5

Before

CNI 2008 Fall Task Force Meeting 6

After

CNI 2008 Fall Task Force Meeting 7

Archival storage?

CNI 2008 Fall Task Force Meeting 8

http://failblog.org/2008/02/08/floppy-fail/

Archival storage?Remote disksGridsCloudsWith partners?

CNI 2008 Fall Task Force Meeting 9

Why roll your own?Policy-drivenAuditableAsymmetricIndependence of each location

CNI 2008 Fall Task Force Meeting 10

Syndicated Storage Platform (SSP)Start with LOCKSSLots of Copies Keep Stuff SafeBut used in a closed network

Private LOCKSS Network (PLN)A few of them out there

MetaArchive perhaps the best known

Biggest selling point was independence of each node in the PLN

CNI 2008 Fall Task Force Meeting 11

PLNsLOCKSS is really easy to setup

PLNs are more difficultOther differences between traditional PLN and our needsOur content isn’t harvestable via HTTPOur PLN nodes are different sizesOur trust model requirement prevents a centralized authority controlling the network

CNI 2008 Fall Task Force Meeting 12

SSP = Stone Soup Platform?ICPSR and Odum setup a small PLNHDMC provided a harvester and designed the schema

Odum built the ComparatorRoper is building the Invitor

CNI 2008 Fall Task Force Meeting 13

PLN

CNI 2008 Fall Task Force Meeting 14

Schema• Nodes

– IP address– Storage commitment

• AUs– Max size– # in the PLN

• Lots more

CNI 2008 Fall Task Force Meeting 15

Comparator• diff for our SSP• Compares

– Contents of the LOCKSS Cache Manager [sic] – Schema

• Produces– List of differences between “what is” and “what should be”

– Feeds into another tool for “fixing the PLN”

• Machine-actionable output (XML)

CNI 2008 Fall Task Force Meeting 16

Invitor• Reads the report from the Comparator• Issues requests to PLN nodes to ADD or DROP an AU– Expectation is that PLN nodes always accept an ADD if they can• An offer they cannot refuse

• Requests may be reviewed/approved by a human administrator (or not)

• USENET news technology?

CNI 2008 Fall Task Force Meeting 17

SummaryData-PASS is a group of archives committed to preserving social science data

Exploring various technology optionsOne avenue is a custom LOCKSS deploymentNetwork schemaOAI data harvesterComparison toolNetwork update tool

CNI 2008 Fall Task Force Meeting 18