The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives...

18
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation Partnerships

Transcript of The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives...

Page 1: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

The Library of Congress

Martha Anderson

Program Officer, NDIIPPOffice of Strategic InitiativesLibrary of Congress April 2005

LC Perspective : Preservation Partnerships

Page 2: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

2 The Library of Congress

Born Digital “At-Risk” Web Born Digital “At-Risk” Web SitesSites

http://www.loc.gov/minerva/collect/elec2000

http://www.loc.gov/minerva/collect/sept11

Page 3: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

3 The Library of Congress

Take Actions that are• Catalytic

– Invest in existing strengths

• Collaborative– Engage partners in areas of mutual interest and

expertise

• Iterative– Learn by doing

• Strategic– Broad spectrum of balanced short-term &

investments

NDIIPP Strategic Direction

Page 4: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

4 The Library of Congress

Web of projects

UIUC

NARAGPO

LC Web Projects

IIPC

NDIIPCDL

IA

AIHT

Preservation Partners

StatesInitiative

Page 5: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

5 The Library of Congress

Library of Congress Web Archiving

• Collaborate with partners working on the same preservation issues

• Develop collection strategies to leverage available resources

• Learn by doing

Strategy

Page 6: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

6 The Library of Congress

Collaborate with partners working on the same preservation issues

• Membership in the International Internet Preservation Consortium (IIPC)

• Cooperative projects with NDIIPP Preservation Partners– California Digital Library– University of Illinois at Champaign-Urbana

• Technical information sharing with other US government agencies– Government Printing Office– National Archives and Records Administration

Page 7: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

7 The Library of Congress

• Collect thematically both by crawling and by acquiring collections gathered by others

Develop collection strategies to leverage available resources

Learn by doing• Case studies and regular collection of theme-

based collections• Participate in tools development with IIPC• Archive Ingest & Handling Project

Page 8: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

8 The Library of Congress

Challenges of collecting from the Web • Characteristics of the resource--dynamic,

deep, linked• Intellectual property laws and regulations• Tension of preservation vs access goals• Degree of alignment with current collection

policies for other media• Curation strategy• Tools for identification and selection• Tools for collection, curation, and archiving of

large web collections

Page 9: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

9 The Library of Congress

Average Web Collection

• Begins with a theme or event• Usually does not include commercial

sites• Starts with a list of about 200 urls• Is crawled by vendor • Yields about 1 TB of data per month • Has a frequency of once a week

Page 10: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

10 The Library of Congress

Web Collections to date at LC

• Event-based– US National Elections—2000, 2002, 2004– War in Iraq– September 11

• Public Policy Topics– Health Care– Legislative Branch– Terrorism

• 26 TB

Page 11: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

11 The Library of Congress

Archive Ingest & Handling Test

• AIHT is a first test of proposed NDIIP preservation architecture.

• The test is conducted with a common data set.– George Mason University 9/11 Archive

• Phase I tests ingest and data handling in local systems.

• Phase II tests export and import between institutions.

• Phase III explores format migration.

Page 12: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

12 The Library of Congress

GMU 9/11Archive Participants demonstrate capabilities

Participants exchangearchive

Page 13: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

13 The Library of Congress

Participants

• Old Dominion University, Department of Computer Science

• Stanford University Libraries &

Academic Information Resources

• The Johns Hopkins University, Sheridan Libraries

• Harvard University Library

Page 14: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

14 The Library of Congress

AUDIO4%

VIDEO0.2%

PDF3%

OTHER2%

IMAGES27%

HTML29%

TEXT35%

`

George Mason University 9/11 Archive: Breakdown

by File Types

57,450+ files12GBOriginally stored in a Linux environment

Page 15: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

15 The Library of Congress

Goals of AIHT

• Gain practical experience with multiple institutions

• Document transfer and ingest processes for multiple systems

• Determine next set of tasks for developing interfaces between layers and institutions

Page 16: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

16 The Library of Congress

Status of AIHT

• All phases completed.– Imports focused on technical assessment of

archive and developing tools to examine the archive

– Exports included METS and MPG21 DID objects– Migrations included transforms to JPG2000,

TIFF, and some exploration of html to xml and avi to mpg

• Full report expected by early summer.

Page 17: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

17 The Library of Congress

For more information….

• NDIIPP Technical Architecture version 0.2 http://www.digitalpreservation.gov

• International Internet Preservation Consortium http://netpreserve.org/about/index.php

• MINERVA: Mapping the INternet Electronic Resources Virtual Archive http://www.loc.gov/minerva/

Page 18: The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

18 The Library of Congress

Martha AndersonNDIIP Program OfficerOffice of Strategic InitiativesThe Library of CongressWashington, DC

[email protected]