Harvesting HathiTrust Documents: A New Model for Online Access

17
Christopher C. Brown University of Denver, Penrose Library (303) 871-3404 [email protected] 2011 Missouri Government Documents Conference Harvesting HathiTrust Documents: A New Model for Online Access

description

Brown, Christopher C. “Harvesting HathiTrust Documents: A New Model for Online Access.” Presentation given at the 2011 Missouri Government Documents Conference, 7 June 2011, Columbia, MO.

Transcript of Harvesting HathiTrust Documents: A New Model for Online Access

Page 1: Harvesting HathiTrust Documents: A New Model for Online  Access

Christopher C. BrownUniversity of Denver, Penrose Library

(303) [email protected]

2011 Missouri Government Documents Conference

Harvesting HathiTrust Documents: A New Model for Online Access

Page 2: Harvesting HathiTrust Documents: A New Model for Online  Access

This presentation will show how Encore harvesting can be used to mitigate a space problem in a library, substituting online access for the need for physical access to the collection. The government documents collection will be the primary focus.Note: Encore is the next-generation catalog interface produced by Innovative Interfaces, Inc.

DR, IR, Digital Texts

Inbound HarvestingOutbound Harvesting

Page 3: Harvesting HathiTrust Documents: A New Model for Online  Access

Collection Downsizing?

Malpas, Constance. 2011. Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment. Dublin, Ohio: OCLC Research. http://www.oclc.org/research/publications/library/2011/2011-01.pdf.

Page 4: Harvesting HathiTrust Documents: A New Model for Online  Access

About University of DenverDepository since 1909Historically a 70-75% selectiveNow a 4.8% selective, but receive

100% of online catalogingAdding URLs to historic

documentsCurrently 100% of our paper

documents are in storageWe are remodeling our library.

Under the remodeling plan, all docs will remain in remote storage.

Page 5: Harvesting HathiTrust Documents: A New Model for Online  Access

Partial Solution: Using Encore for Outbound Harvesting

All documents off-siteOur users are accustomed to using

electronic documentsNeed to divert attention away from physical

collection holdingsEncore harvesting of Hathi Trust can do this

Page 6: Harvesting HathiTrust Documents: A New Model for Online  Access

PD = where docs generally live

Hathi Trust AttributesFrom: http://www.hathitrust.org/rights_database

Page 7: Harvesting HathiTrust Documents: A New Model for Online  Access

Sampling MethodI wanted to see how many government

documents were in our Hathi Trust harvestLimit to Hathi Trust for a given yearExamine first result on each page of 25

results (4% of results) [limitation: Encore only displays first 1,000 results]

Page 8: Harvesting HathiTrust Documents: A New Model for Online  Access

Harvesting Hathi Docs: The Stats

Date Range Hathi Totals

Hathi All Pub Domain

pdus + pd Hathi pdus DU pd Harvest Docs Sampling2000-2009 505,682 14,140 726 13,369 13,340 99.78%1990-1999 709,214 29,163 880 28,164 26,662 94.67%1980-1989 723,657 33,753 1,204 32,321 31,370 97.06%1970-1979 631,110 28,633 2,046 26,189 25,607 97.78%1960-1969 546,914 21,244 1,987 18,991 7,668 40.38%1950-1959 281,615 20,861 863 19,893 3,888 19.54%1940-1949 184,755 17,096 600 16,253 3,771 23.21%1930-1939 175,103 16,237 654 15,317 2,600 16.97%1920-1929 175,226 66,563 27,108 28,854 1,529 5.30%1910-1919 175,148 169,923 75,955 61,230 4,124 6.73%1900-1909 179,018 153,284 70,900 47,999 2,265 4.72%1890-1899 112,295 110,605 50,502 34,742 596 1.72%1880-1889 83,950 82,809 38,928 23,855 699 2.93%1870-1879 58,624 57,826 27,202 17,751 319 1.80%1860-1869 50,907 50,337 2,273 45,790 248 0.54%

4,593,218 872,474 301,828 430,718 124,686 28.95%

Statistics as of mid-March, 2011The Docs Sampling columns show the estimated numbers of docs per year and the estimated percentage of docs per year from the Harvest

Page 9: Harvesting HathiTrust Documents: A New Model for Online  Access

Malpas: Docs about 3% of Hathi Total and 15% of Public Domain

GovDocs: 3% overall

GovDocs: 15% of Public Domain

Page 10: Harvesting HathiTrust Documents: A New Model for Online  Access

Hathi Docs Usage in Proportion to Docs Distribution

Sources: 1895-1976 data: Monthly Catalog, 1895-1976 (ProQuest);1976 onward data: CGP

Page 11: Harvesting HathiTrust Documents: A New Model for Online  Access

% Docs in HathiTrust (est.)

Page 12: Harvesting HathiTrust Documents: A New Model for Online  Access

Hathi Docs Links Provide Access to Docs in Storage

Page 13: Harvesting HathiTrust Documents: A New Model for Online  Access

Stripped-Out Fields

008 fixed field data

650 subfields other than “a”

500 notes5xx shipping list info

300 subfields after “a”

086 SuDocs number

Page 14: Harvesting HathiTrust Documents: A New Model for Online  Access

Use Stats for Hathi Trust?

•Statistics for all Hathi Trust records accessed, not just documents•Spikes in usage are docs librarian (my) testing, not real users

Statistics from Google Analytics

Page 15: Harvesting HathiTrust Documents: A New Model for Online  Access

Harvesting with Summon

Page 16: Harvesting HathiTrust Documents: A New Model for Online  Access

Summon Harvesting of HathiTrust

Page 17: Harvesting HathiTrust Documents: A New Model for Online  Access

ConclusionsDocuments content in HathiTrust can

provide a suitable surrogate for a limited subset of documents, but not a wholesale replacement.

HathiTrust documents can be used as surrogates for selected titles, especially larger serial runs. But it is difficult at this time to isolate those titles.

HathiTrust is definitely worth harvesting into local catalogs or other digital repositories.