Post on 27-Mar-2015
Brenda Johnson, Dean of University Libraries
Gary Charbonneau, Systems Librarian
Julie Bobay, Associate Dean for Collection Development and Scholarly Communication
Statewide IT Conference, Indiana University
Sept. 27, 2010
HathiTrust: A Big Idea with Bold Plans
HathiTrust - Outline
A Big Idea• Mission and Goals; Partners; Governance
Content and Use• Relationship to Google Books and Internet Archive• Size, characteristics of content• A few words about technology
Bold Plans
September 27, 2010Statewide IT Conference, Indiana University
Importance of A Name
September 27, 2010Statewide IT Conference, Indiana University
• Hathi (pronounced hah-tee)
Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength
• Trust
A core value of research libraries and one of their greatest assets. In combination, the words convey the key benefits researchers can expect from a first-of-its-kind shared digital repository
• There’s an elephant in the library.
What is HathiTrust?
• Started in 2008 as a partnership among research libraries, HathiTrust is an open web resource that aggregates, preserves and provides access to the collections of member libraries.
• Initial purpose was to provide trusted shared repository for books and journals digitized by and available through Google Books and Internet Archive
September 27, 2010Statewide IT Conference, Indiana University
Google Books/Internet Archive
• In 2004, Google began digitizing the books and journals from many major research libraries in U.S. – including, starting in 2008, IU’s
• Some libraries, including the University of California, had similar digitization projects with the Internet Archive
• Books and journals digitized from these projects were deposited in HathiTrust
September 27, 2010Statewide IT Conference, Indiana University
Current HathiTrust Partners: 29 and Counting
Columbia University
Dartmouth University
University of California system (11 libraries)
CIC (Committee on Institutional Cooperation) (12 libraries)University of Chicago University of Minnesota
University of Illinois Northwestern University
Indiana University Ohio State University
University of Iowa Pennsylvania State University
University of Michigan Purdue University
Michigan State University University of Wisconsin, Madison
New York Public Library
Princeton University
University of Virginia
Yale University
September 27, 2010Statewide IT Conference, Indiana University
If Google and Internet Archive have these books, why do we need
HathiTrust?
HathiTrust’s mission is much broader than simply to replicate Google Books:
Contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge.
September 27, 2010Statewide IT Conference, Indiana University
Why do we need HathiTrust? (1)
Preservation…For The Long Term• Better entrusted to research libraries than to a private
corporation, even a benevolent one
• Not just preserving bits
• Full preservation program, including active curation, metadata, migration, management plans, etc.
• Seeking TRAC Certification (Trustworthy Repository Audit and Certification)
September 27, 2010Statewide IT Conference, Indiana University
Why do we need HathiTrust? (2)
Expanded access and discoverability
• Full-text access to pre-1923 books and journals, plus those which have had rights cleared
• Beyond full-text keyword search: enhanced discoverability options
September 27, 2010Statewide IT Conference, Indiana University
Why do we need HathiTrust? (3)
Focus on scholarly values and needs
• Develop content, access and functionality that meets needs of researchers
• Share expertise and cost of preserving and providing access to scholarly record among institutions who share this fundamental mission
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust: Getting Started
• Initial development responsibility: University of Michigan, with mirror site at IUPUI, administered by UITS Enterprise Infrastructure
• Much future development will be distributed among partner institutions under direction of HathiTrust Executive Committee
September 27, 2010Statewide IT Conference, Indiana University
A Unique Partnership• HathiTrust is library work at scale; an early example of an
“above-campus” service
• A new experiment in collaboration
Not a separate entity; not a 501(c)(3) like Sakai, Kuali, DuraSpace or many open source software projects
Instead, a jointly-funded, jointly governed, jointly developed partnership.
• Together, we are HathiTrust.
September 27, 2010Statewide IT Conference, Indiana University
Sustainability:HathiTrust Governance 2008-2012
• Executive Committee
Budget, finances, decision making
• Strategic Advisory Board
Guidance on policy and planning
• HathiTrust staff
• Working groups and committees
September 27, 2010Statewide IT Conference, Indiana University
Current Working Groups
• Discovery Interface • Collections• Quality• Communication• Usability• Storage• Development Environment• Research Center
September 27, 2010Statewide IT Conference, Indiana University
Financial contributions of partners
HathiTrust Functional Framework
Next steps in governance
• 5-year agreements, reviewed in the third year of every term
• First Constitutional Convention will be in 2012
• Partners will determine governance structures and partnership models, effective 2013
September 27, 2010Statewide IT Conference, Indiana University
Focus On Users
• Preservation…with access• Benefits to IU researchers and their colleagues
around the world:– Ensure long-term preservation and access– Increase discoverability – Create scholarly tools– Expand content beyond Google and Internet
Archive
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust – constantly changing
• Rapid growth and development; fluid environment
• Next few slides describe HathiTrust currently
• Will follow with discussion about future plans
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust - Content
• The vast majority of what is currently in HathiTrust consists of files received from Google from volumes digitized by Google for Google Book Search
• Almost all of the remainder consists of files received from Internet Archive. Much of the content from University of California comes by way of Internet Archive
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust Content (2)
• Since not all of Google’s “library partners” are members of HathiTrust, and none of Google’s publisher partners are, HathiTrust is still (mostly) a subset of what is in Google Book Search. However….
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust Content (3)
• Because of HathiTrust’s copyright clearance project, there are some things available in full text in HathiTrust that are only available in “snippet view” in Google.
• Because of Internet Archive, there are probably some things in HathiTrust that are not available in Google at all.
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust - focus on collections
• HathiTrust is about collections, not simply Google digitization
• For example:• access for persons with print disabilities• opening access for public domain volumes• collection building tool• high-quality bibliographic data necessary
for scholarly work
September 27, 2010Statewide IT Conference, Indiana University
Content Growth
September 27, 2010Statewide IT Conference, Indiana University
Content Distribution
September 27, 2010Statewide IT Conference, Indiana University
Language Distribution (1)
September 27, 2010Statewide IT Conference, Indiana University
Language Distribution (2)
September 27, 2010Statewide IT Conference, Indiana University
Dates
September 27, 2010Statewide IT Conference, Indiana University
Originating Institution
September 27, 2010Statewide IT Conference, Indiana University
Content Over Time
September 27, 2010Statewide IT Conference, Indiana University
September 27, 2010Statewide IT Conference, Indiana University
September 27, 2010Statewide IT Conference, Indiana University
September 27, 2010Statewide IT Conference, Indiana University
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust DataGrid
• Using Isilon Clustered Storage System• Similar principles to a datagrid using WAFS
(OneFS)– Wide Area File System (2.3 PB per file system)– Automated data replication among nodes– Currently Two Nodes
• Ann Arbor - University of Michigan• Indianapolis – Indiana University NOC
• Connected via I-Light and Michigan Lambda Rail
September 27, 2010Statewide IT Conference, Indiana University
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust Grid
Indianapolis Ann Arbor
Isilon OneFS Currently Supports
up to 2.3 PB between Two Nodes
More on HathiTrust Technology
http://www.hathitrust.org/technology
September 27, 2010Statewide IT Conference, Indiana University
A Use Case
• IUB scholar needed quick access to a definitive 52-volume set of Voltaire’s work published in late 1800s; deadline approaching
• Had been transferred to the Auxiliary Library Facility
• Available in HathiTrust and Google Books• Google Books not usable for this scholarly
purpose• Able to do work much more efficiently and
quickly in HathiTrust
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust’s Bold Plans
• We believe the HathiTrust of tomorrow will look very different from the HathiTrust of today
• Google and Internet Archive digitized volumes just the beginning
• The sky’s the limit (or, more accurately, the combined will and resources of the partnership are the limit)
September 27, 2010Statewide IT Conference, Indiana University
Vision for the future: More Content
• Current and backlist scholarly monographs• Born-digital materials• Some locally-digitized collections• Some non-book/non-journal resources
…anything that is appropriate for a research library collection AND IS A SHARED PRIORITY FOR PARTNERS
September 27, 2010Statewide IT Conference, Indiana University
Vision for the future: More Content (2)
• More full-text:
Google Book Settlement - if approved:– could receive all Google-digitized files
to preserve– could make much more full-text
available• Rights-clearing project - open access to public
domain materials
September 27, 2010Statewide IT Conference, Indiana University
Vision for the Future: More Functionality
• Research tools– Computational research – Advanced collection builders– Advanced discovery
• Expanded quality processes • Rigorous preservation guarantees• Defining paths for fair uses• Tools for shared print collection management
September 27, 2010Statewide IT Conference, Indiana University
Vision for the Future: Enhanced Discoverability
• Not just keyword searching of full-text• Highly-functional bibliographic access
- HathiTrust catalog - Integration into other discovery tools:
- IUCAT, WorldCat, Discovery Services
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust and local digital library initiatives
• HathiTrust is a solution for large-scale, shared high-priority needs of partners; currently optimized for digitized monographs and journals
• Partners will identify priorities for content and functionality development
• HathiTrust will not supplant all institutionally-based digital library initiatives
• Local digital library collections and services will still be needed
September 27, 2010Statewide IT Conference, Indiana University
How Can HathiTrust Make a Difference?
• Future not yet known precisely, but…• For the first time in history, HathiTrust has:
- defined a large-scale partnership to achieve a large-scale goal
- built the first version of a very large, high-quality shared repository
• Building blocks to ensuring that research collections, print and digital:• are preserved, curated, highly discoverable and
accessible• retain their research value in a digital platform
September 27, 2010Statewide IT Conference, Indiana University
Some lessons learned so far
• HathiTrust can serve as shared repository for mass digitized library collections
• HathiTrust can provide organizational structure for other collaborations– Shared print collection management– Bibliographic integration
• The research library community is able to collaborate deeply to attain shared goals
September 27, 2010Statewide IT Conference, Indiana University
HathiTrust Mission - redux
Contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge.
September 27, 2010Statewide IT Conference, Indiana University
CreditsOur thanks to colleagues who generously granted us
permission to use their slides for this presentation:
John Wilkin, HathiTrust Executive Director
Jeremy York, HathiTrust Project Librarian
Heather Christenson, Mass Digitization Project Manager, California Digital Library
Also, many of the ideas for this presentation based on:Courant, Paul N. and John Wilkin. “Building ‘Above Campus’ Library
Services.” Educause Review, July/August 2010, 74-75.
September 27, 2010Statewide IT Conference, Indiana University