BREEDING Web-Scale Discovery Systems Overview
-
Upload
camilo-andres-rodriguez-carlos -
Category
Documents
-
view
17 -
download
0
Transcript of BREEDING Web-Scale Discovery Systems Overview
2009 Annual ASERL Membership Meeting
Marshall BreedingDirector for Innovative Technology and ResearchVanderbilt University Libraryhttp://www.librarytechnology.org/
Highly abstracted model of computing Displaces the need for local hardware and
software Provisioned on demand Metered use of storage and computing cycles Platform-as-a-service Storage-as-a-service
Emerging model for library discovery and automation
Increasingly dubbed “Web-scale”
Lots of non-library Web destinations deliver content to library patrons ◦ Google Scholar◦ Amazon.com◦ Wikipedia◦ Ask.com
Do Library Web sites and catalogs meet the information needs of our users?
Do they attract their interest?
Print > Electronic Increasing emphasis on subscribed content,
especially articles and databases Strong emphasis on digitizing local
collections New generations of library users:
◦ Millennial generation ◦ Web savvy◦ Pervasive Web 2.0 concepts
Silos Prevail◦Books: Library OPAC (ILS module)◦Articles: Aggregated content products, e-
journal collections◦OpenURL linking services◦E-journal finding aids (Often managed by link
resolver)◦Local digital collections
ETDs, photos, rich media collections◦Metasearch engines
All searched separately
More comprehensive information discovery environments
Primary search tool that extends beyond print resources
Digital resources cannot be an afterthought Systems designed for e-content only are also
problematic Forcing users to use different interfaces
depending on type of content becoming less tenable
Libraries working toward consolidated user environments that give equal footing to digital and print resources
Bound handwritten catalogs Card Catalogs Library online catalogs – OPACs Discovery interfaces Web-scale discovery services
A single point of entry to all the content and services offered by the library
Search:
Single search box Query tools
◦ Did you mean◦ Type-ahead
Relevance ranked results Faceted navigation Enhanced visual displays
◦ Cover art◦ Summaries, reviews,
Recommendation services
Online Catalog◦ Interface
conventions from an earlier Web era
◦ Scope: Tied to the ILS and its content domain
Discovery Layer◦ Modern interface
elements◦ Scope: aims to
address broad range of components that constitute library collections
AquaBrowser Ex Libris Primo Innovative Interfaces: Encore Serials Solutions: Summon (under
development) SirsiDynix Enterprise The Library Corporation: LS2 PAC VUFind (open source) BiblioCommons eXtensible Catalog (under development)
◦ Tags, user-supplied ratings and reviews◦ Leverage social networking interactions to assist
readers in identifying interesting materials: BiblioCommons
◦ Leverage use data for a recommendation service of scholarly content based on link resolver data: Ex Libris bX service
Initial products focused on technology◦ AquaBrowser, Endeca, Primo, Encore, VUfind◦ Mostly locally-installed software
Current phase focused on pre-populated indexes that aim to deliver Web-scale discovery◦ Summon (Serials Solutions)◦ WorldCat Local (OCLC)◦ EBSCO Discovery Service (EBSCO)◦ Primo Central
Federated Search / Metasearch use real-time queries against multiple information targets
No centralized index – presentation of dynamic results
Shallow results -- only a few results initially fetched from each target
Difficult to calculate relevancy Performance challenges
Search: Digital Collections
Digital Collections
ProQuestProQuest
EBSCOhost
EBSCOhost
…MLA
Bibliography
MLA Bibliograph
y
ABC-CLIOABC-CLIO
Search Results
Real-time query and responses
ILS DataILS Data
Search: Digital Collections
Digital Collections
ProQuestProQuest
EBSCOhost
EBSCOhost
…MLA
Bibliography
MLA Bibliograph
y
ABC-CLIOABC-CLIO
Search Results
Real-time query and responses
ILS DataILS Data
Local Index
Meta
Searc
h E
ng
ine
Search: Digital Collections
Digital Collections
ProQuestProQuest
EBSCOhost
EBSCOhost
…MLA
Bibliography
MLA Bibliograph
y
ABC-CLIOABC-CLIO
Search Results
Pre-built harvesting and indexing
Con
solid
ate
d In
dex
ILS DataILS Data
Pre-populated indexes Web-scale
◦ Exploits the full depth and breadth of library collections
◦ Beyond the bounds of the local library’s collection◦ Targets the universe of objective, vetted library
content Includes full-text indexing to the fullest
extent possible
Indexing the full corpus of information available globally Or at least major portions
Google aims to address all the world’s information Not quite comprehensive – partial harvesting of any given
resource Discovery Layer Products for libraries aim to
address all content collected by libraries: Print Remotely access electronic content: e-journals, e-books,
databases, licensed and open access. Local special collections: digital and print.
Addresses the comprehensive body of content held within library collections
Comprehensive, unified
New-generation interface Harvested local content
◦ ILS metadata◦ Institutional repositories, ETDs, Digital Collection
platforms Vendor-supplied indexes of library content
◦ E-journals, databases, e-books Full-text and metadata corresponding to e-content
subscriptions◦ Book collections beyond local library collections
Entering post-metadata search era Increasing opportunities to search the full
contents◦Google Library Print, Google Publisher, Open
Content Alliance, government publications, etc.
◦High-quality metadata will improve search precision
Commercial search providers already offer “search inside the book” and searching across the full text of large book collections
Not currently available through most library search environments◦ Will be an important feature of projects such as
HathiTrust Deep search highly improved by high-quality
metadata
Now viewed as separate problem Many interdependencies Current model of feeding discovery systems
from many underlying repositories◦ ILS / e-journal collections / collections of digital
objects Will models of resource management
change to consolidate the repositories? Realign Discovery and management?
Traditional Proprietary Commercial ILS◦ Millennium, Symphony, Polaris
Traditional Open Source ILS◦ Evergreen, Koha
Clean slate automation framework (SOA, enterprise-ready)◦ Ex Libris URM, OLE Project
Cloud-based automation system◦ WorldCat Local (+circ, acq, license management)
Beyond selecting one brand from an assortment of similar products
Several conceptually diverse options Companies and projects now competing on
innovation