Discovery platforms: Technology, tools and issues
-
Upload
saiful76 -
Category
Technology
-
view
1.912 -
download
0
description
Transcript of Discovery platforms: Technology, tools and issues
T E C H N O L O G I E S , T O O L S A N D I S S U E S
Discovery Platforms
Saiful Amin 39th Five Laws Lecture (2011)
Evolution of Discovery Tools
Printed catalogues
Traditional (Web)OPAC
Integrated OPAC portals
Federated search services
Discovery interfaces
Web-scale discovery services
Integrated discovery platform
Printed catalogues
Author browse
Title browse
Series browse
Call Number browse
Subject browse
Shelf list (inventory)
Traditional (Web)OPAC
ILS Database (Bibs)
(Web)Server Application
Pros Cons
Keyword search! Author, title, subject
ISBN/LCCN search
Boolean queries
Proximity search
Browse index Authority headings
Title, Call Number
Real-time item status! Copies & availability info
Link to URL (tag 856)
Uses database queries
„LIKE‟ statements
Exact/partial match
Limited use of search algorithm
No relevance ranking
Only physical collection and e-books
Traditional (Web)OPAC
Integrated OPAC Portal
ILS Database (Bibs)
Web Server Application
ILS Database (Patrons)
Website content
Enrichment Services Web services
Pros Cons
All WebOPAC features Keyword search Headings browse Availability info
Library website integration Patron empowerment Circ/Account details Online renewal Online hold placement SDI services New arrivals
OPAC enrichment Book cover/reviews
Thesaurus integration
Uses database queries „LIKE‟ statements
Exact/partial match
Limited use of search algorithms
No relevance ranking
Still limited to only physical collection & e-books
Integrated OPAC Portal
Federated Search Service
Web Server Application
Library Catalog
Digital Repository
ProQuest EBSCO Science Direct
PubMed Emerald
Full-text links
dbWiz 360 Search
Pazpar2 Research Pro
…
Federated Search Service
Muse Content Architecture
http://www.museglobal.com/technology/contentIntegration.html
Supports 6300+ databases!
Pros Cons
Single search broadcast Real-time search results Based on standards Z39.50, SRU/W MARC, ISO2709, XML
Supports large set of databases 7000+ in “360 Search” 6300+ in Muse platform
Merging and sorting No local index
(maintenance free!)
Not all databases are standards compliant Requires custom search scripts
Requires metadata crosswalk
Network intensive Performance issues
Mostly available as hosted service Annual subscription
Federated Search Service
Discovery Interface
Central Index (Solr/Lucene)
Web Server Application
ILS Database
MARC Bib data
Availability/Holds
Digital Repository
DC XML data
Full-text link
Enrichment Services Web services
Discovery Interface
Word stemming „fishing‟, „fished‟, „fish‟,
„fisher‟ => „fish‟
Fuzzy search insertion: cot coat deletion: coat cot substitution: coat cost
Auto-suggest N-gram, Edge N-gram
analysis
Phrase query „Did you mean?‟
Spell Checker
Relevance ranking TF-IDF / Term Vector
Term weights
Lucene scores
Faceted browsing Who are main authors and
their count?
What are main subjects and their count?
Pros Cons
Google-like search box Advanced features Fuzzy searching Relevance ranking Word stemming algorithms Social tagging/reviews “Did you mean?” feature Auto-suggest (type ahead) Faceted browsing
Availability/Hold requests Metadata enrichment Linking Amazon/Google/Wikipedia
Digital repository integration
Searches only locally hosted collections
Discovery Interface
Can we combine the two?
Modern discovery interface
Local collections + Remote databases
Unified search result
Web-scale Discovery Services
Central Index
Library Catalog
MARC data
Availability Full-text link
EBSCO
ProQuest
ABI Inform
PubMed
Science Direct
Lexis-Nexis
Web Server Application
Full-text and metadata
… Digital
Repository DC data
Web-scale Discovery Services
Web-scale Discovery Services
Library catalog records E-journal articles Institutional repositories Newspaper articles E-books Dissertations
Conference proceedings Grey literature Cited references Reports Digital library Databases and more.
Content types include:
Summon Service
Pros Cons
Google-like single search box Pre-indexed licensed content Inclusion of local collection OAI-PMH, MARC updates
Advanced features Relevance ranking “Did you mean?” Auto-suggest (type ahead) Faceted navigation
Availability/Full-text links Mobile friendly Web-service APIs Easier off-campus access No installation/maintenance
Supports limited number of databases (1000-1500) Requires huge investment to
maintain centralized index
Publisher partnerships (Licensing/legal issues)
Regular pre-publication indexing
Mostly hosted-only service Content bias? (ranking) Vendor lock-in?
Annual subscription
Web-scale Discovery Services
Can we have best of both worlds?
Web Server Application
Digital Repository
ILS database (Bibs)
Remote database
Remote database
Remote database
Remote database
Remote database
Remote database
Remote database
Remote database
Modern discovery interface
Local collections +
Remote databases
Unified search result
Supports large number of databases
Based on open standards (extensible)
Can be maintained locally (No subscription!)
Integrated Discovery Platform
http://www.indexdata.com/masterkey
Semi-commercial Supports 1000+ databases
Integrated Discovery Platform
Pazpar2 Architecture
https://www.indexdata.com/pazpar2
Open source (GPL) Build your own connector!
Conclusion
Each platform has its own goals: Pure library catalog can provide expressive search (high precision)
Federated search improves content coverage in single search
Discovery interfaces are designed to improve user experience for local collections
Web-scale discovery provides unified search experience for local and remote collections (still way short in content coverage)
Integrated platform provides extensibility (but requires significant effort in development and maintenance)
One size does not fit all. No single system is perfect.
As content becomes more open, the focus of discovery solutions should be on open platforms that are extensible as well as affordable.
Questions and Discussions