Endeca @ NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
-
Upload
shanon-andrews -
Category
Documents
-
view
222 -
download
1
Transcript of Endeca @ NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
Endeca @ NCSU Endeca @ NCSU LibrariesLibraries
Kristin AntelmanKristin Antelman
NCSU LibrariesNCSU Libraries
June 24, 2006June 24, 2006
Overview The problemThe problem Quick demoQuick demo Technical overviewTechnical overview Implementation processImplementation process Use dataUse data Assessment dataAssessment data Next stepsNext steps
Existing catalogs are hard to Existing catalogs are hard to use:use:– known item searching works pretty known item searching works pretty well, well, but …but …
– users often do keyword searching users often do keyword searching on topics and get large result on topics and get large result sets returned in system sort ordersets returned in system sort order
– catalogs are unforgiving on catalogs are unforgiving on spelling errors, stemmingspelling errors, stemming
Why did we do this?
NO RELEVANCY!
Catalog value is buried
Subject headings are not Subject headings are not leveraged in searchingleveraged in searching– they should be they should be browsedbrowsed or or linkedlinked from, not searchedfrom, not searched
Data from the item record is not Data from the item record is not leveragedleveraged– should be able to filter by item type, should be able to filter by item type, location, circulation status, location, circulation status, popularitypopularity
What does the Endeca software do? Provides search software for Provides search software for ecommerce companiesecommerce companies
Faceted browse of structured metadata; Faceted browse of structured metadata; goal is to goal is to exposeexpose the ontology the ontology
Endeca technical overview
Raw MARC data
NCSU exports and reformats
Flat text files
Data FoundryParse text files
Indices
MDEX Engine
NCSU Web Application
HTTP
Client browser
HTTP
Endeca Information Access Platform
Integrating Endeca - Enhancements MarcAdapter plugin for raw MARC MarcAdapter plugin for raw MARC data.data.– Eliminate need for external MARC Eliminate need for external MARC 21 translation and file merging21 translation and file merging
Partial UpdatesPartial Updates– Update circulation data multiple Update circulation data multiple times throughout the daytimes throughout the day
Implementation process TimelineTimeline
– License / negotiation: Spring 2005License / negotiation: Spring 2005– Acquire: Summer 2005Acquire: Summer 2005– Implementation: August 2005 – January 12, Implementation: August 2005 – January 12, 20062006
7 representative team members7 representative team members– functional requirements, metadata, functional requirements, metadata, interface issuesinterface issues (total of 40-60 hours) (total of 40-60 hours)– project manager: approximately 10 hours project manager: approximately 10 hours per week per week for 20 weeks for 20 weeks
Java-trained librarianJava-trained librarian (30-40 hrs/wk (30-40 hrs/wk for 14 weeks)for 14 weeks)
It doesn’t have to be perfect!It doesn’t have to be perfect!
Key decision points
Search interfaceSearch interface
Main search page
Endeca
Web2
Advanced search
A few major issues Search interfaceSearch interface
Selecting dimensions and their orderSelecting dimensions and their order
10. Library of Congress Classification
9. Availability
1. Subject: Topic2. Subject: Genre3. Format4. Library5. Subject: Region6. Subject: Era7. Language8. Author
Dimensions
A few major issues Search interfaceSearch interface
Selecting dimensions and their orderSelecting dimensions and their order Defining the relevance algorithmDefining the relevance algorithm
Relevance defined Relevance ranking in Endeca – Relevance ranking in Endeca – select from a variety of modules select from a variety of modules and order them based on importanceand order them based on importance
At NCSU…At NCSU…1.1.Original query term(s) (no Original query term(s) (no thesaurus, stemming, spell thesaurus, stemming, spell correction)correction)
2.2.Exact phrase matchExact phrase match3.3.Field ranking (Title higher than Field ranking (Title higher than Author higher than Table of Contents, Author higher than Table of Contents, etc.)etc.)
4.4.Number of fields that contain Number of fields that contain term(s) …term(s) …
Use dataUse data
Some search statistics (March
- May 2006)
Requests by Search Type
Search -> Navigation
29%
Navigation 20%
Search 51%
Sorting statistics (March – May
2006)
Sorting Requests
Most Popular19%
Title A-Z13%
Pub Date53%
Author A-Z
Call Number
Some navigation statistics (March - May 2006)
Navigation Requests by Dimension
70,516
38,074
38,605
59,248
87,221
74,985
65,545
155,856
169,249
23,848
0 30,000 60,000 90,000 120,000 150,000
Author
Language
Subject: Era
Subject: Region
Library
Format
Subject: Genre
Subject: Topic
LC Classification
Availability
Requests
AssessmentAssessment
Some user reaction
“The new Endeca system is incredible. It would be difficult to exaggerate how much better it is than our old online card catalog (and therefore that of most other universities). I've found myself searching the catalog just for fun, whereas before it was a chore to find what I needed.”
- NCSU Undergrad, - NCSU Undergrad, StatisticsStatistics
“The new library catalog search features are a big improvement over the old system. Not only is the search extremely fast, but seemingly it's much more intelligent as well.”
- NCSU faculty, - NCSU faculty, PsychologyPsychology
Topical searching tasks
Topical Task Success: Web2
Easy36%
Medium7%Hard
23%
Failed34%
Topical Task Success: Endeca
Easy58%
Medium17%
Hard3%
Failed22%
Average topical task duration
Testing relevance
Are search results in Endeca more Are search results in Endeca more likely to be relevant to a user’s likely to be relevant to a user’s query than search results in Web2 query than search results in Web2 OPAC? OPAC?
100 topical user searches from 1 100 topical user searches from 1 month in fall 2005month in fall 2005
How many of top 5 results How many of top 5 results relevant?relevant?– 40% relevant in Web2 OPAC40% relevant in Web2 OPAC– 68% relevant in Endeca catalog68% relevant in Endeca catalog
Future plans
FRBR-ized displays FRBR-ized displays FAST (Faceted Access to Subject Terms) instead of LCSHFAST (Faceted Access to Subject Terms) instead of LCSH
Enrich records with Enrich records with supplemental contentsupplemental content
More integration with website searchMore integration with website search
Use Endeca to index local collectionsUse Endeca to index local collections
Thank you
project page:www.lib.ncsu.edu/endeca