Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben...

Post on 11-Dec-2015

221 views 3 download

Transcript of Demystifying Endeca’s search results ranking Kristina Spurgin with input & support from Ben...

Demystifying Endeca’s search results ranking

Kristina Spurginwith input & support from Ben Pennell & Jeff Campbell

UNC Libraries

Endeca details

• Search Configuration and Relevance Ranking– The supported search methods and details on how

results are ranked for each• TRLN Endeca Data Model– The major field groups, with brief descriptions of

their use, and indexing and display properties.• Endeca Extract and Mappings Spreadsheet– Details on how MARC fields get mapped into

Endeca fields

TRLN Endeca Search Interfaces

• Words anywhere (i.e. Keyword)• Author• Title• Journal title• Subject• ISBN/ISSN• (Publisher)

Spotting the relevancy strata

• Subject search relevancy strategy– Exact phrase match, starting from beginning of a

single field is the gold-standard match– Subject heading search: commonplace book

PubDateSort = 1700

No pub date!

A more complex search: keyword(AKA “Words anywhere”)

“Searches all indexed fields, but only uses some fields to rank results.” -- Search Configuration and Relevance Ranking

What fields are indexed?• Endeca Extract and Mappings Spreadsheet gives

the detailed info.

More on keyword search(AKA “Words anywhere”)

“Matches in the main title, subject headings, and main author fields will be given the highest ranking.” -- Search Configuration and Relevance Ranking

More on keyword search(AKA “Words anywhere”)

“Queries that match as a phrase are ranked higher than those which do not.” -- Search Configuration and Relevance Ranking

More on keyword search(AKA “Words anywhere”)

“Exact term matches are ranked higher than those returned because of spell correction, stemming, and thesaurus lookups.” -- Search Configuration and Relevance Ranking

More on keyword search(AKA “Words anywhere”)

“Matches in tables of contents, summaries, or selected EAD elements are not used to determine ranking.” -- Search Configuration and Relevance Ranking

An aside on keyword search(AKA “Words anywhere”)

Fields used to rank Keyword resultsMost important to least

Main TitleMain Title NormalizedTitle VernacularTitle Vernacular SegmentedSubject HeadingsSubjects NormalizedSubjects Vernacular SegmentedMain AuthorMain Author NormalizedMain Author VernacularMain Author Vernacular SegmentedCompanyVarying TitlesVarying Titles Vernacular SegmentedOther AuthorsOther Author TranslationAuthors NormalizedMain Uniform TitleMain Uniform Title VernacularMain Uniform Title Vernacular SegmentedUniform TitleUniform Title VernacularUniform Title Vernacular SegmentedTitle Index

Earlier TitleLater TitleHost Item LinkingUncontrolled SubjectOther TitlesOther Title TranslationTranslated as LinkingTranslation of LinkingSeries Title IndexSeries StatementSeries NormalizedSeries Statement VernacularSeries Statement Vernacular SegmentedPublisherPublisher NormalizedSound Recording ImprintDirectorPerformer CreditsProduction CreditsBiographical SketchRelated CollectionsDigital CollectionGenreProduct

Fields used to rank Title resultsMost important to least

Title1Title2Title3Title4Main TitleMain Title NormalizedJournal Title IndexTitle VernacularTitle Vernacular SegmentedVarying TitlesTitles NormalizedVarying Titles Vernacular SegmentedMain Uniform TitleMain Uniform Title VernacularMain Uniform Title Vernacular Segmented

1 word titles

2 word titles

3 word titles

Fields used to rank Journal Title resultsMost important to least

Journal Title IndexJournal Uniform TitleJournal Title AbbreviationJournal Later TitleJournal Earlier Title

Fields used to rank Author resultsMost important to least

Main AuthorMain Author NormalizedMain Author VernacularMain Author Vernacular SegmentedDirectorPerformer CreditsProduction CreditsAuthor

Fields used to rank Subject resultsMost important to least

Subject HeadingsSubjects Vernacular SegmentedSubjects NormalizedGenre

What is irrelevant to relevancy?

• Many aspects of the record are NOT considered in relevancy ranking

• FORMAT is the biggest surprise, it seems

And, with that whirlwind tour…

Image source