KB subject prediction tool
STITCH final event
KB subject prediction prototype Introduction
• Subject prediction is a special case of book reindexing
• What is reindexing?• Why build a subject prediction tool?• How does subject prediction work?• The technical aspects of the tool• DEMO• User study results
STITCH final event
What do we mean by re-indexing?
• 2 collections• Each described (indexed) by its own
thesaurus/classification system
Collection 1 Collection 2
Thes1 Thes2
STITCH final event
The re-indexing application
• Goal: have the book of one collection described by the thesaurus used in the second collection
• For instance: if one thesaurus is dropped, old books have to be indexed according to the other thesaurus
Collection 1 Collection 2
Thes1 Thes2
STITCH final event
Book re-indexing
• Goal: converting source indexing to a target indexing system • From original thesaurus• To new thesaurus
? ? ?
Thes1
Thes2
STITCH final event
KB subject prediction prototype
• Book reindexing as a possibly valuable tool for the KB
• Reindexing as a usage scenario for vocabulary alignment• Deployment of alignment techniques into a real world
context• Scenario-specific, vocabulary alignment
• Introduction of SW techniques into KB practice
STITCH final event
KB subject prediction prototype
Prototype:
• NBD/Biblion (LTR) to Brinkman• Books already indexed by Public libraries
• Integration into WinIBW software• the access tool to the Pica Library system used at KB
• User Study• tool evaluated by 6 experienced indexers
(titelbeschrijvers)
STITCH final event
How to predict Brinkman Subjects?
• Given NDB/Biblion subject metadata values, predict Brinkman
• Used 240000 common books
• Tried different reindexing strategies• Used standard (lexical and instance based)
techniques • Developed an alignment using statistical techniques.
Very specific to scenario, using more metadata • LTR -- Biblion concepts,• AUT -- main authors of books,• KAR -- ``characteristic'' and• DGP -- intellectual level/target group.
STITCH final event
Subject Suggestion Rules
Source combination → target concept Confidence level
Correct books / Total
DGP:Jeugd fictie; vanaf 13 jaar' + KAR:Stripverhaal → BTR:stripverhalen
0.995 182/182
LTR:Reisgidsen + LTR:Spanje → BTR:Spanje ; reisgidsen
0.982 50/50
LTR:Liefde + AUT:Jeanette Winterson →
romans en novellen ; vertaald 0.540 1/1
LTR:Bouwkunde → BTR:leermiddelen ; bouwtechniek
0.196 25/123
STITCH final event
KB re-indexing prototype
STITCH final eventBook indexing suggestion tool
STITCH final event
User Study
• 6 indexers (titelbeschrijvers) of the “Depot collectie”.
• They used the tool for 6 weeks• only if a book contained LTR → 284 books.
• Marked the suggestions,• Filled in two questionnaires• Gave verbal feedback in final meeting
STITCH final event
Most important results for user satisfaction
• Technically the tool was ok, but the robustness (storingsgevoeligheid) absolutely needed to be improved.
• Users were interested in the tool and would keep on using it provided its quality is improved.
Most important directions for improvement:• More robust• More often applicable• Better subject suggestions
STITCH final event
Results User Study: quality of suggestionsconfidence suggestio
nscorrect precision recall
> 0.54 308 224 72.7% 47.9% > 0.10 1188 127 10.7% 27.1% > 0.02 2525 28 1.11% 5.98%not suggested
89 19.0%
• Which suggestions were chosen among the ones presented to the indexer?
• precision: percentage of suggestions that were correct• recall: percentage of the subjects chosen by indexers
that were found by the tool.
STITCH final event
Thanks evaluators!BedanktMargot, Donita, Bernadette, Judith, Rob, Arjan
Top Related