Download - KB subject prediction tool. STITCH final event KB subject prediction prototype Introduction Subject prediction is a special case of book reindexing What.

KB subject prediction tool

STITCH final event

KB subject prediction prototype Introduction

• Subject prediction is a special case of book reindexing

• What is reindexing?• Why build a subject prediction tool?• How does subject prediction work?• The technical aspects of the tool• DEMO• User study results

STITCH final event

What do we mean by re-indexing?

• 2 collections• Each described (indexed) by its own

thesaurus/classification system

Collection 1 Collection 2

Thes1 Thes2

STITCH final event

The re-indexing application

• Goal: have the book of one collection described by the thesaurus used in the second collection

• For instance: if one thesaurus is dropped, old books have to be indexed according to the other thesaurus

Collection 1 Collection 2

Thes1 Thes2

STITCH final event

Book re-indexing

• Goal: converting source indexing to a target indexing system • From original thesaurus• To new thesaurus

? ? ?

Thes1

Thes2

STITCH final event

KB subject prediction prototype

• Book reindexing as a possibly valuable tool for the KB

• Reindexing as a usage scenario for vocabulary alignment• Deployment of alignment techniques into a real world

context• Scenario-specific, vocabulary alignment

• Introduction of SW techniques into KB practice

STITCH final event

KB subject prediction prototype

Prototype:

• NBD/Biblion (LTR) to Brinkman• Books already indexed by Public libraries

• Integration into WinIBW software• the access tool to the Pica Library system used at KB

• User Study• tool evaluated by 6 experienced indexers

(titelbeschrijvers)

STITCH final event

How to predict Brinkman Subjects?

• Given NDB/Biblion subject metadata values, predict Brinkman

• Used 240000 common books

• Tried different reindexing strategies• Used standard (lexical and instance based)

techniques • Developed an alignment using statistical techniques.

Very specific to scenario, using more metadata • LTR -- Biblion concepts,• AUT -- main authors of books,• KAR -- ``characteristic'' and• DGP -- intellectual level/target group.

STITCH final event

Subject Suggestion Rules

Source combination → target concept Confidence level

Correct books / Total

DGP:Jeugd fictie; vanaf 13 jaar' + KAR:Stripverhaal → BTR:stripverhalen

0.995 182/182

LTR:Reisgidsen + LTR:Spanje → BTR:Spanje ; reisgidsen

0.982 50/50

LTR:Liefde + AUT:Jeanette Winterson →

romans en novellen ; vertaald 0.540 1/1

LTR:Bouwkunde → BTR:leermiddelen ; bouwtechniek

0.196 25/123

STITCH final event

KB re-indexing prototype

STITCH final eventBook indexing suggestion tool

STITCH final event

User Study

• 6 indexers (titelbeschrijvers) of the “Depot collectie”.

• They used the tool for 6 weeks• only if a book contained LTR → 284 books.

• Marked the suggestions,• Filled in two questionnaires• Gave verbal feedback in final meeting

STITCH final event

Most important results for user satisfaction

• Technically the tool was ok, but the robustness (storingsgevoeligheid) absolutely needed to be improved.

• Users were interested in the tool and would keep on using it provided its quality is improved.

Most important directions for improvement:• More robust• More often applicable• Better subject suggestions

STITCH final event

Results User Study: quality of suggestionsconfidence suggestio

nscorrect precision recall

> 0.54 308 224 72.7% 47.9% > 0.10 1188 127 10.7% 27.1% > 0.02 2525 28 1.11% 5.98%not suggested

89 19.0%

• Which suggestions were chosen among the ones presented to the indexer?

• precision: percentage of suggestions that were correct• recall: percentage of the subjects chosen by indexers

that were found by the tool.

STITCH final event

Thanks evaluators!BedanktMargot, Donita, Bernadette, Judith, Rob, Arjan