Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent...

12
Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006

Transcript of Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent...

Page 1: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

Relevance Rankingand Clustering

Small steps towards making the

library catalogue more useful

Kent Fitch, 16 Sep 2006

Page 2: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

Motivation

Help people find what they’re looking for

Page 3: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

The problem

• A reference librarian often has lots of context when someone walks up to them and says “The Civil War”

• Location• Age• Clothing• What’s on the local syllabus• Books they’re carrying• Past interactions• …

• A computer program has 13 characters

Page 4: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

Diversion – improving the context?

• IP addr• ANU, DFAT, BHP, Nicholls Primary

• Search history• “spanish history”, “franco”, “gettysburg”

• Referrer• ANU Library, Wikipedia, MySpace

• Browser• Visually impaired user?

Page 5: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

Relevance ranking“The Civil War”: more relevant if

• Occurs in Title/Subject/Author rather than notes/TOC; main Title/Author rather than added entry…

• Occurs as a phrase or near phrase rather than as scattered words• Occurs as an exact match• Occurs multiple times (especially the unusual words)• Occurs as the only or main words (e.g., as the only subject rather

than as 1 of 10)• Is a collection level record• Is widely held• Is held by one of your libraries• Is on the shelf at one of your libraries• Is available online• Is highly rated (sales/reviews) on Amazon or LibraryThing• Is widely cited by other books or by credible web pages• Is available for inexpensive purchase and quick delivery new or

second hand

Page 6: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

Relevance Ranking

Two approaches

– TeraText Gateway• Issue a series of searches on each successive

criteria• Very hard to incorporate non-binary factors (such

as quality of phrase match, number of holdings, …)

– Lucene• Combine a “score” for each criteria with an innate

“score” for each work

Page 7: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

Relevance Ranking

Example

http://ll01.nla.gov.au/

Page 8: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

ClusteringRelevance ranking only takes you so far

Relevant to what?• English civil war• US civil war• Spanish civil war• Angolan civil war• The church and civil wars• Post-colonial civil wars

Relevant to whom?• Audience• Date published• Form• Picture book• Movie• Thesis…

Page 9: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

Clustering

Group results by various criteria

• Subjects (hierarchy or parts/facets)• Material type/form• Genre• When published• Audience• Classification (Dewey, LC)• Author

Page 10: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

Extracting data from the MARC record for ranking and clustering

• What’s a “title”?• Deriving ranking and clustering fields

– Can we use LC/Dewey code names as “subjects”?http://ll01.nla.gov.au/search.jsp?topic=class%253A632%2BPlant%2Binjuries%252C%2Bdiseases%252C%2Bpests

– Can we reliably set “audience” based on 650 0 v Juvenile fictionGenre: “percussion xylophone” based on 048 a pb01Genre: “bibliography” and “technical report” based on 008 040308s2003    xraa     bt  f000 0 engSubject: “United States -- Florida” based on 043 a n-us-fl

Page 11: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

Clustering

Example

http://ll01.nla.gov.au/

Page 12: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.

Please Help

http://ll01.nla.gov.au/ is a prototype

• What do you like and dislike about it?

• How can it be improved?