Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent...
-
Upload
berenice-jackson -
Category
Documents
-
view
212 -
download
0
Transcript of Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent...
![Page 1: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/1.jpg)
Relevance Rankingand Clustering
Small steps towards making the
library catalogue more useful
Kent Fitch, 16 Sep 2006
![Page 2: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/2.jpg)
Motivation
Help people find what they’re looking for
![Page 3: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/3.jpg)
The problem
• A reference librarian often has lots of context when someone walks up to them and says “The Civil War”
• Location• Age• Clothing• What’s on the local syllabus• Books they’re carrying• Past interactions• …
• A computer program has 13 characters
![Page 4: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/4.jpg)
Diversion – improving the context?
• IP addr• ANU, DFAT, BHP, Nicholls Primary
• Search history• “spanish history”, “franco”, “gettysburg”
• Referrer• ANU Library, Wikipedia, MySpace
• Browser• Visually impaired user?
![Page 5: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/5.jpg)
Relevance ranking“The Civil War”: more relevant if
• Occurs in Title/Subject/Author rather than notes/TOC; main Title/Author rather than added entry…
• Occurs as a phrase or near phrase rather than as scattered words• Occurs as an exact match• Occurs multiple times (especially the unusual words)• Occurs as the only or main words (e.g., as the only subject rather
than as 1 of 10)• Is a collection level record• Is widely held• Is held by one of your libraries• Is on the shelf at one of your libraries• Is available online• Is highly rated (sales/reviews) on Amazon or LibraryThing• Is widely cited by other books or by credible web pages• Is available for inexpensive purchase and quick delivery new or
second hand
![Page 6: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/6.jpg)
Relevance Ranking
Two approaches
– TeraText Gateway• Issue a series of searches on each successive
criteria• Very hard to incorporate non-binary factors (such
as quality of phrase match, number of holdings, …)
– Lucene• Combine a “score” for each criteria with an innate
“score” for each work
![Page 7: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/7.jpg)
Relevance Ranking
Example
http://ll01.nla.gov.au/
![Page 8: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/8.jpg)
ClusteringRelevance ranking only takes you so far
Relevant to what?• English civil war• US civil war• Spanish civil war• Angolan civil war• The church and civil wars• Post-colonial civil wars
Relevant to whom?• Audience• Date published• Form• Picture book• Movie• Thesis…
![Page 9: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/9.jpg)
Clustering
Group results by various criteria
• Subjects (hierarchy or parts/facets)• Material type/form• Genre• When published• Audience• Classification (Dewey, LC)• Author
![Page 10: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/10.jpg)
Extracting data from the MARC record for ranking and clustering
• What’s a “title”?• Deriving ranking and clustering fields
– Can we use LC/Dewey code names as “subjects”?http://ll01.nla.gov.au/search.jsp?topic=class%253A632%2BPlant%2Binjuries%252C%2Bdiseases%252C%2Bpests
– Can we reliably set “audience” based on 650 0 v Juvenile fictionGenre: “percussion xylophone” based on 048 a pb01Genre: “bibliography” and “technical report” based on 008 040308s2003 xraa bt f000 0 engSubject: “United States -- Florida” based on 043 a n-us-fl
![Page 11: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/11.jpg)
Clustering
Example
http://ll01.nla.gov.au/
![Page 12: Relevance Ranking and Clustering Small steps towards making the library catalogue more useful Kent Fitch, 16 Sep 2006.](https://reader036.fdocuments.us/reader036/viewer/2022072006/56649d1a5503460f949f0107/html5/thumbnails/12.jpg)
Please Help
http://ll01.nla.gov.au/ is a prototype
• What do you like and dislike about it?
• How can it be improved?