Learning to rank fulltext results from clicks
-
Upload
tkramar -
Category
Technology
-
view
332 -
download
2
description
Transcript of Learning to rank fulltext results from clicks
![Page 1: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/1.jpg)
Learning to rank fulltext results from
clicksTomáš Kramár
@tkramar@synopsitv
![Page 2: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/2.jpg)
Let's build a fulltext search engine.
QueryFind matches
Rank results
1 2
43
![Page 3: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/3.jpg)
Let's build a fulltext search engine.
QueryFind matches
Rank results
1 2
43
● ElasticSearch● LIKE %%● ...
![Page 4: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/4.jpg)
Let's build a fulltext search engine.
QueryFind matches
Rank results
1 2
43
● By number of hits● By PageRank● By Date● ...
![Page 5: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/5.jpg)
![Page 6: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/6.jpg)
How do you choose relevant results?
![Page 7: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/7.jpg)
Number of keywords in title
2 2
Number of keywords in text
2 0
Domain carrerjet.sk vienna-rb.at
Category Job search Programming
Language Slovak English
![Page 8: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/8.jpg)
Document feature How much I care about it (the higher the more I care)
# keywords in title 2.1
# keywords in text 1
Domain is carreerjet.sk -2
Domain is vienna-rb.at 3.5
Category is Job Search -1
Category is Programming 4.2
Language is Slovak 0.9
Language is English 1.5
![Page 9: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/9.jpg)
Document feature How much I care about it
# keywords in title 2.1 2 2
# keywords in text 1 2 0
Domain is carreerjet.sk -2 1 0
Domain is vienna-rb.at 3.5 0 1
Category is Job Search -1 1 0
Category is Programming 4.2 0 1
Language is Slovak 0.9 1 0
Language is English 1.5 0 1
= 4.1 = 13.3rank = d . u
![Page 10: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/10.jpg)
Rate each result on a scale 1-5.
![Page 11: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/11.jpg)
rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un
d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3
d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5
d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1
d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3
![Page 12: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/12.jpg)
rating = d . u = = d1 . u1 + d2 . u2 + ... + dn . un
d1,1 . u1 + d1,2 . u2 + ... + d1,n . un = 3
d2,1 . u1 + d2,2 . u2 + ... + d2,n . un = 5
d3,1 . u1 + d3,2 . u2 + ... + dn . u3,n = 1
d4,1 . u1 + d4,1 . u2 + ... + dn . u4,n = 3
di,j are known, solve this system of
equations and you have u. Done.
![Page 13: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/13.jpg)
Except..
● You don't know the explicit ratings
● User preferences change in time● Those equations probably don't
have solution
![Page 14: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/14.jpg)
Clicked! Assume rating 1.
Not clicked. Assume rating 0.
![Page 15: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/15.jpg)
Except..
● You don't know the explicit ratings
● User preferences change in time● Those equations probably don't
have solution
![Page 16: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/16.jpg)
Approximation functionh(d): d → rankh(d) = d1.u1 + ... + dn.un = estimated_rank
If the function is good, it should make minimal errorserror = (estimated_rank - real_rank)2
![Page 17: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/17.jpg)
Gradient descent
1. Set user preferences (u) to arbitrary values
2. Calculate the estimated rank h(d) for each document
3. Calculate the mean square error4. Adjust preferences u in a way that
minimizes the error5. Repeat until the error converges
![Page 18: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/18.jpg)
mea
n sq
uare
err
or
u# of keywords in title
cost function
![Page 19: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/19.jpg)
mea
n sq
uare
err
or
u# of keywords in title
cost function
Calculate the derivation of cost function at this point and it will give you the direction to move in.
![Page 20: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/20.jpg)
Preference update
ui = ui - α.h(d)dui
α learning rate
h(d)dui partial derivation of cost function h(d) by ui
![Page 21: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/21.jpg)
Preference update
ui = ui - α.h(d)dui
α learning rate
h(d)dui partial derivation of cost function h(d) by ui
How fast will you move. Too low - slow progress. Too high - you will overshoot.
![Page 22: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/22.jpg)
Preference update
ui = ui - α.h(d)dui
α learning rate
h(d)dui partial derivation of cost function h(d) by ui
Nothing scary. You can find these online for standard cost functions.
For mean square error:
(rank(d) - h(d)) * ui
![Page 23: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/23.jpg)
Gradient descent
1. Set user preferences (u) to arbitrary values
2. Calculate the estimated rank h(d) for each document
3. Calculate the square error4. Adjust preferences u in a way that
minimizes the error5. Repeat until the error converges
![Page 24: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/24.jpg)
Clicked! Assume rating 1.
Clicked! Assume rating 1. Or? Doesn't this mean result #1 is not relevant?
![Page 25: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/25.jpg)
Clicked! Assume nothing.
Clicked! Assume it is better than #2 and #3.
![Page 26: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/26.jpg)
What's changed?
We no longer have ratings, just document comparisons.
Cost function - something that considers ordering, e.g., Kendall's T (number of concordant and discordant pairs)
h is now a function of 2 parameters: h(d1, d2). But you can just do d2 - d1 and learn on that.
d4 > d3
d4 > d2
![Page 27: Learning to rank fulltext results from clicks](https://reader034.fdocuments.us/reader034/viewer/2022051817/548e3607b479597a588b48e7/html5/thumbnails/27.jpg)